freebsd-skq

Author	SHA1	Message	Date
Alan Cox	6745299365	Add sysctl and support code to allow directories to be VMIO'd. The default setting for the sysctl is OFF, which is the historical operation. Submitted by: dillon	1999-07-26 06:25:53 +00:00
Poul-Henning Kamp	698bfad7f2	Now a dev_t is a pointer to struct specinfo which is shared by all specdev vnodes referencing this device. Details: cdevsw->d_parms has been removed, the specinfo is available now (== dev_t) and the driver should modify it directly when applicable, and the only driver doing so, does so: vn.c. I am not sure the logic in checking for "<" was right before, and it looks even less so now. An intial pool of 50 struct specinfo are depleted during early boot, after that malloc had better work. It is likely that fewer than 50 would do. Hashing is done from udev_t to dev_t with a prime number remainder hash, experiments show no better hash available for decent cost (MD5 is only marginally better) The prime number used should not be close to a power of two, we use 83 for now. Add new checkalias2() to get around the loss of info from dev2udev() in bdevvp(); The aliased vnodes are hung on a list straight of the dev_t, and speclisth[SPECSZ] is unused. The sharing of struct specinfo means that the v_specnext moves into the vnode which grows by 4 bytes. Don't use a VBLK dev_t which doesn't make sense in MFS, now we hang a dummy cdevsw on B/Cmaj 253 so that things look sane. Storage overhead from all of this is O(50k). Bump __FreeBSD_version to 400009 The next step will add the stuff needed so device-drivers can start to hang things from struct specinfo	1999-07-20 09:47:55 +00:00
Poul-Henning Kamp	3de280c443	[click] Now all dev_t's in the kernel have their char device major. Only know casualy of this is swapinfo/pstat which should be fixes the right way: Store the actual pathname in the kernel like mount does. [Volounteers sought for this task] The road map from here is roughly: expand struct specinfo into struct based dev_t. Add dev_t registration facilities for device drivers and start to use them.	1999-07-19 09:37:59 +00:00
Poul-Henning Kamp	6ca5486476	Introduce the vn_todev(struct vnode*) function, which returns the dev_t corresponding to a VBLK or VCHR node, or NODEV.	1999-07-18 14:30:37 +00:00
Poul-Henning Kamp	c7119ea7dd	Fix 2nd arg to udev2dev().	1999-07-17 19:38:00 +00:00
Poul-Henning Kamp	f008cfcc1a	I have not one single time remembered the name of this function correctly so obviously I gave it the wrong name. s/umakedev/makeudev/g	1999-07-17 18:43:50 +00:00
Kris Kennaway	e7647e6c20	Correct a couple of spelling errors in comments.	1999-07-12 15:02:51 +00:00
Kirk McKusick	ad8ac923fa	These changes appear to give us benefits with both small (32MB) and large (1G) memory machine configurations. I was able to run 'dbench 32' on a 32MB system without bring the machine to a grinding halt. * buffer cache hash table now dynamically allocated. This will have no effect on memory consumption for smaller systems and will help scale the buffer cache for larger systems. * minor enhancement to pmap_clearbit(). I noticed that all the calls to it used constant arguments. Making it an inline allows the constants to propogate to deeper inlines and should produce better code. * removal of inherent vfs_ioopt support through the emplacement of appropriate #ifdef's, with John's permission. If we do not find a use for it by the end of the year we will remove it entirely. * removal of getnewbufloops* counters & sysctl's - no longer necessary for debugging, getnewbuf() is now optimal. * buffer hash table functions removed from sys/buf.h and localized to vfs_bio.c * VFS_BIO_NEED_DIRTYFLUSH flag and support code added ( bwillwrite() ), allowing processes to block when too many dirty buffers are present in the system. * removal of a softdep test in bdwrite() that is no longer necessary now that bdwrite() no longer attempts to flush dirty buffers. * slight optimization added to bqrelse() - there is no reason to test for available buffer space on B_DELWRI buffers. * addition of reverse-scanning code to vfs_bio_awrite(). vfs_bio_awrite() will attempt to locate clusterable areas in both the forward and reverse direction relative to the offset of the buffer passed to it. This will probably not make much of a difference now, but I believe we will start to rely on it heavily in the future if we decide to shift some of the burden of the clustering closer to the actual I/O initiation. * Removal of the newbufcnt and lastnewbuf counters that Kirk added. They do not fix any race conditions that haven't already been fixed by the gbincore() test done after the only call to getnewbuf(). getnewbuf() is a static, so there is no chance of it being misused by other modules. ( Unless Kirk can think of a specific thing that this code fixes. I went through it very carefully and didn't see anything ). * removal of VOP_ISLOCKED() check in flushbufqueues(). I do not think this check is necessary, the buffer should flush properly whether the vnode is locked or not. ( yes? ). * removal of extra arguments passed to getnewbuf() that are not necessary. * missed cluster_wbuild() that had to be a cluster_wbuild_wb() in vfs_cluster.c * vn_write() now calls bwillwrite() PRIOR to locking the vnode, which should greatly aid flushing operations in heavy load situations - both the pageout and update daemons will be able to operate more efficiently. * removal of b_usecount. We may add it back in later but for now it is useless. Prior implementations of the buffer cache never had enough buffers for it to be useful, and current implementations which make more buffers available might not benefit relative to the amount of sophistication required to implement a b_usecount. Straight LRU should work just as well, especially when most things are VMIO backed. I expect that (even though John will not like this assumption) directories will become VMIO backed some point soon. Submitted by: Matthew Dillon <dillon@backplane.com> Reviewed by: Kirk McKusick <mckusick@mckusick.com>	1999-07-08 06:06:00 +00:00
Kirk McKusick	e929c00d23	The buffer queue mechanism has been reformulated. Instead of having QUEUE_AGE, QUEUE_LRU, and QUEUE_EMPTY we instead have QUEUE_CLEAN, QUEUE_DIRTY, QUEUE_EMPTY, and QUEUE_EMPTYKVA. With this patch clean and dirty buffers have been separated. Empty buffers with KVM assignments have been separated from truely empty buffers. getnewbuf() has been rewritten and now operates in a 100% optimal fashion. That is, it is able to find precisely the right kind of buffer it needs to allocate a new buffer, defragment KVM, or to free-up an existing buffer when the buffer cache is full (which is a steady-state situation for the buffer cache). Buffer flushing has been reorganized. Previously buffers were flushed in the context of whatever process hit the conditions forcing buffer flushing to occur. This resulted in processes blocking on conditions unrelated to what they were doing. This also resulted in inappropriate VFS stacking chains due to multiple processes getting stuck trying to flush dirty buffers or due to a single process getting into a situation where it might attempt to flush buffers recursively - a situation that was only partially fixed in prior commits. We have added a new daemon called the buf_daemon which is responsible for flushing dirty buffers when the number of dirty buffers exceeds the vfs.hidirtybuffers limit. This daemon attempts to dynamically adjust the rate at which dirty buffers are flushed such that getnewbuf() calls (almost) never block. The number of nbufs and amount of buffer space is now scaled past the 8MB limit that was previously imposed for systems with over 64MB of memory, and the vfs.{lo,hi}dirtybuffers limits have been relaxed somewhat. The number of physical buffers has been increased with the intention that we will manage physical I/O differently in the future. reassignbuf previously attempted to keep the dirtyblkhd list sorted which could result in non-deterministic operation under certain conditions, such as when a large number of dirty buffers are being managed. This algorithm has been changed. reassignbuf now keeps buffers locally sorted if it can do so cheaply, and otherwise gives up and adds buffers to the head of the dirtyblkhd list. The new algorithm is deterministic but not perfect. The new algorithm greatly reduces problems that previously occured when write_behind was turned off in the system. The P_FLSINPROG proc->p_flag bit has been replaced by the more descriptive P_BUFEXHAUST bit. This bit allows processes working with filesystem buffers to use available emergency reserves. Normal processes do not set this bit and are not allowed to dig into emergency reserves. The purpose of this bit is to avoid low-memory deadlocks. A small race condition was fixed in getpbuf() in vm/vm_pager.c. Submitted by: Matthew Dillon <dillon@apollo.backplane.com> Reviewed by: Kirk McKusick <mckusick@mckusick.com>	1999-07-04 00:25:38 +00:00
Poul-Henning Kamp	8947a90a90	Make sure that stat(2) and friends always return a valid st_dev field. Pseudo-FS need not fill in the va_fsid anymore, the syscall code will use the first half of the fsid, which now looks like a udev_t with major 255.	1999-07-02 16:29:47 +00:00
Peter Wemm	9c8b8baa38	Slight reorganization of kernel thread/process creation. Instead of using SYSINIT_KT() etc (which is a static, compile-time procedure), use a NetBSD-style kthread_create() interface. kproc_start is still available as a SYSINIT() hook. This allowed simplification of chunks of the sysinit code in the process. This kthread_create() is our old kproc_start internals, with the SYSINIT_KT fork hooks grafted in and tweaked to work the same as the NetBSD one. One thing I'd like to do shortly is get rid of nfsiod as a user initiated process. It makes sense for the nfs client code to create them on the fly as needed up to a user settable limit. This means that nfsiod doesn't need to be in /sbin and is always "available". This is a fair bit easier to do outside of the SYSINIT_KT() framework.	1999-07-01 13:21:46 +00:00
Kirk McKusick	67812eacd7	Convert buffer locking from using the B_BUSY and B_WANTED flags to using lockmgr locks. This commit should be functionally equivalent to the old semantics. That is, all buffer locking is done with LK_EXCLUSIVE requests. Changes to take advantage of LK_SHARED and LK_RECURSIVE will be done in future commits.	1999-06-26 02:47:16 +00:00
Kirk McKusick	f9c8cab591	Add a vnode argument to VOP_BWRITE to get rid of the last vnode operator special case. Delete special case code from vnode_if.sh, vnode_if.src, umap_vnops.c, and null_vnops.c.	1999-06-16 23:27:55 +00:00
Kirk McKusick	e4ab40bcb6	Get rid of the global variable rushjob and replace it with a function in kern/vfs_subr.c named speedup_syncer() which handles the speedup request. Change the various clients of rushjob to use the new function.	1999-06-15 23:37:29 +00:00
Poul-Henning Kamp	2447bec829	Simplify cdevsw registration. The cdevsw_add() function now finds the major number(s) in the struct cdevsw passed to it. cdevsw_add_generic() is no longer needed, cdevsw_add() does the same thing. cdevsw_add() will print an message if the d_maj field looks bogus. Remove nblkdev and nchrdev variables. Most places they were used bogusly. Instead check a dev_t for validity by seeing if devsw() or bdevsw() returns NULL. Move bdevsw() and devsw() functions to kern/kern_conf.c Bump __FreeBSD_version to 400006 This commit removes: 72 bogus makedev() calls 26 bogus SYSINIT functions if_xe.c bogusly accessed cdevsw[], author/maintainer please fix. I4b and vinum not changed. Patches emailed to authors. LINT probably broken until they catch up.	1999-05-31 11:29:30 +00:00
John Birrell	02013ff886	Remove the test for bdevsw(dev) == NULL from bdevvp() because it fails if there is no character device associated with the block device. In this case that doesn't matter because bdevvp() doesn't use the character device structure. I can use the pointy bit of the axe too.	1999-05-24 00:34:10 +00:00
Luoqi Chen	0ce54cbb0c	Legally acquire a major number for mfs.	1999-05-14 20:40:23 +00:00
Kirk McKusick	eaea7a9e9f	Previously directories were sync'ed every 10 seconds while bitmaps & inodes were synced every 15 seconds. This is now reversed as during directory create, we cannot commit the directory entry until its inode has been written. With this switch, the inodes will be more likely to be written by the time that the directory is written thus reducing the number of directory rollbacks that are needed.	1999-05-14 01:29:21 +00:00
Peter Wemm	cc5881cff5	Fix (?) SPECHASH dev_t/major/minor/etc args	1999-05-12 19:06:40 +00:00
Poul-Henning Kamp	8bee45c44e	Don't peek into dev_t	1999-05-12 11:06:07 +00:00
Poul-Henning Kamp	bfbb9ce670	Divorce "dev_t" from the "major\|minor" bitmap, which is now called udev_t in the kernel but still called dev_t in userland. Provide functions to manipulate both types: major() umajor() minor() uminor() makedev() umakedev() dev2udev() udev2dev() For now they're functions, they will become in-line functions after one of the next two steps in this process. Return major/minor/makedev to macro-hood for userland. Register a name in cdevsw[] for the "filedescriptor" driver. In the kernel the udev_t appears in places where we have the major/minor number combination, (ie: a potential device: we may not have the driver nor the device), like in inodes, vattr, cdevsw registration and so on, whereas the dev_t appears where we carry around a reference to a actual device. In the future the cdevsw and the aliased-from vnode will be hung directly from the dev_t, along with up to two softc pointers for the device driver and a few houskeeping bits. This will essentially replace the current "alias" check code (same buck, bigger bang). A little stunt has been provided to try to catch places where the wrong type is being used (dev_t vs udev_t), if you see something not working, #undef DEVT_FASCIST in kern/kern_conf.c and see if it makes a difference. If it does, please try to track it down (many hands make light work) or at least try to reproduce it as simply as possible, and describe how to do that. Without DEVT_FASCIST I belive this patch is a no-op. Stylistic/posixoid comments about the userland view of the <sys/*.h> files welcome now, from userland they now contain the end result. Next planned step: make all dev_t's refer to the same devsw[] which means convert BLK's to CHR's at the perimeter of the vnodes and other places where they enter the game (bootdev, mknod, sysctl).	1999-05-11 19:55:07 +00:00
Poul-Henning Kamp	1637aa4b1c	Fix some of the places where too much inside knowledge about major/minor layout and dev_t structure is being (ab)used.	1999-05-08 07:02:41 +00:00
Poul-Henning Kamp	4be2eb8c49	I got tired of seeing all the cdevsw[major(foo)] all over the place. Made a new (inline) function devsw(dev_t dev) and substituted it. Changed to the BDEV variant to this format as well: bdevsw(dev_t dev) DEVFS will eventually benefit from this change too.	1999-05-08 06:40:31 +00:00
Poul-Henning Kamp	46eede0058	Continue where Julian left off in July 1998: Virtualize bdevsw[] from cdevsw. bdevsw() is now an (inline) function. Join CDEV_MODULE and BDEV_MODULE to DEV_MODULE (please pay attention to the order of the cmaj/bmaj arguments!) Join CDEV_DRIVER_MODULE and BDEV_DRIVER_MODULE to DEV_DRIVER_MODULE (ditto!) (Next step will be to convert all bdev dev_t's to cdev dev_t's before they get to do any damage^H^H^H^H^H^Hwork in the kernel.)	1999-05-07 10:11:40 +00:00
Bill Fumerola	3d177f465a	Add sysctl descriptions to many SYSCTL_XXXs PR: kern/11197 Submitted by: Adrian Chadd <adrian@FreeBSD.org> Reviewed by: billf(spelling/style/minor nits) Looked at by: bde(style)	1999-05-03 23:57:32 +00:00
Julian Elischer	4ef2094e45	Reviewed by: Many at differnt times in differnt parts, including alan, john, me, luoqi, and kirk Submitted by: Matt Dillon <dillon@frebsd.org> This change implements a relatively sophisticated fix to getnewbuf(). There were two problems with getnewbuf(). First, the writerecursion can lead to a system stack overflow when you have NFS and/or VN devices in the system. Second, the free/dirty buffer accounting was completely broken. Not only did the nfs routines blow it trying to manually account for the buffer state, but the accounting that was done did not work well with the purpose of their existance: figuring out when getnewbuf() needs to sleep. The meat of the change is to kern/vfs_bio.c. The remaining diffs are all minor except for NFS, which includes both the fixes for bp interaction AND fixes for a 'biodone(): buffer already done' lockup. Sys/buf.h also contains a chaining structure which is not used by this patchset but is used by other patches that are coming soon. This patch deliniated by tags PRE_MAT_GETBUF and POST_MAT_GETBUF. (sorry for the missing T matt)	1999-03-12 02:24:58 +00:00
Matthew Dillon	155f87daf2	Reviewed by: Julian Elischer <julian@whistle.com> Add d_parms() to {c,b}devsw[]. If non-NULL this function points to a device routine that will properly fill in the specinfo structure. vfs_subr.c's checkalias() supplies appropriate defaults. This change should be fully backwards compatible with existing devices.	1999-02-25 05:22:30 +00:00
Matthew Dillon	42e26d47bd	Protect vn worklist and vn->v_{clean,dirty}blkhd at splbio(). Get rid of extra LIST_REMOVE() Reviewed by: hsu@FreeBSD.ORG (Jeffrey Hsu), mckusick@McKusick.COM Submitted by: hsu@FreeBSD.ORG (Jeffrey Hsu), dillon@backplane.com ( Matthew Dillon )	1999-02-19 17:36:58 +00:00
Matthew Dillon	82b23b5384	vp->v_object must be valid after normal flow of vfs_object_create() completes, change if() to KASSERT(). This is not a bug, we are simplify clarifying and optimizing the code. In if/else in vfs_object_create(), the failure of both conditionals will lead to a NULL object. Exit gracefully if this case occurs. ( this case does not normally occur, but needed to be handled ). Obtained from: Eivind Eklund <eivind@FreeBSD.org>	1999-02-04 18:25:39 +00:00
Matthew Dillon	bc81493155	More const fixes for -Wall, -Wcast-qual	1999-01-29 23:18:50 +00:00
Matthew Dillon	8aef171243	Fix warnings in preparation for adding -Wall -Wcast-qual to the kernel compile	1999-01-28 00:57:57 +00:00
Matthew Dillon	1c7c3c6a86	This is a rather large commit that encompasses the new swapper, changes to the VM system to support the new swapper, VM bug fixes, several VM optimizations, and some additional revamping of the VM code. The specific bug fixes will be documented with additional forced commits. This commit is somewhat rough in regards to code cleanup issues. Reviewed by: "John S. Dyson" <root@dyson.iquest.net>, "David Greenman" <dg@root.com>	1999-01-21 08:29:12 +00:00
Eivind Eklund	219cbf59f2	KNFize, by bde.	1999-01-10 01:58:29 +00:00
Eivind Eklund	5526d2d920	Split DIAGNOSTIC -> DIAGNOSTIC, INVARIANTS, and INVARIANT_SUPPORT as discussed on -hackers. Introduce 'KASSERT(assertion, ("panic message", args))' for simple check + panic. Reviewed by: msmith	1999-01-08 17:31:30 +00:00
Eivind Eklund	fb1167777a	Remove the 'waslocked' parameter to vfs_object_create().	1999-01-05 18:50:03 +00:00
Eivind Eklund	0df45b5a31	Finish staticization.	1999-01-05 18:12:29 +00:00
Bruce Evans	289bdf33d3	Ifdefed conditionally used simplock variables.	1999-01-02 11:34:57 +00:00
Bruce Evans	4d94881342	Restored rev.1.31 which was clobbered by rev.1.69 (the big Lite2 merge). This fixes at least hanging in revoke(2) when a somewhat active slave pty is revoked. The hang made the window for the null pointer bug in ufsspec_{read,write} much larger. There are many other bugs in this area (revoke of an active fifo at best leaks memory...).	1998-12-24 12:07:16 +00:00
Eivind Eklund	29c98cd8bf	Check return value of tsleep(). I've checked of all call points - there does not seem to be a problem with this. PR: kern/8732 Analysis by: David G Andersen <danderse@cs.utah.edu> Tested by: Alfred Perlstein <bright@hotjobs.com>	1998-12-22 00:44:11 +00:00
Eivind Eklund	db878ba480	Staticize.	1998-12-21 23:38:33 +00:00
Archie Cobbs	2127f26023	Examine all occurrences of sprintf(), strcat(), and str[n]cpy() for possible buffer overflow problems. Replaced most sprintf()'s with snprintf(); for others cases, added terminating NUL bytes where appropriate, replaced constants like "16" with sizeof(), etc. These changes include several bug fixes, but most changes are for maintainability's sake. Any instance where it wasn't "immediately obvious" that a buffer overflow could not occur was made safer. Reviewed by: Bruce Evans <bde@zeta.org.au> Reviewed by: Matthew Dillon <dillon@apollo.backplane.com> Reviewed by: Mike Spengler <mks@networkcs.com>	1998-12-04 22:54:57 +00:00
Peter Wemm	16e9e530cc	Convert lists for bufs attached to vnodes from a LIST to a TAILQ. - Use TAILQ_* macros extensively instead of internal names - use b_xflags instead of the NOLIST magic number hack in the next pointer - clean bufs are inserted at the tail rather than the head. - redo dirty buffer insert so that metadata (negative lbn) goes to the tail directly rather than at the HEAD. This makes a difference when inserting dirty data blocks in lbn sorted order since data block insertion will not have to bypass all the metadata cruft. data is lbn sorted since it makes sense for clustering and writeback ordering, while metadata sorting doesn't help much since the lbn's are meaningless when walking the list for writebacks. Small systems will not notice much (if any) benefit from this, but really busy systems with large dirty block lists should get a lot more. I've tested this with softdep, and it doesn't seem to mind the change of queueing of metadata. Reviewed (in princible) by: dg Obtained from: partly from John Dyson's work-in-progress patches in June.	1998-10-31 14:20:39 +00:00
Peter Wemm	b421db370b	The last argument to vm_object_page_clean() are now bit flags, rather than the old true/false. While here, have vfs_msync() only call vm_object_page_clean() with OBJPC_SYNC if called with MNT_WAIT flags. vfs_msync() is called at unmount time (with MNT_WAIT) and from the syncer process (formerly update). This should make dirty mmap writebacks a little less nasty. I have tested this a little with SOFTUPDATES enabled, but I don't normally use it since I've been badly burned too many times.	1998-10-31 07:42:04 +00:00
Bruce Evans	cbbbd4c330	Oops, rev.1.167 made the device number checking in bdevvp() too strict for mfs root mounts. Don't require major 255 to be in bdevsw[].	1998-10-29 11:50:32 +00:00
Peter Wemm	20f02ef5e9	Remove the V_SAVEMETA flag, nothing uses it any more now that msdosfs and ext2fs call vtruncbuf() directly. This simplifies and cleans up vinvalbuf() a little.	1998-10-29 09:51:28 +00:00
Bruce Evans	885bf0b57a	Updated the major number check in vfs_object_create(). It's not clear if the check is necessary, but vfs_object_create() is called for all vnodes and it was silly to create objects for VBLK vnodes that don't even have a driver.	1998-10-26 08:07:00 +00:00
Poul-Henning Kamp	f5ef029e92	Nitpicking and dusting performed on a train. Removes trivial warnings about unused variables, labels and other lint.	1998-10-25 17:44:59 +00:00
Bruce Evans	37906c686d	Fixed device number checking in bdevvp(): - dev != NODEV was checked for, but 0 was returned on failure. This was fixed in Lite2 (except the return code was still slightly wrong (ENODEV instead of ENXIO)) but the changes were not merged. This case probably doesn't actually occur under FreeBSD. - major(dev) was not checked to have a valid non-NULL bdevsw entry. This caused panics when the driver for the root device didn't exist. Fixed minor misformattings in bdevvp(). Rev.1.14 consisted mainly of gratuitous reformattings that seem to have caused many Lite2 merge errors. PR: 8417	1998-10-25 16:11:49 +00:00
Dmitrij Tejblum	f74d75a2b6	Backed out rev. 1.164. It caused problems on SMP. PR: 8309	1998-10-14 15:05:52 +00:00
David Greenman	6cde7a165f	Fixed two potentially serious classes of bugs: 1) The vnode pager wasn't properly tracking the file size due to "size" being page rounded in some cases and not in others. This sometimes resulted in corrupted files. First noticed by Terry Lambert. Fixed by changing the "size" pager_alloc parameter to be a 64bit byte value (as opposed to a 32bit page index) and changing the pagers and their callers to deal with this properly. 2) Fixed a bogus type cast in round_page() and trunc_page() that caused some 64bit offsets and sizes to be scrambled. Removing the cast required adding casts at a few dozen callers. There may be problems with other bogus casts in close-by macros. A quick check seemed to indicate that those were okay, however.	1998-10-13 08:24:45 +00:00

1 2 3 4 5

214 Commits