freebsd-dev

Author	SHA1	Message	Date
Poul-Henning Kamp	aec0fb7b40	Back when VOP_* was introduced, we did not have new-style struct initializations but we did have lofty goals and big ideals. Adjust to more contemporary circumstances and gain type checking. Replace the entire vop_t frobbing thing with properly typed structures. The only casualty is that we can not add a new VOP_ method with a loadable module. History has not given us reason to belive this would ever be feasible in the the first place. Eliminate in toto VOCALL(), vop_t, VNODEOP_SET() etc. Give coda correct prototypes and function definitions for all vop_()s. Generate a bit more data from the vnode_if.src file: a struct vop_vector and protype typedefs for all vop methods. Add a new vop_bypass() and make vop_default be a pointer to another struct vop_vector. Remove a lot of vfs_init since vop_vector is ready to use from the compiler. Cast various vop_mumble() to void * with uppercase name, for instance VOP_PANIC, VOP_NULL etc. Implement VCALL() by making vdesc_offset the offsetof() the relevant function pointer in vop_vector. This is disgusting but since the code is generated by a script comparatively safe. The alternative for nullfs etc. would be much worse. Fix up all vnode method vectors to remove casts so they become typesafe. (The bulk of this is generated by scripts)	2004-12-01 23:16:38 +00:00
Poul-Henning Kamp	6fde64c778	Mechanically change prototypes for vnode operations to use the new typedefs.	2004-12-01 12:24:41 +00:00
Poul-Henning Kamp	964ebefd8d	Use system wide no-op vfs_start function.	2004-11-25 09:11:27 +00:00
Jeff Roberson	b646893f0f	- Eliminate the acquisition and release of the bqlock in bremfree() by setting the B_REMFREE flag in the buf. This is done to prevent lock order reversals with code that must call bremfree() with a local lock held. This also reduces overhead by removing two lock operations per buf for fsync() and similar. - Check for the B_REMFREE flag in brelse() and bqrelse() after the bqlock has been acquired so that we may remove ourself from the free-list. - Provide a bremfreef() function to immediately remove a buf from a free-list for use only by NFS. This is done because the nfsclient code overloads the b_freelist queue for its own async. io queue. - Simplify the numfreebuffers accounting by removing a switch statement that executed the same code in every possible case. - getnewbuf() can encounter locked bufs on free-lists once Giant is removed. Remove a panic associated with this condition and delay asserts that inspect the buf until after it is locked. Reviewed by: phk Sponsored by: Isilon Systems, Inc.	2004-11-18 08:44:09 +00:00
Poul-Henning Kamp	9c83534dd8	Make VOP_BMAP return a struct bufobj for the underlying storage device instead of a vnode for it. The vnode_pager does not and should not have any interest in what the filesystem uses for backend. (vfs_cluster doesn't use the backing store argument.)	2004-11-15 09:18:27 +00:00
Poul-Henning Kamp	51ac12ab28	Be prepared to accept NULL mountargs as part of root-mounting.	2004-11-13 13:04:31 +00:00
Poul-Henning Kamp	cf5e414960	Put back the vfs_object_create() calls, they do make a difference when my test-setup does what I want it to instead of what I ask it to. Pointed out by: tegge	2004-11-12 10:27:14 +00:00
Poul-Henning Kamp	40ce27cb57	fix some comments	2004-11-10 06:53:31 +00:00
Poul-Henning Kamp	2e6649198a	Use mount flags instead of NULL path to detect root filesystem mount.	2004-11-09 23:38:10 +00:00
Poul-Henning Kamp	5e2ccaff7a	Stop pretending to have a vm_object backing the underlying disk vnode: it isn't used for anything anywhere and the vnode_pager would explode if we attempted to.	2004-11-09 23:12:45 +00:00
Poul-Henning Kamp	5349c79d75	Properly implement a default version of VOP_GETWRITEMOUNT. Remove improper access to vop_stdgetwritemount() which should and will instead rely on the VOP default path.	2004-11-06 11:41:22 +00:00
Poul-Henning Kamp	40c340aa5d	Don't grab the exclusive bit on a root filesystem until we are willing to mount it. Doing so prevented fsck to be run after a refused mount.	2004-11-04 09:11:22 +00:00
Poul-Henning Kamp	4392001125	Move UFS from DEVFS backing to GEOM backing. This eliminates a bunch of vnode overhead (approx 1-2 % speed improvement) and gives us more control over the access to the storage device. Access counts on the underlying device are not correctly tracked and therefore it is possible to read-only mount the same disk device multiple times: syv# mount -p /dev/md0 /var ufs rw 2 2 /dev/ad0 /mnt ufs ro 1 1 /dev/ad0 /mnt2 ufs ro 1 1 /dev/ad0 /mnt3 ufs ro 1 1 Since UFS/FFS is not a synchrousely consistent filesystem (ie: it caches things in RAM) this is not possible with read-write mounts, and the system will correctly reject this. Details: Add a geom consumer and a bufobj pointer to ufsmount. Eliminate the vnode argument from softdep_disk_prewrite(). Pick the vnode out of bp->b_vp for now. Eventually we should find it through bp->b_bufobj->b_private. In the mountcode, use g_vfs_open() once we have used VOP_ACCESS() to check permissions. When upgrading and downgrading between r/o and r/w do the right thing with GEOM access counts. Remove all the workarounds for not being able to do this with VOP_OPEN(). If we are the root mount, drop the exclusive access count until we upgrade to r/w. This allows fsck of the root filesystem and the MNT_RELOAD to work correctly. Set bo_private to the GEOM consumer on the device bufobj. Change the ffs_ops->strategy function to call g_vfs_strategy() In ufs_strategy() directly call the strategy on the disk bufobj. Same in rawread. In ffs_fsync() we will no longer see VCHR device nodes, so remove code which synced the filesystem mounted on it, in case we came there. I'm not sure this code made sense in the first place since we would have taken the specfs route on such a vnode. Redo the highly bogus readblock() function in the snapshot code to something slightly less bogus: Constructing an uio and using physio was really quite a detour. Instead just fill in a bio and ship it down.	2004-10-29 10:15:56 +00:00
Poul-Henning Kamp	570a7ddaa3	We only support backing UFS/FFS with disks.	2004-10-28 06:19:28 +00:00
Poul-Henning Kamp	a40a512387	Eliminate unnecessary KASSERTS.	2004-10-27 06:45:06 +00:00
Poul-Henning Kamp	93d244fb1a	KASSERT that we only get to prewrite() on writes.	2004-10-26 20:13:49 +00:00
Poul-Henning Kamp	8dd5650594	White space changes. Add missing static.	2004-10-26 20:13:21 +00:00
Poul-Henning Kamp	53389dd64a	Replace single case switch() with if().	2004-10-26 20:12:25 +00:00
Poul-Henning Kamp	b6e2606155	Vertically align comment.	2004-10-26 20:12:00 +00:00
Poul-Henning Kamp	6e77a04170	The island council met and voted buf_prewrite() home. Give ffs it's own bufobj->bo_ops vector and create a private strategy routine, (currently misnamed for forwards compatibility), which is just a copy of the generic bufstrategy routine except we call softdep_disk_prewrite() directly instead of through the buf_prewrite() indirection. Teach UFS about the need for softdep_disk_prewrite() and call the function directly in FFS. Remove buf_prewrite() from the default bufstrategy() and from the global bio_ops method vector.	2004-10-26 10:44:10 +00:00
Poul-Henning Kamp	58883a1fe5	Fix syntax errors introduced by last commit. Why isn't DIRECTIO in NOTES/LINT ?	2004-10-26 09:04:20 +00:00
Poul-Henning Kamp	5d9d81e7ea	Put the I/O block size in bufobj->bo_bsize. We keep si_bsize_phys around for now as that is the simplest way to pull the number out of disk device drivers in devfs_open(). The correct solution would be to do an ioctl(DIOCGSECTORSIZE), but the point is probably mooth when filesystems sit on GEOM, so don't bother for now.	2004-10-26 07:39:12 +00:00
Poul-Henning Kamp	fae974f156	Degeneralize the per cdev copyonwrite callback. The only possible value is ffs_copyonwrite() and the only place it can be called from is FFS which would never want to call another filesystems copyonwrite method, should one exist, so there is no reason why anything generic should know about this.	2004-10-26 06:25:56 +00:00
Poul-Henning Kamp	156cb26583	Loose the v_dirty* and v_clean* alias macros. Check the count field where we just want to know the full/empty state, rather than using TAILQ_EMPTY() or TAILQ_FIRST().	2004-10-25 09:14:03 +00:00
Poul-Henning Kamp	ee1d0eb330	Remove vnode->v_bsize. This was a dead-end.	2004-10-25 07:50:59 +00:00
Poul-Henning Kamp	b792bebeea	Move the buffer method vector (buf->b_op) to the bufobj. Extend it with a strategy method. Add bufstrategy() which do the usual VOP_SPECSTRATEGY/VOP_STRATEGY song and dance. Rename ibwrite to bufwrite(). Move the two NFS buf_ops to more sensible places, add bufstrategy to them. Add inlines for bwrite() and bstrategy() which calls through buf->b_bufobj->b_ops->b_{write,strategy}(). Replace almost all VOP_STRATEGY()/VOP_SPECSTRATEGY() calls with bstrategy().	2004-10-24 20:03:41 +00:00
Poul-Henning Kamp	494eb176e7	Add b_bufobj to struct buf which eventually will eliminate the need for b_vp. Initialize b_bufobj for all buffers. Make incore() and gbincore() take a bufobj instead of a vnode. Make inmem() local to vfs_bio.c Change a lot of VI_[UN]LOCK(bp->b_vp) to BO_[UN]LOCK(bp->b_bufobj) also VI_MTX() to BO_MTX(), Make buf_vlist_add() take a bufobj instead of a vnode. Eliminate other uses of bp->b_vp where bp->b_bufobj will do. Various minor polishing: remove "register", turn panic into KASSERT, use new function declarations, TAILQ_FOREACH_SAFE() etc.	2004-10-22 08:47:20 +00:00
Poul-Henning Kamp	a76d8f4ec9	Move the VI_BWAIT flag into no bo_flag element of bufobj and call it BO_WWAIT Add bufobj_wref(), bufobj_wdrop() and bufobj_wwait() to handle the write count on a bufobj. Bufobj_wdrop() replaces vwakeup(). Use these functions all relevant places except in ffs_softdep.c where the use if interlocked_sleep() makes this impossible. Rename b_vnbufs to b_bobufs now that we touch all the relevant files anyway.	2004-10-21 15:53:54 +00:00
Robert Watson	60c9762920	Explicitly break out NETA license from Berkeley license to clearly indicate license grant, as well as to indicate that NETA is asserting only two clauses, not four clauses. Requested by: imp	2004-10-20 08:05:02 +00:00
Nate Lawson	894d8d3c03	Fix fsbtodb() for UFS1. This fixes an overflow for file sizes >1 TB, allowing for sizes up to 4 TB. This doesn't affect UFS2 since b is already a 64 bit type, coincidental with daddr_t. Submitted by: bde	2004-10-09 20:16:06 +00:00
Pawel Jakub Dawidek	8d02a378aa	Back out changes which were introduced to delay mounting root file system. Those changes were made on gmirror needs, but now gmirror handles this by itself.	2004-10-05 11:26:43 +00:00
Poul-Henning Kamp	4f116178ba	Remove support for accessing device nodes in UFS/FFS. Device nodes can still be created and exported with NFS.	2004-09-28 13:30:58 +00:00
Poul-Henning Kamp	961da2716b	Give cluster_write() an explicit vnode argument. In the future a struct buf will not automatically point out a vnode for us.	2004-09-27 19:14:10 +00:00
Pawel Jakub Dawidek	5a19f8b0c4	Introduce new /boot/loader.conf variable: root_mount_delay. It can be used to delay mounting root partition to give a chance to GEOM providers to show up. Now, when there is no needed provider, vfs_rootmount() function will look for it every second and if it can't be find in defined time, it'll ask for root device name (before this change it was done immediately). This will allow to boot from gmirror device in degraded mode.	2004-09-23 10:13:18 +00:00
Poul-Henning Kamp	d705e025d0	The getpages VOP was a good stab at getting scatter/gather I/O without too much kernel copying, but it is not the right way to do it, and it is in the way for straightening out the buffer cache. The right way is to pass the VM page array down through the struct bio to the disk device driver and DMA directly in to/out off the physical memory. Once the VM/buf thing is sorted out it is next on the list. Retire most of vnode method. ffs_getpages(). It is not clear if what is left shouldn't be in the default implementation which we now fall back to. Retire specfs_getpages() as well, as it has no users now.	2004-09-19 08:14:55 +00:00
Poul-Henning Kamp	b08c753baa	Do not traverse list of snapshots if there isn't one. Found by: scottl	2004-09-16 17:28:56 +00:00
Poul-Henning Kamp	b85e29f007	Missed a place where snapshots were allocated in my last commit to this file.	2004-09-16 15:58:18 +00:00
Poul-Henning Kamp	67673e6677	Create struct snapdata which contains the snapshot fields from cdev and the previously malloc'ed snapshot lock. Malloc struct snapdata instead of just the lock. Replace snapshot fields in cdev with pointer to snapdata (saves 16 bytes). While here, give the private readblock() function a vnode argument in preparation for moving UFS to access GEOM directly.	2004-09-13 07:29:45 +00:00
Poul-Henning Kamp	883d3c0c07	Remove the buffercache/vnode side of BIO_DELETE processing in preparation for integration of p4::phk_bufwork. In the future, local filesystems will talk to GEOM directly and they will consequently be able to issue BIO_DELETE directly. Since the removal of the fla driver, BIO_DELETE has effectively been a no-op anyway.	2004-09-13 06:50:42 +00:00
Poul-Henning Kamp	1affa3adc8	Create simple function init_va_filerev() for initializing a va_filerev field. Replace three instances of longhaired initialization va_filerev fields. Added XXX comment wondering why we don't use random bits instead of uptime of the system for this purpose.	2004-09-07 09:17:05 +00:00
Christian S.J. Peron	60088fb7b1	Currently, if the secure level is low enough, system flags can be manipulated by prison root. In 4.x prison root can not manipulate system flags, regardless of the security level. This behavior should remain consistent to avoid any surprises which could lead to security problems for system administrators which give out privileged access to jails. This commit changes suser_cred's flag argument from SUSER_ALLOWJAIL to 0. This will prevent prison root from being able to manipulate system flags on files. This may be a MFC candidate for RELENG_5. Discussed with: cperciva Reviewed by: rwatson Approved by: bmilekic (mentor) PR: kern/70298	2004-08-22 02:03:41 +00:00
John Baldwin	b72ea57f3b	Generalize the UFS bad magic value used to determine when a filesystem has only been partly initialized via newfs(8) so that it applies to both UFS1 and UFS2. Submitted by: "Xin LI" delphij at frontfree dot net MFC: maybe?	2004-08-19 11:09:13 +00:00
David Malone	da126abaf1	When looking for some extra data to include in the hash, use the address of the dirhash, rather than the first sizeof(struct dirhash *) bytes of the structure (which, thankfully, seem to be constant). Submitted by: Ted Unangst <tedu@zeitbombe.org> MFC after: 2 weeks	2004-08-16 10:00:44 +00:00
John-Mark Gurney	ad3b9257c2	Add locking to the kqueue subsystem. This also makes the kqueue subsystem a more complete subsystem, and removes the knowlege of how things are implemented from the drivers. Include locking around filter ops, so a module like aio will know when not to be unloaded if there are outstanding knotes using it's filter ops. Currently, it uses the MTX_DUPOK even though it is not always safe to aquire duplicate locks. Witness currently doesn't support the ability to discover if a dup lock is ok (in some cases). Reviewed by: green, rwatson (both earlier versions)	2004-08-15 06:24:42 +00:00
Poul-Henning Kamp	7ac439fec4	use bufdone() not biodone().	2004-08-08 13:23:05 +00:00
Poul-Henning Kamp	5e8c582ac2	Put a version element in the VFS filesystem configuration structure and refuse initializing filesystems with a wrong version. This will aid maintenance activites on the 5-stable branch. s/vfs_mount/vfs_omount/ s/vfs_nmount/vfs_mount/ Name our filesystems mount function consistently. Eliminate the namiedata argument to both vfs_mount and vfs_omount. It was originally there to save stack space. A few places abused it to get hold of some credentials to pass around. Effectively it is unused. Reorganize the root filesystem selection code.	2004-07-30 22:08:52 +00:00
Poul-Henning Kamp	d634f69316	Remove global variable rootdevs and rootvp, they are unused as such. Add local rootvp variables as needed. Remove checks for miniroot's in the swappartition. We never did that and most of the filesystems could never be used for that, but it had still been copy&pasted all over the place.	2004-07-28 20:21:04 +00:00
Alexander Kabaev	b403319b8d	Avoid using casts as lvalues. Introduce DIP_SET macro which sets proper inode field based on UFS version. Use DIP ro read values and DIP_SET to modify them throughout FFS code base.	2004-07-28 06:41:27 +00:00
Colin Percival	56f21b9d74	Rename suser_cred()'s PRISON_ROOT flag to SUSER_ALLOWJAIL. This is somewhat clearer, but more importantly allows for a consistent naming scheme for suser_cred flags. The old name is still defined, but will be removed in a few days (unless I hear any complaints...) Discussed with: rwatson, scottl Requested by: jhb	2004-07-26 07:24:04 +00:00
Poul-Henning Kamp	d8d3d4158b	Make sure to update the mnt_stats before UFS1 extattr tried to do I/O on the device. Otherwise the blocksize is undefined in the buffer cache.	2004-07-14 14:19:32 +00:00
Alfred Perlstein	f257b7a54b	Make VFS_ROOT() and vflush() take a thread argument. This is to allow filesystems to decide based on the passed thread which vnode to return. Several filesystems used curthread, they now use the passed thread.	2004-07-12 08:14:09 +00:00
Marcel Moolenaar	f65de26bf6	Update for the KDB debugger framework: o Make debugging code conditional upon KDB. o Use kdb_backtrace() instead of backtrace(). o Remove inclusion of opt_ddb.h.	2004-07-10 20:45:47 +00:00
Poul-Henning Kamp	c94cd5fc8c	Explicity initialize vp->v_bsize.	2004-07-07 20:04:06 +00:00
Poul-Henning Kamp	e3c5a7a4dd	When we traverse the vnodes on a mountpoint we need to look out for our cached 'next vnode' being removed from this mountpoint. If we find that it was recycled, we restart our traversal from the start of the list. Code to do that is in all local disk filesystems (and a few other places) and looks roughly like this: MNT_ILOCK(mp); loop: for (vp = TAILQ_FIRST(&mp...); (vp = nvp) != NULL; nvp = TAILQ_NEXT(vp,...)) { if (vp->v_mount != mp) goto loop; MNT_IUNLOCK(mp); ... MNT_ILOCK(mp); } MNT_IUNLOCK(mp); The code which takes vnodes off a mountpoint looks like this: MNT_ILOCK(vp->v_mount); ... TAILQ_REMOVE(&vp->v_mount->mnt_nvnodelist, vp, v_nmntvnodes); ... MNT_IUNLOCK(vp->v_mount); ... vp->v_mount = something; (Take a moment and try to spot the locking error before you read on.) On a SMP system, one CPU could have removed nvp from our mountlist but not yet gotten to assign a new value to vp->v_mount while another CPU simultaneously get to the top of the traversal loop where it finds that (vp->v_mount != mp) is not true despite the fact that the vnode has indeed been removed from our mountpoint. Fix: Introduce the macro MNT_VNODE_FOREACH() to traverse the list of vnodes on a mountpoint while taking into account that vnodes may be removed from the list as we go. This saves approx 65 lines of duplicated code. Split the insmntque() which potentially moves a vnode from one mount point to another into delmntque() and insmntque() which does just what the names say. Fix delmntque() to set vp->v_mount to NULL while holding the mountpoint lock.	2004-07-04 08:52:35 +00:00
Robert Watson	12ec7658a4	Annotate that we don't check the returned data length from ufs_readdir() because UFS uses fixed-size directory blocks. When using this code with other file systems, such as HFS+, the value of auio.uio_resid will need to be taken into account.	2004-06-24 18:31:23 +00:00
Robert Watson	bb0527fdd3	Remove unnecessary setting of VV_SYSTEM on extended attribute backing files. When this flag is used in our port of this code to Darwin, it caused remarkable pain, and doesn't offer a benefit in FreeBSD.	2004-06-24 18:17:41 +00:00
Robert Watson	00a460dcf4	Protect a non-text comment with a '-'.	2004-06-24 17:45:45 +00:00
Robert Watson	cd39d9b661	White space cleanup: use spaces instead of tabs in variable declarations local to a function. Remove a couple of blank lines in variable declarations. In one case, explicitly test against NULL rather than using a pointer as a boolean directly.	2004-06-24 17:44:14 +00:00
Bruce Evans	8f7c483f5c	Backed out previous commit. The dev_t -> `struct cdev ' changes have lots of errors. Blind substitution of "dev_t foo" by "struct cdev foo" in comments usually just created an English syntax error (e.g., "struct cdev *changes"), but here it did less than that since the dev_t is a user dev_t.	2004-06-20 03:11:19 +00:00
Jun Kuriyama	86030e4a00	Avoid deadlock which is caused by locking VDIR of parent and VREG of snapshot itself in wrong order. We can skip unlink check of that directory because it must have snapshot in it. Reviewed by: mckusick and current@	2004-06-18 14:35:17 +00:00
Poul-Henning Kamp	89c9c53da0	Do the dreaded s/dev_t/struct cdev */ Bump __FreeBSD_version accordingly.	2004-06-16 09:47:26 +00:00
Julian Elischer	fa88511615	Nice, is a property of a process as a whole.. I mistakenly moved it to the ksegroup when breaking up the process structure. Put it back in the proc structure.	2004-06-16 00:26:31 +00:00
Stefan Farfeleder	1a5ff9285a	Avoid assignments to cast expressions. Reviewed by: md5 Approved by: das (mentor)	2004-06-08 13:08:19 +00:00
Tim J. Robbins	fa2a4d0595	Move TDF_DEADLKTREAT into td_pflags (and rename it accordingly) to avoid having to acquire sched_lock when manipulating it in lockmgr(), uiomove(), and uiomove_fromphys(). Reviewed by: jhb	2004-06-03 01:47:37 +00:00
Kirill Ponomarev	b4a1d9299a	- Fix typo Approved by: tobez	2004-05-31 16:55:12 +00:00
Ken Smith	4b14cc0205	Upon further review it was decided this piece of the msync(2) fixes was applicable to HEAD, originally it was thought this should only be done in RELENG_4. Implement IO_INVAL in the vnode op for writing by marking the buffer as "no cache". This fix has already been applied to RELENG_4 as Rev. 1.65.2.15 of ufs/ufs/ufs_readwrite.c. Reviewed by: alc, tegge	2004-05-21 12:05:48 +00:00
Ken Smith	83d8045f16	Style fixup in previous commit. Noticed by: bde (thanks!)	2004-05-19 18:06:21 +00:00
Ken Smith	f7dd67d801	Change ffs_realloccg() to set the valid bits for the extended part of the fragment to zero the valid parts of a VM_IO buffer. RE would like this to be part of 4.10-RC3 so this will be MFC-ed immediately. Reviewed by: alc, tegge	2004-05-14 22:00:08 +00:00
Bosko Milekic	451079d4ab	Revert previous change to this file because it breaks some things which compare /etc/fstab entries to results from getfsstat(). The real way to fix this is to make 'ufs2' a recognized filesystem (for real, no beating around the bush). This should fix things like 'umount -a -t ufs' now. Appologies for the previous breakage.	2004-04-29 15:10:42 +00:00
Bosko Milekic	2aebb586db	The previous change to mount(8) to report ufs or ufs2 used libufs, which only works for Charlie root. This change reverts the introduction of libufs and moves the check into the kernel. Since the f_fstypename is the same for both ufs and ufs2, we check fs_magic for presence of ufs2 and copy "ufs2" explicitly instead. Submitted by: Christian S.J. Peron <maneo@bsdpro.com>	2004-04-26 15:13:46 +00:00
Bruce Evans	f679aa45a7	Record where half the bits in this file came from (from ufs_readwrite.c). Damage to history from moving bits was especially large since a repo copy is not feasible for partial files.	2004-04-07 11:21:18 +00:00
Warner Losh	012d41340a	Remove advertising clause from University of California Regent's license, per letter dated July 22, 1999 and irc message from Robert Watson saying that clause 3 can be removed from those files with an NAI copyright that also have only a University of California copyrights. Approved by: core, rwatson	2004-04-07 03:47:21 +00:00
John Baldwin	255ec151e6	Fix a paste-o from the buf_prewrite() cleanup commit and check for the MNTK_SUSPEND flag on the correct vnode pointer in softdep_disk_prewrite(). Reviewed by: phk Tested by: kensmith	2004-04-06 19:20:24 +00:00
Maxime Henrion	b1fddb236f	Fix the remaining warnings of growfs(8) on my sparc64 box with WARNS=6. I don't change the WARNS level in the Makefile because I didn't tested this on other archs. The fs.h fix was suggested by: marcel Reviewed by: md5(1)	2004-04-03 23:30:59 +00:00
Alexander Kabaev	c355fd5a84	Avoid doing bawrite to initialize inode block while holding cylinder group block locked. If filesystem has any active snapshots, bawrite can come back trying to allocate new snapshot data block from the same cylinder group and cause panic due to recursive lock attempt. PR: 64206 Reviewed by: mckusick Tested by: pjd	2004-03-16 22:06:32 +00:00
Poul-Henning Kamp	ceb58ca58f	When I was a kid my work table was one cluttered mess an cleaning it up were a rather overwhelming task. I soon learned that if you don't know where you're going to store something, at least try to pile it next to something slightly related in the hope that a pattern emerges. Apply the same principle to the ffs/snapshot/softupdates code which have leaked into specfs: Add yet a buf-quasi-method and call it from the only two places I can see it can make a difference and implement the magic in ffs_softdep.c where it belongs. It's not pretty, but at least it's one less layer violated.	2004-03-11 18:50:33 +00:00
Poul-Henning Kamp	4d453ef101	Properly vector all bwrite() and BUF_WRITE() calls through the same path and s/BUF_WRITE()/bwrite()/ since it now does the same as bwrite().	2004-03-11 18:02:36 +00:00
Kirk McKusick	ecef42e1eb	A more accurate test in the new ufs_lock than that in 1.235.	2004-02-23 19:05:05 +00:00
Kirk McKusick	546a1660f0	In the function clear_inodedeps(), a FREE_LOCK() should be called AFTER the call to vn_start_write(), not before it. Otherwise, it is possible to unlock it multiple times if the vn_start_write() fails. Submitted by: Juergen Hannken-Illjes <hannken@eis.cs.tu-bs.de>	2004-02-23 06:56:31 +00:00
Kirk McKusick	6c053cec34	Change UFS from using vop_stdlock to using its own ufs_lock. In ufs_lock, check for attempts to acquire shared locks on snapshot files and change them to be exclusive locks. This change eliminates deadlocks and machine lockups reported in -current since most read requests started using shared lock requests. Submitted by: Jun Kuriyama <kuriyama@imgsrc.co.jp>	2004-02-23 06:40:17 +00:00
Robert Watson	f6a4109212	Update my personal copyrights and NETA copyrights in the kernel to use the "year1-year3" format, as opposed to "year1, year2, year3". This seems to make lawyers more happy, but also prevents the lines from getting excessively long as the years start to add up. Suggested by: imp	2004-02-22 00:33:12 +00:00
David Malone	346180de08	Abstract dirhash's locking using macros. This should make it easier to use the same dirhash code on different branches/platforms. Reviewed by: Ted Unangst <tedu@zeitbombe.org> Reviewed by: iedowse MFC after: 3 weeks	2004-02-15 21:39:35 +00:00
Bruce Evans	e9827c6d93	Fixed some style bugs: - don't unlock the vnode after vinvalbuf() only to have to relock it almost immediately. - don't refer to devices classified by vn_isdisk() as block devices.	2004-02-14 04:41:13 +00:00
Bruce Evans	0efb13948d	MFextfs: backed out secondary changes in rev.1.40 that had become just style bugs (a variable that is used only once, and misformattings).	2004-02-13 03:05:12 +00:00
Jun Kuriyama	df1941fb59	Fix style bugs in previous commit. Submitted by: bde	2004-02-13 02:02:06 +00:00
Bruce Evans	8adff5fc12	Fixed some minor style bugs (English usage and formatting of binary operators) in and near revs.1.169-1.170 (open mode bandaid). This (or better a proper fix) should have been done before cloning the bandaid to many other file systems.	2004-02-12 16:52:24 +00:00
Jun Kuriyama	5580f04ab0	Reverse lock order by using local variable. This will shut up "acquiring duplicate lock of same type" message. Reviewed by: mckusick	2004-02-12 08:52:08 +00:00
Bruce Evans	1723bc36ef	Removed more vestiges of vfs_ioopt: - rev.1.42 of ffs_readwrite.c added a special case in ffs_read() for reads that are initially at EOF, and rev.1.62 of ufs_readwrite.c fixed timestamp bugs in it. Removal of most of vfs_ioopt made it just and optimization, and removal of the vm object reference calls made it less than an optimization. It was cloned in rev.1.94 of ufs_readwrite.c as part of cloning ffs_extwrite() although it was always less than an optimization in ffs_extwrite(). - some comments, compound statements and vertical whitespace were vestiges of dead code.	2004-02-11 15:27:26 +00:00
John Baldwin	91d5354a2c	Locking for the per-process resource limits structure. - struct plimit includes a mutex to protect a reference count. The plimit structure is treated similarly to struct ucred in that is is always copy on write, so having a reference to a structure is sufficient to read from it without needing a further lock. - The proc lock protects the p_limit pointer and must be held while reading limits from a process to keep the limit structure from changing out from under you while reading from it. - Various global limits that are ints are not protected by a lock since int writes are atomic on all the archs we support and thus a lock wouldn't buy us anything. - All accesses to individual resource limits from a process are abstracted behind a simple lim_rlimit(), lim_max(), and lim_cur() API that return either an rlimit, or the current or max individual limit of the specified resource from a process. - dosetrlimit() was renamed to kern_setrlimit() to match existing style of other similar syscall helper functions. - The alpha OSF/1 compat layer no longer calls getrlimit() and setrlimit() (it didn't used the stackgap when it should have) but uses lim_rlimit() and kern_setrlimit() instead. - The svr4 compat no longer uses the stackgap for resource limits calls, but uses lim_rlimit() and kern_setrlimit() instead. - The ibcs2 compat no longer uses the stackgap for resource limits. It also no longer uses the stackgap for accessing sysctl's for the ibcs2_sysconf() syscall but uses kernel_sysctl() instead. As a result, ibcs2_sysconf() no longer needs Giant. - The p_rlimit macro no longer exists. Submitted by: mtm (mostly, I only did a few cleanups and catchups) Tested on: i386 Compiled on: alpha, amd64	2004-02-04 21:52:57 +00:00
Alan Cox	bfb7317ebf	Remove unnecessary vm object reference and deallocate calls from ffs_read() and ffs_write(). These calls trace their origins to the dead vfs_ioopt code, first appearing in revision 1.39 of ufs_readwrite.c. Observed by: bde Discussed with: tegge	2004-01-31 05:42:58 +00:00
Andrey A. Chernov	a0036d23a6	Turn uio_resid/uio_offset comments into KASSERTs Reviewed by: bde	2004-01-27 11:28:38 +00:00
Andrey A. Chernov	51cf017614	Copy comment about caller check from ffs_read to ffs_extread, don't check for uio_resid < 0 here too.	2004-01-23 06:00:41 +00:00
Andrey A. Chernov	070f8eefb1	Fix various panic() strings to reflect true function name to allow easy grep. Small code reorganization to look more logic. Copy ffs_write check from prev. commit to ffs_extwrite.	2004-01-23 05:52:31 +00:00
Andrey A. Chernov	bd0cc17757	ffs_read: Replace wrong check returned EFBIG with EOVERFLOW handling from POSIX: 36708 [EOVERFLOW] The file is a regular file, nbyte is greater than 0, the starting position is before the end-of-file, and the starting position is greater than or equal to the offset maximum established in the open file description associated with fildes. ffs_write: Replace u_int64_t cast with uoff_t cast which is more natural for types used. ffs_write & ffs_read: Remove uio_offset and uio_resid checks for negative values, the caller supposed to do it already. Add comments about it. Reviewed by: bde	2004-01-23 05:38:02 +00:00
Alexander Kabaev	6bd39fe978	Spell magic '16' number as IO_SEQSHIFT.	2004-01-19 20:03:43 +00:00
Alexander Kabaev	291027ce9c	Avoid calling vprint on a vnode while holding its interlock mutex. Move diagnostic printf after vget. This might delay the debug output some, but at least it keeps kernel from exploding if DEBUG_VFS_LOCKS is in effect.	2004-01-04 04:08:34 +00:00
Don Lewis	31c81e4bed	Set fs_ronly to the correct value in ffs_reload() when reloading the file system super block after fsck has repaired the file system. The value of fs_ronly was getting overwritten, which caused ffs_update() to attempt to update inode timestamps even though the file system was still mounted read-only. This fixes the "giving up on N buffers" error that is triggered by running fsck on the root file system and then rebooting without mounting the file system read-write.	2003-12-07 05:16:52 +00:00
Wes Peters	ec52df8eb9	Write the UFS2 superblock with a 'BAD' magic number at the beginning of newfs, to signify the newfs operation has not yet completed. Re- write the superblock with the correct magic number once all of the cylinder groups have been created to show the operation has finished. Sponsored by: St. Bernard Software	2003-11-16 07:08:27 +00:00
Poul-Henning Kamp	00cbe31bd8	Send B_PHYS out to pasture, it no longer serves any function.	2003-11-15 09:28:09 +00:00
Alan Cox	c78b8dfacf	Call free(9) after the vnode interlock is released, avoiding a lock-order reversal.	2003-11-13 03:56:32 +00:00
Kirk McKusick	fde81c7d8e	Update the statfs structure with 64-bit fields to allow accurate reporting of multi-terabyte filesystem sizes. You should build and boot a new kernel BEFORE doing a `make world' as the new kernel will know about binaries using the old statfs structure, but an old kernel will not know about the new system calls that support the new statfs structure. Running an old kernel after a `make world' will cause programs such as `df' that do a statfs system call to fail with a bad system call. Reviewed by: Bruce Evans <bde@zeta.org.au> Reviewed by: Tim Robbins <tjr@freebsd.org> Reviewed by: Julian Elischer <julian@elischer.org> Reviewed by: the hoards of <arch@freebsd.org> Sponsored by: DARPA & NAI Labs.	2003-11-12 08:01:40 +00:00
Alexander Kabaev	ca430f2e92	Remove mntvnode_mtx and replace it with per-mountpoint mutex. Introduce two new macros MNT_ILOCK(mp)/MNT_IUNLOCK(mp) to operate on this mutex transparently. Eventually new mutex will be protecting more fields in struct mount, not only vnode list. Discussed with: jeff	2003-11-05 04:30:08 +00:00
Alexander Kabaev	45d45c6cde	Use VOP_UNLOCK/vrele instead of vput. td was erecived as a parameter and one cannot be sure it is equal to curthread.	2003-11-03 04:46:19 +00:00
Alexander Kabaev	cb9ddc80ae	Take care not to call vput if thread used in corresponding vget wasn't curthread, i.e. when we receive a thread pointer to use as a function argument. Use VOP_UNLOCK/vrele in these cases. The only case there td != curthread known at the moment is boot() calling sync with thread0 pointer. This fixes the panic on shutdown people have reported.	2003-11-02 04:52:53 +00:00
Alexander Kabaev	492c1e68fb	Temporarily undo parts of the stuct mount locking commit by jeff. It is unsafe to hold a mutex across vput/vrele calls. This will be redone when a better locking strategy is agreed upon. Discussed with: jeff	2003-11-01 05:51:54 +00:00
Don Lewis	9f206707a5	Tweak the calculation of minbfree in ffs_dirpref() so that only those cylinder groups that have at least 75% of the average free space per cylinder group for that file system are considered as candidates for the creation of a new directory. The previous formula for minbfree would set it to zero if the file system was more than 75% full, which allowed cylinder groups with no free space at all to be chosen as candidates for directory creation, which resulted in an expensive search for free blocks for each file that was subsequently created in that directory. Modify the calculation of minifree in the same way. Decrease maxcontigdirs as the file system fills to decrease the likelyhood that a cluster of directories will overflow the available space in a cylinder group. Reviewed by: mckusick Tested by: kmarx@vicor.com MFC after: 2 weeks	2003-10-31 07:25:06 +00:00
John Baldwin	787f162df6	Move the P_COWINPROGRESS flag from being a per-process p_flag to being a per-thread td_pflag which doesn't require any locks to read or write as it is only read or written by curthread on itself. Glanced at by: mckusick	2003-10-23 21:14:08 +00:00
Tor Egge	f0da6ec99b	Initialize bp->b_offset to the physical offset in partition so GEOM knows where to read from disk.	2003-10-22 18:57:59 +00:00
Poul-Henning Kamp	2c18019f14	DuH! bp->b_iooffset (the spot on the disk), not bp->b_offset (the offset in the file)	2003-10-18 14:10:28 +00:00
Poul-Henning Kamp	4e1694ecaf	Initialize bp->b_offset before calling VOP_[SPEC]STRATEGY()	2003-10-18 11:16:33 +00:00
Kirk McKusick	bd189c8c3e	When expunging unlinked files from a snapshot, skip over holes in the file rather than panicing with "indiracct: botched params". Submitted by: Mark Santcroos <marks@ripe.net>	2003-10-17 13:57:58 +00:00
Jeff Roberson	a844eb934c	- My last commit to this file is still not safe, I believe that it may be due to the recursion in indir_trunc().	2003-10-06 03:28:03 +00:00
Jeff Roberson	8af6a57099	- Reinstate 1.142 this was fixed by 1.144.	2003-10-06 02:39:37 +00:00
Jeff Roberson	69b609a85d	- The VCHR case in ffs_sync() is an unneccsary optimization especially considering how infrequently we access devices via ffs now that we have devfs. Collapse this case with the other case. Obtained from: bde	2003-10-05 22:56:33 +00:00
Jeff Roberson	ab1f917b53	- Further simplify ffs_sync(). The vnode lock is required for UFS_UPDATE() so make the code slightly more uniform. The vnode lock is acquired in all cases and now the only difference between VCHR and other is we call UFS_UPDATE instead of VOP_FSYNC().	2003-10-05 09:42:24 +00:00
Jeff Roberson	cffa37d466	- In ffs_update() assert that either the vnode lock or the XLOCK is held.	2003-10-05 09:39:02 +00:00
Jeff Roberson	2f05568aa8	- Check the XLOCK before inspecting v_data. - Slightly rewrite the fsync loop to be more lock friendly. We must acquire the vnode interlock before dropping the mnt lock. We must also check XLOCK to prevent vclean() races. - Use LK_INTERLOCK in the vget() in ffs_sync to further prevent vclean() races. - Use a local variable to store the results of the nvp == TAILQ_NEXT test so that we do not access the vp after we've vrele()d it. - Add an XXX comment about UFS_UPDATE() not being protected by any lock here. I suspect that it should need the VOP lock.	2003-10-05 07:16:45 +00:00
Jeff Roberson	53938b4a86	- Skip over xvp if XLOCK is set.	2003-10-05 06:48:37 +00:00
Jeff Roberson	5c014b9d6d	- Don't cache_purge() in ufs_reclaim. vclean() does it for us so this is redundant.	2003-10-05 02:45:00 +00:00
Alan Cox	ccf78b6895	Synchronize access to a vm page's valid field using the containing vm object's lock.	2003-10-04 20:38:32 +00:00
Jeff Roberson	cac3558da3	- The VI assert in getdirtybuf() is only valid if we're not on a VCHR vnode. VCHR vnodes don't do background writes. Reported by: kan	2003-10-04 15:57:05 +00:00
Jeff Roberson	04a17687ea	- Increase the scope of the interlock in ffs_reload(). Acquire it before we release the mntvnode_mtx. - Call vgonel() directly instead of going through vrecycle() since we own the interlock now. - Remove a few cases where we locked the interlock just so that we could call VOP_UNLOCK with interlock held.	2003-10-04 14:27:49 +00:00
Jeff Roberson	934914d2ef	- Fix an unlocked call to GETATTR by slightly shuffling the code in ffs_snapshot() around. - Acquire the interlock before releasing the mntvnode_mtx. Use the interlock to protect v_usecount access.	2003-10-04 14:25:45 +00:00
Jeff Roberson	90e1659e41	- Use the VI_LOCK macro in two places where we directly called mtx_lock() before. Direct calls indicated places that needed review and these have now been reviewed.	2003-10-04 14:03:28 +00:00
Jeff Roberson	8f2e9e4388	- Properly acquire the vnode interlock before releasing the mntvnode_mtx. - Use a local variable to store the results of the test to see if the next vnode on the mount list has changed. This is so that we no longer acess the vnode after we vput() it.	2003-10-04 14:02:32 +00:00
Jeff Roberson	04c81ad83c	- Remove a mp_fixme() and some locks that weren't necessary. I now understand how this works.	2003-10-04 11:06:43 +00:00
Jeff Roberson	cfd5600c66	- Several of the callers to getdirtybuf() were erroneously changed to pass in a list head instead of a pointer to the first element at the time of the first call. These lists are subject to change, and getdirtybuf() would refetch from the wrong list in some cases. Spottedy by: tegge Pointy hat to: me	2003-09-03 04:08:15 +00:00
Jeff Roberson	23efe6dafc	- Backout rev 1.142. This caused a deadlock that I do not understand. More investigation is required.	2003-08-31 11:26:52 +00:00
Jeff Roberson	d919a11d06	- Define a new flag for getblk(): GB_NOCREAT. This flag causes getblk() to bail out if the buffer is not already present. - The buffer returned by incore() is not locked and should not be sent to brelse(). Use getblk() with the new GB_NOCREAT flag to preserve the desired semantics.	2003-08-31 08:50:11 +00:00
Jeff Roberson	a0ebaaddef	- Don't acquire the vnode interlock in drain_output(). Instead, require the caller to acquire it. This permits drain_output() to be done atomically with other operations as well as reducing the number of lock operations. - Assert that the proper locks are held in drain_output(). - Change getdirtybuf() to accept a mutex as an argument. This mutex is used to protect the vnode's buf list and the BKGRDWAIT flag. This lock is dropped when we successfully acquire a buffer and held on return otherwise. These semantics reduce the number of cumbersome cases in calling code. - Pass the mtx from getdirtybuf() into interlocked_sleep() and allow this mutex to be used as the interlock argument to BUF_LOCK() in the LOCKBUF case of interlocked_sleep(). - Change the return value of getdirtybuf() to be the resulting locked buffer or NULL otherwise. This is for callers who pass in a list head that requires a lock. It is necessary since the lock that protects the list head must be dropped in getdirtybuf() so that we don't have a lock order reversal with the buf queues lock in bremfree(). - Adjust all callers of getdirtybuf() to match the new semantics. - Add a comment in indir_trunc() that points at unlocked access to a buf. This may also be one of the last instances of incore() in the tree.	2003-08-31 07:29:34 +00:00
Jeff Roberson	9dbfeb0ae6	- Move BX_BKGRDWAIT and BX_BKGRDINPROG to BV_ and the b_vflags field. - Surround all accesses of the BKGRD{WAIT,INPROG} flags with the vnode interlock. - Don't use the B_LOCKED flag and QUEUE_LOCKED for background write buffers. Check for the BKGRDINPROG flag before recycling or throwing away a buffer. We do this instead because it is not safe for us to move the original buffer to a new queue from the callback on the background write buffer. - Remove the B_LOCKED flag and the locked buffer queue. They are no longer used. - The vnode interlock is used around checks for BKGRDINPROG where it may not be strictly necessary. If we hold the buf lock the a back-ground write will not be started without our knowledge, one may only be completed while we're not looking. Rather than remove the code, Document two of the places where this extra locking is done. A pass should be done to verify and minimize the locking later.	2003-08-28 06:55:18 +00:00
Alan Cox	9cf8f2f707	The previous change necessitates the addition of a new #include. Otherwise, there is a compilation warning.	2003-08-18 17:27:08 +00:00
Poul-Henning Kamp	b103854847	Don't use a VOP_*() function on our own vnodes, go directly to the relevant internal function, in this case ufs_bmaparray().	2003-08-17 19:26:03 +00:00
Alan Cox	f6c098e569	Revision 1.44 of ufs/ufs/inode.h has made it necessary to add two new #includes to this file. Otherwise, it doesn't compile.	2003-08-16 06:15:17 +00:00
Poul-Henning Kamp	5c24d6ee26	Eliminate the i_devvp field from the incore UFS inodes, we can get the same value from ip->i_ump->um_devvp. This saves a pointer in the memory copies of inodes, which can easily run into several hundred kilobytes. The extra indirection is unmeasurable in benchmarks. Approved by: mckusick	2003-08-15 20:03:19 +00:00
John Baldwin	8b149b5131	Consistently use the BSD u_int and u_short instead of the SYSV uint and ushort. In most of these files, there was a mixture of both styles and this change just makes them self-consistent. Requested by: bde (kern_ktrace.c)	2003-08-07 15:04:27 +00:00
Robert Watson	2495048579	Now that the central POSIX.1e ACL code implements functions to generate the inode mode from a default ACL and creation mask, implement ufs_sync_inode_from_acl() using acl_posix1e_newfilemode(). Since ACL_OVERRIDE_MASK/ACL_PRESERVE_MASK are defined, we no longer need to explicitly pass in a "preserve_mask" field: this is implicit in the use of POSIX.1e semantics. Note: this change contains a semantic bugfix for new file creation: we now intersect the ACL-generated mode and the cmode requested by the user process. This means permissions on newly created file objects will now be more conservative. In the future, we may want to provide alternative semantics (similar to Solaris and Linux) in which the ACL mask overrides the umask, permitting ACLs to broaden the rights beyond the requested umask. PR: 50148 Reported by: Ritz, Bruno <bruno_ritz@gmx.ch> Obtained from: TrustedBSD Project	2003-08-04 03:29:13 +00:00
Robert Watson	7942b925b8	In ufs_chmod(), use privilege only when required in the following cases: - Setting sticky bit on non-directory - Setting setgid on a file with a group that isn't in the effective or extended groups of the authorizing credential I.e., test the requirement first, then do the privilege test, rather than doing the privilege test regardless of the need for privilege. Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories	2003-08-04 00:31:01 +00:00
Robert Watson	9080ff25cf	Rename VOP_RMEXTATTR() to VOP_DELETEEXTATTR() for consistency with the kernel ACL interfaces and system call names. Break out UFS2 and FFS extattr delete and list vnode operations from setextattr and getextattr to deleteextattr and listextattr, which cleans up the implementations, and makes the results more readable, and makes the APIs more clear. Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories	2003-07-28 18:53:29 +00:00
Poul-Henning Kamp	7c89f162bc	Add fdidx argument to vn_open() and vn_open_cred() and pass -1 throughout.	2003-07-27 17:04:56 +00:00
Poul-Henning Kamp	a8d43c90af	Add a "int fd" argument to VOP_OPEN() which in the future will contain the filedescriptor number on opens from userland. The index is used rather than a "struct file " since it conveys a bit more information, which may be useful to in particular fdescfs and /dev/fd/ For now pass -1 all over the place.	2003-07-26 07:32:23 +00:00
Poul-Henning Kamp	b941a2beb7	We just cached the inode pointer, no need to call VTOI() again.	2003-07-04 12:16:33 +00:00
Alan Cox	4e28b22e35	Lock the vm object when freeing pages.	2003-06-15 21:50:38 +00:00
Poul-Henning Kamp	cefb5754dd	Add the same KASSERT to all VOP_STRATEGY and VOP_SPECSTRATEGY implementations to check that the buffer points to the correct vnode.	2003-06-15 18:53:00 +00:00
Robert Watson	44533b1722	Re-implement kernel access control for quotactl() as found in the UFS quota implementation. Push some quite broken access control logic out of ufs_quotactl() into the individual command implementations in ufs_quota.c; fix that logic. Pass in the thread argument to any quotactl command that will need to perform access control. o quotaon() requires privilege (PRISON_ROOT). o quotaoff() requires privilege (PRISON_ROOT). o getquota() requires that: If the type is USRQUOTA, either the effective uid match the requested quota ID, that the unprivileged_get_quota flag be set, or that the thread be privileged (PRISON_ROOT). If the type is GRPQUOTA, require that either the thread be a member of the group represented by the requested quota ID, that the unprivileged_get_quota flag be set, or that the thread be privileged (PRISON_ROOT). o setquota() requires privilege (PRISON_ROOT). o setuse() requires privilege (PRISON_ROOT). o qsync() requires no special privilege (consistent with what was present before, but probably not very useful). Add a new sysctl, security.bsd.unprivileged_get_quota, which when set to a non-zero value, will permit unprivileged users to query user quotas with non-matching uids and gids. Set this to 0 by default to be mostly consistent with the previous behavior (the same for USRQUOTA, but not for GRPQUOTA). Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories	2003-06-15 06:36:19 +00:00
Poul-Henning Kamp	7652131bee	Initialize struct vfsops C99-sparsely. Submitted by: hmp Reviewed by: phk	2003-06-12 20:48:38 +00:00
David E. O'Brien	f4636c5959	Use __FBSDID().	2003-06-11 06:34:30 +00:00
Robert Watson	1e9e2eb598	Implement ffs_listextattr() by breaking out that logic and special-cased attribute name of "" from ffs_getextattr(). Invoking VOP_GETETATTR() with an empty name is now no longer supported; user application compatibility is provided by a system call level compatibility wrapper. We make sure to explicitly reject attempts to set an EA with the name "". Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories	2003-06-05 05:57:39 +00:00
Robert Watson	bd38ab57a1	Don't special-case handling of the empty string in the UFS1 extended attribute retrieval code: it's no longer special-cased, and is caught by the normal UFS1 EA validity checks (and, in fact, returns the same error, EINVAL). Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories	2003-06-05 04:58:58 +00:00
Robert Watson	e1249def7d	Return EOPNOTSUPP for attempted EA operations on VCHR vnodes in UFS2; if we permit them to occur, the kernel panics due to our performing EA operations using VOP_STRATEGY on the vnode. This went unnoticed previously because there are very for users of device nodes on UFS2 due to the introduction of devfs. However, this can come up with the Linux compat directories and its hard-coded dev nodes (which will need to go away as we move away from hard-coded device numbers). This can come up if you use EA-intensive features such as ACLs and MAC. The proper fix is pretty complicated, but this band-aid would be an excellent MFC candidate for the release.	2003-06-01 02:42:18 +00:00
Poul-Henning Kamp	61301f74d0	Remove unused variable. Found by: FlexeLint	2003-05-31 19:56:09 +00:00
Poul-Henning Kamp	6280ed26af	Remove unused local variables. Found by: FlexeLint	2003-05-31 18:17:32 +00:00
Poul-Henning Kamp	17a1391990	The IO_NOWDRAIN and B_NOWDRAIN hacks are no longer needed to prevent deadlocks with vnode backed md(4) devices because md now uses a kthread to run the bio requests instead of doing it directly from the bio down path.	2003-05-31 16:42:45 +00:00
Alan Cox	7f758dabbb	Lock the vm object when performing vm_object_page_clean(). Approved by: re (rwatson)	2003-05-18 22:02:51 +00:00
Robert Watson	62d4b85ec1	Jeff added locking assertions that the VV_ flags on vnodes were modified only while holding appropriate vnode locks. This patch slides the lock release for ufs_extattr_enable() to continue to hold the active vnode lock on a backing file until after the flag change; it also acquires a vnode lock when disabling an attribute and hence clearing a flag on the backing vnode. This permits VFS_DEBUG_LOCKS to run UFS1 extended attributes without panicking, as well as preventing a potential race and vnode flag problem. Approved by: re (jhb) Pointed out by: DEBUG_VFS_LOCKS	2003-05-15 21:07:33 +00:00
Alan Cox	ad682c4825	Lock the vm_object on entry to vm_object_vndeallocate().	2003-05-03 20:28:26 +00:00
Tim J. Robbins	3632928957	Do not attempt to free NULL dinodes (i_din1 or i_din2) in ffs_ifree(). These fields can be left as NULL if ffs_vget() allocates an inode but fails before the dinode memory has been allocated. There are two cases when this can occur: when we lose a race and another process has added the inode to the hash, and when reading the inode off disk fails. The bug was observed by Kris on one of the package-building machines. See http://marc.theaimsgroup.com/?l=freebsd-current&m=105172731013411&w=2 In Kris's case, it was the bread() that failed because of a disk error. The alternative to this patch is to ensure that ffs_vget() does not call vput() when the inode that hasn't been properly initialised.	2003-05-01 06:41:59 +00:00
Tim J. Robbins	8d721e877d	Free i_din2 instead of i_din1 in ffs_ifree() on UFS2 filesystems. This is purely a cosmetic change because these members are in a union together.	2003-05-01 06:38:27 +00:00
Mark Murray	51da11a27a	Fix some easy, global, lint warnings. In most cases, this means making some local variables static. In a couple of cases, this means removing an unused variable.	2003-04-30 12:57:40 +00:00
Alexander Kabaev	104a9b7e3e	Deprecate machine/limits.h in favor of new sys/limits.h. Change all in-tree consumers to include <sys/limits.h> Discussed on: standards@ Partially submitted by: Craig Rodrigues <rodrigc@attbi.com>	2003-04-29 13:36:06 +00:00
John Baldwin	a15cc35909	Lock both the proc lock and sched_lock when calling sched_nice since kg_nice is now protected by both. Being protected by both means that other places in the kernel that want to read kg_nice only need one of the two locks.	2003-04-22 20:45:38 +00:00
Jeff Roberson	86711bae9b	- Use the sched_nice() api instead of setting the nice value directly. Tested by: Steve Kargl <sgk@troutmask.apl.washington.edu>	2003-04-12 01:05:19 +00:00
Alan Cox	6134838f99	Sufficient access checks are performed by vmapbuf() that calling useracc() is pointless. Remove the call to useracc(). Don't reinitialize fields that are already initialized by getpbuf(). Reviewed by: tegge	2003-04-06 19:26:30 +00:00
Tor Egge	5e2e6a67c4	Check return value from vmapbuf instead of the function address.	2003-03-27 20:48:34 +00:00
Tor Egge	10dccf8ff2	Eliminate a buffer sleep/wakeup race.	2003-03-27 19:28:11 +00:00
Tor Egge	5bbb806004	Add support for reading directly from file to userland buffer when the O_DIRECT descriptor status flag is set and both offset and length is a multiple of the physical media sector size.	2003-03-26 23:40:42 +00:00
John Baldwin	31566c96f4	Use td->td_ucred instead of td->td_proc->p_ucred.	2003-03-20 21:17:40 +00:00
John Baldwin	2a53bfbe62	Minor fixes to ffs_fserr(): - Assume that curthread is not NULL. It never is in -current. - Use td_ucred instead of p_ucred.	2003-03-20 21:15:54 +00:00
Poul-Henning Kamp	b4b138c27f	Including <sys/stdint.h> is (almost?) universally only to be able to use %j in printfs, so put a newsted include in <sys/systm.h> where the printf prototype lives and save everybody else the trouble.	2003-03-18 08:45:25 +00:00
Jeff Roberson	09f11da5a3	- Remove a race between fsync like functions and flushbufqueues() by requiring locked bufs in vfs_bio_awrite(). Previously the buf could have been written out by fsync before we acquired the buf lock if it weren't for giant. The cluster_wbuild() handles this race properly but the single write at the end of vfs_bio_awrite() would not. - Modify flushbufqueues() so there is only one copy of the loop. Pass a parameter in that says whether or not we should sync bufs with deps. - Call flushbufqueues() a second time and then break if we couldn't find any bufs without deps.	2003-03-13 07:19:23 +00:00
Kirk McKusick	34968037b1	Use the appropriate size when zeroing out the unused portion of a snapshot's copy of a superblock. This patch fixes a panic when taking a snapshot of a 4096/512 filesystem. Reported by: Ian Freislich <ianf@za.uu.net> Sponsored by: DARPA & NAI Labs.	2003-03-07 23:49:16 +00:00
Alan Cox	09c80124a3	Remove ENABLE_VFS_IOOPT. It is a long unfinished work-in-progress. Discussed on: arch@	2003-03-06 03:41:02 +00:00
Jeff Roberson	7261f5f68e	- Add a new 'flags' parameter to getblk(). - Define one flag GB_LOCK_NOWAIT that tells getblk() to pass the LK_NOWAIT flag to the initial BUF_LOCK(). This will eventually be used in cases were we want to use a buffer only if it is not currently in use. - Convert all consumers of the getblk() api to use this extra parameter. Reviwed by: arch Not objected to by: mckusick	2003-03-04 00:04:44 +00:00
Nate Lawson	99648386d3	Finish cleanup of vprint() which was begun with changing v_tag to a string. Remove extraneous uses of vop_null, instead defering to the default op. Rename vnode type "vfs" to the more descriptive "syncer". Fix formatting for various filesystems that use vop_print.	2003-03-03 19:15:40 +00:00
Dag-Erling Smørgrav	521f364b80	More low-hanging fruit: kill caddr_t in calls to wakeup(9) / [mt]sleep(9).	2003-03-02 16:54:40 +00:00
Kirk McKusick	74f3809a19	Change the field used to test whether the superblock has been updated from the filesystem size field to the filesystem maximum blocksize field. The problem is that older versions of growfs updated only the new size field and not the old size field. This resulted in the old (smaller) size field being copied up to the new size field which caused the filesystem to appear to fsck to be badly trashed. This also adds a sanity check to ensure that the superblock is not being updated when the filesystem is mounted read-only. Obviously such an update should never happen. Reported by: Nate Lawson <nate@root.org> Sponsored by: DARPA & NAI Labs.	2003-02-25 23:21:08 +00:00
Jeff Roberson	17661e5ac4	- Add an interlock argument to BUF_LOCK and BUF_TIMELOCK. - Remove the buftimelock mutex and acquire the buf's interlock to protect these fields instead. - Hold the vnode interlock while locking bufs on the clean/dirty queues. This reduces some cases from one BUF_LOCK with a LK_NOWAIT and another BUF_LOCK with a LK_TIMEFAIL to a single lock. Reviewed by: arch, mckusick	2003-02-25 03:37:48 +00:00
David Schultz	9cdb2d4d9d	Expand the reference count on struct dquot to 32 bits. This fixes a panic on large systems where a single user may have more than 64K active or inactive vnodes. PR: 48234 Reviewed by: mike (mentor)	2003-02-24 08:49:59 +00:00
Kirk McKusick	3bf0ed940b	When removing the last item from a non-empty worklist, the worklist tail pointer must be updated. Reported by: Kris Kennaway <kris@obsecurity.org> Sponsored by: DARPA & NAI Labs.	2003-02-24 07:28:41 +00:00
Kirk McKusick	5bb651cb72	This patch fixes a deadlock between the bufdaemon and a process taking a snapshot. As part of taking a snapshot of a filesystem, the kernel builds up a list of the filesystem metadata (such as the cylinder group bitmaps) that are contained in the snapshot. When doing a copy-on-write check, the list is first consulted. If the block being written is found on the list, then the full snapshot lookup can be avoided. Besides providing an important performance speedup this check also avoids a potential deadlock between the code creating the snapshot and the bufdaemon trying to cleanup snapshot related buffers. This fix creates a temporary list containing the key metadata blocks that can cause the deadlock. This temporary list is used between the time that the snapshot is first enabled and the time that the fully complete list is built. Reported by: Attila Nagy <bra@fsn.hu> Sponsored by: DARPA & NAI Labs.	2003-02-22 00:59:34 +00:00
Kirk McKusick	37e2ebfdba	This patch fixes a bug on an active filesystem on which a snapshot is being taken from panicing with either "freeing free block" or "freeing free inode". The problem arises when the snapshot code is scanning the filesystem looking for inodes with a reference count of zero (e.g., unlinked but still open) so that it can expunge them from its view. If it encounters a reclaimed vnode and has to restart its scan, then it will panic if it encounters and tries to free an inode that it has already processed. The fix is to check each candidate inode to see if it has already been processed before trying to delete it from the snapshot image. Sponsored by: DARPA & NAI Labs.	2003-02-22 00:29:51 +00:00
Kirk McKusick	d60682c239	This patch fixes a bug in the logical block calculation macros so that they convert to 64-bit values before shifting rather than afterwards. Once fixed, they can be used rather than inline expanded. Sponsored by: DARPA & NAI Labs.	2003-02-22 00:19:26 +00:00
Warner Losh	a163d034fa	Back out M_* changes, per decision of the TRB. Approved by: trb	2003-02-19 05:47:46 +00:00
Kirk McKusick	aca3e4974f	Replace use of random() with arc4random() to provide less guessable values for the initial inode generation numbers in newfs and for newly allocated inode generation numbers in the kernel. Submitted by: Theo de Raadt <deraadt@cvs.openbsd.org> Sponsored by: DARPA & NAI Labs.	2003-02-14 21:31:58 +00:00
Kirk McKusick	50bd54e391	Correct lines incorrectly added to the copyright message. Submitted by: Frank van der Linden <fvdl@wasabisystems.com> Sponsored by: DARPA & NAI Labs.	2003-02-14 00:31:06 +00:00
Jeff Roberson	767b9a529d	- Cleanup unlocked accesses to buf flags by introducing a new b_vflag member that is protected by the vnode lock. - Move B_SCANNED into b_vflags and call it BV_SCANNED. - Create a vop_stdfsync() modeled after spec's sync. - Replace spec_fsync, msdos_fsync, and hpfs_fsync with the stdfsync and some fs specific processing. This gives all of these filesystems proper behavior wrt MNT_WAIT/NOWAIT and the use of the B_SCANNED flag. - Annotate the locking in buf.h	2003-02-09 11:28:35 +00:00
Alfred Perlstein	04738e99b5	Catch more uses of MIN().	2003-02-02 13:30:00 +00:00
Alfred Perlstein	44956c9863	Remove M_TRYWAIT/M_WAITOK/M_WAIT. Callers should use 0. Merge M_NOWAIT/M_DONTWAIT into a single flag M_NOWAIT.	2003-01-21 08:56:16 +00:00
Matthew Dillon	48e3128b34	Bow to the whining masses and change a union back into void *. Retain removal of unnecessary casts and throw in some minor cleanups to see if anyone complains, just for the hell of it.	2003-01-13 00:33:17 +00:00
Matthew Dillon	cd72f2180b	Change struct file f_data to un_data, a union of the correct struct pointer types, and remove a huge number of casts from code using it. Change struct xfile xf_data to xun_data (ABI is still compatible). If we need to add a #define for f_data and xf_data we can, but I don't think it will be necessary. There are no operational changes in this commit.	2003-01-12 01:37:13 +00:00
Marcel Moolenaar	cc4a858397	o Improve wording of the comment that accompanies fs_pad. The padding is not specific to non-i386 architectures. It is caused by non-i386 specific alignment requirements of fs_swuid, o Add a CTASSERT to catch a change in the size of struct fs at compile-time rather than run-time. Ok'd: gordon Tested on: i386 ia64	2003-01-10 06:59:34 +00:00
Gordon Tetlow	963cae780f	Fix superblock alignment problems on non-i386 platforms. Also change fs_uuid to fs_swuid, making it more descriptive. Submitted by: marcel Reviewed by: peter Pointy hat to: gordon	2003-01-09 23:53:30 +00:00
Gordon Tetlow	291871da9e	Steal some space from fs_fsmnt to create fs_volname and fs_uuid. The volname will be used to support volume names with the help of a GEOM module (to be committed). uuid will be used to deal with conflicting volume names (which doesn't work just yet). Approved by: mckusick@	2003-01-08 22:53:54 +00:00
Kirk McKusick	fa06a012cd	This patch fixes a problem caused by applications that rapidly and repeatedly truncate the same file. Each time the file is truncated, a buffer is grabbed to store the indirect block numbers that need to be freed. Those blocks cannot be freed until the inode claiming them is written to disk. Thus, the number of buffers being held by soft updates explodes and in extreme cases can run the kernel out of buffers. The problem can be avoided by doing an fsync on the file every debug.maxindirdep truncates (currently defaulted to 50). The fsync causes the inode to be written so that the held buffers can be freed. The check for excessive buffers is checked as part of the existing hook for excessive dependencies (softdep_slowdown) in the truncate code. Reported by: David Schultz <dschultz@uclink.Berkeley.EDU> Sponsored by: DARPA & NAI Labs. MFC after: 3 weeks	2003-01-07 18:23:50 +00:00
Poul-Henning Kamp	f5b11b6e2d	Temporarily introduce a new VOP_SPECSTRATEGY operation while I try to sort out disk-io from file-io in the vm/buffer/filesystem space. The intent is to sort VOP_STRATEGY calls into those which operate on "real" vnodes and those which operate on VCHR vnodes. For the latter kind, the call will be changed to VOP_SPECSTRATEGY, possibly conditionally for those places where dual-use happens. Add a default VOP_SPECSTRATEGY method which will call the normal VOP_STRATEGY. First time it is called it will print debugging information. This will only happen if a normal vnode is passed to VOP_SPECSTRATEGY by mistake. Add a real VOP_SPECSTRATEGY in specfs, which does what VOP_STRATEGY does on a VCHR vnode today. Add a new VOP_STRATEGY method in specfs to catch instances where the conversion to VOP_SPECSTRATEGY has not yet happened. Handle the request just like we always did, but first time called print debugging information. Apart up to two instances of console messages per boot, this amounts to a glorified no-op commit. If you get any of the messages on your console I would very much like a copy of them mailed to phk@freebsd.org	2003-01-04 22:10:36 +00:00
Poul-Henning Kamp	c6e3ae999b	Since Jeffr made the std* functions the default in rev 1.63 of kern/vfs_defaults.c it is wrong for the individual filesystems to use the std* functions as that prevents override of the default. Found by: src/tools/tools/vop_table	2003-01-04 08:47:19 +00:00
Poul-Henning Kamp	862702306b	Convert calls to BUF_STRATEGY to VOP_STRATEGY calls. This is a no-op since all BUF_STRATEGY did in the first place was call VOP_STRATEGY.	2003-01-03 06:32:15 +00:00
Jens Schweikhardt	9d5abbddbf	Correct typos, mostly s/ a / an / where appropriate. Some whitespace cleanup, especially in troff files.	2003-01-01 18:49:04 +00:00
Alfred Perlstein	13438f6823	When compiling the kernel do not implicitly include filedesc.h from proc.h, this was causing filedesc work to be very painful. In order to make this work split out sigio definitions to thier own header (sigio.h) which is included from proc.h for the time being.	2003-01-01 01:56:19 +00:00
Poul-Henning Kamp	aa4d7a8a4b	Use three UMA zones for FFS/UFS inodes instead of malloc space. Since inodes are currently 144 bytes, this will save 112 bytes per inode. This can amount to up to 10MByte on large systems.	2002-12-27 11:05:05 +00:00
Poul-Henning Kamp	de6ba7c016	Move the allocation of the inode contents into ffs_vfsops.c rather than passing malloc types around.	2002-12-27 10:23:03 +00:00
Poul-Henning Kamp	975512a907	Make ffs_mountfs() static. Remove the malloctype from the ufs mount structure, instead add a callback to the storage method for freeing inodes: UFS_IFREE(). Add vfs_ifree() method function which frees an inode. Unvariablelize the malloc type used for allocating inodes.	2002-12-27 10:06:37 +00:00
Kirk McKusick	4c572f6222	Fix corruption introduced in previous delta. Reported by: Aurelien Nephtali <aurelien.nephtali@wanadoo.fr> Sponsored by: DARPA & NAI Labs.	2002-12-18 19:50:28 +00:00
Kirk McKusick	6d967351b4	Keep comments consistent with the code. Minor optimization. Sponsored by: DARPA & NAI Labs.	2002-12-18 07:19:41 +00:00
Kirk McKusick	c021e44776	Cosmetic cleanup of unsigned buglets. Submitted by: Bruce Evans <bde@zeta.org.au> Sponsored by: DARPA & NAI Labs.	2002-12-18 00:53:45 +00:00
Poul-Henning Kamp	120a6d842a	Remove unused lockcnt variable. Approved by: mckusick	2002-12-17 20:23:51 +00:00
Kirk McKusick	8efcd9a794	Update to previous change (1.54) to use an approperly wide inode field so as to work correctly on 64-bit platforms. Reported-by: Jake Burkholder <jake@locore.ca> Sponsored by: DARPA & NAI Labs. Approved by: Ian Dowse <iedowse@maths.tcd.ie>	2002-12-15 19:25:59 +00:00
Ian Dowse	c2ca8e1ce2	Undo the adjustment of the total memory used by dirhash in the case where allocating the dirhash structure fails. Fix a few typos in comments and update copyright. MFC after: 1 week	2002-12-14 17:16:16 +00:00
Kirk McKusick	0db138a6b0	Only the most recent snapshot contains the complete list of blocks that were copied in all of the earlier snapshots, thus its precomputed list must be used in the copyonwrite test. Using incomplete lists may lead to deadlock. Also do not include the blocks used for the indirect pointers in the indirect pointers as this may lead to inconsistent snapshots. Sponsored by: DARPA & NAI Labs. Approved by: re	2002-12-14 01:36:59 +00:00
Tom Rhodes	1626155b82	Remove the comment about dump(8) not working properly with snapshots. Discussed with: mckusick Approved by: re (rwatson)	2002-12-12 00:31:45 +00:00
Kirk McKusick	8d6754f289	More tightly verify the preference returned for the new inode. Submitted by: Kris Kennaway <kris@obsecurity.org> Sponsored by: DARPA & NAI Labs. Approved by: re	2002-12-06 02:08:46 +00:00
Kirk McKusick	0cb652d925	Have to use bread() rather than UFS_BALLOC() when obtaining a previously allocated block as the previous use of the block may have fallen out of the cache. Failure to reread its contents cause zeroed results to be written instead of the proper contents. Conversely, when the block is going to be entirely filled in, it is not necessary reread the old contents. Sponsored by: DARPA & NAI Labs. Approved by: re	2002-12-03 18:19:27 +00:00
Kirk McKusick	31574422a3	Add a check to disable the previous patch so that future filesystems that choose to place their superblocks in non-standard locations will not get them smashed. Sponsored by: DARPA & NAI Labs.	2002-11-30 19:04:57 +00:00
Kirk McKusick	c6964d3bc9	Remove a race condition / deadlock from snapshots. When converting from individual vnode locks to the snapshot lock, be sure to pass any waiting processes along to the new lock as well. This transfer is done by a new function in the lock manager, transferlockers(from_lock, to_lock); Thanks to Lamont Granquist <lamont@scriptkiddie.org> for his help in pounding on snapshots beyond all reason and finding this deadlock. Sponsored by: DARPA & NAI Labs.	2002-11-30 19:00:51 +00:00
Kirk McKusick	63cf5b0ee2	Fix two deadlocks in snapshots: 1) Release the snapshot file lock while suspending the system. Otherwise a process trying to read the lock may block on its containing directory preventing the suspension from completing. Thanks to Sean Kelly <smkelly@zombie.org> for finding this deadlock. 2) Replace some bdwrite's with bawrite's so as not to fill all the buffers with dirty data. The buffers could not be cleaned as the snapshot vnode was locked hence the system could deadlock when making snapshots of really massive filesystems. Thanks to Hidetoshi Shimokawa <simokawa@sat.t.u-tokyo.ac.jp> for figuring this out. Sponsored by: DARPA & NAI Labs.	2002-11-30 07:27:12 +00:00
Kirk McKusick	fa5d33e242	Check to make sure that the fs_sblockloc field was properly updated before using it to write the superblock. This is to guard against accidentally trashing the disklabel if the superblock format missed being upgraded by the new kernel. Reported by: Sam Leffler <sam@errno.com> Sponsored by: DARPA & NAI Labs. Approved by: Murray Stokely <murray@FreeBSD.org>	2002-11-29 19:20:15 +00:00
Kirk McKusick	ada981b228	Create a new 32-bit fs_flags word in the superblock. Add code to move the old 8-bit fs_old_flags to the new location the first time that the filesystem is mounted by a new kernel. One of the unused flags in fs_old_flags is used to indicate that the flags have been moved. Leave the fs_old_flags word intact so that it will work properly if used on an old kernel. Change the fs_sblockloc superblock location field to be in units of bytes instead of in units of filesystem fragments. The old units did not work properly when the fragment size exceeeded the superblock size (8192). Update old fs_sblockloc values at the same time that the flags are moved. Suggested by: BOUWSMA Barry <freebsd-misuser@netscum.dyndns.dk> Sponsored by: DARPA & NAI Labs.	2002-11-27 02:18:58 +00:00
Kirk McKusick	f5235f70a4	The target for the maximum number of dependencies has been cut in half because of reports that under heavy load the kernel could exhaust its memory pool. The limit is now (desiredvnodes * 4) rather than (desiredvnodes * 8), so it will still scale with larger systems, just not as quickly. Sponsored by: DARPA & NAI Labs.	2002-11-20 05:16:11 +00:00
Kirk McKusick	3374bb5ad6	If an error occurs while writing a buffer, then the data will not have hit the disk and the dependencies cannot be unrolled. In this case, the system will mark the buffer as dirty again so that the write can be retried in the future. When the write succeeds or the system gives up on the buffer and marks it as invalid (B_INVAL), the dependencies will be cleared. Sponsored by: DARPA & NAI Labs.	2002-11-20 05:14:16 +00:00
Peter Wemm	cdf5e9ccb6	Do not assume that time_t is an int. Approved by: re (jhb)	2002-11-15 22:36:57 +00:00
John Baldwin	6db27285f5	Print daddr_t's with %j and intmax_t.	2002-11-08 22:28:35 +00:00
Robert Watson	372360693d	Update licenses and wording: NAI has authorized the removal of clause three of their BSD-style license; also, carry out the NAI Labs -> Network Associates Laboratories renaming in these files.	2002-11-04 02:35:46 +00:00
Garrett Wollman	1d1971ac38	Implement the new 1003.1-2001 pathconf() keys, including the Advisory Information option. Other filesystem implementations should do something similar. With advice from: mckusick, phk	2002-10-27 18:09:49 +00:00
Robert Watson	763bbd2f4f	Slightly change the semantics of vnode labels for MAC: rather than "refreshing" the label on the vnode before use, just get the label right from inception. For single-label file systems, set the label in the generic VFS getnewvnode() code; for multi-label file systems, leave the labeling up to the file system. With UFS1/2, this means reading the extended attribute during vfs_vget() as the inode is pulled off disk, rather than hitting the extended attributes frequently during operations later, improving performance. This also corrects sematics for shared vnode locks, which were not previously present in the system. This chances the cache coherrency properties WRT out-of-band access to label data, but in an acceptable form. With UFS1, there is a small race condition during automatic extended attribute start -- this is not present with UFS2, and occurs because EAs aren't available at vnode inception. We'll introduce a work around for this shortly. Approved by: re Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories	2002-10-26 14:38:24 +00:00
Kirk McKusick	9ab73fd11a	Within ufs, the ffs_sync and ffs_fsync functions did not always check for and/or report I/O errors. The result is that a VFS_SYNC or VOP_FSYNC called with MNT_WAIT could loop infinitely on ufs in the presence of a hard error writing a disk sector or in a filesystem full condition. This patch ensures that I/O errors will always be checked and returned. This patch also ensures that every call to VFS_SYNC or VOP_FSYNC with MNT_WAIT set checks for and takes appropriate action when an error is returned. Sponsored by: DARPA & NAI Labs.	2002-10-25 00:20:37 +00:00
Kirk McKusick	c0762674c9	We must be careful to avoid recursive copy-on-write faults when trying to clean up during disk-full senarios. Sponsored by: DARPA & NAI Labs.	2002-10-23 21:47:02 +00:00
Kirk McKusick	2eff16f057	Missplaced FREE_LOCK causes a panic when hit while taking a snapshot. Sponsored by: DARPA & NAI Labs.	2002-10-23 05:14:06 +00:00
Kirk McKusick	0152387ade	This update further fine tunes the locking of snapshot vnodes in the ffs_copyonwrite routine to avoid a deadlock between the syncer daemon trying to sync out a snapshot vnode and the bufdaemon trying to write out a buffer containing the snapshot inode. With any luck this will be the last snapshot race condition. Sponsored by: DARPA & NAI Labs.	2002-10-22 01:23:00 +00:00
Kirk McKusick	127ab960d5	This update is a performance improvement when allocating blocks on a full filesystem. Previously, if the allocation failed, we had to fsync the file before rolling back any partial allocation of indirect blocks. Most block allocation requests only need to allocate a single data block and if that allocation fails, there is nothing to unroll. So, before doing the fsync, we check to see if any rollback will really be necessary. If none is necessary, then we simply return. This update eliminates the flurry of disk activity that got triggered whenever a filesystem would run out of space. Sponsored by: DARPA & NAI Labs.	2002-10-22 01:14:25 +00:00
Kirk McKusick	e03486d198	This checkin reimplements the io-request priority hack in a way that works in the new threaded kernel. It was commented out of the disksort routine earlier this year for the reasons given in kern/subr_disklabel.c (which is where this code used to reside before it moved to kern/subr_disk.c): ---------------------------- revision 1.65 date: 2002/04/22 06:53:20; author: phk; state: Exp; lines: +5 -0 Comment out Kirks io-request priority hack until we can do this in a civilized way which doesn't cause grief. The problem is that it is not generally safe to cast a "struct bio " to a "struct buf ". Things like ccd, vinum, ata-raid and GEOM constructs bio's which are not entrails of a struct buf. Also, curthread may or may not have anything to do with the I/O request at hand. The correct solution can either be to tag struct bio's with a priority derived from the requesting threads nice and have disksort act on this field, this wouldn't address the "silly-seek syndrome" where two equal processes bang the diskheads from one edge to the other of the disk repeatedly. Alternatively, and probably better: a sleep should be introduced either at the time the I/O is requested or at the time it is completed where we can be sure to sleep in the right thread. The sleep also needs to be in constant timeunits, 1/hz can be practicaly any sub-second size, at high HZ the current code practically doesn't do anything. ---------------------------- As suggested in this comment, it is no longer located in the disk sort routine, but rather now resides in spec_strategy where the disk operations are being queued by the thread that is associated with the process that is really requesting the I/O. At that point, the disk queues are not visible, so the I/O for positively niced processes is always slowed down whether or not there is other activity on the disk. On the issue of scaling HZ, I believe that the current scheme is better than using a fixed quantum of time. As machines and I/O subsystems get faster, the resolution on the clock also rises. So, ten years from now we will be slowing things down for shorter periods of time, but the proportional effect on the system will be about the same as it is today. So, I view this as a feature rather than a drawback. Hence this patch sticks with using HZ. Sponsored by: DARPA & NAI Labs. Reviewed by: Poul-Henning Kamp <phk@critter.freebsd.dk>	2002-10-22 00:59:49 +00:00
Robert Watson	be36629d5c	Rename _POSIX_FOO_PRESENT and friends from POSIX.1e to _PC_FOO_PRESENT and related friends. This would have been corrected had POSIX.1e progressed to a standard. Pointed out by: wollman	2002-10-20 22:11:13 +00:00
Robert Watson	6f54838539	Implement _POSIX_ACL_PATH_MAX, which returns the maximum number of ACL entries for a file system node using pathconf(). Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories	2002-10-20 22:08:26 +00:00
Robert Watson	e0c12d4c23	Teach UFS to respond to pathconf() tests for _POSIX_ACL_EXTENDED and _POSIX_MAC_PRESENT based on available mount flags, if the services are available. Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories	2002-10-20 21:49:41 +00:00
Robert Watson	f683d75342	Clarify that the UFS1 extended attribute configuration steps do not apply to UFS2 file systems. Submitted by: jedgar Obtained from: TrustedBSD Project	2002-10-19 16:09:16 +00:00
Matthew Dillon	1b7e3dafdf	Fix a file-rewrite performance case for UFS[2]. When rewriting portions of a file in chunks that are less then the filesystem block size, if the data is not already cached the system will perform a read-before-write. The problem is that it does this on a block-by-block basis, breaking up the I/Os and making clustering impossible for the writes. Programs such as INN using cyclic file buffers suffer greatly. This problem is only going to get worse as we use larger and larger filesystem block sizes. The solution is to extend the sequential heuristic so UFS[2] can perform a far larger read and readahead when dealing with this case. (note: maximum disk write bandwidth is 27MB/sec thru filesystem) (note: filesystem blocksize in test is 8K (1K frag)) dd if=/dev/zero of=test.dat bs=1k count=2m conv=notrunc Before: (note half of these are reads) tty da0 da1 acd0 cpu tin tout KB/t tps MB/s KB/t tps MB/s KB/t tps MB/s us ni sy in id 0 76 14.21 598 8.30 0.00 0 0.00 0.00 0 0.00 0 0 7 1 92 0 76 14.09 813 11.19 0.00 0 0.00 0.00 0 0.00 0 0 9 5 86 0 76 14.28 821 11.45 0.00 0 0.00 0.00 0 0.00 0 0 8 1 91 After: (note half of these are reads) tty da0 da1 acd0 cpu tin tout KB/t tps MB/s KB/t tps MB/s KB/t tps MB/s us ni sy in id 0 76 63.62 434 26.99 0.00 0 0.00 0.00 0 0.00 0 0 18 1 80 0 76 63.58 424 26.30 0.00 0 0.00 0.00 0 0.00 0 0 17 2 82 0 76 63.82 438 27.32 0.00 0 0.00 0.00 0 0.00 1 0 19 2 79 Reviewed by: mckusick Approved by: re X-MFC after: immediately (was heavily tested in -stable for 4 months)	2002-10-18 22:52:41 +00:00
Robert Watson	61eef6c245	Update extended attribute readme file to note that no special configuration is required to use EAs with UFS2, and that UFS2 is recommend for EA use for a variety of reasons. Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories	2002-10-18 21:11:36 +00:00
Robert Watson	f5b1000b8f	Update instructions for ACLs given recent tunefs, mount changes. Also note that UFS2 doesn't require explicit extended attribute configuration, and is recommends for this and other reasons if you plan to use ACLs. Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories	2002-10-18 21:09:57 +00:00
Robert Watson	16eac5b95c	Use 'size_t' instead of 'int' for the result of sizeof().	2002-10-18 21:03:30 +00:00
Kirk McKusick	ef6c0bb296	With the revised single-lock method used in snapshots, the BA_NOWAIT flag is no longer needed. Sponsored by: DARPA & NAI Labs.	2002-10-18 01:17:28 +00:00
Kirk McKusick	86aeb27fa2	Change locking so that all snapshots on a particular filesystem share a common lock. This change avoids a deadlock between snapshots when separate requests cause them to deadlock checking each other for a need to copy blocks that are close enough together that they fall into the same indirect block. Although I had anticipated a slowdown from contention for the single lock, my filesystem benchmarks show no measurable change in throughput on a uniprocessor system with three active snapshots. I conjecture that this result is because every copy-on-write fault must check all the active snapshots, so the process was inherently serial already. This change removes the last of the deadlocks of which I am aware in snapshots. Sponsored by: DARPA & NAI Labs.	2002-10-16 00:19:23 +00:00
Robert Watson	9e3bf94fd7	Push most UFS ACL behavior behind a check for MNT_ACLS, permitting ACLs to be administratively disabled as needed on UFS/UFS2 file systems. This also has the effect of preventing the slightly more expensive ACL code from running on non-ACL file systems, avoiding storage allocation for ACLs that may be read from disk. MNT_ACLS may be set at mount-time using mount -o acls, or implicitly by setting the FS_ACLS flag using tunefs. On UFS1, you may also have to configure ACL store. Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories	2002-10-15 21:28:24 +00:00
Robert Watson	80830407c6	If the FS_MULTILABEL flag is set in a UFS or UFS2 superblock, automatically set MNT_MULTILABEL in the mount flags. If FS_ACLS is set in a UFS or UFS2 superblock, automatically set MNT_ACLS in the mount flags. If either of these flags is set, but the appropriate kernel option to support the features associated with the flag isn't available, then print a warning at mount-time. Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories	2002-10-15 20:00:06 +00:00
Kirk McKusick	48f0495d85	When reading or writing the extended attributes of a special device or fifo in UFS2, the normal ufs_strategy routine needs to be used rather than the spec_strategy or fifo_strategy routine. Thus the ffsext_strategy routine is interposed in the ffs_vnops vectors for special devices and fifo's to pick off this special case. Otherwise it simply falls through to the usual spec_strategy or fifo_strategy routine. Submitted by: Robert Watson <rwatson@FreeBSD.org> Sponsored by: DARPA & NAI Labs.	2002-10-14 23:18:09 +00:00
Robert Watson	baeb8a4774	Fix two memory leaks in error conditions involving the UFS ACL code: if failures occur, make sure that we release both the default ACL and access ACL storage during new object creation. Spotted by: phk and his pet flexelint Sponsored by: DARPA, Network Associates Laboratories	2002-10-14 19:55:49 +00:00
Robert Watson	3ceef565b2	Define two new superblock file system flags: FS_ACLS Administrative enable/disable of extended ACL support FS_MULTILABEL Administrative flag to indicate to the MAC Framework that objects in the file system are individually labeled using extended attributes. Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories Reviewed by: (in principal) mckusick, phk	2002-10-14 17:07:11 +00:00
Kirk McKusick	a5b65058d5	Regularize the vop_stdlock'ing protocol across all the filesystems that use it. Specifically, vop_stdlock uses the lock pointed to by vp->v_vnlock. By default, getnewvnode sets up vp->v_vnlock to reference vp->v_lock. Filesystems that wish to use the default do not need to allocate a lock at the front of their node structure (as some still did) or do a lockinit. They can simply start using vn_lock/VOP_UNLOCK. Filesystems that wish to manage their own locks, but still use the vop_stdlock functions (such as nullfs) can simply replace vp->v_vnlock with a pointer to the lock that they wish to have used for the vnode. Such filesystems are responsible for setting the vp->v_vnlock back to the default in their vop_reclaim routine (e.g., vp->v_vnlock = &vp->v_lock). In theory, this set of changes cleans up the existing filesystem lock interface and should have no function change to the existing locking scheme. Sponsored by: DARPA & NAI Labs.	2002-10-14 03:20:36 +00:00
Mike Barcroft	2b7f24d210	Change iov_base's type from `char ' to the standard` void '. All uses of iov_base which assume its type is `char ' (in order to do pointer arithmetic) have been updated to cast iov_base to `char '.	2002-10-11 14:58:34 +00:00
Maxime Henrion	cba63e0291	Fix build of 64 bit platforms.	2002-10-09 12:19:36 +00:00
Kirk McKusick	98d275df37	When creating a snapshot, create a list of initially allocated blocks. Whenever doing a copy-on-write check, first look in the list of initially allocated blocks to see if it is there. If so, no further check is needed. If not, fall through and do the full check. This change eliminates one of two known deadlocks caused by snapshots. Handling the second deadlock will be the subject of another check-in. This change also reduces the cost of the copy-on-write check by speeding up the verification of frequently checked blocks. Sponsored by: DARPA & NAI Labs.	2002-10-09 07:28:35 +00:00
Kirk McKusick	4d533db182	When creating a snapshot, create a list of initially allocated blocks. Whenever doing a copy-on-write check, first look in the list of initially allocated blocks to see if it is there. If so, no further check is needed. If not, fall through and do the full check. This change eliminates one of two known deadlocks caused by snapshots. Handling the second deadlock will be the subject of another check-in. This change also reduces the cost of the copy-on-write check by speeding up the verification of frequently checked blocks. Sponsored by: DARPA & NAI Labs.	2002-10-09 06:13:48 +00:00
Kirk McKusick	b6cef5648d	The appropriate units for disk block addresses are always DEV_BSIZE, even when the underlying device has a larger sector size. Therefore, the filesystem code should not (and with this patch does not) try to use the underlying sector size when doing disk block address calculations. This patch fixes problems in -current when using the swap-based memory-disk device (mdconfig -a -t swap ...). This bugfix is not relevant to -stable as -stable does not have the memory-disk device. Sponsored by: DARPA & NAI Labs.	2002-10-09 04:01:23 +00:00
Jeff Roberson	a2c4ff970b	- Remove LK_INTERLOCK from the vn_lock() in ffs_snapshot(). Pointy hat to: me Found by: green	2002-10-08 21:00:52 +00:00
Poul-Henning Kamp	4f3ee6dcc4	Mark two places where an unsigned number is checked "if (foo < 0)" with an XXX comment. Somebody[TM] should look at this in some detail. Spotted by: FlexeLint	2002-10-02 09:11:18 +00:00
Dima Dorfman	85bba62925	size_t is not a struct (fix mislabelling in a comment).	2002-10-02 05:15:34 +00:00
Poul-Henning Kamp	8d3574c7a4	Fix some harmless mis-indents. Spotted by: FlexeLint	2002-10-01 15:48:31 +00:00
Juli Mallett	85de3147ea	When spamming me with a printf(9), under DIAGNOSTIC, at least be nice enough to include a newline. MFC after: 4 days Sponsored by: Bright Path Solutions	2002-09-28 19:04:49 +00:00
Poul-Henning Kamp	37c841831f	Be consistent about "static" functions: if the function is marked static in its prototype, mark it static at the definition too. Inspired by: FlexeLint warning #512	2002-09-28 17:15:38 +00:00
Poul-Henning Kamp	a8babca268	Make it a tad easier to deal with struct inode in userland programs which fondle /dev/kmem by using "struct cdev *" instead of "dev_t". Requsted by: jake	2002-09-27 20:03:05 +00:00
Poul-Henning Kamp	993b0567b2	Use our mount-credential if we get a NOCRED when we try to write out EA space back to disk. This is wrong in many ways, but not as wrong as a panic. Pancied on: rwatson & jmallet Sponsored by: DARPA & NAI Labs.	2002-09-27 20:00:03 +00:00
Jeff Roberson	2ee5711e84	- Convert locks to use standard macros. - Lock access to the buflists. - Document broken locking. - Use vrefcnt().	2002-09-25 02:49:48 +00:00
Jeff Roberson	6ef1763407	- Document broken locking. - Use vrefcnt().	2002-09-25 02:47:49 +00:00
Jeff Roberson	d4820f8036	- Lock accesses to v_usecount. - Convert interlock locks to use standard macros.	2002-09-25 02:45:50 +00:00
Jeff Roberson	8823f1b6db	- Don't use the interlock to protect v_writecount.	2002-09-25 02:44:55 +00:00
Poul-Henning Kamp	cf09d67418	We don't need to #include <sys/disklabel.h>. We don't need to #include <sys/disklabel.h> second time either. Sponsored by: DARPA & NAI Labs.	2002-09-20 16:42:33 +00:00
Don Lewis	fa288043e2	VOP_FSYNC() requires that it's vnode argument be locked, which nfs_link() wasn't doing. Rather than just lock and unlock the vnode around the call to VOP_FSYNC(), implement rwatson's suggestion to lock the file vnode in kern_link() before calling VOP_LINK(), since the other filesystems also locked the file vnode right away in their link methods. Remove the locking and and unlocking from the leaf filesystem link methods. Reviewed by: rwatson, bde (except for the unionfs_link() changes)	2002-09-19 13:32:45 +00:00
David E. O'Brien	47a561263d	intmax_t is printed with %jd, not %lld.	2002-09-19 03:55:30 +00:00
Nate Lawson	86ed6d45ac	Remove any VOP_PRINT that redundantly prints the tag. Move lockmgr_printinfo() into vprint() for everyone's benefit. Suggested by: bde	2002-09-18 20:42:04 +00:00
Nate Lawson	06be2aaa83	Remove all use of vnode->v_tag, replacing with appropriate substitutes. v_tag is now const char * and should only be used for debugging. Additionally: 1. All users of VT_NTS now check vfsconf->vf_type VFCF_NETWORK 2. The user of VT_PROCFS now checks for the new flag VV_PROCDEP, which is propagated by pseudofs to all child vnodes if the fs sets PFS_PROCDEP. Suggested by: phk Reviewed by: bde, rwatson (earlier version)	2002-09-14 09:02:28 +00:00
Bruce Evans	d3a7b5e70e	vfs_syscalls.c: Changed rename(2) to follow the letter of the POSIX spec. POSIX requires rename() to have no effect if its args "resolve to the same existing file". I think "file" can only reasonably be read as referring to the inode, although the rationale and "resolve" seem to say that sameness is at the level of (resolved) directory entries. ext2fs_vnops.c, ufs_vnops.c: Replaced code that gave the historical BSD behaviour of removing one link name by checks that this code is now unreachable. This fixes some races. All vnodes needed to be unlocked for the removal, and locking at another level using something like IN_RENAME was not even attempted, so it was possible for rename(x, y) to return with both x and y removed even without any unlink(2) syscalls (one process can remove x using rename(x, y) and another process can remove y using rename(y, x)). Prodded by: alfred MFC after: 8 weeks PR: 42617	2002-09-10 11:09:13 +00:00
Poul-Henning Kamp	0e168822b2	Implement the VOP_OPENEXTATTR() and VOP_CLOSEEXTATTR() methods. Use extattr_check_cred() to check access to EAs. This is still a WIP. Sponsored by: DARPA & NAI Labs.	2002-09-05 20:59:42 +00:00
Poul-Henning Kamp	190a4963d0	Use canonical extattr_check_cred() instead of private implementation of the same policy. Sponsored by: DARPA & NAI Labs.	2002-09-05 20:39:36 +00:00
Poul-Henning Kamp	04205dc4be	Fix credentials check: do not leak ENOATTR until we know if they're supposed to know. Sponsored by: DARPA & NAI Labs.	2002-09-05 20:28:24 +00:00
Bruce Evans	8f767abf71	Include <sys/malloc.h> instead of depending on namespace pollution 2 layers deep in <sys/proc.h> or <sys/vnode.h>. Include <sys/vmmeter.h> instead of depending on namespace pollution in <sys/pcpu.h>. Sorted includes as much as possible.	2002-09-05 09:43:24 +00:00
Robert Watson	2fc6567e9a	Since we have vp and td cached in local variables, use those instead of derefencing the VOP arguments again when calling the UFS code. Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs	2002-09-01 16:06:40 +00:00
Poul-Henning Kamp	d0e9b8dbc4	Correctly handle setting, getting and deleting EA's with zero length content. Sponsored by: DARPA & NAI Labs.	2002-08-30 08:57:09 +00:00
Philippe Charnier	93b0017f88	Replace various spelling with FALLTHROUGH which is lint()able	2002-08-25 13:23:09 +00:00
Alan Cox	fff6062ab6	o Retire vm_page_zero_fill() and vm_page_zero_fill_area(). Ever since pmap_zero_page() and pmap_zero_page_area() were modified to accept a struct vm_page * instead of a physical address, vm_page_zero_fill() and vm_page_zero_fill_area() have served no purpose.	2002-08-25 00:22:31 +00:00
Poul-Henning Kamp	7428de69d2	Implement list of EA return functionality. Correctly delete EA's when the content length is set to zero. Sponsored by: DARPA & NAI Labs.	2002-08-20 11:34:58 +00:00
Poul-Henning Kamp	0176455bc8	First snapshot of UFS2 EA support. Sponsored by: DARPA & NAI Labs.	2002-08-19 07:01:55 +00:00
Robert Watson	9ca435893b	In order to better support flexible and extensible access control, make a series of modifications to the credential arguments relating to file read and write operations to cliarfy which credential is used for what: - Change fo_read() and fo_write() to accept "active_cred" instead of "cred", and change the semantics of consumers of fo_read() and fo_write() to pass the active credential of the thread requesting an operation rather than the cached file cred. The cached file cred is still available in fo_read() and fo_write() consumers via fp->f_cred. These changes largely in sys_generic.c. For each implementation of fo_read() and fo_write(), update cred usage to reflect this change and maintain current semantics: - badfo_readwrite() unchanged - kqueue_read/write() unchanged pipe_read/write() now authorize MAC using active_cred rather than td->td_ucred - soo_read/write() unchanged - vn_read/write() now authorize MAC using active_cred but VOP_READ/WRITE() with fp->f_cred Modify vn_rdwr() to accept two credential arguments instead of a single credential: active_cred and file_cred. Use active_cred for MAC authorization, and select a credential for use in VOP_READ/WRITE() based on whether file_cred is NULL or not. If file_cred is provided, authorize the VOP using that cred, otherwise the active credential, matching current semantics. Modify current vn_rdwr() consumers to pass a file_cred if used in the context of a struct file, and to always pass active_cred. When vn_rdwr() is used without a file_cred, pass NOCRED. These changes should maintain current semantics for read/write, but avoid a redundant passing of fp->f_cred, as well as making it more clear what the origin of each credential is in file descriptor read/write operations. Follow-up commits will make similar changes to other file descriptor operations, and modify the MAC framework to pass both credentials to MAC policy modules so they can implement either semantic for revocation. Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs	2002-08-15 20:55:08 +00:00
Poul-Henning Kamp	18280bc653	Expand the arguments to ffs_ext{read,write}() to their component parts rather than use vop_{read,write}_args. Access to these functions will ultimately not be available through the "vop_{read,write}+IO_EXT" API but this functionality is retained for debugging purposes for now. Sponsored by: DARPA & NAI Labs.	2002-08-13 11:33:01 +00:00
Poul-Henning Kamp	d6fe88e475	Unravel the UFS_EXTATTR incest between FFS and UFS: UFS_EXTATTR is an UFS only thing, and FFS should in principle not know if it is enabled or not. This commit cleans ffs_vnops.c for such knowledge, but not ffs_vfsops.c Sponsored by: DARPA and NAI Labs.	2002-08-13 10:33:57 +00:00
Poul-Henning Kamp	9bf1a75697	Introduce typedefs for the member functions of struct vfsops and employ these in the main filesystems. This does not change the resulting code but makes the source a little bit more grepable. Sponsored by: DARPA and NAI Labs.	2002-08-13 10:05:50 +00:00
Robert Watson	c08b677fb5	Pass IO_NOMACCHECK to vn_rdwr() in the following checks to prevent enforcement of MAC policy on the read or write operations: - In ext2fs, don't enforce MAC on loop-back reads and writes supporting directory read operations in lookup(), directory modifications in rename(), directory write operations in mkdir(), symlink write operations in symlink(). - In the NFS client locking code, perform vn_rdwr() on the NFS locking socket without enforcing MAC, since the write is done on behalf of the kernel NFS implementation rather than the user process. - In UFS, don't enforce MAC on loop-back reads and writes supporting directory read operations in lookup(), and symlink write operations in symlink(). Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs	2002-08-12 16:43:04 +00:00
Poul-Henning Kamp	e179b40f14	Stop pretending that the FFS file ufs_readwrite.c is a UFS file. Instead of #including it, pull it into ffs_vnops.c and name things correctly. Sponsored by: DARPA & NAI Labs.	2002-08-12 10:32:56 +00:00
Poul-Henning Kamp	851da5d6cf	Fix a comment.	2002-08-12 09:22:11 +00:00
Ian Dowse	98caa2e4e9	Don't call softdep_slowdown() if soft updates are not active on the filesystem. This causes a panic for kernels compiled without softupdates. Reported by: luigi	2002-08-05 17:59:20 +00:00
Jeff Roberson	e6e370a7fe	- Replace v_flag with v_iflag and v_vflag - v_vflag is protected by the vnode lock and is used when synchronization with VOP calls is needed. - v_iflag is protected by interlock and is used for dealing with vnode management issues. These flags include X/O LOCK, FREE, DOOMED, etc. - All accesses to v_iflag and v_vflag have either been locked or marked with mp_fixme's. - Many ASSERT_VOP_LOCKED calls have been added where the locking was not clear. - Many functions in vfs_subr.c were restructured to provide for stronger locking. Idea stolen from: BSD/OS	2002-08-04 10:29:36 +00:00
Robert Watson	af05e056ec	Introduce support for Mandatory Access Control and extensible kernel access control. Instrument UFS to support per-inode MAC labels. In particular, invoke MAC framework entry points for generically supporting the backing of MAC labels into extended attributes. This ends up introducing new vnode operation vector entries point at the MAC framework entry points, as well as some explicit entry point invocations for file and directory creation events so that the MAC framework can push labels to disk before the directory names become persistent (this will work better once EAs in UFS2 are hooked into soft updates). The generic EA MAC entry points support executing with the file system in either single label or multilabel operation, and will fall back to the mount label if multilabel is not specified at mount-time. Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs	2002-07-31 16:05:30 +00:00
Poul-Henning Kamp	c3a0d1d4e1	I forgot this bit of uglyness in the fsck_ffs cleanup.	2002-07-31 07:01:18 +00:00
Poul-Henning Kamp	9fbc6a330d	Fix braino in last commit.	2002-07-30 12:02:41 +00:00
Poul-Henning Kamp	17b1994bbe	Move ffs_isfreeblock() to ffs_alloc.c and make it static. Sponsored by: DARPA & NAI Labs.	2002-07-30 11:54:48 +00:00
Alan Cox	1e8fabc097	Lock page queue accesses by vm_page_free().	2002-07-28 08:01:48 +00:00
Benno Rice	683eac8dbb	Add a missing argument to the stub for softdep_setup_freeblocks. Forgotten by: mckusick	2002-07-20 04:07:15 +00:00
Peter Wemm	382f95d332	Fix a warning: ffs_softdep.c:1630: warning: int format, different type arg (arg 2)	2002-07-20 01:09:35 +00:00
Kirk McKusick	7aca6291e3	Add support to UFS2 to provide storage for extended attributes. As this code is not actually used by any of the existing interfaces, it seems unlikely to break anything (famous last words). The internal kernel interface to manipulate these attributes is invoked using two new IO_ flags: IO_NORMAL and IO_EXT. These flags may be specified in the ioflags word of VOP_READ, VOP_WRITE, and VOP_TRUNCATE. Specifying IO_NORMAL means that you want to do I/O to the normal data part of the file and IO_EXT means that you want to do I/O to the extended attributes part of the file. IO_NORMAL and IO_EXT are mutually exclusive for VOP_READ and VOP_WRITE, but may be specified individually or together in the case of VOP_TRUNCATE. For example, when removing a file, VOP_TRUNCATE is called with both IO_NORMAL and IO_EXT set. For backward compatibility, if neither IO_NORMAL nor IO_EXT is set, then IO_NORMAL is assumed. Note that the BA_ and IO_ flags have been `merged' so that they may both be used in the same flags word. This merger is possible by assigning the IO_ flags to the low sixteen bits and the BA_ flags the high sixteen bits. This works because the high sixteen bits of the IO_ word is reserved for read-ahead and help with write clustering so will never be used for flags. This merge lets us get away from code of the form: if (ioflags & IO_SYNC) flags \|= BA_SYNC; For the future, I have considered adding a new field to the vattr structure, va_extsize. This addition could then be exported through the stat structure to allow applications to find out the size of the extended attribute storage and also would provide a more standard interface for truncating them (via VOP_SETATTR rather than VOP_TRUNCATE). I am also contemplating adding a pathconf parameter (for concreteness, lets call it _PC_MAX_EXTSIZE) which would let an application determine the maximum size of the extended atribute storage. Sponsored by: DARPA & NAI Labs.	2002-07-19 07:29:39 +00:00
Kirk McKusick	fb36a3d847	Change utimes to set the file creation time (for filesystems that support creation times such as UFS2) to the value of the modification time if the value of the modification time is older than the current creation time. See utimes(2) for further details. Sponsored by: DARPA & NAI Labs.	2002-07-17 02:03:19 +00:00
Kirk McKusick	faab4e2722	Change the name of st_createtime to st_birthtime. This change is made to reduce confusion between st_ctime and st_createtime. Submitted by: Eric Allman <eric@sendmail.org> Sponsored by: DARPA & NAI Labs.	2002-07-16 22:36:00 +00:00
Tom Rhodes	ae76f60046	Fix a type: s/your are/you are/	2002-07-12 19:56:31 +00:00
Bruce Evans	2daf9dc825	Fixed some printf format errors (4 new ones reported by gcc and 5 nearby old ones not reported by gcc). This helps unbreak LINT.	2002-07-08 12:42:29 +00:00
Ian Dowse	6bd521df93	Use indirect function pointer hooks instead of #ifdef SOFTUPDATES direct calls for the two places where the kernel calls into soft updates code. Set up the hooks in softdep_initialize() and NULL them out in softdep_uninitialize(). This change allows soft updates to function correctly when ufs is loaded as a module. Reviewed by: mckusick	2002-07-01 17:59:40 +00:00
Ian Dowse	5346934fe7	Add the ffs bits necessary to support unloading of the ufs kernel module. This adds an ffs_uninit() function that calls ufs_uninit() and also calls a new softdep_uninitialize() function. Add a stub for softdep_uninitialize() to cover the non-SOFTUPDATES case. Reviewed by: mckusick	2002-07-01 11:00:47 +00:00
Ian Dowse	3423b21c09	Remove the bogus SYSINIT from ufs_dirhash.c and instead add a call to ufsdirhash_init() from ufs_init(). Add uninit() functions corresponding the ufs, dirhash, quota and ihash init() functions.	2002-06-30 02:49:39 +00:00
Ian Dowse	8f42fb8fc9	Remove the kernel file-size limit for UFS2, so that only the limit imposed by the filesystem structure itself remains. With 16k blocks, the maximum file size is now just over 128TB. For now, the UFS1 file size limit is left unchanged so as to remain consistent with RELENG_4, but it too could be removed in the future. Reviewed by: mckusick	2002-06-26 18:34:51 +00:00
Kenneth D. Merry	98cb733c67	At long last, commit the zero copy sockets code. MAKEDEV: Add MAKEDEV glue for the ti(4) device nodes. ti.4: Update the ti(4) man page to include information on the TI_JUMBO_HDRSPLIT and TI_PRIVATE_JUMBOS kernel options, and also include information about the new character device interface and the associated ioctls. man9/Makefile: Add jumbo.9 and zero_copy.9 man pages and associated links. jumbo.9: New man page describing the jumbo buffer allocator interface and operation. zero_copy.9: New man page describing the general characteristics of the zero copy send and receive code, and what an application author should do to take advantage of the zero copy functionality. NOTES: Add entries for ZERO_COPY_SOCKETS, TI_PRIVATE_JUMBOS, TI_JUMBO_HDRSPLIT, MSIZE, and MCLSHIFT. conf/files: Add uipc_jumbo.c and uipc_cow.c. conf/options: Add the 5 options mentioned above. kern_subr.c: Receive side zero copy implementation. This takes "disposable" pages attached to an mbuf, gives them to a user process, and then recycles the user's page. This is only active when ZERO_COPY_SOCKETS is turned on and the kern.ipc.zero_copy.receive sysctl variable is set to 1. uipc_cow.c: Send side zero copy functions. Takes a page written by the user and maps it copy on write and assigns it kernel virtual address space. Removes copy on write mapping once the buffer has been freed by the network stack. uipc_jumbo.c: Jumbo disposable page allocator code. This allocates (optionally) disposable pages for network drivers that want to give the user the option of doing zero copy receive. uipc_socket.c: Add kern.ipc.zero_copy.{send,receive} sysctls that are enabled if ZERO_COPY_SOCKETS is turned on. Add zero copy send support to sosend() -- pages get mapped into the kernel instead of getting copied if they meet size and alignment restrictions. uipc_syscalls.c:Un-staticize some of the sf* functions so that they can be used elsewhere. (uipc_cow.c) if_media.c: In the SIOCGIFMEDIA ioctl in ifmedia_ioctl(), avoid calling malloc() with M_WAITOK. Return an error if the M_NOWAIT malloc fails. The ti(4) driver and the wi(4) driver, at least, call this with a mutex held. This causes witness warnings for 'ifconfig -a' with a wi(4) or ti(4) board in the system. (I've only verified for ti(4)). ip_output.c: Fragment large datagrams so that each segment contains a multiple of PAGE_SIZE amount of data plus headers. This allows the receiver to potentially do page flipping on receives. if_ti.c: Add zero copy receive support to the ti(4) driver. If TI_PRIVATE_JUMBOS is not defined, it now uses the jumbo(9) buffer allocator for jumbo receive buffers. Add a new character device interface for the ti(4) driver for the new debugging interface. This allows (a patched version of) gdb to talk to the Tigon board and debug the firmware. There are also a few additional debugging ioctls available through this interface. Add header splitting support to the ti(4) driver. Tweak some of the default interrupt coalescing parameters to more useful defaults. Add hooks for supporting transmit flow control, but leave it turned off with a comment describing why it is turned off. if_tireg.h: Change the firmware rev to 12.4.11, since we're really at 12.4.11 plus fixes from 12.4.13. Add defines needed for debugging. Remove the ti_stats structure, it is now defined in sys/tiio.h. ti_fw.h: 12.4.11 firmware. ti_fw2.h: 12.4.11 firmware, plus selected fixes from 12.4.13, and my header splitting patches. Revision 12.4.13 doesn't handle 10/100 negotiation properly. (This firmware is the same as what was in the tree previously, with the addition of header splitting support.) sys/jumbo.h: Jumbo buffer allocator interface. sys/mbuf.h: Add a new external mbuf type, EXT_DISPOSABLE, to indicate that the payload buffer can be thrown away / flipped to a userland process. socketvar.h: Add prototype for socow_setup. tiio.h: ioctl interface to the character portion of the ti(4) driver, plus associated structure/type definitions. uio.h: Change prototype for uiomoveco() so that we'll know whether the source page is disposable. ufs_readwrite.c:Update for new prototype of uiomoveco(). vm_fault.c: In vm_fault(), check to see whether we need to do a page based copy on write fault. vm_object.c: Add a new function, vm_object_allocate_wait(). This does the same thing that vm_object allocate does, except that it gives the caller the opportunity to specify whether it should wait on the uma_zalloc() of the object structre. This allows vm objects to be allocated while holding a mutex. (Without generating WITNESS warnings.) vm_object_allocate() is implemented as a call to vm_object_allocate_wait() with the malloc flag set to M_WAITOK. vm_object.h: Add prototype for vm_object_allocate_wait(). vm_page.c: Add page-based copy on write setup, clear and fault routines. vm_page.h: Add page based COW function prototypes and variable in the vm_page structure. Many thanks to Drew Gallatin, who wrote the zero copy send and receive code, and to all the other folks who have tested and reviewed this code over the years.	2002-06-26 03:37:47 +00:00
Kirk McKusick	a7d50c22a6	Force the quota update to be done when an inode is released in ufs_inactive. This avoid a panic when checking a NULL credential in suser_cred().	2002-06-25 01:02:28 +00:00
Jonathan Lemon	c86c4abf99	Prototype fixes (long newinum --> ino_t newinum).	2002-06-24 17:20:19 +00:00
Maxime Henrion	cfbf0a4678	Warning fixes for 64 bits platforms. This eliminates all the warnings I have had in the FFS code on sparc64. Reviewed by: mckusick	2002-06-23 18:17:27 +00:00
Matthew Dillon	10cfbc1978	Rename the BALLOC flags from B_* to BA_* to avoid confusion with the struct buf B_ flags. Approved by: mckusick	2002-06-23 06:12:22 +00:00
Kirk McKusick	5006e77609	This patch fixes a problem whereby filesystems that ran out of inodes in a cylinder group would fail to check for free inodes in other cylinder groups. This bug was introduced in the UFS2 code merge two days ago. An inode is allocated by calling ffs_valloc which calls ffs_hashalloc to do the filesystem scan. Ffs_hashalloc walks around the cylinder groups calling its passed allocator (ffs_nodealloccg in this case) until the allocator returns a non-zero result. The bug is that ffs_hashalloc expects the passed allocator function to return a 64-bit ufs2_daddr_t. When allocating inodes, it calls ffs_nodealloccg which was returning a 32-bit ino_t. The ffs_hashalloc code checked a 64-bit return value and usually found random non-zero bits in the high 32-bits so decided that the allocation had succeeded (in this case in the only cylinder group that it checked). When the result was passed back to ffs_valloc it looked at only the bottom 32-bits, saw zero and declared the system out of inodes. But ffs_hashalloc had really only checked one cylinder group. The fix is to change ffs_nodealloccg to return 64-bit results. Sponsored by: DARPA & NAI Labs. Submitted by: Poul-Henning Kamp <phk@critter.freebsd.dk> Reviewed by: Maxime Henrion <mux@freebsd.org>	2002-06-22 21:24:58 +00:00
Kirk McKusick	1c85e6a35d	This commit adds basic support for the UFS2 filesystem. The UFS2 filesystem expands the inode to 256 bytes to make space for 64-bit block pointers. It also adds a file-creation time field, an ability to use jumbo blocks per inode to allow extent like pointer density, and space for extended attributes (up to twice the filesystem block size worth of attributes, e.g., on a 16K filesystem, there is space for 32K of attributes). UFS2 fully supports and runs existing UFS1 filesystems. New filesystems built using newfs can be built in either UFS1 or UFS2 format using the -O option. In this commit UFS1 is the default format, so if you want to build UFS2 format filesystems, you must specify -O 2. This default will be changed to UFS2 when UFS2 proves itself to be stable. In this commit the boot code for reading UFS2 filesystems is not compiled (see /sys/boot/common/ufsread.c) as there is insufficient space in the boot block. Once the size of the boot block is increased, this code can be defined. Things to note: the definition of SBSIZE has changed to SBLOCKSIZE. The header file <ufs/ufs/dinode.h> must be included before <ufs/ffs/fs.h> so as to get the definitions of ufs2_daddr_t and ufs_lbn_t. Still TODO: Verify that the first level bootstraps work for all the architectures. Convert the utility ffsinfo to understand UFS2 and test growfs. Add support for the extended attribute storage. Update soft updates to ensure integrity of extended attribute storage. Switch the current extended attribute interfaces to use the extended attribute storage. Add the extent like functionality (framework is there, but is currently never used). Sponsored by: DARPA & NAI Labs. Reviewed by: Poul-Henning Kamp <phk@freebsd.org>	2002-06-21 06:18:05 +00:00
Matthew Dillon	a37313d234	In rev 1.72 a situation related to write/mmap was fixed which could result in a user process gaining visibility into the 'old' contents of a filesystem block. There were two cases: (1) when uiomove() fails (user process issues illegal write), and (2) when uiomove() overlaps a mmap() of the same file at the same offset (fault -> recursive buffer I/O reads contents of old block). Unfortunately 1.72 also had the unintended effect of forcing the filesystem to do a read-before-write in the case of a full-block-write (non append case), e.g. 'dd if=/dev/zero of=test.dat bs=1m count=256 conv=notrunc'. This destroys performance.. not only is a read forced for every write, but clustering breaks as well. The solution is to clear the buffer manually in the full-block case rather then asking BALLOC to do it (BALLOC issues the read-before-write). In the partial-block case we want BALLOC to do it because the read-before-write is necessary. This patch should greatly improve database and news-feed server performance. Found by: MKI <mki@mozone.net> MFC after: 3 days	2002-06-19 09:39:41 +00:00
Semen Ustimenko	13866b3fd2	Fix a typo in my recently added comment: s/beleived/believed/ Submitted by: keramida	2002-06-06 20:43:03 +00:00
Alfred Perlstein	ba5a4d6c02	Backout/modify previous revision: "empty default cases shouldn't be removed, they should have a break; statement added to them." Requested by: billf	2002-06-01 20:54:21 +00:00
Alfred Perlstein	37e1dd483d	Silence warnings, remove some empty 'default' switch cases.	2002-06-01 20:40:42 +00:00
Semen Ustimenko	f576a00d1b	Remove lock from ffs_vget introduced by v1.24. Instead of locking the vnode creation globaly, we allow processes to create vnodes concurently. In case of concurent creation of vnode for the one ino, we allow processes to race and then check who wins. Assuming that concurent creation of vnode for same ino is really rare case, this is belived to be an improvement, as it just allows concurent creation of vnodes. Idea by: bp Reviewed by: dillon MFC after: 1 month	2002-05-30 22:04:17 +00:00
Robert Watson	2bab796d96	Remove IFS from 5.0-CURRENT. This facilitates introducing UFS2 as IFS had its fingers deep in the belly of the UFS/FFS split. IFS will be reimplemented by the maintainer at a later date. Requested by: adrian (maintainer)	2002-05-19 00:11:08 +00:00
Ian Dowse	ed6ca8732c	Fix two casts to "daddr_t " that should have been "ufs_daddr_t ".	2002-05-18 19:03:00 +00:00
Ian Dowse	e116910b8d	Fix a typo where sizeof(daddr_t) was specified instead of sizeof(doff_t). Now that daddr_t is 64-bit, this caused hash blocks to be allocated twice as large as they need to be.	2002-05-18 18:58:27 +00:00
Ian Dowse	00b162d018	Remove um_i_effnlink_valid, i_spare[] and the ufsmount_u and inode_u unions, since these were only necessary when ext2fs used ufs code. Reviewed by: mckusick	2002-05-18 18:51:14 +00:00
Poul-Henning Kamp	8fdbc99b69	Fix ufs_daddr_t/daddr_t type problems. Sponsored by: DARPA & NAI labs.	2002-05-17 18:59:53 +00:00
Poul-Henning Kamp	c7ffbdd995	Call ufs_bmaparray() with right parameter type. Sponsored by: DARPA & NAI Labs.	2002-05-17 18:53:29 +00:00
Tom Rhodes	d394511de3	More s/file system/filesystem/g	2002-05-16 21:28:32 +00:00
Poul-Henning Kamp	98b0c78978	Make daddr_t and u_daddr_t 64bits wide. Retire daddr64_t and use daddr_t instead. Sponsored by: DARPA & NAI Labs.	2002-05-14 11:09:43 +00:00
Poul-Henning Kamp	05f4ff5da1	Remove register keyword. Sponsored by: DARPA & NAI Labs. Submitted by: mckusick	2002-05-13 09:22:31 +00:00
Poul-Henning Kamp	2b2df79fad	Remove two "register" and a blank line. Submitted by: mckusick Sponsored by: DARPA & NAI Labs.	2002-05-12 22:54:48 +00:00
Poul-Henning Kamp	7110af7577	ARGH! SBLOCK is not unused. Try to get this right. BBSIZE belongs in <sys/disklabel.h> (but shouldn't be a constant). Define SBLOCK again, using the right math. Sponsored by: DARPA & NAI Labs.	2002-05-12 20:21:40 +00:00
Poul-Henning Kamp	7cb71b749c	Remove #define for BBOFF, it is assumed == 0 so many places that we might as well forget about it. In fact the only thing which used it was the SBOFF macro. Sponsored by: DARPA & NAI Labs.	2002-05-12 20:00:21 +00:00
Poul-Henning Kamp	16910634dd	Remove unused BBLOCK and SBLOCK #defines. Sponsored by: DARPA & NAI Labs.	2002-05-12 19:56:31 +00:00
Alan Cox	c0b6bbb80b	o Condition the compilation and use of vm_freeze_copyopts() on ENABLE_VFS_IOOPT.	2002-05-06 05:45:57 +00:00
Poul-Henning Kamp	d08961bec3	Move some UFS related stuff home where it belongs.	2002-05-05 20:04:33 +00:00
Jeff Roberson	5df148630f	Include systm.h so panic(9) is defined when doing DEBUG_ALL_VFS_LOCKS.	2002-05-04 02:40:37 +00:00
Poul-Henning Kamp	afe564a200	Name ufs_vop_[gs]etextattr() consistently with the rest of our VOPs and put then in the ufs_vnops where they belong, rather than in the ffs_vnops. Ok'ed by: rwatson Sponsored by: DARPA & NAI Labs.	2002-05-03 08:40:33 +00:00
Poul-Henning Kamp	d65b3c73d7	Use vop_panic() instead of our home-rolled version.	2002-05-02 19:15:52 +00:00
Alfred Perlstein	5a6ce14c42	Remove support for using soon to be retired "special" poll(2) ops. Replace with kevent(2) ops. This is untested, but the code would rot even further if this wasn't applied. I've chosen to apply this to prompt some cleanup. Submitted by: bde	2002-04-18 14:52:28 +00:00
Jeff Roberson	5dacf95488	Don't peak into the malloc_type structure for limits. The desired vnodes check should be sufficient. This is required for the pending removal of malloc_type limits.	2002-04-15 03:35:35 +00:00
Poul-Henning Kamp	2dd527b3ac	Move generic disk ioctls from <sys/disklabel.h> to <sys/disk.h>. Sponsored by: DARPA & NAI Labs	2002-04-08 09:20:07 +00:00
John Baldwin	6008862bc2	Change callers of mtx_init() to pass in an appropriate lock type name. In most cases NULL is passed, but in some cases such as network driver locks (which use the MTX_NETWORK_LOCK macro) and UMA zone locks, a name is used. Tested on: i386, alpha, sparc64	2002-04-04 21:03:38 +00:00
Poul-Henning Kamp	a463023d6d	Move the FFS parameter MAXFRAG from <sys/param.h> to <ufs/ffs/fs.h> Sponsored by: DARPA & NAI Labs.	2002-04-03 20:39:27 +00:00
Poul-Henning Kamp	46a67eaced	Use DIOCGSECTORSIZE instead of the bogus DIOCGPART ioctl.	2002-04-02 11:23:14 +00:00
John Baldwin	44731cab3b	Change the suser() API to take advantage of td_ucred as well as do a general cleanup of the API. The entire API now consists of two functions similar to the pre-KSE API. The suser() function takes a thread pointer as its only argument. The td_ucred member of this thread must be valid so the only valid thread pointers are curthread and a few kernel threads such as thread0. The suser_cred() function takes a pointer to a struct ucred as its first argument and an integer flag as its second argument. The flag is currently only used for the PRISON_ROOT flag. Discussed on: smp@	2002-04-01 21:31:13 +00:00
Bruce Evans	0508986cce	In ffs_mountffs(), set mnt_iosize_max to si_iosize_max unconditionally provided the latter is nonzero. At this point, the former is a fairly arbitrary default value (DFTPHYS), so changing it to any reasonable value specified by the device driver is safe. Using the maximum of these limits broke ffs clustered i/o for devices whose si_iosize_max is < DFLTPHYS. Using the minimum would break device drivers' ability to increase the active limit from DFTLPHYS up to MAXPHYS. Copied the code for this and the associated (unnecessary?) fixup of mp_iosize_max to all other filesystems that use clustering (ext2fs and msdosfs). It was completely missing. PR: 36309 MFC-after: 1 week	2002-03-30 15:12:57 +00:00
David Malone	527f5ce021	Two minor changes to dirhash, which result in some marginal benchmark improvements. 1) If deleting an entry results in a chain of deleted slots ending in an empty slot, then we can be a bit more aggressive about marking slots as empty. 2) The last stage of the FNV hash is to xor the last byte of data into the hash. This means that filenames which differ only in the last byte will be placed close to one another in the hash table, which forms longer chains. To work around this common case, we also hash in the address of the dirhash structure. news/cancel = news/articles/control/cancel for a tradspool inn server squid2 = squid level 2 directory (dirs called 00->FF) squid3 = squid level 3 directory (files called 00001F00->00001FFF) mean #probes for home dir mh inbox news/cancel tmp squid2 squid3 old successful 1.02 3.19 4.07 1.10 7.85 2.06 new successful 1.04 1.32 1.27 1.04 1.93 1.17 old unsuccessful 1.08 4.50 5.37 1.17 10.76 2.69 new unsuccessful 1.08 1.73 1.64 1.17 2.89 1.37 Reviewed by: iedowse MFC after: 2 weeks	2002-03-20 17:58:02 +00:00
Jeff Roberson	e2f8f8a6b6	Remove references to vm_zone.h and switch over to the new uma API.	2002-03-20 08:48:07 +00:00
Alfred Perlstein	6f1e855112	Remove __P.	2002-03-19 22:40:48 +00:00
Bruce Evans	367b50a28f	Fixed some printf format errors (hopefully all of the remaining daddr64_t ones for GENERIC, and all others on the same line as those). Reformat the printfs if necessary to avoid new long lones or old format printf errors.	2002-03-19 04:09:21 +00:00
Kirk McKusick	a0595d0249	Add a flags parameter to VFS_VGET to pass through the desired locking flags when acquiring a vnode. The immediate purpose is to allow polling lock requests (LK_NOWAIT) needed by soft updates to avoid deadlock when enlisting other processes to help with the background cleanup. For the future it will allow the use of shared locks for read access to vnodes. This change touches a lot of files as it affects most filesystems within the system. It has been well tested on FFS, loopback, and CD-ROM filesystems. only lightly on the others, so if you find a problem there, please let me (mckusick@mckusick.com) know.	2002-03-17 01:25:47 +00:00
Kirk McKusick	0d2af52141	Introduce the new 64-bit size disk block, daddr64_t. Change the bio and buffer structures to have daddr64_t bio_pblkno, b_blkno, and b_lblkno fields which allows access to disks larger than a Terabyte in size. This change also requires that the VOP_BMAP vnode operation accept and return daddr64_t blocks. This delta should not affect system operation in any way. It merely sets up the necessary interfaces to allow the development of disk drivers that work with these larger disk block addresses. It also allows for the development of UFS2 which will use 64-bit block addresses.	2002-03-15 18:49:47 +00:00
David E. O'Brien	f0c8652ed4	Quiet a warning on the Alpha.	2002-03-15 04:06:10 +00:00
Kirk McKusick	9721068f95	This corrects the first of two known deadlock conditions that come from the presence of a snapshot file.	2002-03-14 01:21:13 +00:00
Ian Dowse	23bd68a426	Fix a bug in ufsdirhash_adjfree() that caused it to incorrectly update the free-space statistics in some cases. The problem affected directory blocks when the free space dropped below the size of the maximum allowed entry size. When this happened, the free-space summary information could claim that there are no further blocks that can fit a maximum-size entry, even if there are. The effect of this bug is that the directory may be enlarged even though there is space within the directory for the new entry. This wastes disk space and has a negative impact on performance. Fix it by correctly computing the dh_firstfree array index, adding a helper macro for clarity. Put an extra sanity check into ufsdirhash_checkblock() to detect the situation in future. Found by: dwmalone Reviewed by: dwmalone MFC after: 1 week	2002-03-11 19:13:22 +00:00
Poul-Henning Kamp	063f776327	I missed one VOP_CLOSE in the previous commit. Pointed out by: bde	2002-03-11 16:27:04 +00:00
Poul-Henning Kamp	3dbceccb78	As a XXX bandaid open the mounted device READ/WRITE even if we only mount read-only. The trouble here is that we don't reopen the device in read/write mode when we remount in read/write mode resulting in a filesystem sending write requests to a device which was only opened read/only. I'm not quite sure how such a reopen would best be done and defer the problem to more agile hackers.	2002-03-11 13:53:00 +00:00
Robert Watson	409b188022	Update DBA for NAI. We have several. We used the wrong one. :-)	2002-03-07 17:49:06 +00:00
Brian Feldman	9d9737ecb2	Add new errno ``ENOATTR''.	2002-03-07 15:13:44 +00:00
Matthew Dillon	2cfaf1e315	cleanup readability syntax prior to ongoing b_resid work commits. MFC after: 1 day	2002-03-06 00:44:30 +00:00
John Baldwin	fdcc1cc09f	Use thread0.td_ucred instead of proc0.p_ucred. This change is cosmetic and isn't strictly required. However, it lowers the number of false positives found when grep'ing the kernel sources for p_ucred to ensure proper locking.	2002-02-27 19:18:10 +00:00
John Baldwin	a854ed9893	Simple p_ucred -> td_ucred changes to start using the per-thread ucred reference.	2002-02-27 18:32:23 +00:00
Poul-Henning Kamp	986066d065	Replace bowrite() with BUF_WRITE in ufs. Remove bowrite(), it is now unused. This is the first step in getting entirely rid of BIO_ORDERED which is a generally accepted evil thing. Approved by: mckusick	2002-02-22 09:03:00 +00:00
Robert Watson	15b27e726e	o Minor style fix on #endif, missing '_' in comment.	2002-02-20 15:44:43 +00:00
Poul-Henning Kamp	68edc1b939	Make v_addpollinfo() visible and non-inline. Have callers only call it as needed. Add necessary call in ufs_kqfilter(). Test-case found by: Andrew Gallatin <gallatin@cs.duke.edu>	2002-02-18 16:18:02 +00:00
Poul-Henning Kamp	4b55dbe36b	Move the stuff related to select and poll out of struct vnode. The use of the zone allocator may or may not be overkill. There is an XXX: over in ufs/ufs/ufs_vnops.c that jlemon may need to revisit. This shaves about 60 bytes of struct vnode which on my laptop means 600k less RAM used for vnodes.	2002-02-17 21:15:36 +00:00
Poul-Henning Kamp	e8b26e995e	Collect the VN_KNOTE() macro definitions on vnode.h	2002-02-17 21:07:57 +00:00
Julian Elischer	2c1007663f	In a threaded world, differnt priorirites become properties of different entities. Make it so. Reviewed by: jhb@freebsd.org (john baldwin)	2002-02-11 20:37:54 +00:00
Robert Watson	cfcd3c783e	Minor style tweaks. Remove an unneeded comment and commented out code that won't be needed.	2002-02-10 04:57:08 +00:00
Robert Watson	41d5a43fa1	Copyright + license update.	2002-02-10 04:50:24 +00:00
Robert Watson	74237f55b0	Part I: Update extended attribute API and ABI: o Modify the system call syntax for extattr_{get,set}_{fd,file}() so as not to use the scatter gather API (which appeared not to be used by any consumers, and be less portable), rather, accepts 'data' and 'nbytes' in the style of other simple read/write interfaces. This changes the API and ABI. o Modify system call semantics so that extattr_get_{fd,file}() return a size_t. When performing a read, the number of bytes read will be returned, unless the data pointer is NULL, in which case the number of bytes of data are returned. This changes the API only. o Modify the VOP_GETEXTATTR() vnode operation to accept a *size_t argument so as to return the size, if desirable. If set to NULL, the size will not be returned. o Update various filesystems (pseodofs, ufs) to DTRT. These changes should make extended attributes more useful and more portable. More commits to rebuild the system call files, as well as update userland utilities to follow. Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs	2002-02-10 04:43:22 +00:00
Poul-Henning Kamp	b6e1c37356	Remove di_inumber since LFS is long gone.	2002-02-10 00:55:49 +00:00
Kirk McKusick	b06051cf7c	Occationally background fsck would cause a spurious ``freeing free inode'' panic. This change corrects that problem by setting the fs_active flag when the inode map changes to notify the snapshot code that the cylinder group must be rescanned. Submitted by: Robert Watson <rwatson@FreeBSD.org>	2002-02-07 22:13:56 +00:00
Kirk McKusick	cfdaa88697	Occationally deleted files would hang around for hours or days without being reclaimed. This bug was introduced in revision 1.95 dealing with filenames placed in newly allocated directory blocks, thus is not present in 4.X systems. The bug is triggered when a new entry is made in a directory after the data block containing the original new entry has been written, but before the inode that references the data block has been written. Submitted by: Bill Fenner <fenner@research.att.com>	2002-02-07 00:54:32 +00:00
Kirk McKusick	c9f96392c7	When taking a snapshot, we must check for active files that have been unlinked (e.g., with a zero link count). We have to expunge all trace of these files from the snapshot so that they are neither reclaimed prematurely by fsck nor saved unnecessarily by dump.	2002-02-02 01:42:44 +00:00
Kirk McKusick	7b60855308	Add a stub for softdep_request_cleanup() so that compilation without SOFTUPDATES option works properly. Submitted by: Benno Rice <benno@jeamland.net>	2002-01-23 02:18:56 +00:00
Kirk McKusick	03a2057a5b	This patch fixes a long standing complaint with soft updates in which small and/or nearly full filesystems would fail with `file system full' messages when trying to replace a number of existing files (for example during a system installation). When the allocation routines are about to fail with a file system full condition, they make a call to softdep_request_cleanup() which attempts to accelerate the flushing of pending deletion requests in an effort to free up space. In the face of filesystem I/O requests that exceed the available disk transfer capacity, the cleanup request could take an unbounded amount of time. Thus, the softdep_request_cleanup() routine will only try for tickdelay seconds (default 2 seconds) before giving up and returning a filesystem full error. Under typical conditions, the softdep_request_cleanup() routine is able to free up space in under fifty milliseconds.	2002-01-22 06:17:22 +00:00
Kirk McKusick	99bef8782b	Fix a bug introduced in ffs_snapshot.c -r1.25 and fs.h -r1.26 which caused incomplete snapshots to be taken. When background fsck would run on these snapshots, the result would be files being incorrectly released which would subsequently panic the kernel with ``handle_workitem_freefile: inodedep survived'', ``handle_written_inodeblock: live inodedep'', and ``handle_workitem_remove: lost inodedep'' errors.	2002-01-17 08:33:32 +00:00
Kirk McKusick	8af31e7b46	Put write on read-only filesystem panic after we have weeded out block and character devices, fifo's, etc. Submitted by: Bruce Evans <bde@zeta.org.au>	2002-01-16 04:59:09 +00:00
Kirk McKusick	cd6005961f	When downgrading a filesystem from read-write to read-only, operations involving file removal or file update were not always being fully committed to disk. The result was lost files or corrupted file data. This change ensures that the filesystem is properly synced to disk before the filesystem is down-graded. This delta also fixes a long standing bug in which a file open for reading has been unlinked. When the last open reference to the file is closed, the inode is reclaimed by the filesystem. Previously, if the filesystem had been down-graded to read-only, the inode could not be reclaimed, and thus was lost and had to be later recovered by fsck. With this change, such files are found at the time of the down-grade. Normally they will result in the filesystem down-grade failing with `device busy'. If a forcible down-grade is done, then the affected files will be revoked causing the inode to be released and the open file descriptors to begin failing on attempts to read. Submitted by: "Sam Leffler" <sam@errno.com>	2002-01-15 07:17:12 +00:00
Alfred Perlstein	426da3bcfb	SMP Lock struct file, filedesc and the global file list. Seigo Tanimura (tanimura) posted the initial delta. I've polished it quite a bit reducing the need for locking and adapting it for KSE. Locks: 1 mutex in each filedesc protects all the fields. protects "struct file" initialization, while a struct file is being changed from &badfileops -> &pipeops or something the filedesc should be locked. 1 mutex in each struct file protects the refcount fields. doesn't protect anything else. the flags used for garbage collection have been moved to f_gcflag which was the FILLER short, this doesn't need locking because the garbage collection is a single threaded container. could likely be made to use a pool mutex. 1 sx lock for the global filelist. struct file * fhold(struct file fp); / increments reference count on a file / struct file fhold_locked(struct file fp); / like fhold but expects file to locked / struct file ffind_hold(struct thread , int fd); / finds the struct file in thread, adds one reference and returns it unlocked / struct file ffind_lock(struct thread , int fd); / ffind_hold, but returns file locked */ I still have to smp-safe the fget cruft, I'll get to that asap.	2002-01-13 11:58:06 +00:00
Kirk McKusick	0bc7a833ec	When going to sleep, we must save our SPL so that it does not get lost if some other process uses the lock while we are sleeping. We restore it after we have slept. This functionality is provided by a new routine interlocked_sleep() that wraps the interlocking with functions that sleep. This function is then used in place of the old ACQUIRE_LOCK_INTERLOCKED() and FREE_LOCK_INTERLOCKED() macros. Submitted by: Debbie Chu <dchu@juniper.net>	2002-01-12 20:57:36 +00:00
Kirk McKusick	794ef3471f	Must call drain_output() before checking the dirty block list in softdep_sync_metadata(). Otherwise we may miss dependencies that need to be flushed which will result in a later panic with the message ``vinvalbuf: dirty bufs''. Submitted by: Matthew Dillon <dillon@apollo.backplane.com> MFC after: 1 week	2002-01-11 19:59:27 +00:00
Poul-Henning Kamp	9c643340bb	Do not pull quota entries of the cache-list if they have already been removed from the cache-list as part of a previous unmount. This would result in panics (page fault in dqflush()) during subsequent umounts provided that enough distinct UID's to actually make the hash do something are active. This can probably explain a number of weird quota related behaviours. PR: 32331 maybe more. Reproduced by: Søren Schrørder <sch@cybercity.dk>	2002-01-10 15:02:57 +00:00
Mike Smith	b9a4338d29	Initialise the bioops vector hack at runtime rather than at link time. This avoids the use of common variables. Reviewed by: mckusick	2002-01-08 19:32:18 +00:00
Matthew Dillon	23b590188f	Fix a BUF_TIMELOCK race against BUF_LOCK and fix a deadlock in vget() against VM_WAIT in the pageout code. Both fixes involve adjusting the lockmgr's timeout capability so locks obtained with timeouts do not interfere with locks obtained without a timeout. Hopefully MFC: before the 4.5 release	2001-12-20 22:42:27 +00:00
Kirk McKusick	f305c5d199	Change the atomic_set_char to atomic_set_int and atomic_clear_char to atomic_clear_int to ease the implementation for the sparc64. Requested by: Jake Burkholder <jake@locore.ca>	2001-12-18 18:05:17 +00:00
Ian Dowse	143a5346c9	Make sure we ignore the value of `fs_active' when reloading the superblock, and move the initialisation of it to beside where other pointer fields are initialised.	2001-12-16 18:54:09 +00:00
Ian Dowse	3fa4044e34	Move the new superblock field `fs_active' into the region of the superblock that is already set up to handle pointer types. This fixes an accidental change in the superblock size on 64-bit platforms caused by revision 1.24.	2001-12-16 18:51:11 +00:00
Kirk McKusick	cc5a92334f	Minimize the time necessary to suspend operations on a filesystem when taking a snapshot. The two time consuming operations are scanning all the filesystem bitmaps to determine which blocks are in use and scanning all the other snapshots so as to be able to expunge their blocks from the view of the current snapshot. The bitmap scanning is broken into two passes. Before suspending the filesystem all bitmaps are scanned. After the suspension, those bitmaps that changed after being scanned the first time are rescanned. Typically there are few bitmaps that need to be rescanned. The expunging of other snapshots is now done after the suspension is released by observing that we can easily identify any blocks that were allocated to them after the suspension (they will be maked as `not needing to be copied' in the just created snapshot). For all the gory details, see the ``Running fsck in the Background'' paper in the Usenix BSDCon 2002 Conference Proceedings, pages 55-64.	2001-12-14 00:15:06 +00:00
Kirk McKusick	9db12e5108	When a file is partially truncated, we first check to see if the new file end will land in the middle of a file hole. Since the last block of a file must always be allocated, the hole is filled by allocating a block at that location. If the hole being filled is a direct block, then the truncation may eventually reduce the full sized block down to a fragment. When running with soft updates, it is necessary to FSYNC the file after allocating the block and before creating the fragment to avoid triggering a soft updates inconsistency when the block unexpectedly shrinks. Found by: Matthew Dillon <dillon@apollo.backplane.com> MFC after: 1 week	2001-12-13 05:07:48 +00:00
Robert Watson	24373ce6ed	Use 'mkdir -p /.attribute/system' instead of breaking it into two seperate mkdir targets. Submitted by: jedgar	2001-11-30 15:32:07 +00:00
Robert Watson	cff9580525	Use 'mkdir -p /.attribute/system' instead of breaking it into two seperate mkdir targets.	2001-11-30 15:21:20 +00:00
Robert Watson	15f1c8d3d2	README.extattr incorrectly specified sample command lines for UFS_EXTATTR_AUTOSTART. Insert the missing 'initattr' arguments to extattrctl. Noticed by: green	2001-11-30 15:15:27 +00:00
Guido van Rooij	40e294f796	When mkdir()-ing, the parent dir gets is linkcount increased. Fix VN_KNOTE to reflect that. Found by: tobez@freebsd.org MFC after: 2 days	2001-11-22 15:33:12 +00:00
Ian Dowse	4202b366fc	Oops, when trying the dirhash sequential-access optimisation, compare the slot offset against the predicted offset, not a boolean flag. This typo effectively disabled the sequential optimisation, but was otherwise harmless. Not surprisingly, fixing this improves performance in the sequential access case. I am seeing a 7% speedup on one machine here; using dirhash when sequentially looking up directory entries is now about 5% faster instead of 2% slower than the non-dirhash case. Submitted by: KOIE Hidetaka <koie@suri.co.jp> MFC after: 1 week	2001-11-14 15:08:07 +00:00
Matthew Dillon	7e76bb562e	Implement IO_NOWDRAIN and B_NOWDRAIN - prevents the buffer cache from blocking in wdrain during a write. This flag needs to be used in devices whos strategy routines turn-around and issue another high level I/O, such as when MD turns around and issues a VOP_WRITE to vnode backing store, in order to avoid deadlocking the dirty buffer draining code. Remove a vprintf() warning from MD when the backing vnode is found to be in-use. The syncer of buf_daemon could be flushing the backing vnode at the time of an MD operation so the warning is not correct. MFC after: 1 week	2001-11-05 18:48:54 +00:00
Robert Watson	6d8785434f	o Update copyright dates. o Add reference to TrustedBSD Project in license header. o Update dated comments, including comment in extattr.h claiming that no file systems support extended attributes. o Improve comment consistency.	2001-11-01 21:37:07 +00:00
Robert Watson	b6e0472987	o Althought this is not specified in POSIX.1e, the UFS ACL implementation coerces the deletion of a default ACL on a directory when no default ACL EA is present to success. Because the UFS EA implementation doesn't disinguish the EA failure modes "that EA name has not been administratively enabled" from "that EA name has no defined data", there's a potential conflict in error return values. Normally, the lack of administratively configured EA support is coerced to EOPNOTSUPP to indicate that ACLs are not available; in this case, it is possible to get a successful return, even if ACLs are not available because EA support for them has not been enabled. Expand the comment in ufs_setacl() to identify this case. Obtained from: TrustedBSD Project	2001-10-27 05:39:17 +00:00
Robert Watson	ac8b3dd7dc	o Clarify a comment about the locking condition of the vnode upon exit from ufs_extattr_enable_with_open(). o Print auto-start notifications if (bootverbose). This was previously commented out since it didn't know how to check for bootverbose. o Drop in comments throughout indicating where ENOENT should be replaced with ENOATTR once that is available. Obtained from: TrustedBSD Project	2001-10-27 05:19:14 +00:00
Robert Watson	29543004bd	o The comment about ordering the destruction of the lock and the removal of the flag indicating that the structure was initialized didn't need an XXX, since it didn't need fixing. Obtained from: TrustedBSD Project	2001-10-27 05:05:39 +00:00
Robert Watson	9444746795	o Wrap a number of long lines of code, many of which were introduced due to KSE-related (p) expansions. Obtained from: TrustedBSD Project	2001-10-27 05:03:05 +00:00
Robert Watson	ce5ddec25f	Since namespace support was added to the UFS extended attribute implementation to replace single-character namespace prefixes, '$' is no longer an invalid attribute name, and the namespace is relevant to validity determination. o Remove '$' case from ufs_extattr_valid_attrname() o Add attrnamespace argument to ufs_extattr_valid_attrname(), and fill out appropriately. Currently no decisions are made based on the namespace argument, but may be in the future. Obtained from: TrustedBSD Project	2001-10-27 04:58:28 +00:00
Matthew Dillon	245df27cee	Implement kern.maxvnodes. adjusting kern.maxvnodes now actually has a real effect. Optimize vfs_msync(). Avoid having to continually drop and re-obtain mutexes when scanning the vnode list. Improves looping case by 500%. Optimize ffs_sync(). Avoid having to continually drop and re-obtain mutexes when scanning the vnode list. This makes a couple of assumptions, which I believe are ok, in regards to vnode stability when the mount list mutex is held. Improves looping case by 500%. (more optimization work is needed on top of these fixes) MFC after: 1 week	2001-10-26 00:08:05 +00:00
Ian Dowse	71fc5e11c7	Default to not performing ufs_dirhash's extensive directory-block sanity check after every directory modification. This check can be re-enabled at any time by setting the sysctl "vfs.ufs.dirhash_docheck" to 1. This group of sanity tests was there to ensure that any UFS_DIRHASH bugs could be caught by a panic before a potentially corrupted directory block would be written to disk. It has served its main purpose now, so disable it in the interest of performance. MFC after: 1 week	2001-10-25 22:55:59 +00:00
Matthew Dillon	c72ccd014d	Change the vnode list under the mount point from a LIST to a TAILQ in preparation for an implementation of limiting code for kern.maxvnodes. MFC after: 3 days	2001-10-23 01:21:29 +00:00
John Baldwin	bd78cece5d	Change the kernel's ucred API as follows: - crhold() returns a reference to the ucred whose refcount it bumps. - crcopy() now simply copies the credentials from one credential to another and has no return value. - a new crshared() primitive is added which returns true if a ucred's refcount is > 1 and false (0) otherwise.	2001-10-11 23:38:17 +00:00
John Baldwin	7106ca0d1a	Add missing includes of sys/lock.h.	2001-10-11 17:52:20 +00:00
Matthew Dillon	962922dcd2	Remove panics for rename() race conditions. The panics are inappropriate because the IN_RENAME flag only fixes a few of the huge number of race conditions that can result in the source path becoming invalid even prior to the VOP_RENAME() call. The panics created a serious security issue whereby an attacker could fairly easily cause the panic to occur, crashing the machine. The correct solution requires a great deal of work in the namei path cache code. MFC after: 0 days	2001-10-08 00:37:54 +00:00
Robert Watson	ab66aa1468	o Replace two direct uid!=0 comparisons with suser_xxx() calls. Obtained from: TrustedBSD Project	2001-10-02 14:41:43 +00:00
Robert Watson	b73d2870cd	o Replace two direct uid!=0 comparisons with suser_td() calls. Obtained from: TrustedBSD Project	2001-10-02 14:34:22 +00:00
Matthew Dillon	4c94c7bfb9	Backout the last commit. The problem is actually much worse then I first thought and may require serious work to the VOP_RENAME() api itself. Basically, by the time the VOP_RENAME() function is called, it's already too late.	2001-10-02 04:26:58 +00:00
Matthew Dillon	be2a975a9f	IN_RENAME should only be cleared by the routine that set it. This fixes a rename/rmdir race that has been shown to cause a panic. Bug reported by: Yevgeniy Aleynikov <eugenea@infospace.com> MFC after: 3 days	2001-10-02 02:58:48 +00:00
John Baldwin	eb46fac565	- Fix some minor whitespace nits. - Move the SPECIAL_FLAG #define up next to the NOHOLDER #define and fix a little nit that caused it to be defined as -(sizeof (struct thread) + 1) instead of -2.	2001-09-27 21:04:13 +00:00
Robert Watson	57358f1e93	o Re-enable support of system file flags in jail() by adding back the PRISON_ROOT to the suser_xxx() check. Since securelevels may now be raised in specific jails, use of system flags can still be restricted in jail(), but in a more configurable way. o Users of jail() expecting system flags (such as schg) to restrict jail()'s should be sure to set the securelevel appropriately in jail()'s. o This fixes activities involving automated system flag removal in jail(), including installkernel and friends. Obtained from: TrustedBSD Project	2001-09-26 20:44:41 +00:00
Robert Watson	6748bcc51e	o Modify ufs_setattr() so that it uses securelevel_gt() instead of direct variable access. Obtained from: TrustedBSD Project	2001-09-26 20:31:37 +00:00
Robert Watson	aaef1c3934	o Further clarify comment: ad Udo's request, re-insert the 'if' refering to securelevels; also, update the unprivileged process text to better indicate the scope of actions permittable when any system flags are already set (limited). Submitted by: Udo Schweigert <udo.schweigert@siemens.com>	2001-09-25 12:02:44 +00:00
Robert Watson	82e83c60b3	o Parallelize the comment on the relationship between privileged un-jailed processes and the actual securelevel check: make the comment use '> 0' instead of inverted '<= 0'.	2001-09-25 02:26:10 +00:00
Ian Dowse	5d76690a7f	The addition of i_dirhash to struct inode pushed RELENG_4's sizeof(struct inode) into a new malloc bucket on the i386. This didn't happen in -current due to the removal of i_lock, but it does no harm to apply the workaround to -current first. Reduce the size of the i_spare[] array in struct inode from 4 to 3 entries, and change ext2fs to use i_din.di_spare[1] so that it does not need i_spare[3]. Reviewed by: bde MFC after: 3 days	2001-09-24 18:29:20 +00:00
Julian Elischer	b40ce4165d	KSE Milestone 2 Note ALL MODULES MUST BE RECOMPILED make the kernel aware that there are smaller units of scheduling than the process. (but only allow one thread per process at this time). This is functionally equivalent to teh previousl -current except that there is a thread associated with each process. Sorry john! (your next MFC will be a doosie!) Reviewed by: peter@freebsd.org, dillon@freebsd.org X-MFC after: ha ha ha ha	2001-09-12 08:38:13 +00:00
Ian Dowse	4691e9ead0	The "dirpref" directory layout preference improvements make use of an array "fs_contigdirs[]" to avoid too many directories getting created in each cylinder group. The memory required for this and two other arrays (fs_csp[] and fs_maxcluster[]) is allocated with a single malloc() call, and divided up afterwards. However, the 'space' pointer is not advanced correctly, so fs_contigdirs and fs_maxcluster end up pointing to the same address. Add the missing code to advance the 'space' pointer, and remove an unnecessary update of the pointer that follows. This is likely to fix the "ffs_clusteralloc: map mismatch" panics that have been reported recently. Submitted by: Luke Mewburn <lukem@wasabisystems.com>	2001-09-09 23:48:28 +00:00
Chris D. Faulhaber	dac4a67ce7	Use ACL_PERM_NONE instead of hardcoding 0 when initializing ACL entry permissions. Reviewed by: rwatson	2001-09-01 23:18:15 +00:00
Robert Watson	7df97b6117	o At some point, unmounting a non-EA file system with EA's compiled in got a bit broken, when ufs_extattr_stop() was called and failed, ufs_extattr_destroy() would panic. This makes the call to destroy() conditional on the success of stop(). Submitted by: Christian Carstensen <cc@devcon.net> Obtained from: TrustedBSD Project	2001-09-01 20:11:05 +00:00
Peter Wemm	0f7289022b	If a file has been completely unlinked, stop automatically syncing the file. ffs will discard any pending dirty pages when it is closed, so we may as well not waste time trying to clean them. This doesn't stop other things from writing it out, eg: pageout, fsync(2) etc.	2001-08-27 06:09:56 +00:00
Ian Dowse	be70fc04ce	Stop using dirhash when a directory is removed, and ensure that we never attempt to hash directories once they are deleted. This fixes a problem where operations on a deleted directory could trigger dirhash sanity panics.	2001-08-26 20:47:19 +00:00
Ian Dowse	2ed42812bd	When compacting directories, ufs_direnter() always trusted DIRSIZ() to supply the number of bytes to be bcopy()'d to move an entry. If d_ino == 0 however, DIRSIZ() is not guaranteed to return a sensible length, so ufs_direnter could end up corrupting a directory during compaction. In practice I believe this can only happen after fsck_ffs has fixed a previously-corrupted directory. We now deal with any mid-block unused entries specially to avoid using DIRSIZ() or bcopy() on such entries. We also ensure that the variables 'dsize' and 'spacefree' contain meaningful values at all times. Add a few comments to describe better this intricate piece of code. The special handling of mid-block unused entries makes the dirhash- specific bugfix in the previous revision (1.53) now uncecessary, so this change removes it. Reviewed by: mckusick	2001-08-26 01:25:12 +00:00
Ian Dowse	7dfb550e0c	When compressing directory blocks, the dirhash code didn't check that the directory entry was in use before attempting to find it in the hash structures to change its offset. Normally, unused entries do not need to be moved, but fsck can leave behind some unused entries that do. A dirhash sanity panic resulted when the entry to be moved was not found. Add a check that stops entries with d_ino == 0 from being passed to ufsdirhash_move().	2001-08-22 01:35:17 +00:00
Peter Wemm	61a4237001	Sigh. ufs_lookup() calls ffs_snapgone(), meaning that 'options EXT2FS' without 'options FFS' would fail to link.	2001-08-18 03:08:48 +00:00
Ian Dowse	9e27954de1	Two recent commits in sys/ufs/ufs interacted badly with ext2fs because it shares ufs code. In ufs_fhtovp(), the test on i_effnlink is invalid because ext2fs does not maintain this field. In ufs_close(), i_effnlink is also tested, to determines whether or not to call vn_start_write(). The ufs_fhtovp issue breaks NFS exporting of ext2fs filesystems; I believe the other is harmless. Fix both cases by checking um_i_effnlink_valid in the ufsmount struct, and use i_nlink if necessary. Noticed by: bde Reviewed by: mckusick, bde	2001-07-29 22:26:01 +00:00
Ian Dowse	54d6d2dfaf	Disable the dirhash sanity check that panics if an unused directory entry (d_ino == 0) is found in a position that is not the start of a DIRBLKSIZ block. While such entries cannot occur normally (ufs always extends the previous entry to cover the free space instead), they do not cause problems and fsck does not fix them, so panicking is bad.	2001-07-27 18:45:41 +00:00
Peter Wemm	815d14ddab	Use a fixed type for times in on-disk structures for ufs rather than something that could potentially change like time_t.	2001-07-16 00:55:27 +00:00
Ian Dowse	50c7c3a7c8	Return a locked struct buf from ufsdirhash_lookup() to avoid one extra getblk/brelse sequence for each lookup. We already had this buf in ufsdirhash_lookup(), so there was no point in brelse'ing it only to have the caller immediately reaquire the same buffer. This should make the case of sequential lookups marginally faster; in my tests, sequential lookups with dirhash enabled are now only around 1% slower than without dirhash.	2001-07-13 20:50:38 +00:00
Ian Dowse	9b5ad47fb7	Bring in dirhash, a simple hash-based lookup optimisation for large directories. When enabled via "options UFS_DIRHASH", in-core hash arrays are maintained for large directories. These allow all directory operations to take place quickly instead of requiring long linear searches. For now anyway, dirhash is not enabled by default. The in-core hash arrays have a memory requirement that is approximately half the size of the size of the on-disk directory file. A number of new sysctl variables allow control over which directories get hashed and over the maximum amount of memory that dirhash will use: vfs.ufs.dirhash_minsize The minimum on-disk directory size for which hashing should be used. The default is 2560 (2.5k). vfs.ufs.dirhash_maxmem The system-wide maximum total memory to be used by dirhash data structures. The default is 2097152 (2MB). The current amount of memory being used by dirhash is visible through the read-only sysctl variable vfs.ufs.dirhash_maxmem. Finally, some extra sanity checks that are enabled by default, but which may have an impact on performance, can be disabled by setting vfs.ufs.dirhash_docheck to 0. Discussed on: -fs, -hackers	2001-07-10 21:21:29 +00:00
Matthew Dillon	0cddd8f023	With Alfred's permission, remove vm_mtx in favor of a fine-grained approach (this commit is just the first stage). Also add various GIANT_ macros to formalize the removal of Giant, making it easy to test in a more piecemeal fashion. These macros will allow us to test fine-grained locks to a degree before removing Giant, and also after, and to remove Giant in a piecemeal fashion via sysctl's on those subsystems which the authors believe can operate without Giant.	2001-07-04 16:20:28 +00:00
John Baldwin	ed87274d16	Fix more mntvnode and vnode interlock order reversals.	2001-06-28 22:21:33 +00:00
John Baldwin	49d2d9f4a4	- Fix a mntvnode and vnode interlock reversal. - Protect the mnt_vnode list with the mntvnode lock. - Use queue(9) macros.	2001-06-28 04:12:56 +00:00
Peter Wemm	78236790cd	Fix warning: 1973: warning: int format, long int arg (arg 5)	2001-06-15 07:44:39 +00:00
Kirk McKusick	eb87cd754f	Build on the change in revision 1.98 by Tor.Egge@fast.no. The symptom being treated in 1.98 was to avoid freeing a pagedep dependency if there was still a newdirblk dependency referencing it. That change is correct and no longer prints a warning message when it occurs. The other part of revision 1.98 was to panic when a newdirblk dependency was encountered during a file truncation. This fix removes that panic and replaces it with code to find and delete the newdirblk dependency so that the truncation can succeed.	2001-06-13 23:13:13 +00:00
Thomas Moestl	1fbcf0ac65	Call vn_close on the backing file vnode if ufs_extattr_enable failed to avoid leaking it. Reviewed by: rwatson	2001-06-07 00:11:32 +00:00
Jonathan Lemon	3b6e32b01e	Add a wrapper for the fifo kqfilter which falls through to the ufs routine. This permits the fifo to inherit the ufs VNODE kqfilter.	2001-06-06 17:40:57 +00:00
Jonathan Lemon	4e92ec9dd0	Add a kqueue filter for writing to ufs filesystems which always returns true. This permits better interoperability with programs which register filters on their stdin/stdout handles. Submitted by: Niels Provos <provos@citi.umich.edu>	2001-06-05 13:52:37 +00:00
David E. O'Brien	1239674238	There seems to be a problem that the order of disk write operation being incorrect due to a missing check for some dependency. This change avoids the freelist corruption (but not the temporarily inconsistent state of the file system). A message is printed as a reminder of the under lying problem when a pagedep structure is not freed due to the NEWBLOCK flag being set. Submitted by: Tor.Egge@fast.no	2001-06-05 01:49:37 +00:00
John Baldwin	1c11b01562	Revert the previous commit in favor of the fix in rev 1.42 of ufs/ffs/ffs_extern.h instead. Requested by: bde	2001-05-30 23:09:19 +00:00
John Baldwin	55d132317c	Forward declare struct cg to quiet a warning. Submitted by: bde	2001-05-30 23:08:40 +00:00
John Baldwin	59718ee556	Include <ufs/ffs/fs.h> to get the definition of struct cg to quiet a warning.	2001-05-29 23:53:16 +00:00
Poul-Henning Kamp	c7a3e2379c	Remove last vestiges of MFS.	2001-05-29 21:21:53 +00:00
Poul-Henning Kamp	870b4959b7	Remove MFS from the kernel.	2001-05-29 18:50:30 +00:00
Thomas Moestl	3c436f07b7	Add a check to determine whether extended attributes have been initialized on the file system before trying to grab the lock of the per-mount extattr structure, as this lock is unitialized in that case. This is needed because ufs_extattr_vnode_inactive is called from ufs_inactive, which is also used by EA-unaware file systems such as ext2fs. Reviewed by: rwatson	2001-05-25 18:24:52 +00:00
Robert Watson	b1fc0ec1a7	o Merge contents of struct pcred into struct ucred. Specifically, add the real uid, saved uid, real gid, and saved gid to ucred, as well as the pcred->pc_uidinfo, which was associated with the real uid, only rename it to cr_ruidinfo so as not to conflict with cr_uidinfo, which corresponds to the effective uid. o Remove p_cred from struct proc; add p_ucred to struct proc, replacing original macro that pointed. p->p_ucred to p->p_cred->pc_ucred. o Universally update code so that it makes use of ucred instead of pcred, p->p_ucred instead of p->p_pcred, cr_ruidinfo instead of p_uidinfo, cr_{r,sv}{u,g}id instead of p_*, etc. o Remove pcred0 and its initialization from init_main.c; initialize cr_ruidinfo there. o Restruction many credential modification chunks to always crdup while we figure out locking and optimizations; generally speaking, this means moving to a structure like this: newcred = crdup(oldcred); ... p->p_ucred = newcred; crfree(oldcred); It's not race-free, but better than nothing. There are also races in sys_process.c, all inter-process authorization, fork, exec, and exit. o Remove sigio->sio_ruid since sigio->sio_ucred now contains the ruid; remove comments indicating that the old arrangement was a problem. o Restructure exec1() a little to use newcred/oldcred arrangement, and use improved uid management primitives. o Clean up exit1() so as to do less work in credential cleanup due to pcred removal. o Clean up fork1() so as to do less work in credential cleanup and allocation. o Clean up ktrcanset() to take into account changes, and move to using suser_xxx() instead of performing a direct uid==0 comparision. o Improve commenting in various kern_prot.c credential modification calls to better document current behavior. In a couple of places, current behavior is a little questionable and we need to check POSIX.1 to make sure it's "right". More commenting work still remains to be done. o Update credential management calls, such as crfree(), to take into account new ruidinfo reference. o Modify or add the following uid and gid helper routines: change_euid() change_egid() change_ruid() change_rgid() change_svuid() change_svgid() In each case, the call now acts on a credential not a process, and as such no longer requires more complicated process locking/etc. They now assume the caller will do any necessary allocation of an exclusive credential reference. Each is commented to document its reference requirements. o CANSIGIO() is simplified to require only credentials, not processes and pcreds. o Remove lots of (p_pcred==NULL) checks. o Add an XXX to authorization code in nfs_lock.c, since it's questionable, and needs to be considered carefully. o Simplify posix4 authorization code to require only credentials, not processes and pcreds. Note that this authorization, as well as CANSIGIO(), needs to be updated to use the p_cansignal() and p_cansched() centralized authorization routines, as they currently do not take into account some desirable restrictions that are handled by the centralized routines, as well as being inconsistent with other similar authorization instances. o Update libkvm to take these changes into account. Obtained from: TrustedBSD Project Reviewed by: green, bde, jhb, freebsd-arch, freebsd-audit	2001-05-25 16:59:11 +00:00
Matthew Dillon	ac8f990bde	This patch implements O_DIRECT about 80% of the way. It takes a patchset Tor created a while ago, removes the raw I/O piece (that has cache coherency problems), and adds a buffer cache / VM freeing piece. Essentially this patch causes O_DIRECT I/O to not be left in the cache, but does not prevent it from going through the cache, hence the 80%. For the last 20% we need a method by which the I/O can be issued directly to buffer supplied by the user process and bypass the buffer cache entirely, but still maintain cache coherency. I also have the code working under -stable but the changes made to sys/file.h may not be MFCable, so an MFC is not on the table yet. Submitted by: tegge, dillon	2001-05-24 07:22:27 +00:00
Alfred Perlstein	1752ee59ba	ufs_bmaparray() may block on IO, drop vm mutex and aquire Giant when calling it from the pager routine	2001-05-23 10:30:25 +00:00
Ruslan Ermilov	99d300a1ec	- FDESC, FIFO, NULL, PORTAL, PROC, UMAP and UNION file systems were repo-copied from sys/miscfs to sys/fs. - Renamed the following file systems and their modules: fdesc -> fdescfs, portal -> portalfs, union -> unionfs. - Renamed corresponding kernel options: FDESC -> FDESCFS, PORTAL -> PORTALFS, UNION -> UNIONFS. - Install header files for the above file systems. - Removed bogus -I${.CURDIR}/../../sys CFLAGS from userland Makefiles.	2001-05-23 09:42:29 +00:00
Kirk McKusick	57042c7f72	Update softdep_setup_directory_add prototype to reflect changes in actual function. Obtained from: Jim Bloom <bloom@jbloom.jbloom.org>	2001-05-20 15:59:55 +00:00
Kirk McKusick	dc01275be9	Must ensure that all the entries on the pd_pendinghd list have been committed to disk before clearing them. More specifically, when free_newdirblk is called, we know that the inode claims the new directory block. However, if the associated pagedep is still linked onto the directory buffer dependency chain, then some of the entries on the pd_pendinghd list may not be committed to disk yet. In this case, we will simply note that the inode claims the block and let the pd_pendinghd list be processed when the pagedep is next written. If the pagedep is no longer on the buffer dependency chain, then all the entries on the pd_pending list are committed to disk and we can free them in free_newdirblk. This corrects a window of vulnerability introduced in the code added in version 1.95.	2001-05-19 19:24:26 +00:00

... 7 8 9 10 11 ...

1624 Commits