freebsd-skq

Author	SHA1	Message	Date
alfred	7b7fa13344	remove syscallarg(). Suggested by: peter	2002-12-14 02:07:32 +00:00
alfred	d070c0a52d	SCARGS removal take II.	2002-12-14 01:56:26 +00:00
alfred	4f48184fb2	Backout removal SCARGS, the code freeze is only "selectively" over.	2002-12-13 22:41:47 +00:00
alfred	d19b4e039d	Remove SCARGS. Reviewed by: md5	2002-12-13 22:27:25 +00:00
iedowse	092b51aeec	Fix a case in kern_rename() where a vn_finished_write() call was missed. This bug has been present since the vn_start_write() and vn_finished_write() calls were first added in revision 1.159. When the case is triggered, any attempts to create snapshots on the filesystem will deadlock and also prevent further write activity on that filesystem.	2002-10-27 23:23:51 +00:00
wollman	7e9d4df21f	Change the way support for asynchronous I/O is indicated to applications to conform to 1003.1-2001. Make it possible for applications to actually tell whether or not asynchronous I/O is supported. Since FreeBSD's aio implementation works on all descriptor types, don't call down into file or vnode ops when [f]pathconf() is asked about _PC_ASYNC_IO; this avoids the need for every file and vnode op to know about it.	2002-10-27 18:07:41 +00:00
rwatson	91dee1ecbb	Hook up most of the MAC entry points relating to file/directory/node creation, deletion, and rename. There are one or two other stray cases I'll catch in follow-up commits (such as unix domain socket creation); this permits MAC policy modules to limit the ability to perform these operations based on existing UNIX credential / vnode attributes, extended attributes, and security labels. In the rename case using MAC, we now have to lock the from directory and file vnodes for the MAC check, but this is done only in the MAC case, and the locks are immediately released so that the remainder of the rename implementation remains the same. Because the create check takes a vattr to know object type information, we now initialize additional fields in the VATTR passed to VOP_SYMLINK() in the MAC case. Approved by: re Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories	2002-10-19 20:25:57 +00:00
rwatson	683f085f65	Incremental style improvements: more consistently avoid assignments in conditionals; remove some excess vertical whitespace; remove a bug in the return handling of the delete_vp() case for MAC. Spotted by: bde	2002-10-10 13:59:58 +00:00
rwatson	f1296e0875	Explore new heights in alphabetization for _file and _fd variations on the extended attribute system calls.	2002-10-10 00:32:08 +00:00
rwatson	48070e93ee	Implement extattr_{delete,get,set}_link() system calls: extended attribute operations that do not follow links. Sync to MAC tree. Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories	2002-10-09 21:48:22 +00:00
iedowse	cbd79f434a	Add back a fdrop() call at the end of kern_open() that got lost in revision 1.218. This bug caused a "struct file" reference to be leaked if VOP_ADVLOCK(), vn_start_write(), or mac_check_vnode_write() failed during the open operation. PR: kern/43739 Reported by: Arne Woerner <woerner@mediabase-gmbh.de>	2002-10-07 20:49:22 +00:00
rwatson	abda58cc1e	Merge support for mac_check_vnode_link(), a MAC framework/policy entry point that instruments the creation of hard links. Policy implementations to follow. Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories	2002-10-05 18:11:36 +00:00
phk	76d8452fbf	Fix mis-indentation. Spotted by: FlexeLint	2002-10-02 09:09:25 +00:00
jeff	881a59ab9e	- Properly lock v_vflags in getdirents().	2002-09-25 02:13:38 +00:00
truckman	f280782003	VOP_FSYNC() requires that it's vnode argument be locked, which nfs_link() wasn't doing. Rather than just lock and unlock the vnode around the call to VOP_FSYNC(), implement rwatson's suggestion to lock the file vnode in kern_link() before calling VOP_LINK(), since the other filesystems also locked the file vnode right away in their link methods. Remove the locking and and unlocking from the leaf filesystem link methods. Reviewed by: rwatson, bde (except for the unionfs_link() changes)	2002-09-19 13:32:45 +00:00
bde	8aa3df4eb2	vfs_syscalls.c: Changed rename(2) to follow the letter of the POSIX spec. POSIX requires rename() to have no effect if its args "resolve to the same existing file". I think "file" can only reasonably be read as referring to the inode, although the rationale and "resolve" seem to say that sameness is at the level of (resolved) directory entries. ext2fs_vnops.c, ufs_vnops.c: Replaced code that gave the historical BSD behaviour of removing one link name by checks that this code is now unreachable. This fixes some races. All vnodes needed to be unlocked for the removal, and locking at another level using something like IN_RENAME was not even attempted, so it was possible for rename(x, y) to return with both x and y removed even without any unlink(2) syscalls (one process can remove x using rename(x, y) and another process can remove y using rename(y, x)). Prodded by: alfred MFC after: 8 weeks PR: 42617	2002-09-10 11:09:13 +00:00
iedowse	be17b12cb6	Split out a number of mostly VFS and signal related syscalls into a kernel-internal kern_*() version and a wrapper that is called via the syscall vector table. For paths and structure pointers, the internal version either takes a uio_seg parameter or requires the caller to copyin() the data to kernel memory as appropiate. This will permit emulation layers to use these syscalls without having to copy out translated arguments to the stack gap. Discussed on: -arch Review/suggestions: bde, jhb, peter, marcel	2002-09-01 20:37:28 +00:00
jeff	a9972cd35a	- Hold the vnode lock across unlink() so that the v_vflag check is safe. - Fix the long broken error handling for VV_ROOT and VDIR.	2002-08-21 03:55:35 +00:00
rwatson	a1cb1e3bed	Pass active_cred and file_cred into the MAC framework explicitly for mac_check_vnode_{poll,read,stat,write}(). Pass in fp->f_cred when calling these checks with a struct file available. Otherwise, pass NOCRED. All currently MAC policies use active_cred, but could now offer the cached credential semantic used for the base system security model. Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs	2002-08-19 19:04:53 +00:00
rwatson	1a7cd1a210	Break out mac_check_vnode_op() into three seperate checks: mac_check_vnode_poll(), mac_check_vnode_read(), mac_check_vnode_write(). This improves the consistency with other existing vnode checks, and allows policies to avoid implementing switch statements to determine what operations they do and do not want to authorize. Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs	2002-08-19 16:43:25 +00:00
rwatson	2b82cd24f1	Make similar changes to fo_stat() and fo_poll() as made earlier to fo_read() and fo_write(): explicitly use the cred argument to fo_poll() as "active_cred" using the passed file descriptor's f_cred reference to provide access to the file credential. Add an active_cred argument to fo_stat() so that implementers have access to the active credential as well as the file credential. Generally modify callers of fo_stat() to pass in td->td_ucred rather than fp->f_cred, which was redundantly provided via the fp argument. This set of modifications also permits threads to perform these operations on behalf of another thread without modifying their credential. Trickle this change down into fo_stat/poll() implementations: - badfo_poll(), badfo_stat(): modify/add arguments. - kqueue_poll(), kqueue_stat(): modify arguments. - pipe_poll(), pipe_stat(): modify/add arguments, pass active_cred to MAC checks rather than td->td_ucred. - soo_poll(), soo_stat(): modify/add arguments, pass fp->f_cred rather than cred to pru_sopoll() to maintain current semantics. - sopoll(): moidfy arguments. - vn_poll(), vn_statfile(): modify/add arguments, pass new arguments to vn_stat(). Pass active_cred to MAC and fp->f_cred to VOP_POLL() to maintian current semantics. - vn_close(): rename cred to file_cred to reflect reality while I'm here. - vn_stat(): Add active_cred and file_cred arguments to vn_stat() and consumers so that this distinction is maintained at the VFS as well as 'struct file' layer. Pass active_cred instead of td->td_ucred to MAC and to VOP_GETATTR() to maintain current semantics. - fifofs: modify the creation of a "filetemp" so that the file credential is properly initialized and can be used in the socket code if desired. Pass ap->a_td->td_ucred as the active credential to soo_poll(). If we teach the vnop interface about the distinction between file and active credentials, we would use the active credential here. Note that current inconsistent passing of active_cred vs. file_cred to VOP's is maintained. It's not clear why GETATTR would be authorized using active_cred while POLL would be authorized using file_cred at the file system level. Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs	2002-08-16 12:52:03 +00:00
jeff	02517b6731	- Replace v_flag with v_iflag and v_vflag - v_vflag is protected by the vnode lock and is used when synchronization with VOP calls is needed. - v_iflag is protected by interlock and is used for dealing with vnode management issues. These flags include X/O LOCK, FREE, DOOMED, etc. - All accesses to v_iflag and v_vflag have either been locked or marked with mp_fixme's. - Many ASSERT_VOP_LOCKED calls have been added where the locking was not clear. - Many functions in vfs_subr.c were restructured to provide for stronger locking. Idea stolen from: BSD/OS	2002-08-04 10:29:36 +00:00
rwatson	eac603fb18	Introduce support for Mandatory Access Control and extensible kernel access control. Invoke appropriate MAC framework entry points to authorize readdir() operations in the native ABI. Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs	2002-08-01 20:44:52 +00:00
rwatson	a5dcc1fd3d	Include file cleanup; mac.h and malloc.h at one point had ordering relationship requirements, and no longer do. Reminded by: bde	2002-08-01 17:47:56 +00:00
rwatson	7af111191c	Introduce support for Mandatory Access Control and extensible kernel access control. Invoke appropriate MAC entry points to authorize the following operations: truncate on open() (write) access() (access) readlink() (readlink) chflags(), lchflags(), fchflags() (setflag) chmod(), fchmod(), lchmod() (setmode) chown(), fchown(), lchown() (setowner) utimes(), lutimes(), futimes() (setutimes) truncate(), ftrunfcate() (write) revoke() (revoke) fhopen() (open) truncate on fhopen() (write) extattr_set_fd, extattr_set_file() (setextattr) extattr_get_fd, extattr_get_file() (getextattr) extattr_delete_fd(), extattr_delete_file() (setextattr) These entry points permit MAC policies to enforce a variety of protections on vnodes. More vnode checks to come, especially in non-native ABIs. Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs	2002-08-01 15:37:12 +00:00
rwatson	fff16f04c3	Introduce support for Mandatory Access Control and extensible kernel access control. Instrument chdir() and chroot()-related system calls to invoke appropriate MAC entry points to authorize the two operations. Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs	2002-08-01 03:50:08 +00:00
rwatson	6d0d48759b	Improve formatting and variable use consistency in extattr system calls. Submitted by: green Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs	2002-08-01 01:29:03 +00:00
rwatson	14cc38f1e8	Simplify the logic to enter VFS_EXTATTRCTL(). Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs	2002-08-01 01:26:07 +00:00
rwatson	4d5d66e7e4	Introduce support for Mandatory Access Control and extensible kernel access control. Implement MAC framework access control entry points relating to operations on mountpoints. Currently, this consists only of access control on mountpoint listing using the various statfs() variations. In the future, it might also be desirable to implement checks on mount() and unmount(). Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs	2002-07-31 01:27:33 +00:00
rwatson	5f36208df2	When referencing nd_cnp after namei(), always pass SAVENAME into NDINIT() operation flags. Submitted by: green Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs	2002-07-30 18:48:25 +00:00
rwatson	fcb9022bbd	Set VAPPEND in open mode when O_APPEND is specified as an argument to open() of fhopen(). Currently this has no actual affect due to the treatment of VAPPEND in vaccess() and vaccess_acl() as a subset of VWRITE, but when MAC comes in, MAC will distinguish the two. Note: if any file systems are cutting their own permission models, they may wish to now take this into account. Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs	2002-07-22 12:51:06 +00:00
mckusick	3abb526f86	Change utimes to set the file creation time (for filesystems that support creation times such as UFS2) to the value of the modification time if the value of the modification time is older than the current creation time. See utimes(2) for further details. Sponsored by: DARPA & NAI Labs.	2002-07-17 02:03:19 +00:00
mckusick	a3ff90696f	Change the name of st_createtime to st_birthtime. This change is made to reduce confusion between st_ctime and st_createtime. Submitted by: Eric Allman <eric@sendmail.org> Sponsored by: DARPA & NAI Labs.	2002-07-16 22:36:00 +00:00
jhb	11a8bfb923	- Change chroot_refuse_vdir_fds() to require that the passed in struct filedesc is already locked rather than having chroot() unlock the filedesc so chroot_refuse_vdir_fds() can immediately relock it. - Reorder chroot() a bitso that we do the namei lookup before checking the process's struct filedesc. This closes at least one potential race and allows us to only acquire the filedsec lock once in chroot(). - Push down Giant slightly into chroot().	2002-07-13 04:07:12 +00:00
mux	eb5a0f4a7e	Move every code related to mount(2) in a new file, vfs_mount.c. The file vfs_conf.c which was dealing with root mounting has been repo-copied into vfs_mount.c to preserve history. This makes nmount related development easier, and help reducing the size of vfs_syscalls.c, which is still an enormous file. Reviewed by: rwatson Repo-copy by: peter	2002-07-02 17:09:22 +00:00
iedowse	4416f82706	Use indirect function pointer hooks instead of #ifdef SOFTUPDATES direct calls for the two places where the kernel calls into soft updates code. Set up the hooks in softdep_initialize() and NULL them out in softdep_uninitialize(). This change allows soft updates to function correctly when ufs is loaded as a module. Reviewed by: mckusick	2002-07-01 17:59:40 +00:00
alfred	ae44f24f87	Remove unneeded casts to caddr_t.	2002-06-28 23:02:38 +00:00
iedowse	792737cf4d	In vn_mkdir(), use vrele() instead of vput() on the parent directory vnode in the case that the target exists and is the same vnode as the parent (i.e. "mkdir ."). The namei() call does not leave the vnode locked in this case even though you might expect it to. This bug was mostly harmless in practice because unlocking an already unlocked vnode currently does not trigger any panics or warnings. Reviewed by: jeff	2002-06-28 20:06:47 +00:00
mckusick	7d70f5926f	Use proper size in bzero of stat structure. Submitted by: Jake Burkholder <jake@locore.ca> Sponsored by: DARPA & NAI Labs.	2002-06-24 07:14:44 +00:00
mckusick	f09d0a42d9	This patch fixes a size problem with the stat structure for 64-bit architectures that was introduced in the UFS2 code merge two days ago. The stat structure change that caused the problem was the addition of the file create time. Submitted by: Bruce Evans <bde@zeta.org.au> Sponsored by: DARPA & NAI Labs.	2002-06-22 22:01:13 +00:00
mux	24aca74f2d	o Remove the initialization of unused fields in the struct uio now that we don't use uiomove() anymore. o Enforce stricter checks on the length of the iov's in nmount(2) since we now malloc() them individually and corrupted iov's could make the kernel crash in malloc() with "kmem_map too small". Reviewed by: phk	2002-06-22 18:07:05 +00:00
mckusick	88d85c15ef	This commit adds basic support for the UFS2 filesystem. The UFS2 filesystem expands the inode to 256 bytes to make space for 64-bit block pointers. It also adds a file-creation time field, an ability to use jumbo blocks per inode to allow extent like pointer density, and space for extended attributes (up to twice the filesystem block size worth of attributes, e.g., on a 16K filesystem, there is space for 32K of attributes). UFS2 fully supports and runs existing UFS1 filesystems. New filesystems built using newfs can be built in either UFS1 or UFS2 format using the -O option. In this commit UFS1 is the default format, so if you want to build UFS2 format filesystems, you must specify -O 2. This default will be changed to UFS2 when UFS2 proves itself to be stable. In this commit the boot code for reading UFS2 filesystems is not compiled (see /sys/boot/common/ufsread.c) as there is insufficient space in the boot block. Once the size of the boot block is increased, this code can be defined. Things to note: the definition of SBSIZE has changed to SBLOCKSIZE. The header file <ufs/ufs/dinode.h> must be included before <ufs/ffs/fs.h> so as to get the definitions of ufs2_daddr_t and ufs_lbn_t. Still TODO: Verify that the first level bootstraps work for all the architectures. Convert the utility ffsinfo to understand UFS2 and test growfs. Add support for the extended attribute storage. Update soft updates to ensure integrity of extended attribute storage. Switch the current extended attribute interfaces to use the extended attribute storage. Add the extent like functionality (framework is there, but is currently never used). Sponsored by: DARPA & NAI Labs. Reviewed by: Poul-Henning Kamp <phk@freebsd.org>	2002-06-21 06:18:05 +00:00
mux	3770ca4156	Change the way we internally store the mount options to a linked list. This is to allow the merging of the mount options in the MNT_UPDATE case, as the current data structure is unsuitable for this. There are no functional differences in this commit. Reviewed by: phk	2002-06-20 20:03:42 +00:00
mux	8ba975c439	Remove a duplicated vfs_freeopts() that I introduced in last revision.	2002-05-28 13:27:55 +00:00
mux	334d1908ec	Style nit, no functional changes.	2002-05-23 23:22:22 +00:00
mux	67080508a8	Slightly change the way we pass mount options to the filesystem VFS_NMOUNT operations. Reviewed by: phk	2002-05-23 23:02:19 +00:00
mux	85aa3f836d	Change two vput() that should have been vrele(). Submitted by: iedowse	2002-05-20 14:59:43 +00:00
trhodes	28d42899b7	More s/file system/filesystem/g	2002-05-16 21:28:32 +00:00
jeff	ba85b0e087	Disable the shared locking namei() code for now. It breaks several stacking filesystems. This is on hold until the rest of VFS Locking is reviewed and deemed safe. It can be enabled with 'options LOOKUP_SHARED'.	2002-05-14 21:59:49 +00:00
mux	b2f5ccfa53	Add the lchflags(2) syscall. Reviewed by: rwatson	2002-05-05 23:47:41 +00:00
jeff	4323e678da	Move a KASSERT() in open() prior to unlocking the vnode. It's not safe to call VOP_GETVOBJECT without a lock.	2002-05-05 23:17:13 +00:00
mux	7856f6d21c	Fix a typo. Submitted by: dwmalone	2002-05-04 19:50:09 +00:00
rwatson	780f32f693	Slightly restructure extattr_get_vp() so that there's only one entry point to VOP_GETEXTATTR(). This simplifies code flow when inserting MAC hooks. Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs	2002-04-23 01:27:38 +00:00
rwatson	30744d9c56	Improve style consistency of vfs_syscalls.c by converting the style used in various extattr_*() calls to match the rest of the file. Originally, these bits at the end looked more like style(9). This patch was submitted by green by way of the TrustedBSD MAC tree, and I fixed a few problems with it on the way through. Someone with more time on their hands should convert the entire file to style(9); this commit is for diff reduction purposes. Submitted by: green Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs	2002-04-20 01:37:08 +00:00
iedowse	64322dabea	The recent NFS forced unmount improvements introduced a side-effect where some client operations might be unexpectedly cancelled during an unsuccessful non-forced unmount attempt. This causes problems for amd(8), because it periodically attempts a non-forced unmount to check if the filesystem is still in use. Fix this by adding a new mountpoint flag MNTK_UNMOUNTF that is set only during the operation of a forced unmount. Use this instead of MNTK_UNMOUNT to trigger the cancellation of hung NFS operations. Also correct a problem where dounmount() might inadvertently clear the MNTK_UNMOUNT flag. Reported by: simokawa MFC after: 1 week	2002-04-17 01:07:29 +00:00
jeff	0b5e15cef7	Turn #ifdef LOOKUP_SHARED into #ifndef LOOKUP_EXCLUSIVE to enable this behavior by default. Also, change the options line to reflect this. If there are no problems reported this will become the only behavior and the knob will be removed in a month or so. Demanded by: obrien	2002-04-09 05:14:17 +00:00
mux	bf7d877bcd	The fourth parameter to copystr() is a size_t, not an int. Approved by: peter	2002-04-08 21:14:19 +00:00
mux	00b6b34450	o Change kernel_vmount() interface to be more convenient : pass two separate strings instead of passing "foo=bar". o Don't forget to clear the VMOUNT flag on the vnode when vfs_nmount() fails because the fs doesn't implement VFS_NMOUNT (and in vfs_mount() when the fs doesn't implement VFS_MOUNT) ; also decrement the vfs refcount in the !MNT_UPDATE case.	2002-04-07 13:22:47 +00:00
mux	9effffd331	Add two forgotten vfs_unbusy() calls, in vfs_mount() and vfs_nmount(). Reviewed by: phk	2002-04-03 12:19:03 +00:00
jhb	dc2e474f79	Change the suser() API to take advantage of td_ucred as well as do a general cleanup of the API. The entire API now consists of two functions similar to the pre-KSE API. The suser() function takes a thread pointer as its only argument. The td_ucred member of this thread must be valid so the only valid thread pointers are curthread and a few kernel threads such as thread0. The suser_cred() function takes a pointer to a struct ucred as its first argument and an integer flag as its second argument. The flag is currently only used for the PRISON_ROOT flag. Discussed on: smp@	2002-04-01 21:31:13 +00:00
mux	ac4c018837	- Properly sync vfs_nmount() with changes that have be already done in vfs_mount(), in particular revisions 1.215, 1.227 and 1.240. - flag2 is a low quality variable name, change it to kern_flag. - strncpy NUL-terminates f_fstypename and f_mntonname since the strings have length <= <buffer length> - 1, so the explicit NUL-termination is bogus. - M_ZERO'ing space for fstype and fspath is stupid since we never use the space beyond the end of the string. - Do various style(9) cleanups in both functions. Submitted by: bde Reviewed by: phk	2002-03-28 13:47:32 +00:00
arr	da9c75ac68	- Fixup a few style nits: - return error -> return (error); - move a declaration to the top of the function. - become bug for bug compatible with if (error) lines. Submitted by: bde	2002-03-26 18:07:10 +00:00
mux	124c6d3a26	As discussed in -arch, add the new nmount(2) system call and the new vfs_getopt()/vfs_copyopt() API. This is intended to be used later, when there will be filesystems implementing the VFS_NMOUNT operation. The mount(2) system call will disappear when all filesystems will be converted to the new API. Documentation will be committed in a while. Reviewed by: phk	2002-03-26 15:33:44 +00:00
arr	db4f882c76	- Recommit the securelevel_gt() calls removed by commits rev. 1.84 of kern_linker.c and rev. 1.237 of vfs_syscalls.c since these are not the source of the recent panics occuring around kldloading file system support modules. Requested by: rwatson	2002-03-25 18:26:34 +00:00
arr	fc49faf982	- Back out the commit to make the linker_load_file() securelevel check made aware in jail environments. Supposedly something is broken, so this should be backed out until further investigation proves otherwise, or a proper fix can be provided.	2002-03-22 04:56:09 +00:00
arr	68e226a99e	- Fix a logic error in checking the securelevel that was introduced in the previous commit. Pointy hats to: arr, rwatson	2002-03-21 15:27:39 +00:00
arr	fc9167c193	- Change a check of securelevel to securelevel_gt() call in order to help against users within a jail attempting to load kernel modules. - Add a check of securelevel_gt() to vfs_mount() in order to chop some low hanging fruit for the repair of securelevel checking of linking and unlinking files from within jails. There is more to be done here. Reviewed by: rwatson	2002-03-20 16:03:42 +00:00
jeff	318cbeeecf	Remove references to vm_zone.h and switch over to the new uma API. Also, remove maxsockets. If you look carefully you'll notice that the old zone allocator never honored this anyway.	2002-03-20 04:09:59 +00:00
alfred	357e37e023	Remove __P.	2002-03-19 21:25:46 +00:00
alfred	21fc25cfdf	Close a race when vfs_syscalls.c:checkdirs() runs. To do this protect the filedesc pointer in the proc with PROC_LOCK in both checkdirs() and kern_descrip.c:fdfree().	2002-03-19 04:30:04 +00:00
jeff	e6d26e8880	This patch adds the "LOCKSHARED" option to namei which causes it to only acquire shared locks on leafs. The stat() and open() calls have been changed to make use of this new functionality. Using shared locks in these cases is sufficient and can significantly reduce their latency if IO is pending to these vnodes. Also, this reduces the number of exclusive locks that are floating around in the system, which helps reduce the number of deadlocks that occur. A new kernel option "LOOKUP_SHARED" has been added. It defaults to off so this patch can be turned on for testing, and should eventually go away once it is proven to be stable. I have personally been running this patch for over a year now, so it is believed to be fully stable. Reviewed by: jake, obrien Approved by: jake	2002-03-12 04:00:11 +00:00
rwatson	8d5b7b21f3	Three p_ucred -> td_ucred's missed in jhb's earlier pass; all appear to be safe.	2002-03-05 19:45:45 +00:00
rwatson	191712b565	The change from td->td_proc->p_ucred to td->td_ucred has shortened some lines: more agressively line wrap under those circumstances.	2002-03-05 19:31:25 +00:00
jhb	56094fdb72	- Change namei() to use td_ucred instead of p_ucred. - Change the hack in access() that uses a temporary credential to set td_ucred to the temp cred instead of p_ucred.	2002-02-27 19:15:29 +00:00
jhb	3706cd3509	Simple p_ucred -> td_ucred changes to start using the per-thread ucred reference.	2002-02-27 18:32:23 +00:00
rwatson	2bbae54d18	Make sure to hold vnode lock when calling into VOP_GETATTR(). Discussed with: mckusick, phk	2002-02-10 21:44:30 +00:00
rwatson	aa54f85939	Make sure to grab vnode lock on a vnode before calling VOP_GETATTR() to perform an ownership test in revoke(). This is also required for MAC hooks so that the vnode lock is held during a call to the MAC framework. Release the lock before calling VOP_REVOKE(). Discussed with: phk, mckusick	2002-02-10 20:45:43 +00:00
rwatson	eef638ac93	Remove a stray 'const' that slept into extattr_set_vp(), and could result in compiler warnings.	2002-02-10 05:31:55 +00:00
rwatson	6ca91a055c	Part I: Update extended attribute API and ABI: o Modify the system call syntax for extattr_{get,set}_{fd,file}() so as not to use the scatter gather API (which appeared not to be used by any consumers, and be less portable), rather, accepts 'data' and 'nbytes' in the style of other simple read/write interfaces. This changes the API and ABI. o Modify system call semantics so that extattr_get_{fd,file}() return a size_t. When performing a read, the number of bytes read will be returned, unless the data pointer is NULL, in which case the number of bytes of data are returned. This changes the API only. o Modify the VOP_GETEXTATTR() vnode operation to accept a *size_t argument so as to return the size, if desirable. If set to NULL, the size will not be returned. o Update various filesystems (pseodofs, ufs) to DTRT. These changes should make extended attributes more useful and more portable. More commits to rebuild the system call files, as well as update userland utilities to follow. Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs	2002-02-10 04:43:22 +00:00
rwatson	d45677811a	o Merge various recent fixes from the MAC branch relating to extattrctl(): - Fix null-pointer dereference introduced when snapshotting was introduced. This occured because unlike the previous code, vn_start_write() doesn't always return a non-NULL mp, as filesystems may not support the VOP_GETWRITEMOUNT() call. For now, rely on two pointers, so that vn_finished_write() works properly. - Fix locking problems on exit, introduced at some past time, some when snapshots came in, where a vnode might not be unlocked before being vrele'd in various error situations. Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs	2002-02-08 05:58:41 +00:00
julian	b5eb64d6f0	Pre-KSE/M3 commit. this is a low-functionality change that changes the kernel to access the main thread of a process via the linked list of threads rather than assuming that it is embedded in the process. It IS still embeded there but remove all teh code that assumes that in preparation for the next commit which will actually move it out. Reviewed by: peter@freebsd.org, gallatin@cs.duke.edu, benno rice,	2002-02-07 20:58:47 +00:00
alfred	bfbf894c82	Don't recurse on filedesc lock in chroot_refuse_vdir_fds(). Noticed by: Michael Nottebrock <michaelnottebrock@gmx.net>	2002-02-01 18:27:16 +00:00
alfred	1f82bc18d1	Replace ffind_* with fget calls. Make fget MPsafe. Make fgetvp and fgetsock use the fget subsystem to reduce code bloat. Push giant down in fpathconf().	2002-01-14 00:13:45 +00:00
alfred	844237b396	SMP Lock struct file, filedesc and the global file list. Seigo Tanimura (tanimura) posted the initial delta. I've polished it quite a bit reducing the need for locking and adapting it for KSE. Locks: 1 mutex in each filedesc protects all the fields. protects "struct file" initialization, while a struct file is being changed from &badfileops -> &pipeops or something the filedesc should be locked. 1 mutex in each struct file protects the refcount fields. doesn't protect anything else. the flags used for garbage collection have been moved to f_gcflag which was the FILLER short, this doesn't need locking because the garbage collection is a single threaded container. could likely be made to use a pool mutex. 1 sx lock for the global filelist. struct file * fhold(struct file fp); / increments reference count on a file / struct file fhold_locked(struct file fp); / like fhold but expects file to locked / struct file ffind_hold(struct thread , int fd); / finds the struct file in thread, adds one reference and returns it unlocked / struct file ffind_lock(struct thread , int fd); / ffind_hold, but returns file locked */ I still have to smp-safe the fget cruft, I'll get to that asap.	2002-01-13 11:58:06 +00:00
iedowse	83b07d10e7	Change dounmount() to return EBUSY in the non-MNT_FORCE case if we can't acquire the mnt_lock without blocking. Normally non-forced unmount attempts return EBUSY quickly if any vnodes are active, so this just extends that behaviour to cover the per-mount mnt_lock too.	2002-01-10 01:59:30 +00:00
se	4f45ba2c53	Return EBADF in case some vnode field has been reset to a NULL pointer. (There has been some discussion, whether ENOENT or EBADF is more appropriate. I choose the latter, since the operation is not supported on the file descriptor at that time, even if it was, immediately before.) PR: 32681 Reviewed by: dillon, iedowse, ... Approved by: nectar MFC after: 3 days (pending RE approval)	2002-01-03 09:54:24 +00:00
phk	235f3ed483	Define a new mount flag "MNT_JAILDEVFS" Collect the magic combination of flags which can be updated into a macro in sys/mount.h rather than inlining them (twice!) in vfs_syscalls.c	2001-11-05 10:33:45 +00:00
dillon	c9a56085ce	Add mnt_reservedvnlist so we can MFC to 4.x, in order to make all mount structure changes now rather then piecemeal later on. mnt_nvnodelist currently holds all the vnodes under the mount point. This will eventually be split into a 'dirty' and 'clean' list. This way we only break kld's once rather then twice. nvnodelist will eventually turn into the dirty list and should remain compatible with the klds.	2001-11-04 18:55:42 +00:00
rwatson	339957cbbd	o Remove the local temporary variable "struct proc *p" from vfs_mount() in vfs_syscalls.c. Although it did save some indirection, many of those savings will be obscured with the impending commit of suser() changes, and the result is increased code complexity. Also, once p->p_ucred and td->td_ucred are distinguished, this will make vfs_mount() use the correct thread credential, rather than the process credential.	2001-11-02 21:11:41 +00:00
phk	9682ea2877	Argh! patch added the nmount at the bottom first time around. Take 3!	2001-11-02 19:12:06 +00:00
phk	ea1c496f9a	Add empty shell for nmount syscall (take 2!)	2001-11-02 18:35:54 +00:00
phk	a7cf9dee8f	Add nmount() stub function and regenerate the syscall-glue which should not need to check in generated files.	2001-11-02 17:59:23 +00:00
dillon	f421c786e5	unwind v_writecount in fhopen() if we are unable to allocate the descriptor. MFC after: 3 days	2001-10-24 18:32:17 +00:00
dillon	45a6fabe87	Change the vnode list under the mount point from a LIST to a TAILQ in preparation for an implementation of limiting code for kern.maxvnodes. MFC after: 3 days	2001-10-23 01:21:29 +00:00
rwatson	75a3f18a2a	o Complete the migration from suser error checking in the following form in vfs_syscalls.c: if (mp->mnt_stat.f_owner != p->p_ucred->cr_uid && (error = suser_td(td)) != 0) { unwrap_lots_of_stuff(); return (error); } to: if (mp->mnt_stat.f_owner != p->p_ucred->cr_uid) { error = suser_td(td); if (error) { unwrap_lots_of_stuff(); return (error); } } This makes the code more readable when complex clauses are in use, and minimizes conflicts for large outstanding patchsets modifying the kernel authorization code (of which I have several), especially where existing authorization and context code are combined in the same if() conditional. Obtained from: TrustedBSD Project	2001-10-01 20:01:07 +00:00
rwatson	f12cf1d430	o vpaccess() -> vn_access() -- Peter reminds me that there is already a convention for vnop helper routines of this sort. Submitted by: Mr Wemm <peter>	2001-09-22 03:07:41 +00:00
rwatson	e925d3281c	o Introduce eaccess(2), a version of access(2) that uses the effective credentials rather than the real credentials. This is useful for implementing GUI's which need to modify icons based on access rights, but where use of open(2) is too expensive, use of stat(2) doesn't reflect the file system's real protection model, and use of access() suffers from real/effective credential confusion. This implementation provides the same semantics as the call of the same name on SCO OpenServer. Note: using this call improperly can leave you subject to some of the same races present in the access(2) call. o To implement this, break out the basic logic of access(2) into vpaccess(), which accepts a passed credential to perform the invocation of VOP_ACCESS(). Add eaccess(2) to invoke vpaccess(), and modify access(2) to use vpaccess(). Obtained from: TrustedBSD Project	2001-09-21 21:33:22 +00:00
julian	5596676e6c	KSE Milestone 2 Note ALL MODULES MUST BE RECOMPILED make the kernel aware that there are smaller units of scheduling than the process. (but only allow one thread per process at this time). This is functionally equivalent to teh previousl -current except that there is a thread associated with each process. Sorry john! (your next MFC will be a doosie!) Reviewed by: peter@freebsd.org, dillon@freebsd.org X-MFC after: ha ha ha ha	2001-09-12 08:38:13 +00:00
ache	2718e60089	lseek: simplify overflow checks	2001-08-29 18:35:53 +00:00
ache	c71ba5eea8	Cosmetique & style fixes from bde	2001-08-26 10:23:49 +00:00
ache	1dabef8845	lseek: fix check for vattr.va_size overflow. Check suggested by bde simple not works with unsigned types.	2001-08-23 17:01:25 +00:00
ache	29e4dde471	Cosmetique: more <sys/*> into one group, separate include families by blank line	2001-08-23 13:51:17 +00:00
ache	5b25e70c26	Make lseek() POSIXed: for non character special files 1) handle off_t overflow with EOVERFLOW 2) handle negative offsets with EINVAL Reviewed by: arch discussion	2001-08-21 21:20:42 +00:00
iedowse	22613ae234	Avoid sleeping while holding a mutex in dounmount(). This problem has existed for a long time, but I made it worse a few months ago by by adding calls to VFS_ROOT() and checkdirs() in revision 1.179. Also, remove the LK_REENABLE flag in the lockmgr() call; this flag has been ignored by the lockmgr code for 4 years. This was the only remaining mention of it apart from its definition. Reviewed by: jhb	2001-08-20 19:16:31 +00:00
iedowse	b3168db212	Arbitrarily limit to 64k the number of bytes that can be read at a time using the ogetdirentries() compatibility syscall. This is a hack to ensure that rediculous values don't get passed to MALLOC(). Reviewed by: kris	2001-08-10 22:14:18 +00:00
des	1b82a02868	Constify the fstype argument to vfs_mount(). This eliminates at least one "call discards qualifier" warning (in sys/compat/linux/linux_file.c).	2001-07-09 19:11:51 +00:00
dillon	e028603b7e	With Alfred's permission, remove vm_mtx in favor of a fine-grained approach (this commit is just the first stage). Also add various GIANT_ macros to formalize the removal of Giant, making it easy to test in a more piecemeal fashion. These macros will allow us to test fine-grained locks to a degree before removing Giant, and also after, and to remove Giant in a piecemeal fashion via sysctl's on those subsystems which the authors believe can operate without Giant.	2001-07-04 16:20:28 +00:00
tmm	0e4c21f7c1	Fix an instance of NDINIT in the extattrctl syscall: LOCKLEAF was or'ed to the operation parameter, not to the flags as it should be. Reviewed by: rwatson	2001-06-06 23:34:38 +00:00
rwatson	f504530d9f	o Merge contents of struct pcred into struct ucred. Specifically, add the real uid, saved uid, real gid, and saved gid to ucred, as well as the pcred->pc_uidinfo, which was associated with the real uid, only rename it to cr_ruidinfo so as not to conflict with cr_uidinfo, which corresponds to the effective uid. o Remove p_cred from struct proc; add p_ucred to struct proc, replacing original macro that pointed. p->p_ucred to p->p_cred->pc_ucred. o Universally update code so that it makes use of ucred instead of pcred, p->p_ucred instead of p->p_pcred, cr_ruidinfo instead of p_uidinfo, cr_{r,sv}{u,g}id instead of p_*, etc. o Remove pcred0 and its initialization from init_main.c; initialize cr_ruidinfo there. o Restruction many credential modification chunks to always crdup while we figure out locking and optimizations; generally speaking, this means moving to a structure like this: newcred = crdup(oldcred); ... p->p_ucred = newcred; crfree(oldcred); It's not race-free, but better than nothing. There are also races in sys_process.c, all inter-process authorization, fork, exec, and exit. o Remove sigio->sio_ruid since sigio->sio_ucred now contains the ruid; remove comments indicating that the old arrangement was a problem. o Restructure exec1() a little to use newcred/oldcred arrangement, and use improved uid management primitives. o Clean up exit1() so as to do less work in credential cleanup due to pcred removal. o Clean up fork1() so as to do less work in credential cleanup and allocation. o Clean up ktrcanset() to take into account changes, and move to using suser_xxx() instead of performing a direct uid==0 comparision. o Improve commenting in various kern_prot.c credential modification calls to better document current behavior. In a couple of places, current behavior is a little questionable and we need to check POSIX.1 to make sure it's "right". More commenting work still remains to be done. o Update credential management calls, such as crfree(), to take into account new ruidinfo reference. o Modify or add the following uid and gid helper routines: change_euid() change_egid() change_ruid() change_rgid() change_svuid() change_svgid() In each case, the call now acts on a credential not a process, and as such no longer requires more complicated process locking/etc. They now assume the caller will do any necessary allocation of an exclusive credential reference. Each is commented to document its reference requirements. o CANSIGIO() is simplified to require only credentials, not processes and pcreds. o Remove lots of (p_pcred==NULL) checks. o Add an XXX to authorization code in nfs_lock.c, since it's questionable, and needs to be considered carefully. o Simplify posix4 authorization code to require only credentials, not processes and pcreds. Note that this authorization, as well as CANSIGIO(), needs to be updated to use the p_cansignal() and p_cansched() centralized authorization routines, as they currently do not take into account some desirable restrictions that are handled by the centralized routines, as well as being inconsistent with other similar authorization instances. o Update libkvm to take these changes into account. Obtained from: TrustedBSD Project Reviewed by: green, bde, jhb, freebsd-arch, freebsd-audit	2001-05-25 16:59:11 +00:00
jhb	2441ff245e	Don't release Giant around vm_oject_page_clean() in fsync() as the pager putpages called will need Giant.	2001-05-23 22:55:13 +00:00
ru	35437d86aa	- FDESC, FIFO, NULL, PORTAL, PROC, UMAP and UNION file systems were repo-copied from sys/miscfs to sys/fs. - Renamed the following file systems and their modules: fdesc -> fdescfs, portal -> portalfs, union -> unionfs. - Renamed corresponding kernel options: FDESC -> FDESCFS, PORTAL -> PORTALFS, UNION -> UNIONFS. - Install header files for the above file systems. - Removed bogus -I${.CURDIR}/../../sys CFLAGS from userland Makefiles.	2001-05-23 09:42:29 +00:00
alfred	a3f0842419	Introduce a global lock for the vm subsystem (vm_mtx). vm_mtx does not recurse and is required for most low level vm operations. faults can not be taken without holding Giant. Memory subsystems can now call the base page allocators safely. Almost all atomic ops were removed as they are covered under the vm mutex. Alpha and ia64 now need to catch up to i386's trap handlers. FFS and NFS have been tested, other filesystems will need minor changes (grabbing the vm lock when twiddling page properties). Reviewed (partially) by: jake, jhb	2001-05-19 01:28:09 +00:00
grog	4b9d9cbaac	Revert consequences of changes to mount.h, part 2. Requested by: bde	2001-04-29 02:45:39 +00:00
grog	1f5de30718	Correct #includes to work with fixed sys/mount.h.	2001-04-23 09:05:15 +00:00
rwatson	cb4f33ea31	o Introduce extattr_{delete,get,set}_fd() to allow extended attribute operations on file descriptors, which complement the existing set of calls, extattr_{delete,get,set}_file() which act on paths. In doing so, restructure the system call implementation such that the two sets of functions share most of the relevant code, rather than duplicating it. This pushes the vnode locking into the shared code, but keeps the copying in of some arguments in the system call code. Allowing access via file descriptors reduces the opportunity for race conditions when managing extended attributes. Obtained from: TrustedBSD Project	2001-03-31 16:20:05 +00:00
jhb	79cf991a6b	Convert the allproc and proctree locks from lockmgr locks to sx locks.	2001-03-28 11:52:56 +00:00
bde	a18da66f92	Fixed breakage of access() in rev.1.164. Wrong credentials were used for the final path component.	2001-03-20 09:38:05 +00:00
rwatson	0012887962	o Rename "namespace" argument to "attrnamespace" as namespace is a C++ reserved word. Submitted by: jkh Obtained from: TrustedBSD Project	2001-03-19 05:44:15 +00:00
rwatson	f773ff5a87	o Change the API and ABI of the Extended Attribute kernel interfaces to introduce a new argument, "namespace", rather than relying on a first- character namespace indicator. This is in line with more recent thinking on EA interfaces on various mailing lists, including the posix1e, Linux acl-devel, and trustedbsd-discuss forums. Two namespaces are defined by default, EXTATTR_NAMESPACE_SYSTEM and EXTATTR_NAMESPACE_USER, where the primary distinction lies in the access control model: user EAs are accessible based on the normal MAC and DAC file/directory protections, and system attributes are limited to kernel-originated or appropriately privileged userland requests. o These API changes occur at several levels: the namespace argument is introduced in the extattr_{get,set}_file() system call interfaces, at the vnode operation level in the vop_{get,set}extattr() interfaces, and in the UFS extended attribute implementation. Changes are also introduced in the VFS extattrctl() interface (system call, VFS, and UFS implementation), where the arguments are modified to include a namespace field, as well as modified to advoid direct access to userspace variables from below the VFS layer (in the style of recent changes to mount by adrian@FreeBSD.org). This required some cleanup and bug fixing regarding VFS locks and the VFS interface, as a vnode pointer may now be optionally submitted to the VFS_EXTATTRCTL() call. Updated documentation for the VFS interface will be committed shortly. o In the near future, the auto-starting feature will be updated to search two sub-directories to the ".attribute" directory in appropriate file systems: "user" and "system" to locate attributes intended for those namespaces, as the single filename is no longer sufficient to indicate what namespace the attribute is intended for. Until this is committed, all attributes auto-started by UFS will be placed in the EXTATTR_NAMESPACE_SYSTEM namespace. o The default POSIX.1e attribute names for ACLs and Capabilities have been updated to no longer include the '$' in their filename. As such, if you're using these features, you'll need to rename the attribute backing files to the same names without '$' symbols in front. o Note that these changes will require changes in userland, which will be committed shortly. These include modifications to the extended attribute utilities, as well as to libutil for new namespace string conversion routines. Once the matching userland changes are committed, a buildworld is recommended to update all the necessary include files and verify that the kernel and userland environments are in sync. Note: If you do not use extended attributes (most people won't), upgrading is not imperative although since the system call API has changed, the new userland extended attribute code will no longer compile with old include files. o Couple of minor cleanups while I'm there: make more code compilation conditional on FFS_EXTATTR, which should recover a bit of space on kernels running without EA's, as well as update copyright dates. Obtained from: TrustedBSD Project	2001-03-15 02:54:29 +00:00
jhb	c7d1d65499	Check to see if p_fd is NULL before derferencing it in checkdirs(). It's possible for us to see a process in the early stages of fork before p_fd has been initialized. Ideally, we wouldn't stick a process on the allproc list until it was fully created however.	2001-03-07 02:25:13 +00:00
adrian	bf6a51f986	Mismatched MFSNAMELEN and MNAMELEN with fstype / fspath. Submitted by: Naoki Kobayashi <shibata@geo.titech.ac.jp>	2001-03-02 14:05:49 +00:00
adrian	4018955334	Reviewed by: jlemon An initial tidyup of the mount() syscall and VFS mount code. This code replaces the earlier work done by jlemon in an attempt to make linux_mount() work. * the guts of the mount work has been moved into vfs_mount(). * move `type', `path' and `flags' from being userland variables into being kernel variables in vfs_mount(). `data' remains a pointer into userspace. * Attempt to verify the `type' and `path' strings passed to vfs_mount() aren't too long. * rework mount() and linux_mount() to take the userland parameters (besides data, as mentioned) and pass kernel variables to vfs_mount(). (linux_mount() already did this, I've just tidied it up a little more.) * remove the copyin() stuff for `path'. `data' still requires copyin() since its a pointer into userland. * set `mount->mnt_statf_mntonname' in vfs_mount() rather than in each filesystem. This variable is generally initialised with `path', and each filesystem can override it if they want to. * NOTE: f_mntonname is intiailised with "/" in the case of a root mount.	2001-03-01 21:00:17 +00:00
iedowse	63324ae546	The kernel did not hold a vnode reference associated with the `rootvnode' pointer, but vfs_syscalls.c's checkdirs() assumed that it did. This bug reliably caused a panic at reboot time if any filesystem had been mounted directly over /. The checkdirs() function is called at mount time to find any process fd_cdir or fd_rdir pointers referencing the covered mountpoint vnode. It transfers these to point at the root of the new filesystem. However, this process was not reversed at unmount time, so processes with a cwd/root at a mount point would unexpectedly lose their cwd/root following a mount-unmount cycle at that mountpoint. This change should fix both of the above issues. Start_init() now holds an extra vnode reference corresponding to `rootvnode', and dounmount() releases this reference when the root filesystem is unmounted just before reboot. Dounmount() now undoes the actions taken by checkdirs() at mount time; any process cdir/rdir pointers that reference the root vnode of the unmounted filesystem are transferred to the now-uncovered vnode. Reviewed by: bde, phk	2001-02-28 20:54:28 +00:00
rwatson	ab5676fc87	o Move per-process jail pointer (p->pr_prison) to inside of the subject credential structure, ucred (cr->cr_prison). o Allow jail inheritence to be a function of credential inheritence. o Abstract prison structure reference counting behind pr_hold() and pr_free(), invoked by the similarly named credential reference management functions, removing this code from per-ABI fork/exit code. o Modify various jail() functions to use struct ucred arguments instead of struct proc arguments. o Introduce jailed() function to determine if a credential is jailed, rather than directly checking pointers all over the place. o Convert PRISON_CHECK() macro to prison_check() function. o Move jail() function prototypes to jail.h. o Emulate the P_JAILED flag in fill_kinfo_proc() and no longer set the flag in the process flags field itself. o Eliminate that "const" qualifier from suser/p_can/etc to reflect mutex use. Notes: o Some further cleanup of the linux/jail code is still required. o It's now possible to consider resolving some of the process vs credential based permission checking confusion in the socket code. o Mutex protection of struct prison is still not present, and is required to protect the reference count plus some fields in the structure. Reviewed by: freebsd-arch Obtained from: TrustedBSD Project	2001-02-21 06:39:57 +00:00
jlemon	c7ba1f9694	Introduce copyinfrom and copyinstrfrom, which can copy data from either user or kernel space. This will allow layering of os-compat (e.g.: linux) system calls. Apply the changes to mount.	2001-02-16 14:31:49 +00:00
bmilekic	f364d4ac36	Change and clean the mutex lock interface. mtx_enter(lock, type) becomes: mtx_lock(lock) for sleep locks (MTX_DEF-initialized locks) mtx_lock_spin(lock) for spin locks (MTX_SPIN-initialized) similarily, for releasing a lock, we now have: mtx_unlock(lock) for MTX_DEF and mtx_unlock_spin(lock) for MTX_SPIN. We change the caller interface for the two different types of locks because the semantics are entirely different for each case, and this makes it explicitly clear and, at the same time, it rids us of the extra `type' argument. The enter->lock and exit->unlock change has been made with the idea that we're "locking data" and not "entering locked code" in mind. Further, remove all additional "flags" previously passed to the lock acquire/release routines with the exception of two: MTX_QUIET and MTX_NOSWITCH The functionality of these flags is preserved and they can be passed to the lock/unlock routines by calling the corresponding wrappers: mtx_{lock, unlock}_flags(lock, flag(s)) and mtx_{lock, unlock}_spin_flags(lock, flag(s)) for MTX_DEF and MTX_SPIN locks, respectively. Re-inline some lock acq/rel code; in the sleep lock case, we only inline the _obtain_lock()s in order to ensure that the inlined code fits into a cache line. In the spin lock case, we inline recursion and actually only perform a function call if we need to spin. This change has been made with the idea that we generally tend to avoid spin locks and that also the spin locks that we do have and are heavily used (i.e. sched_lock) do recurse, and therefore in an effort to reduce function call overhead for some architectures (such as alpha), we inline recursion for this case. Create a new malloc type for the witness code and retire from using the M_DEV type. The new type is called M_WITNESS and is only declared if WITNESS is enabled. Begin cleaning up some machdep/mutex.h code - specifically updated the "optimized" inlined code in alpha/mutex.h and wrote MTX_LOCK_SPIN and MTX_UNLOCK_SPIN asm macros for the i386/mutex.h as we presently need those. Finally, caught up to the interface changes in all sys code. Contributors: jake, jhb, jasone (in no particular order)	2001-02-09 06:11:45 +00:00
jake	a4ad237eaa	- Change the allproc_lock to use a macro, ALLPROC_LOCK(how), instead of explicit calls to lockmgr. Also provides macros for the flags pased to specify shared, exclusive or release which map to the lockmgr flags. This is so that the use of lockmgr can be easily replaced with optimized reader-writer locks. - Add some locking that I missed the first time.	2000-12-13 00:17:05 +00:00
dwmalone	dd75d1d73b	Convert more malloc+bzero to malloc+M_ZERO. Submitted by: josh@zipperup.org Submitted by: Robert Drehmel <robd@gmx.net>	2000-12-08 21:51:06 +00:00
jake	0c0be4e826	Protect the following with a lockmgr lock: allproc zombproc pidhashtbl proc.p_list proc.p_hash nextpid Reviewed by: jhb Obtained from: BSD/OS and netbsd	2000-11-22 07:42:04 +00:00
dillon	15a44d16ca	This patchset fixes a large number of file descriptor race conditions. Pre-rfork code assumed inherent locking of a process's file descriptor array. However, with the advent of rfork() the file descriptor table could be shared between processes. This patch closes over a dozen serious race conditions related to one thread manipulating the table (e.g. closing or dup()ing a descriptor) while another is blocked in an open(), close(), fcntl(), read(), write(), etc... PR: kern/11629 Discussed with: Alexander Viro <viro@math.psu.edu>	2000-11-18 21:01:04 +00:00
phk	4e063f5534	Take VBLK devices further out of their missery. This should fix the panic I introduced in my previous commit on this topic.	2000-11-02 21:14:13 +00:00
jhb	d944886e4d	Catch up to moving headers: - machine/ipl.h -> sys/ipl.h - machine/mutex.h -> sys/mutex.h	2000-10-20 07:58:15 +00:00
jasone	4e290e67b7	Convert lockmgr locks from using simple locks to using mutexes. Add lockdestroy() and appropriate invocations, which corresponds to lockinit() and must be called to clean up after a lockmgr lock is no longer needed.	2000-10-04 01:29:17 +00:00
eivind	e6b9704a88	Add function comments for functions missing them	2000-09-14 19:13:59 +00:00
eivind	957c331f7f	Blow away COMPAT_43 support for mount	2000-09-14 18:11:44 +00:00
bp	a7bc78c86d	Add three new VOPs: VOP_CREATEVOBJECT, VOP_DESTROYVOBJECT and VOP_GETVOBJECT. They will be used by nullfs and other stacked filesystems to support full cache coherency. Reviewed in general by: mckusick, dillon	2000-09-12 09:49:08 +00:00
rwatson	15609dd6ef	o Remove commented out code which modified return values from extattr_{get,set} syscalls in the face of partial reads or writes. Obtained from: TrustedBSD Project	2000-09-05 02:13:14 +00:00
truckman	bc29ebd97d	access() shouldn't diddle with the contents of a potentially shared credential. Create a temporary copy of the current credential and modify the copy. Submitted by: tegge	2000-09-02 12:31:55 +00:00
tegge	31d32eb7b1	Don't set flags on the mount structure before all permission checks have been done. Don't allow multiple mount operations with MNT_UPDATE at the same time on the same mount point. When the first mount operation completed, MNT_UPDATE was cleared in the mount structure, causing the second to complete as if it was a no-update mount operation with the following bad side effects: - mount structure inserted multiple times onto the mountlist - vp->v_mountedhere incorrectly set, causing next namei operation walking into the mountpoint to crash with a locking against myself panic. Plug a vnode leak in case vinvalbuf fails.	2000-08-09 01:57:11 +00:00
rwatson	bf8742e138	o Modify extattr_{set,get}() syscalls so that partial reads and writes with an error condition such as EINTR, EWOULDBLOCK, and ERESTART, are reported to the application, not silently conceal. This behavior was copied from the {read,write}v() syscalls, and is appropriate there but not here. o Correct a bug in extattr_delete() wherein the LOCKLEAF flag is passed to the wrong argument in namei(), resulting in some unexpected errors during name resolution, and passing in an unlocked vnode. Obtained from: TrustedBSD Project	2000-07-28 19:52:38 +00:00
rwatson	97118d11db	o Lock vnode before calling extattr_* VOP's, and modify vnode spec to allow for that. o Remember to call NDFREE() if exiting as a result of a failed vn_start_write() when snapshotting. Reviewed by: mckusick Obtained from: TrustedBSD Project	2000-07-26 20:29:20 +00:00
mckusick	d439206673	Do not need vrele(nd.ni_vp) as that is done by NDFREE(&nd, 0); Submitted by: Peter Holm <pho@freebsd.org>	2000-07-25 05:38:54 +00:00
mckusick	a3d0c189ea	Add snapshots to the fast filesystem. Most of the changes support the gating of system calls that cause modifications to the underlying filesystem. The gating can be enabled by any filesystem that needs to consistently suspend operations by adding the vop_stdgetwritemount to their set of vnops. Once gating is enabled, the function vfs_write_suspend stops all new write operations to a filesystem, allows any filesystem modifying system calls already in progress to complete, then sync's the filesystem to disk and returns. The function vfs_write_resume allows the suspended write operations to begin again. Gating is not added by default for all filesystems as for SMP systems it adds two extra locks to such critical kernel paths as the write system call. Thus, gating should only be added as needed. Details on the use and current status of snapshots in FFS can be found in /sys/ufs/ffs/README.snapshot so for brevity and timelyness is not included here. Unless and until you create a snapshot file, these changes should have no effect on your system (famous last words).	2000-07-11 22:07:57 +00:00
mckusick	040e64cd97	Move the truncation code out of vn_open and into the open system call after the acquisition of any advisory locks. This fix corrects a case in which a process tries to open a file with a non-blocking exclusive lock. Even if it fails to get the lock it would still truncate the file even though its open failed. With this change, the truncation is done only after the lock is successfully acquired. Obtained from: BSD/OS	2000-07-04 03:34:11 +00:00
phk	2a91a9dd04	Make the two calls from kern/* into softupdates #ifdef SOFTUPDATES, that is way cleaner than using the softupdates_stub stunt, which should be killed when convenient. Discussed with: mckusick	2000-07-03 13:26:54 +00:00
archie	0e6c8a1f1b	Move the securelevel check before loading KLD's into linker_load_file(), instead of requiring every caller of linker_load_file() to perform the check itself. This avoids netgraph loading KLD's when securelevel > 0, not to mention any future code that may call linker_load_file(). Reviewed by: dfr	2000-06-29 17:57:04 +00:00
phk	cb90cb2b60	Revert part of my bioops change which implemented panic(8).	2000-06-16 14:32:13 +00:00
phk	4ec91666fa	Virtualizes & untangles the bioops operations vector. Ref: Message-ID: <18317.961014572@critter.freebsd.dk> To: current@	2000-06-16 08:48:51 +00:00
phk	36c3965ff9	Separate the struct bio related stuff out of <sys/buf.h> into <sys/bio.h>. <sys/bio.h> is now a prerequisite for <sys/buf.h> but it shall not be made a nested include according to bdes teachings on the subject of nested includes. Diskdrivers and similar stuff below specfs::strategy() should no longer need to include <sys/buf.> unless they need caching of data. Still a few bogus uses of struct buf to track down. Repocopy by: peter	2000-05-05 09:59:14 +00:00
dillon	689641c1ea	Commit major SMP cleanups and move the BGL (big giant lock) in the syscall path inward. A system call may select whether it needs the MP lock or not (the default being that it does need it). A great deal of conditional SMP code for various deadended experiments has been removed. 'cil' and 'cml' have been removed entirely, and the locking around the cpl has been removed. The conditional separately-locked fast-interrupt code has been removed, meaning that interrupts must hold the CPL now (but they pretty much had to anyway). Another reason for doing this is that the original separate-lock for interrupts just doesn't apply to the interrupt thread mechanism being contemplated. Modifications to the cpl may now ONLY occur while holding the MP lock. For example, if an otherwise MP safe syscall needs to mess with the cpl, it must hold the MP lock for the duration and must (as usual) save/restore the cpl in a nested fashion. This is precursor work for the real meat coming later: avoiding having to hold the MP lock for common syscalls and I/O's and interrupt threads. It is expected that the spl mechanisms and new interrupt threading mechanisms will be able to run in tandem, allowing a slow piecemeal transition to occur. This patch should result in a moderate performance improvement due to the considerable amount of code that has been removed from the critical path, especially the simplification of the spl*() calls. The real performance gains will come later. Approved by: jkh Reviewed by: current, bde (exception.s) Some work taken from: luoqi's patch	2000-03-28 07:16:37 +00:00
mckusick	2f9951ffbd	Add bwillwrite to all system calls that create things in the filesystem. Benchmarks that create huge trees of empty files overwhelm the buffer cache.	2000-01-10 00:08:53 +00:00
rwatson	4b6baecfc7	Second pass commit to introduce new ACL and Extended Attribute system calls, vnops, vfsops, both in /kern, and to individual file systems that require a vfsop_ array entry. Reviewed by: eivind	1999-12-19 06:08:07 +00:00
eivind	87724eb673	Introduce NDFREE (and remove VOP_ABORTOP)	1999-12-15 23:02:35 +00:00
dillon	7cbe0b8f16	Remove accidental pollution unrelated to previous commit. The issue here is real but has not yet been discussed with Eivind.	1999-12-12 03:28:14 +00:00
dillon	b66fb2c648	Add MAP_NOSYNC feature to mmap(), and MADV_NOSYNC and MADV_AUTOSYNC to madvise(). This feature prevents the update daemon from gratuitously flushing dirty pages associated with a mapped file-backed region of memory. The system pager will still page the memory as necessary and the VM system will still be fully coherent with the filesystem. Modifications made by other means to the same area of memory, for example by write(), are unaffected. The feature works on a page-granularity basis. MAP_NOSYNC allows one to use mmap() to share memory between processes without incuring any significant filesystem overhead, putting it in the same performance category as SysV Shared memory and anonymous memory. Reviewed by: julian, alc, dg	1999-12-12 03:19:33 +00:00
phk	1adcecffd9	struct mountlist and struct mount.mnt_list have no business being a CIRCLEQ. Change them to TAILQ_HEAD and TAILQ_ENTRY respectively. This removes ugly mp != (void*)&mountlist comparisons. Requested by: phk Submitted by: Jake Burkholder jake@checker.org PR: 14967	1999-11-20 10:00:46 +00:00
dillon	13bf2dbc72	Ensure that garbage from the kernel stack does not wind up being returned to user mode in the spare fields of the stat structure. PR: kern/14966 Reviewed by: dillon@freebsd.org Submitted by: Kelly Yancey kbyanc@posi.net	1999-11-18 08:14:20 +00:00
phk	ec4e24bd52	Commit the remaining part of PR14914: Alot of the code in sys/kern directly accesses the Q_HEAD and Q_ENTRY structures for list operations. This patch makes all list operations in sys/kern use the queue(3) macros, rather than directly accessing the *Q_{HEAD,ENTRY} structures. Reviewed by: phk Submitted by: Jake Burkholder <jake@checker.org> PR: 14914	1999-11-16 16:28:58 +00:00
eivind	4ce73d7096	Remove WILLRELE from VOP_SYMLINK Note: Previous commit to these files (except coda_vnops and devfs_vnops) that claimed to remove WILLRELE from VOP_RENAME actually removed it from VOP_MKNOD.	1999-11-13 20:58:17 +00:00
eivind	6d6289a5a6	Fix style bugs from last commit	1999-11-13 14:35:50 +00:00
eivind	21fff7b1c2	Remove WILLRELE from VOP_RENAME	1999-11-12 03:34:28 +00:00
julian	88e6664e72	Most modern OSs have the ability to flag certain mounts as ones to be ignored by default by the df(1) program. This is used mostly to avoid stat()-ing entries that do not represent "real" disk mount points (such as those made by an automounter such as amd.) It is also useful not to have to stat() these entries because it takes longer to report them that for other file systems, being that these mount points are served by a user-level file server and resulting in several context switches. Worse, if the automounter is down unexpectedly, a causal df(1) will hang in an interruptible way. PR: kern/9764 Submitted by: Erez Zadok <ezk@cs.columbia.edu>	1999-11-01 04:57:43 +00:00
peter	787140aa42	Trim unused options (or #ifdef for undoc options). Submitted by: phk	1999-10-11 15:19:12 +00:00
phk	322edeeaa9	Before we start to mess with the VFS name-cache clean things up a little bit: Isolate the namecache in its own file, and give it a dedicated malloc type.	1999-10-03 12:18:29 +00:00
phk	073b941095	Remove v_maxio from struct vnode. Replace it with mnt_iosize_max in struct mount. Nits from: bde	1999-09-29 20:05:33 +00:00
phk	3588b9beb9	Fix a hole in jail(2). Noticed by: Alexander Bezroutchko <abb@zenon.net>	1999-09-25 14:14:21 +00:00
alfred	b9136a6115	Seperate the export check in VFS_FHTOVP, exports are now checked via VFS_CHECKEXP. Add fh(open\|stat\|stafs) syscalls to allow userland to query filesystems based on (network) filehandle. Obtained from: NetBSD	1999-09-11 00:46:08 +00:00
peter	3b842d34e8	$Id$ -> $FreeBSD$	1999-08-28 01:08:13 +00:00
phk	591c94d4c6	Simplify the handling of VCHR and VBLK vnodes using the new dev_t: Make the alias list a SLIST. Drop the "fast recycling" optimization of vnodes (including the returning of a prexisting but stale vnode from checkalias). It doesn't buy us anything now that we don't hardlimit vnodes anymore. Rename checkalias2() and checkalias() to addalias() and addaliasu() - which takes dev_t and udev_t arg respectively. Make the revoke syscalls use vcount() instead of VALIASED. Remove VALIASED flag, we don't need it now and it is faster to traverse the much shorter lists than to maintain the flag. vfs_mountedon() can check the dev_t directly, all the vnodes point to the same one. Print the devicename in specfs/vprint(). Remove a couple of stale LFS vnode flags. Remove unimplemented/unused LK_DRAINED;	1999-08-26 14:53:31 +00:00
jdp	545b4031fc	Go back to using microtime() to get the timestamps for {f,l,}utimes(path, NULL) for now. Bruce says I jumped the gun with my change in revision 1.131, or maybe it should use nanotime(), or maybe it shouldn't be decided in the VFS layer at all. I'm leaving it with the old behavior until the Trans-Pacific Internet Vulcan Mind Meld yields fuller understanding.	1999-08-22 16:50:30 +00:00
jdp	36ba7111cd	Use the new vfs_timestamp() function to create the timestamps used by utimes(path, NULL). This gives them the same precision as the timestamps produced by write operations. Do likewise for lutimes() and futimes(). Suggested by bde.	1999-08-22 01:46:57 +00:00
alfred	e7ab2e043a	Replace a redundant vfs_object_create() call (already done in vn_open) with a KASSERT. Reviewed by: Eivind, Alan Cox	1999-08-12 20:38:32 +00:00
green	c03366a55d	Fix fd race conditions (during shared fd table usage.) Badfileops is now used in f_ops in place of NULL, and modifications to the files are more carefully ordered. f_ops should also be set to &badfileops upon "close" of a file. This does not fix other problems mentioned in this PR than the first one. PR: 11629 Reviewed by: peter	1999-08-04 18:53:50 +00:00
imp	54b72818fb	o Typo in prior version kept it from compiling (blush). Noticed by: Nobody! o Add comment about why we restrict chflags to root for devices. o nit noticed by bde wrt return values.	1999-08-04 04:52:18 +00:00
imp	9e1a007dbf	brucify: o use suser_xxx rather than suser to support JAIL code. o KNF comment convention o use vp->type rather than vaddr.type and eliminate call to VOP_GETATTR. Bruce says that vp->type is valid at this point. Submitted by: bde. Not fixed: o return (value) o Comment needs to be longer and more explicit. It will be after the advisory.	1999-08-03 17:07:04 +00:00
imp	b9c832cd1f	Only allow root to set file flags on devices.	1999-08-02 21:34:46 +00:00
green	7c9726279a	lutimes() bug: FOLLOW should be NOFOLLOW for this one. Submitted by: Dan Nelson <dnelson@emsphone.com>	1999-07-29 17:02:56 +00:00
alc	743eeab638	Add sysctl and support code to allow directories to be VMIO'd. The default setting for the sysctl is OFF, which is the historical operation. Submitted by: dillon	1999-07-26 06:25:53 +00:00
phk	ca21a25f17	This Implements the mumbled about "Jail" feature. This is a seriously beefed up chroot kind of thing. The process is jailed along the same lines as a chroot does it, but with additional tough restrictions imposed on what the superuser can do. For all I know, it is safe to hand over the root bit inside a prison to the customer living in that prison, this is what it was developed for in fact: "real virtual servers". Each prison has an ip number associated with it, which all IP communications will be coerced to use and each prison has its own hostname. Needless to say, you need more RAM this way, but the advantage is that each customer can run their own particular version of apache and not stomp on the toes of their neighbors. It generally does what one would expect, but setting up a jail still takes a little knowledge. A few notes: I have no scripts for setting up a jail, don't ask me for them. The IP number should be an alias on one of the interfaces. mount a /proc in each jail, it will make ps more useable. /proc/<pid>/status tells the hostname of the prison for jailed processes. Quotas are only sensible if you have a mountpoint per prison. There are no privisions for stopping resource-hogging. Some "#ifdef INET" and similar may be missing (send patches!) If somebody wants to take it from here and develop it into more of a "virtual machine" they should be most welcome! Tools, comments, patches & documentation most welcome. Have fun... Sponsored by: http://www.rndassociates.com/ Run for almost a year by: http://www.servetheweb.com/	1999-04-28 11:38:52 +00:00
phk	16e3fbd2c1	Suser() simplification: 1: s/suser/suser_xxx/ 2: Add new function: suser(struct proc ), prototyped in <sys/proc.h>. 3: s/suser_xxx($[a-zA-Z0-9_]$->p_ucred, \&\1->p_acflag)/suser(\1)/ The remaining suser_xxx() calls will be scrutinized and dealt with later. There may be some unneeded #include <sys/cred.h>, but they are left as an exercise for Bruce. More changes to the suser() API will come along with the "jail" code.	1999-04-27 11:18:52 +00:00
phk	e1c9acedca	Add a sysctl variable which can help stop chroot(2) escapes. kern.chroot_allow_open_directories = 0 chroot(2) fails if there are open directories. kern.chroot_allow_open_directories = 1 (default) chroot(2) fails if there are open directories and the process is subject of a previous chroot(2). kern.chroot_allow_open_directories = anything else filedescriptors are not checked. (old behaviour). I'm very interested in reports about software which breaks when running with the default setting.	1999-03-23 14:26:40 +00:00
julian	caaaa6363b	Slight cleanup of code resurected for union mounts.. Submitted by: Tony Finch <dot@dotat.at>	1999-03-03 02:35:51 +00:00
julian	51809f03eb	Fix code for union mounts Accidentally deleted by peter when he extracted the unionfs stuff in 1.109 Submitted by: Tony Finch <dot@dotat.at>	1999-02-27 07:06:05 +00:00
bde	ae4dfde7bc	Added a used #include (don't depend on "vnode_if.h" including <sys/buf.h>).	1999-02-25 15:54:06 +00:00
dfr	22ceb237f0	* Change sysctl from using linker_set to construct its tree using SLISTs. This makes it possible to change the sysctl tree at runtime. * Change KLD to find and register any sysctl nodes contained in the loaded file and to unregister them when the file is unloaded. Reviewed by: Archie Cobbs <archie@whistle.com>, Peter Wemm <peter@netplex.com.au> (well they looked at it anyway)	1999-02-16 10:49:55 +00:00
phk	c43b148bf4	Use suser() to determine super-user-ness. Collapse some duplicated checks. Reviewed by: bde	1999-01-30 12:27:00 +00:00
dillon	ca558df378	Fix warnings related to -Wall -Wcast-qual	1999-01-28 17:32:05 +00:00
dillon	a40e0249d4	Fix warnings in preparation for adding -Wall -Wcast-qual to the kernel compile	1999-01-27 21:50:00 +00:00
bde	ee544bdf09	Go back to only supporting revoke() for bdevs and cdevs. It is very buggy for fifos, and no one seems to have investigated its behaviour on other types of files. It has been broken since the Lite2 merge in rev.1.54. Nagged about by: Brian Feldman (green@unixhelp.org)	1999-01-24 06:28:37 +00:00
eivind	ffaaca5874	Remove the 'waslocked' parameter to vfs_object_create().	1999-01-05 18:50:03 +00:00
dillon	127655f6dd	PR: kern/8965 Obtained from: Stephen Clawson <sclawson@cs.utah.edu> Wakeup anyone waiting on a mount point prior to returning from umount, whether an error occurs or not. Fixes a stat/NFS-umount race and other potential future problems. Fix taken from bug/pr which also indicated that the same fix has already been applied to OpenBSD and NetBSD.	1998-12-12 21:07:09 +00:00
peter	43589c7042	make mount(2) automatically kldload modules if the requested filesystem isn't present.	1998-11-03 14:29:09 +00:00
peter	7819a9450e	Change the #ifdef UNION code into a callable hook. Arrange to have this set up when unionfs is present, either statically or as a kld module.	1998-11-03 08:01:48 +00:00
peter	c9ebff45d0	The last argument to vm_object_page_clean() are now bit flags, rather than the old true/false. While here, have vfs_msync() only call vm_object_page_clean() with OBJPC_SYNC if called with MNT_WAIT flags. vfs_msync() is called at unmount time (with MNT_WAIT) and from the syncer process (formerly update). This should make dirty mmap writebacks a little less nasty. I have tested this a little with SOFTUPDATES enabled, but I don't normally use it since I've been badly burned too many times.	1998-10-31 07:42:04 +00:00
luoqi	10b8717849	Eliminate a race in VOP_FSYNC() when softupdates is enabled. Submitted by: Kirk McKusick <mckusick@McKusick.COM> Two minor changes are also included, 1. Remove gratuitious checks for error return from vn_lock with LK_RETRY set, vn_lock should always succeed in these cases. 2. Back out change rev. 1.36->1.37, which unnecessarily makes async mount a little more unstable. It also keeps us in sync with other BSDs. Suggested by: Bruce Evans <bde@zeta.org.au>	1998-09-24 15:02:46 +00:00
tegge	127e6b1d67	Don't keep the underlying directory locked while performing the file system specific VFS_MOUNT operation. PR: 1067	1998-09-10 02:27:52 +00:00
bde	863d5c8b68	Cast pointers to uintptr_t/intptr_t instead of to u_long/long, respectively. Most of the longs should probably have been u_longs, but this changes is just to prevent warnings about casts between pointers and integers of different sizes, not to fix poorly chosen types.	1998-07-15 02:32:35 +00:00
dg	4795d888ab	Reset MNT_ASYNC flag if needed if unmount() should fail. Submitted by: Paul Saab <paul@mu.org>	1998-07-03 03:47:24 +00:00
dyson	bfd1d29a9f	Remove some junk left over from a previous commit. Submitted by: phk	1998-06-08 18:18:28 +00:00
dfr	1d5f38ac22	This commit fixes various 64bit portability problems required for FreeBSD/alpha. The most significant item is to change the command argument to ioctl functions from int to u_long. This change brings us inline with various other BSD versions. Driver writers may like to use (__FreeBSD_version == 300003) to detect this change. The prototype FreeBSD/alpha machdep will follow in a couple of days time.	1998-06-07 17:13:14 +00:00
dyson	ee396db7d3	Fix the futimes/undelete/utrace conflict with other BSD's. Note that the only common usage of utrace (the possible problem with this commit) is with malloc, so this should be a real problem. Add the various NetBSD syscalls that allow full emulation of their development environment.	1998-05-11 03:55:28 +00:00
msmith	964ce778b1	In the words of the submitter: --------- Make callers of namei() responsible for releasing references or locks instead of having the underlying filesystems do it. This eliminates redundancy in all terminal filesystems and makes it possible for stacked transport layers such as umapfs or nullfs to operate correctly. Quality testing was done with testvn, and lat_fs from the lmbench suite. Some NFS client testing courtesy of Patrik Kudo. vop_mknod and vop_symlink still release the returned vpp. vop_rename still releases 4 vnode arguments before it returns. These remaining cases will be corrected in the next set of patches. --------- Submitted by: Michael Hancock <michaelh@cet.co.jp>	1998-05-07 04:58:58 +00:00
des	901c8a6cfa	Backed out lseek changes.	1998-04-19 22:20:32 +00:00
des	959a007c3a	Return EINVAL and do not change file pointer if resulting offset is negative. PR: kern/6184	1998-04-18 19:24:44 +00:00
wosch	e3aed30232	New mount option nosymfollow. If enabled, the kernel lookup() function will not follow symbolic links on the mounted file system and return EACCES (Permission denied).	1998-04-08 18:31:59 +00:00
dyson	7f3f758651	Correct a significant problem with the softupdates port. Allow fsync to work properly within the softupdates framework, and thereby eliminate some unfortunate panics.	1998-03-29 18:23:44 +00:00
julian	10c5ccc30a	Reviewed by: dyson@freebsd.org (john Dyson), dg@root.com (david greenman) Submitted by: Kirk McKusick (mcKusick@mckusick.com) Obtained from: WHistle development tree	1998-03-08 09:59:44 +00:00
dyson	8ceb6160f4	This mega-commit is meant to fix numerous interrelated problems. There has been some bitrot and incorrect assumptions in the vfs_bio code. These problems have manifest themselves worse on NFS type filesystems, but can still affect local filesystems under certain circumstances. Most of the problems have involved mmap consistancy, and as a side-effect broke the vfs.ioopt code. This code might have been committed seperately, but almost everything is interrelated. 1) Allow (pmap_object_init_pt) prefaulting of buffer-busy pages that are fully valid. 2) Rather than deactivating erroneously read initial (header) pages in kern_exec, we now free them. 3) Fix the rundown of non-VMIO buffers that are in an inconsistent (missing vp) state. 4) Fix the disassociation of pages from buffers in brelse. The previous code had rotted and was faulty in a couple of important circumstances. 5) Remove a gratuitious buffer wakeup in vfs_vmio_release. 6) Remove a crufty and currently unused cluster mechanism for VBLK files in vfs_bio_awrite. When the code is functional, I'll add back a cleaner version. 7) The page busy count wakeups assocated with the buffer cache usage were incorrectly cleaned up in a previous commit by me. Revert to the original, correct version, but with a cleaner implementation. 8) The cluster read code now tries to keep data associated with buffers more aggressively (without breaking the heuristics) when it is presumed that the read data (buffers) will be soon needed. 9) Change to filesystem lockmgr locks so that they use LK_NOPAUSE. The delay loop waiting is not useful for filesystem locks, due to the length of the time intervals. 10) Correct and clean-up spec_getpages. 11) Implement a fully functional nfs_getpages, nfs_putpages. 12) Fix nfs_write so that modifications are coherent with the NFS data on the server disk (at least as well as NFS seems to allow.) 13) Properly support MS_INVALIDATE on NFS. 14) Properly pass down MS_INVALIDATE to lower levels of the VM code from vm_map_clean. 15) Better support the notion of pages being busy but valid, so that fewer in-transit waits occur. (use p->busy more for pageouts instead of PG_BUSY.) Since the page is fully valid, it is still usable for reads. 16) It is possible (in error) for cached pages to be busy. Make the page allocation code handle that case correctly. (It should probably be a printf or panic, but I want the system to handle coding errors robustly. I'll probably add a printf.) 17) Correct the design and usage of vm_page_sleep. It didn't handle consistancy problems very well, so make the design a little less lofty. After vm_page_sleep, if it ever blocked, it is still important to relookup the page (if the object generation count changed), and verify it's status (always.) 18) In vm_pageout.c, vm_pageout_clean had rotted, so clean that up. 19) Push the page busy for writes and VM_PROT_READ into vm_pageout_flush. 20) Fix vm_pager_put_pages and it's descendents to support an int flag instead of a boolean, so that we can pass down the invalidate bit.	1998-03-07 21:37:31 +00:00
dyson	fbe6fe8df6	Make the rootdir handling more consistent. Now, processes always have a root vnode associated with them, and no special checks for the null case are needed. Submitted by: terry@freebsd.org	1998-02-15 04:17:09 +00:00
dyson	9f39ec243f	Fix a problem with vn_lock in fsync.	1998-02-08 01:41:33 +00:00
eivind	4547a09753	Back out DIAGNOSTIC changes.	1998-02-06 12:14:30 +00:00
eivind	c552a9a1c3	Turn DIAGNOSTIC into a new-style option.	1998-02-04 22:34:03 +00:00
dyson	cb2800cd94	Make our v_usecount vnode reference count work identically to the original BSD code. The association between the vnode and the vm_object no longer includes reference counts. The major difference is that vm_object's are no longer freed gratuitiously from the vnode, and so once an object is created for the vnode, it will last as long as the vnode does. When a vnode object reference count is incremented, then the underlying vnode reference count is incremented also. The two "objects" are now more intimately related, and so the interactions are now much less complex. When vnodes are now normally placed onto the free queue with an object still attached. The rundown of the object happens at vnode rundown time, and happens with exactly the same filesystem semantics of the original VFS code. There is absolutely no need for vnode_pager_uncache and other travesties like that anymore. A side-effect of these changes is that SMP locking should be much simpler, the I/O copyin/copyout optimizations work, NFS should be more ponderable, and further work on layered filesystems should be less frustrating, because of the totally coherent management of the vnode objects and vnodes. Please be careful with your system while running this code, but I would greatly appreciate feedback as soon a reasonably possible.	1998-01-06 05:26:17 +00:00
dyson	cd67bb82fe	Lots of improvements, including restructring the caching and management of vnodes and objects. There are some metadata performance improvements that come along with this. There are also a few prototypes added when the need is noticed. Changes include: 1) Cleaning up vref, vget. 2) Removal of the object cache. 3) Nuke vnode_pager_uncache and friends, because they aren't needed anymore. 4) Correct some missing LK_RETRY's in vn_lock. 5) Correct the page range in the code for msync. Be gentle, and please give me feedback asap.	1997-12-29 00:25:11 +00:00
bde	3c1b6940fc	Unspammed nested include of <vm/vm_zone.h>.	1997-12-27 02:56:39 +00:00
eivind	01dd6091ed	Make COMPAT_43 and COMPAT_SUNOS new-style options.	1997-12-16 17:40:42 +00:00
bde	65d8d62f59	Cleaned up __getcwd(). This should be cosmetic except disabled calls are now counted. Reviewed by: phk	1997-12-02 10:32:21 +00:00
bde	241618cb5b	Staticized. Use OID_AUTO instead of a magic number for the debug.syncprt sysctl. (This sysctl doesn't actually work. FreeBSD nuked it, but parts of it were mismerged from Lite2. It is not very good, but better than nothing.)	1997-11-22 06:41:21 +00:00
bde	7b2acd5a85	Fixed rev.1.81. mp->mnt_kern_flag was restored in the non-error case of `mount -u'. This only matters for `mount -u' competing with unmounts. If I understand the locking correctly: if mount() blocks, then unmount() may run and set mp->kern_flag for the same mp. Then unmount() blocks waiting for mount() to finish. When unmount() continues, its MNTK flags (MNTK_UNMOUNT and MNTK_MWAIT) may have been clobbered. Didn't fix old bugs: - restoring mp->mnt_kern_flag is wrong for the same reasons in the error case. - the error case of unmount() seems to be broken too: (a) MNTK_UNMOUNT gets clobbered, although another unmount() may have set it. Perhaps it shouldn't be set until after the full lock is aquired. (b) MNTK_MWAIT isn't honoured. Fixed a nearby style bug.	1997-11-22 06:10:36 +00:00
julian	c931d11d3f	Reviewed by: hackers@freebsd.org in general Obtained from: Whistle Communications tree Add an option to the way UFS works dependent on the SUID bit of directories This changes makes things a whole lot simpler on systems running as fileservers for PCs and MACS. to enable the new code you must 1/ enable option SUIDDIR on the kernel. 2/ mount the filesystem with option suiddir. hopefully this makes it difficult enough for people to do this accidentally. see the new chmod(2) man page for detailed info.	1997-11-13 00:28:51 +00:00
julian	ae22df605c	Reviewed by: various. Ever since I first say the way the mount flags were used I've hated the fact that modes, and events, internal and exported, and short-term and long term flags are all thrown together. Finally it's annoyed me enough.. This patch to the entire FreeBSD tree adds a second mount flag word to the mount struct. it is not exported to userspace. I have moved some of the non exported flags over to this word. this means that we now have 8 free bits in the mount flags. There are another two that might well move over, but which I'm not sure about. The only user visible change would have been in pstat -v, except that davidg has disabled it anyhow. I'd still like to move the state flags and the 'command' flags apart from each other.. e.g. MNT_FORCE really doesn't have the same semantics as MNT_RDONLY, but that's left for another day.	1997-11-12 05:42:33 +00:00
phk	4c8218a5c7	Move the "retval" (3rd) parameter from all syscall functions and put it in struct proc instead. This fixes a boatload of compiler warning, and removes a lot of cruft from the sources. I have not removed the /ARGSUSED/, they will require some looking at. libkvm, ps and other userland struct proc frobbing programs will need recompiled.	1997-11-06 19:29:57 +00:00
bde	63cd8d1f1b	Fixed style bugs in open() fix.	1997-10-28 10:29:55 +00:00
kato	e4e4abd8bf	Disallow non-root mount. If you want to allow non-root mount, change vfs.usermount into 1 with sysctl.	1997-10-23 09:29:09 +00:00
joerg	b631d52e4d	Reject attempts to call open() with an illegal combination of O_RDONLY, O_WRONLY, O_RDWR.	1997-10-22 07:28:51 +00:00
phk	36e7a51ea1	Last major round (Unless Bruce thinks of somthing :-) of malloc changes. Distribute all but the most fundamental malloc types. This time I also remembered the trick to making things static: Put "static" in front of them. A couple of finer points by: bde	1997-10-12 20:26:33 +00:00
phk	93c5769c6c	Fix handling of nested mountpoints in __getcwd() Detected by: Simon Shapiro <Shimon@i-Connect.Net>	1997-09-28 06:37:02 +00:00
kato	fe9b86cf0b	Clustered read and write are switched at mount-option level. 1. Clustered I/O is switched by the MNT_NOCLUSTERR and MNT_NOCLUSTERW bits of the mnt_flag. The sysctl variables, vfs.foo.doclusterread and vfs.foo.doclusterwrite are deleted. Only mount option can control clustered I/O from userland. 2. When foofs_mount mounts block device, foofs_mount checks D_CLUSTERR and D_CLUSTERW bits of the d_flags member in the block device switch table. If D_NOCLUSTERR / D_NOCLUSTERW are set, MNT_NOCLUSTERR / MNT_NOCLUSTERW bits will be set. In this case, MNT_NOCLUSTERR and MNT_NOCLUSTERW cannot be cleared from userland. 3. Vnode driver disables both clustered read and write. 4. Union filesystem disables clutered write. Reviewed by: bde	1997-09-27 13:40:20 +00:00
phk	4d1dd5bc33	A couple of handles to tweak, more statistics.	1997-09-24 07:46:54 +00:00
dyson	e64b1984f9	Change the M_NAMEI allocations to use the zone allocator. This change plus the previous changes to use the zone allocator decrease the useage of malloc by half. The Zone allocator will be upgradeable to be able to use per CPU-pools, and has more intelligent usage of SPLs. Additionally, it has reasonable stats gathering capabilities, while making most calls inline.	1997-09-21 04:24:27 +00:00
phk	9a01d27563	Don't leak memory, from sef. Stylistic nits and a blunder, from bde.	1997-09-16 08:05:09 +00:00
phk	869c60f487	Solve race-condition, return path in normal order. A couple of stylistic nits from Bruce. If your libc contains version 1.11 or 1.12 of getcwd.c, (ie: if you recompiled libc one of the last couple of days): >>> Recompile LIBC before you boot a new kernel <<< A new libc will deal with both old and new kernels.	1997-09-15 19:11:07 +00:00
phk	950d92dde6	Deal more correctly with mountpoints.	1997-09-15 08:25:43 +00:00
phk	b079abc11c	Add a __getcwd() syscall. This is intentionally undocumented, but all it does is to try to figure the pwd out from the vfs namecache, and return a reversed string to it. libc:getcwd() is responsible for flipping it back.	1997-09-14 16:51:31 +00:00
bde	6ffb8bf9af	Removed unused #includes.	1997-09-02 20:06:59 +00:00
dfr	df8d8e5713	Merge WebNFS support from NetBSD Obtained from: NetBSD	1997-07-17 07:17:33 +00:00
dfr	5974d18a75	[Previous comment was incorrect for these files] Added calls to VFS lock debugging macros to make fixing filesystems' locking easier.	1997-04-04 17:47:43 +00:00
dfr	60008c7902	Add a function vop_sharedlock which a copy of vop_nolock without the implementation #ifdef out. This can be used for now by NFS. As soon as all the other filesystems' locking is fixed, this can go away. Print the vnode address in vprint for easier debugging.	1997-04-04 17:46:21 +00:00
peter	db96cd7074	Code to do lchown(2), copied from chown(2) except it's NOFOLLOW in ND_INIT instead of FOLLOW.	1997-03-31 12:21:37 +00:00
peter	760db2332e	Treat symlinks as first class citizens with their own uid/gid rather than as shadows of their containing directory. This should solve the problem of users not being able to delete their symlinks from /tmp once and for all. Symlinks do not have modes though, they are accessable to everything that can read the directory (as before). They are made to show this fact at lstat time (they appear as mode 0777 always, since that's how the the lookup routines in the kernel treat them). More commits will follow, eg: add a real lchown() syscall and man pages.	1997-03-31 12:02:53 +00:00
guido	c337c37259	Add generation number randomization. Newly created filesystems wil now automatically have random generation numbers. The kenel way of handling those also changed. Further it is advised to run fsirand on all your nfs exported filesystems. the code is mostly copied from OpenBSD, with the randomization chanegd to use /dev/urandom Reviewed by: Garrett Obtained from: OpenBSD	1997-03-23 20:08:22 +00:00
bde	0d3591bdbd	Don't #include <sys/fcntl.h> in <sys/file.h> if KERNEL is defined. Fixed everything that depended on getting fcntl.h stuff from the wrong place. Most things don't depend on file.h stuff at all.	1997-03-23 03:37:54 +00:00
msmith	7b27305852	Check that vp->v_mount is non-null in fsync() before dereferencing it to obtain the mountpoint's MNT_ASYNC flag. This is a Very Definite Last-Minute 2.2 Bugfix Candidate. Reviewed by: sef	1997-03-05 01:42:14 +00:00
peter	94b6d72794	Back out part 1 of the MCFH that changed $Id$ to $FreeBSD$. We are not ready for it yet.	1997-02-22 09:48:43 +00:00
mpp	6ada6e6681	Don't depend on FIFO being defined to enable mkfifo. It is now always compiled. Submitted by: bde	1997-02-12 16:55:32 +00:00
mpp	11081f2076	Add function protypes for the new Lite2 unionfs functions.	1997-02-12 07:54:22 +00:00
mpp	d05d971648	Comment out a call to the #ifdef DIAGNOSTIC routine vfs_bufstats(). This routine was not imported in the Lite2 merge.	1997-02-12 06:46:11 +00:00
dyson	10f666af84	This is the kernel Lite/2 commit. There are some requisite userland changes, so don't expect to be able to run the kernel as-is (very well) without the appropriate Lite/2 userland changes. The system boots and can mount UFS filesystems. Untested: ext2fs, msdosfs, NFS Known problems: Incorrect Berkeley ID strings in some files. Mount_std mounts will not work until the getfsent library routine is changed. Reviewed by: various people Submitted by: Jeffery Hsu <hsu@freebsd.org>	1997-02-10 02:22:35 +00:00
jkh	808a36ef65	Make the long-awaited change from $Id$ to $FreeBSD$ This will make a number of things easier in the future, as well as (finally!) avoiding the Id-smashing problem which has plagued developers for so long. Boy, I'm glad we're not using sup anymore. This update would have been insane otherwise.	1997-01-14 07:20:47 +00:00
bde	d837023225	Fixed lseek() on named pipes. It always succeeded but should always fail. Broke locking on named pipes in the same way as locking on non-vnodes (wrong errno). This will be fixed later. The fix involves negative logic. Named pipes are now distinguished from other types of files with vnodes, and there is additional code to handle vnodes and named pipes in the same way only where that makes sense (not for lseek, locking or TIOCSCTTY).	1996-12-19 19:42:37 +00:00
nate	45c85d421d	In sys/time.h, struct timespec is defined as: /* * Structure defined by POSIX.4 to be like a timeval. / struct timespec { time_t ts_sec; / seconds / long ts_nsec; / and nanoseconds */ }; The correct names of the fields are tv_sec and tv_nsec. Reminded by: James Drobina <jdrobina@infinet.com>	1996-09-19 18:21:32 +00:00
bde	51ff523803	Eliminated nested include of <sys/unistd.h> in <sys/file.h> in the kernel. Include it directly in the few places where it is used. Reduced some #includes of <sys/file.h> to #includes of <sys/fcntl.h> or nothing.	1996-09-03 14:25:27 +00:00
dg	4aa33638eb	Implemented kernel side of MNT_NOATIME mount option. This option disables the file access time update on reads and can be useful in reducing filesystem overhead in cases where the access time is not important (like Usenet news spools).	1996-09-03 07:09:11 +00:00
peter	9453593a7d	Dont allow directories to be link()ed or unlink()ed, even for root (returns EPERM always, the errno is specified by POSIX). If you really have a desperate need to link or unlink a directory, you can use fsdb. :-) This should stop any chance of ftpd, rdist, "rm -rf", etc from bugging out and damaging the filesystem structure or loosing races with malicious users. Reviewed by: davidg, bde	1996-05-24 16:19:23 +00:00
bde	c926fa02cd	Hide options for emulators and static file systems in opt_dontuse.h. These options only apply at config time. Using them at compile time would break the corresponding lkms.	1996-05-11 04:39:53 +00:00
dg	0b3d2c7fe7	Make sure the mountpoint is marked busy before doing operations on it. This fixes a panic that freefall suffered last night. Obtained partially from 4.4-lite2, but minus the new bug that it introduced	1996-01-16 13:07:14 +00:00
wollman	f9dc49400b	convert FDESC, KERNFS, NULLFS, PORTAL, UMAPFS, and UNION to the new style of options.	1996-01-05 17:46:14 +00:00
phk	02cbd66c6d	Staticize. Unstaticize a function in scsi/scsi_base that was used, with an undocumented option. My last count on the LINT kernel shows: Total symbols: 3647 unref symbols: 463 undef symbols: 4 1 ref symbols: 1751 2 ref symbols: 485 Approaching the pain threshold now.	1995-12-17 21:23:44 +00:00
dyson	601ed1a4c0	Changes to support 1Tb filesizes. Pages are now named by an (object,index) pair instead of (object,offset) pair.	1995-12-11 04:58:34 +00:00
dg	c30f46c534	Untangled the vm.h include file spaghetti.	1995-12-07 12:48:31 +00:00
bde	fc2b88c806	Fixed the errno returned by rename("dir1", "dir2/."). It was EISDIR (duh); translate it to EINVAL which is the errno for other renames to ".".	1995-11-18 11:35:05 +00:00
phk	dd42080c0c	Change some of the debug sysctl vars. The semantics of these will change.	1995-11-14 09:19:16 +00:00
bde	bc3a8c9dbf	Fixed a cast in olseek(). Fixed confusing order of declarations of getvnode()'s args.	1995-11-13 08:22:21 +00:00
bde	aa9a60640e	Included <sys/sysproto.h> to get central declarations for syscall args structs and prototypes for syscalls. Ifdefed duplicated decentralized declarations of args structs. It's convenient to have this visible but they are hard to maintain. Some are already different from the central declarations. 4.4lite2 puts them in comments in the function headers but I wanted to avoid the large changes for that.	1995-11-12 06:43:28 +00:00
dyson	cfa6fda252	Make MNT_ASYNC more effective for UFS. It should not be too much more dangerous than the original MNT_ASYNC. There might be some minor security considerations due to data writes not being posted as promptly as before. Meta-data operations are still not quite as fast as Linux, but streaming I/O is still higher.	1995-11-05 21:01:15 +00:00
bde	3c4628f918	Prototype getvnode() in the right place (where ibcs2_stat.c can see it).	1995-11-04 10:35:26 +00:00
dg	b5341559e2	Moved the filesystem read-only check out of the syscalls and into the filesystem layer, as was done in lite-2. Merged in some other cosmetic changes while I was at it. Rewrote most of msdosfs_access() to be more like ufs_access() and to include the FS read-only check. Obtained from: partially from 4.4BSD-lite2	1995-10-22 09:32:48 +00:00
swallace	f89a2d2259	Remove prototype definitions from <sys/systm.h>. Prototypes are located in <sys/sysproto.h>. Add appropriate #include <sys/sysproto.h> to files that needed protos from systm.h. Add structure definitions to appropriate files that relied on sys/systm.h, right before system call definition, as in the rest of the kernel source. In kern_prot.c, instead of using the dummy structure "args", create individual dummy structures named <syscall>_args. This makes life easier for prototype generation.	1995-10-08 00:06:22 +00:00
julian	ebb726ec45	Reviewed by: julian with quick glances by bruce and others Submitted by: terry (terry lambert) This is a composite of 3 patch sets submitted by terry. they are: New low-level init code that supports loadbal modules better some cleanups in the namei code to help terry in 16-bit character support some changes to the mount-root code to make it a little more modular.. NOTE: mounting root off cdrom or NFS MIGHT be broken as I haven't been able to test those cases.. certainly mounting root of disk still works just fine.. mfs should work but is untested. (tomorrows task) The low level init stuff includes a total rewrite of init_main.c to make it possible for new modules to have an init phase by simply adding an entry to a TEXT_SET (or is it DATA_SET) list. thus a new module can be added to the kernel without editing any other files other than the 'files' file.	1995-08-28 09:19:25 +00:00
bde	36cc023d9c	The `cred' and` proc' args were missing for some VOP_OPEN() and VOP_CLOSE() calls. Found by: gcc -Wstrict-prototypes after I supplied some of the 5000+ missing prototypes. Now I have 9000+ lines of warnings and errors about bogus conversions of function pointers.	1995-08-17 11:53:51 +00:00
dg	5b4b270015	Converted mountlist to a CIRCLEQ. Partially obtained from: 4.4BSD-Lite2	1995-08-11 11:31:18 +00:00
dg	21cc29328e	Removed my special-case hack for VOP_LINK and fixed the problem with the wrong vp's ops vector being used by changing the VOP_LINK's argument order. The special-case hack doesn't go far enough and breaks the generic bypass routine used in some non-leaf filesystems. Pointed out by Kirk McKusick.	1995-08-01 18:51:02 +00:00
bde	3029c788af	Ignore trailing slashes in pathnames that "refer to a directory", as is required to be POSIXLY_CORRECT and "right". I interpret "referring to a directory" as being a directory or becoming a directory. E.g., the trailing slashes in mkdir("/nonesuch/"), rename("/tmp", /nonesuch/") and link("/tmp", "/root_can_like_dirs/") are ignored because the target will become a directory if the syscall succeeds. A trailing slash on a symlink causes the symlink to be followed (this is a bug if the symlink doesn't point to a directory; fix later).	1995-07-31 00:35:58 +00:00
dg	c8b0a7332c	NOTE: libkvm, w, ps, 'top', and any other utility which depends on struct proc or any VM system structure will have to be rebuilt!!! Much needed overhaul of the VM system. Included in this first round of changes: 1) Improved pager interfaces: init, alloc, dealloc, getpages, putpages, haspage, and sync operations are supported. The haspage interface now provides information about clusterability. All pager routines now take struct vm_object's instead of "pagers". 2) Improved data structures. In the previous paradigm, there is constant confusion caused by pagers being both a data structure ("allocate a pager") and a collection of routines. The idea of a pager structure has escentially been eliminated. Objects now have types, and this type is used to index the appropriate pager. In most cases, items in the pager structure were duplicated in the object data structure and thus were unnecessary. In the few cases that remained, a un_pager structure union was created in the object to contain these items. 3) Because of the cleanup of #1 & #2, a lot of unnecessary layering can now be removed. For instance, vm_object_enter(), vm_object_lookup(), vm_object_remove(), and the associated object hash list were some of the things that were removed. 4) simple_lock's removed. Discussion with several people reveals that the SMP locking primitives used in the VM system aren't likely the mechanism that we'll be adopting. Even if it were, the locking that was in the code was very inadequate and would have to be mostly re-done anyway. The locking in a uni-processor kernel was a no-op but went a long way toward making the code difficult to read and debug. 5) Places that attempted to kludge-up the fact that we don't have kernel thread support have been fixed to reflect the reality that we are really dealing with processes, not threads. The VM system didn't have complete thread support, so the comments and mis-named routines were just wrong. We now use tsleep and wakeup directly in the lock routines, for instance. 6) Where appropriate, the pagers have been improved, especially in the pager_alloc routines. Most of the pager_allocs have been rewritten and are now faster and easier to maintain. 7) The pagedaemon pageout clustering algorithm has been rewritten and now tries harder to output an even number of pages before and after the requested page. This is sort of the reverse of the ideal pagein algorithm and should provide better overall performance. 8) Unnecessary (incorrect) casts to caddr_t in calls to tsleep & wakeup have been removed. Some other unnecessary casts have also been removed. 9) Some almost useless debugging code removed. 10) Terminology of shadow objects vs. backing objects straightened out. The fact that the vm_object data structure escentially had this backwards really confused things. The use of "shadow" and "backing object" throughout the code is now internally consistent and correct in the Mach terminology. 11) Several minor bug fixes, including one in the vm daemon that caused 0 RSS objects to not get purged as intended. 12) A "default pager" has now been created which cleans up the transition of objects to the "swap" type. The previous checks throughout the code for swp->pg_data != NULL were really ugly. This change also provides the rudiments for future backing of "anonymous" memory by something other than the swap pager (via the vnode pager, for example), and it allows the decision about which of these pagers to use to be made dynamically (although will need some additional decision code to do this, of course). 13) (dyson) MAP_COPY has been deprecated and the corresponding "copy object" code has been removed. MAP_COPY was undocumented and non- standard. It was furthermore broken in several ways which caused its behavior to degrade to MAP_PRIVATE. Binaries that use MAP_COPY will continue to work correctly, but via the slightly different semantics of MAP_PRIVATE. 14) (dyson) Sharing maps have been removed. It's marginal usefulness in a threads design can be worked around in other ways. Both #12 and #13 were done to simplify the code and improve readability and maintain- ability. (As were most all of these changes) TODO: 1) Rewrite most of the vnode pager to use VOP_GETPAGES/PUTPAGES. Doing this will reduce the vnode pager to a mere fraction of its current size. 2) Rewrite vm_fault and the swap/vnode pagers to use the clustering information provided by the new haspage pager interface. This will substantially reduce the overhead by eliminating a large number of VOP_BMAP() calls. The VOP_BMAP() filesystem interface should be improved to provide both a "behind" and "ahead" indication of contiguousness. 3) Implement the extended features of pager_haspage in swap_pager_haspage(). It currently just says 0 pages ahead/behind. 4) Re-implement the swap device (swstrategy) in a more elegant way, perhaps via a much more general mechanism that could also be used for disk striping of regular filesystems. 5) Do something to improve the architecture of vm_object_collapse(). The fact that it makes calls into the swap pager and knows too much about how the swap pager operates really bothers me. It also doesn't allow for collapsing of non-swap pager objects ("unnamed" objects backed by other pagers).	1995-07-13 08:48:48 +00:00
dg	3c7c1dd62f	1) Converted v_vmdata to v_object. 2) Removed unnecessary vm_object_lookup()/pager_cache(object, TRUE) pairs after vnode_pager_alloc() calls - the object is already guaranteed to be persistent. 3) Removed some gratuitous casts.	1995-06-28 12:01:13 +00:00
dg	5d0d9f974b	Fixed VOP_LINK argument order botch.	1995-06-28 07:06:55 +00:00
dg	2045200a00	Changes to fix the following bugs: 1) Files weren't properly synced on filesystems other than UFS. In some cases, this lead to lost data. Most likely would be noticed on NFS. The fix is to make the VM page sync/object_clean general rather than in each filesystem. 2) Mixing regular and mmaped file I/O on NFS was very broken. It caused chunks of files to end up as zeroes rather than the intended contents. The fix was to fix several race conditions and to kludge up the "b_dirtyoff" and "b_dirtyend" that NFS relies upon - paying attention to page modifications that occurred via the mmapping. Reviewed by: David Greenman Submitted by: John Dyson	1995-05-21 21:39:31 +00:00
dg	138edd5273	Fixed incompleteness that would allow dirty filesystems to get mounted when the single user shell was terminated. These changes disallow mounting or R/W upgrading filesystems that are dirty unless "-f" (force) option is used with mount. /etc/rc has been modified to abort the startup if one or more non-nfs partitions fail to mount. Reviewed by: Poul-Henning Kamp, Rod Grimes	1995-05-15 08:39:37 +00:00
dg	bdaf4b444d	Removed unused variable caused by last commit.	1995-05-02 09:06:04 +00:00
dg	381c266547	Fix for sync() to close a potential panic with accessing a mount struct that had been freed. Submitted by: John Dyson	1995-05-02 08:44:31 +00:00
dg	8b14c6a8ea	Added a set of braces to make the compiler happy.	1995-03-29 11:54:02 +00:00
dg	a62ef20881	Moved call to vnode_pager_uncache in rename() to before the VOP_RENAME. It was previously after the VOP_RENAME and the reference and lock on the vnode had already been lost, allowing interesting internel inconsistencies. This is one of the two reasons why freefall was crashing every hour or two (the other being nullfs bugs). Don't call vnode_pager_uncache in revoke(). revoke() is only allowed on VCHR and VBLK vnodes.	1995-03-19 11:16:58 +00:00
bde	289f11acb4	Add and move declarations to fix all of the warnings from `gcc -Wimplicit' (except in netccitt, netiso and netns) and most of the warnings from `gcc -Wnested-externs'. Fix all the bugs found. There were no serious ones.	1995-03-16 18:17:34 +00:00
dg	191e07d353	Do a vnode_pager_uncache after the VOP_RENAME to lose the remaining reference to the old vnode. Suggested by: Bruce Evans	1995-02-28 02:52:48 +00:00
dg	a0467b5c60	In sync(), don't dereference the proc pointer if it's NULL. Should fix most or all of the problems with calling sync() without a curproc (which can happen in machdep.c during a panic sync).	1995-02-13 13:45:04 +00:00
dg	e5d78dd23f	From tim@cs.city.ac.uk (Tim Wilkinson): Find enclosed a short bugfix to get the union filesystem up and running in FreeBSD-current. We don't think we've got all the problems yet but these fixes sort out the major ones (which mostly concert bad locking of vnodes), no doubt we'll post others as necessary. Known problems include the inability of the umount command (not the system call) to unmount unions in certain circumstances (this is due the way "realpath" works), and the failure of direntries to always get all available files in unioned subdirectories. We are, as they say, working on it. Submitted by: tim@cs.city.ac.uk (Tim Wilkinson)	1994-11-04 14:41:46 +00:00
wollman	3a9b1c345a	Make my ALLDEVS kernel compile (basically, LINT minus a lot of options). This involves fixing a few things I broke last time.	1994-10-21 01:19:28 +00:00
phk	c4b061eaa0	Fix the problem with panics when mounting on nonexistant directories. Probably my fault in the first place...	1994-10-15 02:53:26 +00:00
sos	45e2713230	Removed static declaration of getvnode() (used in ibcs2)	1994-10-11 20:40:12 +00:00
phk	5f8e9e2c4d	Cosmetics: added ()'s and fixed prinf-formats to make gcc silent.	1994-10-08 22:33:43 +00:00
dg	52b4dc9c30	Stuff object into v_vmdata rather than pager. Not important which at the moment, but will be in the future. Other changes mostly cosmetic, but are made for future VMIO considerations. Submitted by: John Dyson	1994-10-05 09:48:45 +00:00
phk	c3e4945541	All of this is cosmetic. prototypes, #includes, printfs and so on. Makes GCC a lot more silent.	1994-10-02 17:35:40 +00:00
dfr	57b6c0c34c	Make NFS ask the filesystems for directory cookies instead of making them itself.	1994-09-28 16:45:22 +00:00
wollman	900d29807d	More loadable VFS changes: - Make a number of filesystems work again when they are statically compiled (blush) - FIFOs are no longer optional; ``options FIFO'' removed from distributed config files.	1994-09-22 19:38:41 +00:00
wollman	c289ac89a1	Implemented loadable VFS modules, and made most existing filesystems loadable. (NFS is a notable exception.)	1994-09-21 03:47:43 +00:00
dg	70a7b53e06	Disallow truncating to negative file sizes. Doing so causes ffs_truncate() and perhaps other fs truncate's to go crazy and panic the machine or worse. This fixes the truncate bug reported by Michael Class.	1994-09-02 10:23:43 +00:00
dg	4e8717c691	Make olstat() consistent with lstat() - so they both return the same owner.. Submitted by: Kirk McKusick	1994-09-02 04:14:44 +00:00
dg	f817326b2e	Implemented filesystem clean bit via: machdep.c: Changed printf's a little and call vfs_unmountall() if the sync was successful. cd9660_vfsops.c, ffs_vfsops.c, nfs_vfsops.c, lfs_vfsops.c: Allow dismount of root FS. It is now disallowed at a higher level. vfs_conf.c: Removed unused rootfs global. vfs_subr.c: Added new routines vfs_unmountall and vfs_unmountroot. Filesystems are now dismounted if the machine is properly rebooted. ffs_vfsops.c: Toggle clean bit at the appropriate places. Print warning if an unclean FS is mounted. ffs_vfsops.c, lfs_vfsops.c: Fix bug in selecting proper flags for VOP_CLOSE(). vfs_syscalls.c: Disallow dismounting root FS via umount syscall.	1994-08-20 16:03:26 +00:00
dg	8d205697aa	Added $Id$	1994-08-02 07:55:43 +00:00
rgrimes	2469c867a1	The big 4.4BSD Lite to FreeBSD 2.0.0 (Development) patch. Reviewed by: Rodney W. Grimes Submitted by: John Dyson and David Greenman	1994-05-25 09:21:21 +00:00

... 4 5 6 7 8 ...

551 Commits