freebsd-skq

Author	SHA1	Message	Date
Matthew D Fleming	e7ceb1e99b	Based on discussions on the svn-src mailing list, rework r218195: - entirely eliminate some calls to uio_yeild() as being unnecessary, such as in a sysctl handler. - move should_yield() and maybe_yield() to kern_synch.c and move the prototypes from sys/uio.h to sys/proc.h - add a slightly more generic kern_yield() that can replace the functionality of uio_yield(). - replace source uses of uio_yield() with the functional equivalent, or in some cases do not change the thread priority when switching. - fix a logic inversion bug in vlrureclaim(), pointed out by bde@. - instead of using the per-cpu last switched ticks, use a per thread variable for should_yield(). With PREEMPTION, the only reasonable use of this is to determine if a lock has been held a long time and relinquish it. Without PREEMPTION, this is essentially the same as the per-cpu variable.	2011-02-08 00:16:36 +00:00
Pawel Jakub Dawidek	3297cdd096	Correct arguments order.	2010-06-26 21:44:45 +00:00
Edward Tomasz Napierala	77dda2b96f	Avoid overflow. Submitted by: bde@	2010-05-06 18:52:41 +00:00
Edward Tomasz Napierala	307d88b787	Style fixes and removal of unneeded variable. Submitted by: bde@	2010-05-06 18:43:19 +00:00
Edward Tomasz Napierala	b5f770bd86	Move checking against RLIMIT_FSIZE into one place, vn_rlimit_fsize(). Reviewed by: kib	2010-05-05 16:44:25 +00:00
Andriy Gapon	364b8a7b33	vn_stat: take into account va_blocksize when setting st_blksize As currently st_blksize is always PAGE_SIZE, it is playing safe to not use any smaller value. For some cases this might not be optimal, but at least nothing should get broken. Generally I don't expect this commit to change much for the following reasons (in case of VREG, VDIR): - application I/O and physical I/O are sufficiently decoupled by filesystem code, buffer cache code, cluster and read-ahead logic - not all applications use st_blksize as a hint, some use f_iosize, some use fixed block sizes I expect writes to the middle of files on ZFS to benefit the most from this change. Silence from: fs@ MFC after: 2 weeks	2010-04-03 08:39:00 +00:00
Ed Schouten	510ea843ba	Rename st_timespec fields to st_tim for POSIX 2008 compliance. A nice thing about POSIX 2008 is that it finally standardizes a way to obtain file access/modification/change times in sub-second precision, namely using struct timespec, which we already have for a very long time. Unfortunately POSIX uses different names. This commit adds compatibility macros, so existing code should still build properly. Also change all source code in the kernel to work without any of the compatibility macros. This makes it all a less ambiguous. I am also renaming st_birthtime to st_birthtim, even though it was a local extension anyway. It seems Cygwin also has a st_birthtim.	2010-03-28 13:13:22 +00:00
Ed Schouten	0fef797f4a	Actually make O_DIRECTORY work. According to POSIX open() must return ENOTDIR when the path name does not refer to a path name. Change vn_open() to respect this flag. This also simplifies the Linuxolator a bit.	2010-03-21 20:43:23 +00:00
Edward Tomasz Napierala	9d7031a6d6	Don't add VAPPEND if the file is not being opened for writing. Note that this only affects cases where open(2) is being used improperly - i.e. when the user specifies O_APPEND without O_WRONLY or O_RDWR. Reviewed by: rwatson	2009-12-08 20:47:10 +00:00
Edward Tomasz Napierala	3b1b09809f	Revert r198874, pending further discussion.	2009-11-04 07:14:16 +00:00
Edward Tomasz Napierala	8fafa5cecf	Make sure we don't end up with VAPPEND without VWRITE, if someone calls open(2) like this: open(..., O_APPEND).	2009-11-04 06:48:34 +00:00
Xin LI	82aebf697c	Add two new fcntls to enable/disable read-ahead: - F_READAHEAD: specify the amount for sequential access. The amount is specified in bytes and is rounded up to nearest block size. - F_RDAHEAD: Darwin compatible version that use 128KB as the sequential access size. A third argument of zero disables the read-ahead behavior. Please note that the read-ahead amount is also constrainted by sysctl variable, vfs.read_max, which may need to be raised in order to better utilize this feature. Thanks Igor Sysoev for proposing the feature and submitting the original version, and kib@ for his valuable comments. Submitted by: Igor Sysoev <is rambler-co ru> Reviewed by: kib@ MFC after: 1 month	2009-09-28 16:59:47 +00:00
Konstantin Belousov	579b976090	Fix mount reference leak when V_XSLEEP is specified to vn_start_write(). Submitted by: tegge	2009-09-01 12:05:39 +00:00
Konstantin Belousov	a505c2c72c	Make the mnt_writeopcount and mnt_secondary_writes counters, used by the suspension code, not greater then mnt_ref reference counter value. Increment mnt_ref together with write counter in vn_start_write()/ vn_start_secondary_write(), releasing in vn_finished_write/vn_finished_secondary_write(). Since r186197, unmount code requires that no writers occured after all references are expired. We still could get write counter incremented for freed or reused struct mount, but it seems to be innocent, since corresponding vnode should be referenced and reclaimed then. Reported by: pho (last half a year), erwin Reviewed by: attilio Tested by: pho, erwin MFC after: 1 week	2009-08-31 10:20:52 +00:00
Konstantin Belousov	f1eccd05ec	In vn_vget_ino() and their inline equivalents, mnt_ref() the mount point around the sequence that drop vnode lock and then busies the mount point. Not having vlocked node or direct reference to the mp allows for the forced unmount to proceed, making mp unmounted or reused. Tested by: pho Reviewed by: jeff Approved by: re (kensmith) MFC after: 2 weeks	2009-07-02 18:02:55 +00:00
Konstantin Belousov	e0c161b89c	Add another flags argument to vn_open_cred. Use it to specify that some vn_open_cred invocations shall not audit namei path. In particular, specify VN_OPEN_NOAUDIT for dotdot lookup performed by default implementation of vop_vptocnp, and for the open done for core file. vn_fullpath is called from the audit code, and vn_open there need to disable audit to avoid infinite recursion. Core file is created on return to user mode, that, in particular, happens during syscall return. The creation of the core file is audited by direct calls, and we do not want to overwrite audit information for syscall. Reported, reviewed and tested by: rwatson	2009-06-21 13:41:32 +00:00
Paul Saab	27bfb741a0	Simply shared vnode locking and extend it to also include fsync. Also, in vop_write, no longer assert for exclusive locks on the vnode. Reviewed by: jhb, kmacy, jeffr	2009-06-08 21:23:54 +00:00
Robert Watson	bcf11e8d00	Move "options MAC" from opt_mac.h to opt_global.h, as it's now in GENERIC and used in a large number of files, but also because an increasing number of incorrect uses of MAC calls were sneaking in due to copy-and-paste of MAC-aware code without the associated opt_mac.h include. Discussed with: pjd	2009-06-05 14:55:22 +00:00
Paul Saab	983b43c5de	When checking for shared writes, use the struct mount returned from vn_start_write. Reviewed by: jhb	2009-06-04 16:50:03 +00:00
Paul Saab	a6d545d8ed	Support shared vnode locks for write operations when the offset is provided on filesystems that support it. This really improves mysql + innodb performance on ZFS. Reviewed by: jhb, kmacy, jeffr	2009-06-04 16:18:07 +00:00
Attilio Rao	dfd233edd5	Remove the thread argument from the FSD (File-System Dependent) parts of the VFS. Now all the VFS_* functions and relating parts don't want the context as long as it always refers to curthread. In some points, in particular when dealing with VOPs and functions living in the same namespace (eg. vflush) which still need to be converted, pass curthread explicitly in order to retain the old behaviour. Such loose ends will be fixed ASAP. While here fix a bug: now, UFS_EXTATTR can be compiled alone without the UFS_EXTATTR_AUTOSTART option. VFS KPI is heavilly changed by this commit so thirdy parts modules needs to be recompiled. Bump __FreeBSD_version in order to signal such situation.	2009-05-11 15:33:26 +00:00
Konstantin Belousov	41b72e6e50	Eliminate the loop and the call to pause(9) in vfs_vget_ino(). If vfs_busy(MBF_NOWAIT) failed, unlock the vnode and sleep in vfs_busy(). Suggested and reviewed by: jeff Tested by: pho MFC after: 3 weeks	2009-05-07 18:14:21 +00:00
Kip Macy	5e6a926611	- use a shared lock for reads - remove stale comment Reviewed by: jeffr	2009-04-13 23:09:44 +00:00
Robert Watson	885868cd8f	Remove VOP_LEASE and supporting functions. This hasn't been used since the removal of NQNFS, but was left in in case it was required for NFSv4. Since our new NFSv4 client and server can't use it for their requirements, GC the old mechanism, as well as other unused lease- related code and interfaces. Due to its impact on kernel programming and binary interfaces, this change should not be MFC'd. Proposed by: jeff Reviewed by: jeff Discussed with: rmacklem, zach loafman @ isilon	2009-04-10 10:52:19 +00:00
John Baldwin	33fc362512	Add a new internal mount flag (MNTK_EXTENDED_SHARED) to indicate that a filesystem supports additional operations using shared vnode locks. Currently this is used to enable shared locks for open() and close() of read-only file descriptors. - When an ISOPEN namei() request is performed with LOCKSHARED, use a shared vnode lock for the leaf vnode only if the mount point has the extended shared flag set. - Set LOCKSHARED in vn_open_cred() for requests that specify O_RDONLY but not O_CREAT. - Use a shared vnode lock around VOP_CLOSE() if the file was opened with O_RDONLY and the mountpoint has the extended shared flag set. - Adjust md(4) to upgrade the vnode lock on the vnode it gets back from vn_open() since it now may only have a shared vnode lock. - Don't enable shared vnode locks on FIFO vnodes in ZFS and UFS since FIFO's require exclusive vnode locks for their open() and close() routines. (My recent MPSAFE patches for UDF and cd9660 already included this change.) - Enable extended shared operations on UFS, cd9660, and UDF. Submitted by: ups Reviewed by: pjd (ZFS bits) MFC after: 1 month	2009-03-11 14:13:47 +00:00
Konstantin Belousov	e9aff35739	Move the code from ufs_lookup.c used to do dotdot lookup, into the helper function. It is supposed to be useful for any filesystem that has to unlock dvp to walk to the ".." entry in lookup routine. Requested by: jhb Tested by: pho MFC after: 1 month	2009-01-21 14:51:38 +00:00
Pawel Jakub Dawidek	0c6a80e78d	Improve KASSERT() call a bit: - Print flags in hex. - Note that flags can be fine and panic can be due unexpected error condition. - Remove redundant new line character. Eventhough panic message excess 80 characters keep it in one line so it is easier to grep.	2008-11-29 12:40:14 +00:00
Konstantin Belousov	c5f77bf986	Revert r184118. There is actually a code in the kernel, for instance in kern_unlinkat(), that expects that vn_start_write() actually fills the mp even when the call failed. As Tor noted, that pattern relies on the the type stability of the mount points, as well as that suspended mount points are never freed and V_XSLEEP is always passed to vn_start_write() when called on a freed mount point. Reported by: stass Reviewed by: tegge PR: 123768	2008-11-16 21:56:29 +00:00
John Baldwin	21fc02d271	Use shared vnode locks instead of exclusive vnode locks for the access(), chdir(), chroot(), eaccess(), fpathconf(), fstat(), fstatfs(), lseek() (when figuring out the current size of the file in the SEEK_END case), pathconf(), readlink(), and statfs() system calls. Submitted by: ups (mostly) Tested by: pho MFC after: 1 month	2008-11-03 20:31:00 +00:00
Attilio Rao	83b3bdbc8a	Improve VFS locking: - Implement real draining for vfs consumers by not relying on the mnt_lock and using instead a refcount in order to keep track of lock requesters. - Due to the change above, remove the mnt_lock lockmgr because it is now useless. - Due to the change above, vfs_busy() is no more linked to a lockmgr. Change so its KPI by removing the interlock argument and defining 2 new flags for it: MBF_NOWAIT which basically replaces the LK_NOWAIT of the old version (which was unlinked from the lockmgr alredy) and MBF_MNTLSTLOCK which provides the ability to drop the mountlist_mtx once the mnt interlock is held (ability still desired by most consumers). - The stub used into vfs_mount_destroy(), that allows to override the mnt_ref if running for more than 3 seconds, make it totally useless. Remove it as it was thought to work into older versions. If a problem of "refcount held never going away" should appear, we will need to fix properly instead than trust on such hackish solution. - Fix a bug where returning (with an error) from dounmount() was still leaving the MNTK_MWAIT flag on even if it the waiters were actually woken up. Just a place in vfs_mount_destroy() is left because it is going to recycle the structure in any case, so it doesn't matter. - Remove the markercnt refcount as it is useless. This patch modifies VFS ABI and breaks KPI for vfs_busy() so manpages and __FreeBSD_version will be modified accordingly. Discussed with: kib Tested by: pho	2008-11-02 10:15:42 +00:00
Edward Tomasz Napierala	15bc6b2bd8	Introduce accmode_t. This is required for NFSv4 ACLs - it will be neccessary to add more V* constants, and the variables changed by this patch were often being assigned to mode_t variables, which is 16 bit. Approved by: rwatson (mentor)	2008-10-28 13:44:11 +00:00
Konstantin Belousov	3ba28ace77	Change vn_start_write() to clear *mpp on all failures when non-NULL vp is supplied, since vm_pageout_scan() expects it to be cleared on error. Submitted by: tegge PR: 123768 MFC after: 1 week	2008-10-21 09:55:49 +00:00
Konstantin Belousov	016f98f947	Assert that v_holdcnt is non-zero before entering lockmgr in vn_lock and ffs_lock. This cannot catch situations where holdcnt is incremented not by curthread, but I think it is useful. Reviewed by: tegge, attilio Tested by: pho MFC after: 2 weeks	2008-10-20 10:11:33 +00:00
Konstantin Belousov	0fbbf2ea56	Initialize va_rdev to NODEV and va_fsid to VNOVAL before the VOP_GETATTR() call in vn_stat(). Thus if a file system doesn't initialize those fields in VOP_GETATTR() they will have a sane default value. Submitted by: Jaakko Heinonen <jh saunalahti fi> Discussed on: freebsd-fs MFC after: 1 month	2008-09-20 19:48:24 +00:00
Konstantin Belousov	ea60a5f526	Initialize birthtime fields in vn_stat() to prevent stat(2) from returning uninitialized birthtime. Most file systems don't initialize birthtime properly in their VOP_GETTATTR(). Submitted by: Jaakko Heinonen <jh saunalahti fi> Reviewed by: bde Discussed on: freebsd-fs MFC after: 1 month	2008-09-20 19:43:22 +00:00
Konstantin Belousov	2814d5ba5f	When attempt is made to suspend a filesystem that is already syspended, wait until the current suspension is lifted instead of silently returning success immediately. The consequences of calling vfs_write() resume when not owning the suspension are not well-defined at best. Add the vfs_susp_clean() mount method to be called from vfs_write_resume(). Set it to process_deferred_inactive() for ffs, and stop calling it manually. Add the thread flag TDP_IGNSUSP that allows to bypass the suspension point in the vn_start_write. It is intended for use by VFS in the situations where the suspender want to do some i/o requiring calls to vn_start_write(), and this i/o cannot be done later. Reviewed by: tegge In collaboration with: pho MFC after: 1 month	2008-09-16 11:51:06 +00:00
Konstantin Belousov	bdb8094763	Garbage-collect vn_write_suspend_wait(). Suggested and reviewed by: tegge Tested by: pho MFC after: 1 month	2008-09-16 11:09:26 +00:00
Attilio Rao	0359a12ead	Decontextualize the couplet VOP_GETATTR / VOP_SETATTR as the passed thread was always curthread and totally unuseful. Tested by: Giovanni Trematerra <giovanni dot trematerra at gmail dot com>	2008-08-28 15:23:18 +00:00
Robert Watson	1d986c5ff1	Remove broken code to replace st_mode value with ACCESSPERMS when lstat(2) is called on symlinks -- this code appears never to have worked. The PR this addresses suggests that the intended original behavior is the right one, but as bde points out in the PR comments, we do actually support storing a mode on symlinks, so returning it seems reasonable. This is consistent with Mac OS X, which despite documentation to the contrary does return the mode set on a symlink, but not some other platforms. The Single Unix Spec requires only that the returned bits be "meaningful", which seems at best unhelpful as advice goes. PR: 25018 MFC after: 3 days	2008-08-03 15:44:56 +00:00
Konstantin Belousov	e314f69fff	Add the support for the O_EXEC open(2) mode, as specified by the POSIX Extended API Set Part 2 extension specification. Reviewed by: rwatson, rdivacky Tested by: pho	2008-03-31 11:57:18 +00:00
Jeff Roberson	5634d48667	- Don't allow calls to vn_lock() with no lock type requested. Callers which simply want a reference should use vref(). Callers which want to check validity need to hold a lock while performing any action based on that validity. vn_lock() would always release the interlock before returning making any action synchronous with the validity check impossible.	2008-03-29 23:36:26 +00:00
Jeff Roberson	804e60d4cf	- Don't acquire the vnode interlock in _vn_lock() unless no lock type is requested. Handle this case specially before the while loop. - Use the held vnode lock to check for VI_DOOMED. The vnode lock and interlock must both be held to set VI_DOOMED so either one held, even shared, is sufficient to check it. No objection by: kib	2008-03-24 04:17:35 +00:00
Attilio Rao	22db15c06f	VOP_LOCK1() (and so VOP_LOCK()) and VOP_UNLOCK() are only used in conjuction with 'thread' argument passing which is always curthread. Remove the unuseful extra-argument and pass explicitly curthread to lower layer functions, when necessary. KPI results broken by this change, which should affect several ports, so version bumping and manpage update will be further committed. Tested by: kris, pho, Diego Sardina <siarodx at gmail dot com>	2008-01-13 14:44:15 +00:00
Attilio Rao	cb05b60a89	vn_lock() is currently only used with the 'curthread' passed as argument. Remove this argument and pass curthread directly to underlying VOP_LOCK1() VFS method. This modify makes the code cleaner and in particular remove an annoying dependence helping next lockmgr() cleanup. KPI results, obviously, changed. Manpage and FreeBSD_version will be updated through further commits. As a side note, would be valuable to say that next commits will address a similar cleanup about VFS methods, in particular vop_lock1 and vop_unlock. Tested by: Diego Sardina <siarodx at gmail dot com>, Andrea Di Pasquale <whyx dot it at gmail dot com>	2008-01-10 01:10:58 +00:00
John Baldwin	e46502943a	Make ftruncate a 'struct file' operation rather than a vnode operation. This makes it possible to support ftruncate() on non-vnode file types in the future. - 'struct fileops' grows a 'fo_truncate' method to handle an ftruncate() on a given file descriptor. - ftruncate() moves to kern/sys_generic.c and now just fetches a file object and invokes fo_truncate(). - The vnode-specific portions of ftruncate() move to vn_truncate() in vfs_vnops.c which implements fo_truncate() for vnode file types. - Non-vnode file types return EINVAL in their fo_truncate() method. Submitted by: rwatson	2008-01-07 20:05:19 +00:00
Bruce Evans	9283848511	In sequential_heuristic(): - spell 16384 as 16384 and not as BKVASIZE. 16384 is (not quite) just a magic size that works well in practice. BKVASIZE should be MAXBSIZE (65536), but is 16384 because i386's don't have enough kva for it to be MAXBSIZE; 16384 works (not so well) for it for much the same reasons that it works well in the heuristic. - expand and/or add comments about this and other details. - don't explicitly inline this function. - fix some other style bugs.	2008-01-05 08:54:51 +00:00
Jeff Roberson	397c19d175	Remove explicit locking of struct file. - Introduce a finit() which is used to initailize the fields of struct file in such a way that the ops vector is only valid after the data, type, and flags are valid. - Protect f_flag and f_count with atomic operations. - Remove the global list of all files and associated accounting. - Rewrite the unp garbage collection such that it no longer requires the global list of all files and instead uses a list of all unp sockets. - Mark sockets in the accept queue so we don't incorrectly gc them. Tested by: kris, pho	2007-12-30 01:42:15 +00:00
Robert Watson	30d239bc4c	Merge first in a series of TrustedBSD MAC Framework KPI changes from Mac OS X Leopard--rationalize naming for entry points to the following general forms: mac_<object>_<method/action> mac_<object>_check_<method/action> The previous naming scheme was inconsistent and mostly reversed from the new scheme. Also, make object types more consistent and remove spaces from object types that contain multiple parts ("posix_sem" -> "posixsem") to make mechanical parsing easier. Introduce a new "netinet" object type for certain IPv4/IPv6-related methods. Also simplify, slightly, some entry point names. All MAC policy modules will need to be recompiled, and modules not updates as part of this commit will need to be modified to conform to the new KPI. Sponsored by: SPARTA (original patches against Mac OS X) Obtained from: TrustedBSD Project, Apple Computer	2007-10-24 19:04:04 +00:00
Pawel Jakub Dawidek	57fd3d5572	When we do open, we should lock the vnode exclusively. This fixes few races: - fifo race, where two threads assign v_fifoinfo, - v_writecount modifications, - v_object modifications, - and probably more... Discussed with: kib, ups Approved by: re (rwatson)	2007-07-26 16:58:09 +00:00
Konstantin Belousov	9e223287c0	Revert UF_OPENING workaround for CURRENT. Change the VOP_OPEN(), vn_open() vnode operation and d_fdopen() cdev operation argument from being file descriptor index into the pointer to struct file. Proposed and reviewed by: jhb Reviewed by: daichi (unionfs) Approved by: re (kensmith)	2007-05-31 11:51:53 +00:00

1 2 3 4 5 ...

300 Commits