freebsd-skq

Author	SHA1	Message	Date
jeff	8facfb0569	- Use a lockmgr lock rather than a mtx to protect dirhash. This lock may be held for the duration of the various dirhash operations which avoids many complex unlock/lock/revalidate sequences. - Permit shared locks on lookup. To protect the ip->i_dirhash pointer we use the vnode interlock in the shared case. Callers holding the exclusive vnode lock can run without fear of concurrent modification to i_dirhash. - Hold an exclusive dirhash lock when creating the dirhash structure for the first time or when re-creating a dirhash structure which has been recycled. Tested by: kris, pho	2008-04-11 09:48:12 +00:00
jeff	fef79bb2db	- cache dp->i_offset in the local 'i_offset' variable for use in loop indexes so directory lookup becomes shared lock safe. In the modifying cases an exclusive lock is held here so the commit routine may rely on the state of i_offset. - Similarly handle i_diroff by fetching at the start and setting only once the operation is complete. Without the exclusive lock these are only considered hints. - Assert that an exclusive lock is held when we're preparing for a commit routine. - Honor the lock type request from lookup instead of always using exclusive locking. Tested by: pho, kris	2008-04-11 09:44:25 +00:00
pjd	e83dd39a00	Correct function name in panic(). Reported by: kensmith	2008-04-07 18:12:37 +00:00
attilio	07441f19e1	Optimize lockmgr in order to get rid of the pool mutex interlock, of the state transitioning flags and of msleep(9) callings. Use, instead, an algorithm very similar to what sx(9) and rwlock(9) alredy do and direct accesses to the sleepqueue(9) primitive. In order to avoid writer starvation a mechanism very similar to what rwlock(9) uses now is implemented, with the correspective per-thread shared lockmgrs counter. This patch also adds 2 new functions to lockmgr KPI: lockmgr_rw() and lockmgr_args_rw(). These two are like the 2 "normal" versions, but they both accept a rwlock as interlock. In order to realize this, the general lockmgr manager function "__lockmgr_args()" has been implemented through the generic lock layer. It supports all the blocking primitives, but currently only these 2 mappers live. The patch drops the support for WITNESS atm, but it will be probabilly added soon. Also, there is a little race in the draining code which is also present in the current CVS stock implementation: if some sharers, once they wakeup, are in the runqueue they can contend the lock with the exclusive drainer. This is hard to be fixed but the now committed code mitigate this issue a lot better than the (past) CVS version. In addition assertive KA_HELD and KA_UNHELD have been made mute assertions because they are dangerous and they will be nomore supported soon. In order to avoid namespace pollution, stack.h is splitted into two parts: one which includes only the "struct stack" definition (_stack.h) and one defining the KPI. In this way, newly added _lockmgr.h can just include _stack.h. Kernel ABI results heavilly changed by this commit (the now committed version of "struct lock" is a lot smaller than the previous one) and KPI results broken by lockmgr_rw() / lockmgr_args_rw() introduction, so manpages and __FreeBSD_version will be updated accordingly. Tested by: kris, pho, jeff, danger Reviewed by: jeff Sponsored by: Google, Summer of Code program 2007	2008-04-06 20:08:51 +00:00
kib	eff8c6d35e	Add the support for the AT_FDCWD and fd-relative name lookups to the namei(9). Based on the submission by rdivacky, sponsored by Google Summer of Code 2007 Reviewed by: rwatson, rdivacky Tested by: pho	2008-03-31 12:01:21 +00:00
jeff	1968343329	- Since rev 1.142 of ffs_snapshot.c the interlock has not been required to protect the v_lock pointer. Removing the interlock acquisition here allows vn_lock() to proceed without requiring the interlock at all. - If the lock mutated while we were sleeping on it the interlock has been dropped. It is conceivable that the upper layer code was relying on the interlock and LK_NOWAIT to protect the identity or state of the vnode while acquiring the lock. In this case return EBUSY rather than trying the new lock to prevent potential races. Reviewed by: tegge	2008-03-31 07:55:45 +00:00
jeff	5e4f326d87	- Don't free snapdata structures when they are no longer in use. Keeping the lockmgr lock valid allows us to switch the v_lock pointer in snapshot vnodes between the embedded lockmgr lock and snapdata lock without needing the vnode interlock to protect against races - Keep unused snapdata structures in a list. - Add a function to lock the devvp and allocate a snapdata to it or acquire a new one without races. The old function was safe from creation races because we set the mount flag when creating snapshots and thus serializing them. However, it might have been subject to destroying races. Reviewed by: tegge	2008-03-31 07:47:08 +00:00
jhb	20cadd93f0	Fix a nit with the 'nofoo' options where 'foo' is mapped to 'nonofoo' (such as 'atime' vs 'noatime'). The filesystems will always see either 'nofoo' or 'nonofoo', never plain 'foo'. As such, their list of valid mount options should include 'nofoo' instead of 'foo'. With this fix, you can do 'mount -u -o atime' on a FFS filesystem that isn't marked as noatime without getting an error. You can also update a noatime FFS filesystem mounted via mount(2) (e.g. 6.x /sbin/mount binary) to 'atime' using nmount(2) (e.g. 7.x /sbin/mount binary). MFC after: 1 week Reviewed by: crodig	2008-03-26 20:48:07 +00:00
dfr	79d2dfdaa6	Add the new kernel-mode NFS Lock Manager. To use it instead of the user-mode lock manager, build a kernel with the NFSLOCKD option and add '-k' to 'rpc_lockd_flags' in rc.conf. Highlights include: * Thread-safe kernel RPC client - many threads can use the same RPC client handle safely with replies being de-multiplexed at the socket upcall (typically driven directly by the NIC interrupt) and handed off to whichever thread matches the reply. For UDP sockets, many RPC clients can share the same socket. This allows the use of a single privileged UDP port number to talk to an arbitrary number of remote hosts. * Single-threaded kernel RPC server. Adding support for multi-threaded server would be relatively straightforward and would follow approximately the Solaris KPI. A single thread should be sufficient for the NLM since it should rarely block in normal operation. * Kernel mode NLM server supporting cancel requests and granted callbacks. I've tested the NLM server reasonably extensively - it passes both my own tests and the NFS Connectathon locking tests running on Solaris, Mac OS X and Ubuntu Linux. * Userland NLM client supported. While the NLM server doesn't have support for the local NFS client's locking needs, it does have to field async replies and granted callbacks from remote NLMs that the local client has contacted. We relay these replies to the userland rpc.lockd over a local domain RPC socket. * Robust deadlock detection for the local lock manager. In particular it will detect deadlocks caused by a lock request that covers more than one blocking request. As required by the NLM protocol, all deadlock detection happens synchronously - a user is guaranteed that if a lock request isn't rejected immediately, the lock will eventually be granted. The old system allowed for a 'deferred deadlock' condition where a blocked lock request could wake up and find that some other deadlock-causing lock owner had beaten them to the lock. * Since both local and remote locks are managed by the same kernel locking code, local and remote processes can safely use file locks for mutual exclusion. Local processes have no fairness advantage compared to remote processes when contending to lock a region that has just been unlocked - the local lock manager enforces a strict first-come first-served model for both local and remote lockers. Sponsored by: Isilon Systems PR: 95247 107555 115524 116679 MFC after: 2 weeks	2008-03-26 15:23:12 +00:00
kib	5ddf5664cc	Yield the cpu in the kernel while iterating the list of the vnodes belonging to the mountpoint. Also, yield when in the softdep_process_worklist() even when we are not going to sleep due to buffer drain. It is believed that the ULE fixed the problem [1], but the yielding seems to be needed at least for the 4BSD case. Discussed: on stable@, with bde Reviewed by: tegge, jeff [1] MFC after: 2 weeks	2008-03-23 13:45:24 +00:00
jeff	a9d123c3ab	- Complete part of the unfinished bufobj work by consistently using BO_LOCK/UNLOCK/MTX when manipulating the bufobj. - Create a new lock in the bufobj to lock bufobj fields independently. This leaves the vnode interlock as an 'identity' lock while the bufobj is an io lock. The bufobj lock is ordered before the vnode interlock and also before the mnt ilock. - Exploit this new lock order to simplify softdep_check_suspend(). - A few sync related functions are marked with a new XXX to note that we may not properly interlock against a non-zero bv_cnt when attempting to sync all vnodes on a mountlist. I do not believe this race is important. If I'm wrong this will make these locations easier to find. Reviewed by: kib (earlier diff) Tested by: kris, pho (earlier diff)	2008-03-22 09:15:16 +00:00
kib	04661caa35	Reduce the acquisition of the vnode interlock in the ffs_read() and ffs_extread() when setting the IN_ACCESS flag by checking whether the IN_ACCESS is already set. The possible race there is admissible. Tested by: pho Submitted by: jeff	2008-03-21 12:33:00 +00:00
jeff	46f09d5bc3	- Relax requirements for p_numthreads, p_threads, p_swtick, and p_nice from requiring the per-process spinlock to only requiring the process lock. - Reflect these changes in the proc.h documentation and consumers throughout the kernel. This is a substantial reduction in locking cost for these fields and was made possible by recent changes to threading support.	2008-03-19 06:19:01 +00:00
rwatson	877d7c65ba	In keeping with style(9)'s recommendations on macros, use a ';' after each SYSINIT() macro invocation. This makes a number of lightweight C parsers much happier with the FreeBSD kernel source, including cflow's prcc and lxr. MFC after: 1 month Discussed with: imp, rink	2008-03-16 10:58:09 +00:00
cokane	1dbd762cae	Replace the non-MPSAFE timeout(9) API in ffs_softdep.c with the MPSAFE callout_* API (e.g. callout_init_mtx(9)). This was one of the numerous items on the http://wiki.freebsd.org/SMPTODO list. Reviewed by: imp, obrien, jhb MFC after: 1 week	2008-03-13 20:15:48 +00:00
emaste	747c71e58b	Remove include of opt_quota.h; as of revision 1.205 there is no longer any #ifdef QUOTA conditional code.	2008-03-10 18:44:07 +00:00
kib	e378ea9934	Initialize mnt_stat.f_iosize before autostarting UFS1 extattrs. It is normally initialized by ffs_statfs() after ffs_mount finished. The extattr autostart code calls the ufs_lookup(), that uses value above to iterate over the directory blocks, see bmask initialization in the ufs_lookup() and ufsdirhash. Having the filesystem with root directory spanning more then one block would result in reading a random kernel memory. PR: kern/120781 Test case provided by: rwatson MFC after: 1 week	2008-03-05 16:34:03 +00:00
rwatson	48aef5a3f0	Continue on-going campaign to replace lockmgr locks with sx locks where the specific semantics of ockmgr aren't required: update UFS1 extended attributes to protect its data structures using an sx lock. While here, update comments on lock granularity. MFC after: 2 weeks	2008-03-04 12:50:11 +00:00
rwatson	96767d9090	Move setting of MNTK_MPSAFE flag before UFS1 extended attribute auto-start so that the flag is set before we start performing I/O in the auto-start routine. MFC after: 2 weeks Suggested by: kib	2008-03-04 12:10:03 +00:00
rwatson	ba77105862	Don't auto-start or allow extattrctl for UFS2 file systems, as UFS2 has native extended attributes. This didn't interfere with the operation of UFS2 extended attributes, but the code shouldn't be running for UFS2. MFC after: 2 weeks	2008-03-02 22:52:14 +00:00
keramida	41eb7744b1	Minor typo nit.	2008-02-25 19:31:44 +00:00
attilio	4014b55830	Axe the 'thread' argument from VOP_ISLOCKED() and lockstatus() as it is always curthread. As KPI gets broken by this patch, manpages and __FreeBSD_version will be updated by further commits. Tested by: Andrea Barberio <insomniac at slackware dot it>	2008-02-25 18:45:57 +00:00
attilio	0d54671a48	Introduce some functions in the vnode locks namespace and in the ffs namespace in order to handle lockmgr fields in a controlled way instead than spreading all around bogus stubs: - VN_LOCK_AREC() allows lock recursion for a specified vnode - VN_LOCK_ASHARE() allows lock sharing for a specified vnode In FFS land: - BUF_AREC() allows lock recursion for a specified buffer lock - BUF_NOREC() disallows recursion for a specified buffer lock Side note: union_subr.c::unionfs_node_update() is the only other function directly handling lockmgr fields. As this is not simple to fix, it has been left behind as "sole" exception.	2008-02-24 16:38:58 +00:00
attilio	265cb5fb91	- Introduce lockmgr_args() in the lockmgr space. This function performs the same operation of lockmgr() but accepting a custom wmesg, prio and timo for the particular lock instance, overriding default values lkp->lk_wmesg, lkp->lk_prio and lkp->lk_timo. - Use lockmgr_args() in order to implement BUF_TIMELOCK() - Cleanup BUF_LOCK() - Remove LK_INTERNAL as it is nomore used in the lockmgr namespace Tested by: Andrea Barberio <insomniac at slackware dot it>	2008-02-15 21:04:36 +00:00
attilio	7213f4c32b	Cleanup lockmgr interface and exported KPI: - Remove the "thread" argument from the lockmgr() function as it is always curthread now - Axe lockcount() function as it is no longer used - Axe LOCKMGR_ASSERT() as it is bogus really and no currently used. Hopefully this will be soonly replaced by something suitable for it. - Remove the prototype for dumplockinfo() as the function is no longer present Addictionally: - Introduce a KASSERT() in lockstatus() in order to let it accept only curthread or NULL as they should only be passed - Do a little bit of style(9) cleanup on lockmgr.h KPI results heavilly broken by this change, so manpages and FreeBSD_version will be modified accordingly by further commits. Tested by: matteo	2008-01-24 12:34:30 +00:00
attilio	caa2ca048b	- Introduce the function lockmgr_recursed() which returns true if the lockmgr lkp, when held in exclusive mode, is recursed - Introduce the function BUF_RECURSED() which does the same for bufobj locks based on the top of lockmgr_recursed() - Introduce the function BUF_ISLOCKED() which works like the counterpart VOP_ISLOCKED(9), showing the state of lockmgr linked with the bufobj BUF_RECURSED() and BUF_ISLOCKED() entirely replace the usage of bogus BUF_REFCNT() in a more explicative and SMP-compliant way. This allows us to axe out BUF_REFCNT() and leaving the function lockcount() totally unused in our stock kernel. Further commits will axe lockcount() as well as part of lockmgr() cleanup. KPI results, obviously, broken so further commits will update manpages and freebsd version. Tested by: kris (on UFS and NFS)	2008-01-19 17:36:23 +00:00
attilio	71b7824213	VOP_LOCK1() (and so VOP_LOCK()) and VOP_UNLOCK() are only used in conjuction with 'thread' argument passing which is always curthread. Remove the unuseful extra-argument and pass explicitly curthread to lower layer functions, when necessary. KPI results broken by this change, which should affect several ports, so version bumping and manpage update will be further committed. Tested by: kris, pho, Diego Sardina <siarodx at gmail dot com>	2008-01-13 14:44:15 +00:00
attilio	18d0a0dd51	vn_lock() is currently only used with the 'curthread' passed as argument. Remove this argument and pass curthread directly to underlying VOP_LOCK1() VFS method. This modify makes the code cleaner and in particular remove an annoying dependence helping next lockmgr() cleanup. KPI results, obviously, changed. Manpage and FreeBSD_version will be updated through further commits. As a side note, would be valuable to say that next commits will address a similar cleanup about VFS methods, in particular vop_lock1 and vop_unlock. Tested by: Diego Sardina <siarodx at gmail dot com>, Andrea Di Pasquale <whyx dot it at gmail dot com>	2008-01-10 01:10:58 +00:00
kib	149cd5b092	ffs_balloc_ufsX() routines, in the case of recovering from the failed allocation, free the indirect blocks before clearing the disk pointers, that could lead to the softupdate inconsistencies in the case of the machine or disk crash at the wrong time. Rearrange the recover code to do the ffs_blkfree() after the second ffs_syncvnode(), that clears the pointers chain. Proposed and reviewed by: tegge Tested by: Peter Holm MFC after: 3 weeks	2008-01-03 12:28:57 +00:00
obrien	51625bf288	style(9)	2008-01-02 01:19:17 +00:00
kib	c9fb56ffc8	The ffs_balloc() routines, whan allocating the indirect blocks for the inode, do the rollback in case the allocation failed (due to insufficient free space or quota limits). But, the code does leaves the buffers corresponding to the inoirect blocks on the vnode bufobj list. This causes several assertion failures (for instance, "ffs_truncate3" in ffs_truncate()) to fail, and could result in the indirect block aliasing problem, like writing the context of such blocks to random disk location. Remove the buffers from the bufobj properly. Reported and tested by: Peter Holm Reviewed by: tegge MFC after: 3 weeks	2007-12-29 13:31:27 +00:00
kensmith	66cb6fd44f	Fix a broken check that recently became more annoying because it now gets enabled when INVARIANTS is on instead of DIAGNOSTIC (which apparently nobody uses). From Tor's description: This happens when the block range spans two block maps, the first in the inode (mapping up to NDADDR direct blocks) and the second being the first indirect block. The current check assumes that both block maps are indirect blocks. Work done by: tegge Tested by: kris, kensmith	2007-12-01 13:12:43 +00:00
ru	1f9b2f54c8	Fix build without INVARIANTS and update a comment to match a change made in previous revision.	2007-11-09 11:04:36 +00:00
obrien	1ae16b4e64	Turn most ffs 'DIAGNOSTIC's into INVARIANTS.	2007-11-08 17:21:51 +00:00
rwatson	60570a92bf	Merge first in a series of TrustedBSD MAC Framework KPI changes from Mac OS X Leopard--rationalize naming for entry points to the following general forms: mac_<object>_<method/action> mac_<object>_check_<method/action> The previous naming scheme was inconsistent and mostly reversed from the new scheme. Also, make object types more consistent and remove spaces from object types that contain multiple parts ("posix_sem" -> "posixsem") to make mechanical parsing easier. Introduce a new "netinet" object type for certain IPv4/IPv6-related methods. Also simplify, slightly, some entry point names. All MAC policy modules will need to be recompiled, and modules not updates as part of this commit will need to be modified to conform to the new KPI. Sponsored by: SPARTA (original patches against Mac OS X) Obtained from: TrustedBSD Project, Apple Computer	2007-10-24 19:04:04 +00:00
julian	51d643caa6	Rename the kthread_xxx (e.g. kthread_create()) calls to kproc_xxx as they actually make whole processes. Thos makes way for us to add REAL kthread_create() and friends that actually make theads. it turns out that most of these calls actually end up being moved back to the thread version when it's added. but we need to make this cosmetic change first. I'd LOVE to do this rename in 7.0 so that we can eventually MFC the new kthread_xxx() calls.	2007-10-20 23:23:23 +00:00
alfred	3a60df401c	Get rid of qaddr_t. Requested by: bde	2007-10-16 10:54:55 +00:00
bz	621c3a5b99	Fix a DIV0 in case a large value for fs_avgfilesize or fs_avgfpdir is given (with newfs or tunefs) and dirsize overflows. In case dirsize is <= 0 because of an overflow set maxcontigdirs to 0 so it will be 1 later. This is what would happen for large fs_avgfilesize. [1] Identified with help from: roberto, pjd Submitted by: pjd [1] Approved by: re (rwatson) MFC after: 8 days	2007-09-10 14:12:29 +00:00
rodrigc	85690ea861	Perform range check before allocating memory when reading extended attributes. Reviewed by: kib Approved by: re (hrs) PR: 114389	2007-07-13 18:51:08 +00:00
peter	5c6fefddf1	Fix an annoying pointer/int cast warning that shows up on 64 bit systems. Approved by: re	2007-07-02 01:31:43 +00:00
kib	2f486f25b6	Fix livelock that could occur when snapshoting UFS with quotas, where some quota limit was exceeded. Sequence of UFS_VALLOC()/UFS_VFREE() call there could cause inodeblock to have both freefile and inodedep dependencies without any inode in the block being marked for write. Then, softdep_check_suspend() would return EAGAIN forewer. Force write of inodeblock with allocated freefile softdependency by setting IN_MODIFIED flag in softdep_freefile and unconditionally calling UFS_UPDATE() in ufs_reclaim. Reported by: kris Debug help and tested by: Peter Holm Approved by: re (kensmith) MFC after: 3 weeks	2007-06-22 13:22:37 +00:00
rwatson	00b02345d4	Eliminate now-unused SUSER_ALLOWJAIL arguments to priv_check_cred(); in some cases, move to priv_check() if it was an operation on a thread and no other flags were present. Eliminate caller-side jail exception checking (also now-unused); jail privilege exception code now goes solely in kern_jail.c. We can't yet eliminate suser() due to some cases in the KAME code where a privilege check is performed and then used in many different deferred paths. Do, however, move those prototypes to priv.h. Reviewed by: csjp Obtained from: TrustedBSD Project	2007-06-12 00:12:01 +00:00
jeff	91d1501790	Commit 14/14 of sched_lock decomposition. - Use thread_lock() rather than sched_lock for per-thread scheduling sychronization. - Use the per-process spinlock rather than the sched_lock for per-process scheduling synchronization. Tested by: kris, current@ Tested on: i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc. Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)	2007-06-05 00:00:57 +00:00
kib	17260ba6f1	Change the VOP_OPEN(), vn_open() vnode operation and d_fdopen() cdev operation argument from being file descriptor index into the pointer to struct file: part 2. Convert calls missed in the first big commit. Noted by: rwatson Pointy hat to: kib	2007-06-01 14:33:11 +00:00
jeff	a7a8bac81f	- Move rusage from being per-process in struct pstats to per-thread in td_ru. This removes the requirement for per-process synchronization in statclock() and mi_switch(). This was previously supported by sched_lock which is going away. All modifications to rusage are now done in the context of the owning thread. reads proceed without locks. - Aggregate exiting threads rusage in thread_exit() such that the exiting thread's rusage is not lost. - Provide a new routine, rufetch() to fetch an aggregate of all rusage structures from all threads in a process. This routine must be used in any place requiring a rusage from a process prior to it's exit. The exited process's rusage is still available via p_ru. - Aggregate tick statistics only on demand via rufetch() or when a thread exits. Tick statistics are kept in the thread and protected by sched_lock until it exits. Initial patch by: attilio Reviewed by: attilio, bde (some objections), arch (mostly silent)	2007-06-01 01:12:45 +00:00
kib	f13486a222	Revert UF_OPENING workaround for CURRENT. Change the VOP_OPEN(), vn_open() vnode operation and d_fdopen() cdev operation argument from being file descriptor index into the pointer to struct file. Proposed and reviewed by: jhb Reviewed by: daichi (unionfs) Approved by: re (kensmith)	2007-05-31 11:51:53 +00:00
pjd	ee9e45ce7d	- Remove unnecessary vnode internal locking - v_vflag is protect by vnode's lock (not vnode's interlock). - Simplify code a bit.	2007-05-28 00:28:15 +00:00
pjd	1e693d6e52	Eliminate VI_LOCK()/VI_UNLOCK() pair from getattr and close code paths. It's hard to measure performance improvement on my test machine, but the change won't degrade performance for sure. I can measure slight improvement for debugging kernel and it can also be a win for machines where atomic operation is more expensive. Reviewed by: kib	2007-05-23 11:06:09 +00:00
kib	162fa8dc6d	Since renaming of vop_lock to _vop_lock, pre- and post-condition function calls are no more generated for vop_lock. Rename _vop_lock to vop_lock1 to satisfy tools/vnode_if.awk assumption about vop naming conventions. This restores pre/post-condition calls.	2007-05-18 13:02:13 +00:00
thompsa	25f570bc48	Add a newline to the printf message.	2007-05-03 22:39:52 +00:00

1 2 3 4 5 ...

1525 Commits