Commit Graph

91 Commits

Author SHA1 Message Date
dfr
79d2dfdaa6 Add the new kernel-mode NFS Lock Manager. To use it instead of the
user-mode lock manager, build a kernel with the NFSLOCKD option and
add '-k' to 'rpc_lockd_flags' in rc.conf.

Highlights include:

* Thread-safe kernel RPC client - many threads can use the same RPC
  client handle safely with replies being de-multiplexed at the socket
  upcall (typically driven directly by the NIC interrupt) and handed
  off to whichever thread matches the reply. For UDP sockets, many RPC
  clients can share the same socket. This allows the use of a single
  privileged UDP port number to talk to an arbitrary number of remote
  hosts.

* Single-threaded kernel RPC server. Adding support for multi-threaded
  server would be relatively straightforward and would follow
  approximately the Solaris KPI. A single thread should be sufficient
  for the NLM since it should rarely block in normal operation.

* Kernel mode NLM server supporting cancel requests and granted
  callbacks. I've tested the NLM server reasonably extensively - it
  passes both my own tests and the NFS Connectathon locking tests
  running on Solaris, Mac OS X and Ubuntu Linux.

* Userland NLM client supported. While the NLM server doesn't have
  support for the local NFS client's locking needs, it does have to
  field async replies and granted callbacks from remote NLMs that the
  local client has contacted. We relay these replies to the userland
  rpc.lockd over a local domain RPC socket.

* Robust deadlock detection for the local lock manager. In particular
  it will detect deadlocks caused by a lock request that covers more
  than one blocking request. As required by the NLM protocol, all
  deadlock detection happens synchronously - a user is guaranteed that
  if a lock request isn't rejected immediately, the lock will
  eventually be granted. The old system allowed for a 'deferred
  deadlock' condition where a blocked lock request could wake up and
  find that some other deadlock-causing lock owner had beaten them to
  the lock.

* Since both local and remote locks are managed by the same kernel
  locking code, local and remote processes can safely use file locks
  for mutual exclusion. Local processes have no fairness advantage
  compared to remote processes when contending to lock a region that
  has just been unlocked - the local lock manager enforces a strict
  first-come first-served model for both local and remote lockers.

Sponsored by:	Isilon Systems
PR:		95247 107555 115524 116679
MFC after:	2 weeks
2008-03-26 15:23:12 +00:00
kib
8b513f42f1 Do not assert any locks for VOP_PRINT. In particular, do not assert that
the vnode interlock is not held. vn_printf() already correctly handles
locked and unlocked vnode interlocks, and all the in-tree vop_print
methods are interlock-agnostic.

Some code calls vprintf() with the vnode interlock held, that causes
unjustified panics with INVARIANTS (ffs_syncvnode() as example).

Reported by:	Peter Holm
2008-02-26 12:16:35 +00:00
attilio
4014b55830 Axe the 'thread' argument from VOP_ISLOCKED() and lockstatus() as it is
always curthread.

As KPI gets broken by this patch, manpages and __FreeBSD_version will be
updated by further commits.

Tested by:	Andrea Barberio <insomniac at slackware dot it>
2008-02-25 18:45:57 +00:00
attilio
71b7824213 VOP_LOCK1() (and so VOP_LOCK()) and VOP_UNLOCK() are only used in
conjuction with 'thread' argument passing which is always curthread.
Remove the unuseful extra-argument and pass explicitly curthread to lower
layer functions, when necessary.

KPI results broken by this change, which should affect several ports, so
version bumping and manpage update will be further committed.

Tested by: kris, pho, Diego Sardina <siarodx at gmail dot com>
2008-01-13 14:44:15 +00:00
kib
f13486a222 Revert UF_OPENING workaround for CURRENT.
Change the VOP_OPEN(), vn_open() vnode operation and d_fdopen() cdev operation
argument from being file descriptor index into the pointer to struct file.

Proposed and reviewed by:	jhb
Reviewed by:	daichi (unionfs)
Approved by:	re (kensmith)
2007-05-31 11:51:53 +00:00
kib
162fa8dc6d Since renaming of vop_lock to _vop_lock, pre- and post-condition
function calls are no more generated for vop_lock.
Rename _vop_lock to vop_lock1 to satisfy tools/vnode_if.awk assumption
about vop naming conventions. This restores pre/post-condition calls.
2007-05-18 13:02:13 +00:00
pjd
cb2d7c85a8 Move vnode-to-file-handle translation from vfs_vptofh to vop_vptofh method.
This way we may support multiple structures in v_data vnode field within
one file system without using black magic.

Vnode-to-file-handle should be VOP in the first place, but was made VFS
operation to keep interface as compatible as possible with SUN's VFS.
BTW. Now Solaris also implements vnode-to-file-handle as VOP operation.

VFS_VPTOFH() was left for API backward compatibility, but is marked for
removal before 8.0-RELEASE.

Approved by:	mckusick
Discussed with:	many (on IRC)
Tested with:	ufs, msdosfs, cd9660, nullfs and zfs
2007-02-15 22:08:35 +00:00
kmacy
0c00ea16db change vop_lock handling to allowing tracking of callers' file and line for
acquisition of lockmgr locks

Approved by: scottl (standing in for mentor rwatson)
2006-11-13 05:51:22 +00:00
dds
840cb44e85 Remove two locking assertion entries that:
a) were incorrectly written and therefore never compiled into
assertions, and
b) were incorrectly specified and when compiled resulted in a
failed assertion.
2006-05-31 14:06:06 +00:00
dds
27cc870a3f Assertion code specifications are introduced using special character
sequences that are distinct from comments. %% is used for argument
locks; %! for pre- and post-conditions.
2006-05-30 20:49:54 +00:00
dds
1694caa758 Remove incorrect lock validation specifications that caused
failed assertions with DEBUG_VFS_LOCKS.
We should reinstate them with correct specifications, possibly
after extendng vnode_if.awk

Noted by: truckman@
2006-05-30 20:21:51 +00:00
dds
ac5d85085e Add missing % signs in the lock annotations of the functions:
lookup, rename, strategy, islocked
The missing % sign meant that the lines were processed as plain
comments and the corresponding assertions were never generated.
2006-05-28 07:24:12 +00:00
des
5d3c44687b Eradicate caddr_t from the VFS API. 2005-12-14 00:49:52 +00:00
ssouhlal
0835f7b4a9 Allow EVFILT_VNODE events to work on every filesystem type, not just
UFS by:
- Making the pre and post hooks for the VOP functions work even when
DEBUG_VFS_LOCKS is not defined.
- Moving the KNOTE activations into the corresponding VOP hooks.
- Creating a MNTK_NOKNOTE flag for the mnt_kern_flag field of struct
mount that permits filesystems to disable the new behavior.
- Creating a default VOP_KQFILTER function: vfs_kqfilter()

My benchmarks have not revealed any performance degradation.

Reviewed by:	jeff, bde
Approved by:	rwatson, jmg (kqueue changes), grehan (mentor)
2005-06-09 20:20:31 +00:00
jeff
a6b9b8b072 - Mark the VOPs that require exclusive locks. Those that aren't marked
with E may be called with a shared lock held.  This list really could
   be made per filesystem if we had any filesystems which differed from
   ffs in locking guarantees.  VFS itself is not sensitive to this except
   where vgone() etc. are concerned.

Sponsored by:	Isilon Systems, Inc.
2005-04-11 15:19:29 +00:00
jeff
175891c4ff - CLOSE, REVOKE, INACTIVE, and RECLAIM are not L L L, that's a locked vnode
on enter, exit, error.  This allows for the removal of the XLOCK.

Sponsored by:	Isilon Systems, Inc.
2005-03-13 11:42:16 +00:00
phk
e0b8a475a8 VOP_DESTROYVOBJECT() is no more. 2005-02-07 09:26:58 +00:00
phk
32b3eaa1c2 Take VOP_GETVOBJECT() out to pasture. We use the direct pointer now. 2005-01-25 00:42:16 +00:00
phk
d0bbbd0881 Kill VOP_CREATEVOBJECT(), it is now the responsibility of the filesystem
for a given vnode to create a vnode_pager object if one is needed.
2005-01-25 00:12:24 +00:00
phk
da2718f1af Remove the unused credential argument from VOP_FSYNC() and VFS_SYNC().
I'm not sure why a credential was added to these in the first place, it is
not used anywhere and it doesn't make much sense:

	The credentials for syncing a file (ability to write to the
	file) should be checked at the system call level.

	Credentials for syncing one or more filesystems ("none")
	should be checked at the system call level as well.

	If the filesystem implementation needs a particular credential
	to carry out the syncing it would logically have to the
	cached mount credential, or a credential cached along with
	any delayed write data.

Discussed with:	rwatson
2005-01-11 07:36:22 +00:00
imp
20280f1431 /* -> /*- for copyright notices, minor format tweaks as necessary 2005-01-06 23:35:40 +00:00
phk
d8b3df3cb9 Make VOP_BMAP return a struct bufobj for the underlying storage device
instead of a vnode for it.

The vnode_pager does not and should not have any interest in what
the filesystem uses for backend.

(vfs_cluster doesn't use the backing store argument.)
2004-11-15 09:18:27 +00:00
phk
861b10d6de Remove VOP_SPECSTRATEGY() from the system. 2004-10-29 10:59:28 +00:00
phk
2806321da1 Remove the buffercache/vnode side of BIO_DELETE processing in
preparation for integration of p4::phk_bufwork.  In the future,
local filesystems will talk to GEOM directly and they will consequently
be able to issue BIO_DELETE directly.  Since the removal of the fla
driver, BIO_DELETE has effectively been a no-op anyway.
2004-09-13 06:50:42 +00:00
imp
74cf37bd00 Remove advertising clause from University of California Regent's license,
per letter dated July 22, 1999.

Approved by: core
2004-04-05 21:03:37 +00:00
rwatson
d2f7ae9f88 Rename VOP_RMEXTATTR() to VOP_DELETEEXTATTR() for consistency with the
kernel ACL interfaces and system call names.

Break out UFS2 and FFS extattr delete and list vnode operations from
setextattr and getextattr to deleteextattr and listextattr, which
cleans up the implementations, and makes the results more readable,
and makes the APIs more clear.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories
2003-07-28 18:53:29 +00:00
phk
b40be37a17 Call the new argument "fdidx" that is more precise than "fd". 2003-07-27 17:03:20 +00:00
phk
6221ef9078 Add a "int fd" argument to VOP_OPEN() which in the future will
contain the filedescriptor number on opens from userland.

The index is used rather than a "struct file *" since it conveys a bit
more information, which may be useful to in particular fdescfs and /dev/fd/*

For now pass -1 all over the place.
2003-07-26 07:32:23 +00:00
rwatson
eb83dc490d Expose vop_rmextattr as an explicit operation at the vnode operation
interface, rather than relying on a NULL uio for the deletion
operation.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories
2003-06-22 22:45:24 +00:00
se
34750b2905 Add comment about **vpp being special-cased in vnode_if.awk (1.38) 2003-06-20 12:24:06 +00:00
rwatson
9c43f2f46e Add vop_listextattr(), similar to vop_getextattr() but without a
specific attribute name.  It will have the same semantics as the
older vop_getextattr() "retrieve the names" hack, returning
a buffer with ASCII nul-seperated names.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories
2003-06-05 05:53:35 +00:00
phk
131885aa2f Temporarily introduce a new VOP_SPECSTRATEGY operation while I try
to sort out disk-io from file-io in the vm/buffer/filesystem space.

The intent is to sort VOP_STRATEGY calls into those which operate
on "real" vnodes and those which operate on VCHR vnodes.  For
the latter kind, the call will be changed to VOP_SPECSTRATEGY,
possibly conditionally for those places where dual-use happens.

Add a default VOP_SPECSTRATEGY method which will call the normal
VOP_STRATEGY.  First time it is called it will print debugging
information.  This will only happen if a normal vnode is passed
to VOP_SPECSTRATEGY by mistake.

Add a real VOP_SPECSTRATEGY in specfs, which does what VOP_STRATEGY
does on a VCHR vnode today.

Add a new VOP_STRATEGY method in specfs to catch instances where
the conversion to VOP_SPECSTRATEGY has not yet happened.  Handle
the request just like we always did, but first time called print
debugging information.

Apart up to two instances of console messages per boot, this amounts
to a glorified no-op commit.

If you get any of the messages on your console I would very much
like a copy of them mailed to phk@freebsd.org
2003-01-04 22:10:36 +00:00
rwatson
d52d1ebbeb Flush vop_refreshlabel() definition, since it is no longer used.
Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories
2002-12-24 19:47:13 +00:00
jeff
3a58c4d63e - We don't need any automated lock checking for vop_islocked. 2002-09-26 00:31:16 +00:00
truckman
f280782003 VOP_FSYNC() requires that it's vnode argument be locked, which nfs_link()
wasn't doing.  Rather than just lock and unlock the vnode around the call
to VOP_FSYNC(), implement rwatson's suggestion to lock the file vnode
in kern_link() before calling VOP_LINK(), since the other filesystems
also locked the file vnode right away in their link methods.  Remove the
locking and and unlocking from the leaf filesystem link methods.

Reviewed by:	rwatson, bde  (except for the unionfs_link() changes)
2002-09-19 13:32:45 +00:00
phk
aa2987768b Introduce the VOP_OPENEXTATTR() and VOP_CLOSEEXTATTR() methods.
Together these two implement a simple transcation style grouping for
modifications of extended attributes on a vnode.

VOP_CLOSEEXTATTR() takes a boolean "commit" argument, which determines
if the aggregate changes are attempted written or not.  A commit will
fail if any of the VOP_SETEXTATTR() calls since the VOP_OPENEXTATTR()
have failed to meet their objective or if the flush to disk fails.

The default operations for these two VOP's is to return EOPNOTSUPP.

This API may still be subject to change.

Sponsored by:   DARPA & NAI Labs
2002-09-05 20:56:14 +00:00
jeff
2fc7835d26 - Add two new debugging macros: ASSERT_VI_LOCKED and ASSERT_VI_UNLOCKED
- Use the new VI asserts in place of the old mtx_assert checks.
 - Add the VI asserts to the automated lock checking in the VOP calls.  The
   interlock should not be held across vops with a few exceptions.
 - Add the vop_(un)lock_{pre,post} functions to assert that interlock is held
   when LK_INTERLOCK is set.
2002-08-21 06:19:29 +00:00
rwatson
4cbda96096 Begin committing support for Mandatory Access Control and extensible
kernel access control.  The MAC framework permits loadable kernel
modules to link to the kernel at compile-time, boot-time, or run-time,
and augment the system security policy.  This commit includes the
initial kernel implementation, although the interface with the userland
components of the operating system is still under work, and not all
kernel subsystems are supported.  Later in this commit sequence,
documentation of which kernel subsystems will not work correctly with
a kernel compiled with MAC support will be added.

Introduce two node vnode operations required to support MAC.  First,
VOP_REFRESHLABEL(), which will be invoked by callers requiring that
vp->v_label be sufficiently "fresh" for access control purposes.
Second, VOP_SETLABEL(), which be invoked by callers requiring that
the passed label contents be updated.  The file system is responsible
for updating v_label if appropriate in coordination with the MAC
framework, as well as committing to disk.  File systems that are
not MAC-aware need not implement these VOPs, as the MAC framework
will default to maintaining a single label for all vnodes based
on the label on the file system mount point.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-07-30 22:15:09 +00:00
jeff
1a20b66849 - Acknowledge recursive vnode locks in the vop_unlock specification. The
vnode may not be unlocked even if the operation succeeded.
2002-07-30 08:50:52 +00:00
jeff
94c140555c - Use the new vop_lookup_{pre,post} instead of simpler locking specification. 2002-07-09 19:55:06 +00:00
jeff
94ba2259e2 - Require locks for getattr. At some point this could only require shared
locks.
2002-07-07 22:37:45 +00:00
jeff
68e0198612 - Disable original vop_strategy lock specification.
- Switch to the new vop_strategy_pre for lock validation.

VOP_STRATEGY requires only that the buf is locked UNLESS the block numbers need
to be translated.  There may be other reasons, but as long as the underlying
layer uses a VOP to perform the operations they will be caught later.
2002-07-06 05:23:17 +00:00
jeff
6060f5331d Use the new #! directive for vop_rename. Leave the old lock specification
intact but disabled.
2002-07-06 04:41:27 +00:00
phk
8536ea3cdb Make daddr_t and u_daddr_t 64bits wide.
Retire daddr64_t and use daddr_t instead.

Sponsored by:	DARPA & NAI Labs.
2002-05-14 11:09:43 +00:00
mckusick
e929f2e4f0 Introduce the new 64-bit size disk block, daddr64_t. Change
the bio and buffer structures to have daddr64_t bio_pblkno,
b_blkno, and b_lblkno fields which allows access to disks
larger than a Terabyte in size. This change also requires
that the VOP_BMAP vnode operation accept and return daddr64_t
blocks. This delta should not affect system operation in
any way. It merely sets up the necessary interfaces to allow
the development of disk drivers that work with these larger
disk block addresses. It also allows for the development of
UFS2 which will use 64-bit block addresses.
2002-03-15 18:49:47 +00:00
rwatson
fc479aab62 Per discussion at BSDCon, note that the vop_getattr locking protocol
should require a shared lock, rather than an exclusive lock, which can
improve performance.  No actual code change here, since a number of
VFS locking fixes are in the works.
2002-02-18 00:22:57 +00:00
rwatson
3accefe9ca Add a comment indicating that the locking protocol should be updated
to be 'L L L' for vop_getattr().  Don't update it yet, because there
are still many offenders.
2002-02-10 21:46:16 +00:00
rwatson
6ca91a055c Part I: Update extended attribute API and ABI:
o Modify the system call syntax for extattr_{get,set}_{fd,file}() so
  as not to use the scatter gather API (which appeared not to be used
  by any consumers, and be less portable), rather, accepts 'data'
  and 'nbytes' in the style of other simple read/write interfaces.
  This changes the API and ABI.

o Modify system call semantics so that extattr_get_{fd,file}() return
  a size_t.  When performing a read, the number of bytes read will
  be returned, unless the data pointer is NULL, in which case the
  number of bytes of data are returned.  This changes the API only.

o Modify the VOP_GETEXTATTR() vnode operation to accept a *size_t
  argument so as to return the size, if desirable.  If set to NULL,
  the size will not be returned.

o Update various filesystems (pseodofs, ufs) to DTRT.

These changes should make extended attributes more useful and more
portable.  More commits to rebuild the system call files, as well
as update userland utilities to follow.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-02-10 04:43:22 +00:00
julian
5596676e6c KSE Milestone 2
Note ALL MODULES MUST BE RECOMPILED
make the kernel aware that there are smaller units of scheduling than the
process. (but only allow one thread per process at this time).
This is functionally equivalent to teh previousl -current except
that there is a thread associated with each process.

Sorry john! (your next MFC will be a doosie!)

Reviewed by: peter@freebsd.org, dillon@freebsd.org

X-MFC after:    ha ha ha ha
2001-09-12 08:38:13 +00:00
assar
20223509f9 correct description of `vpp' for mknod/symlink: they are actually
returned locked
2001-07-24 16:16:00 +00:00