Commit Graph

1021 Commits

Author SHA1 Message Date
Rick Macklem
ca27c028d8 Modify the NFS clients and the NLM so that the NLM can be used
by both clients. Since the NLM uses various fields of the
nfsmount structure, those fields were extracted and put in a
separate nfs_mountcommon structure stored in sys/nfs/nfs_mountcommon.h.
This structure also has a function pointer for a function that
extracts the required information from the mount point and nfs vnode
for that particular client, for information stored differently by the
clients.

Reviewed by:	jhb
MFC after:	2 weeks
2010-10-19 00:20:00 +00:00
Konstantin Belousov
223073fd1a Do not synchronously start the nfsiod threads at all. The r212506
fixed the issues with file descriptor locks, but the same problems are
present for vnode lock/user map lock.

If the nfs_asyncio() cannot find the free nfsiod, schedule task to
create new nfsiod and return error. This causes fall back to the
synchronous i/o for nfs_strategy(), or does not start read at all in
the case of readahead. The caller that holds vnode and potentially
user map lock does not wait for kproc_create() to finish, preventing
the LORs.

The change effectively reverts r203072, because we never hand off the
request to newly created nfsiod thread anymore.

Reviewed by:	jhb
Tested by:	jhb, pluknet
MFC after:	3 weeks
2010-10-18 19:06:46 +00:00
Konstantin Belousov
57bfe0a9f8 Do not fork nfsiod directly from the vop methods. This causes LORs between
vnode lock and several locks needed during fork, like fd lock.

Instead, schedule the task to be executed in the taskqueue context. We
still waiting for the fork to finish, but the context of the thread
executing the task does not make real LORs with our vnode lock.

Submitted by:	pluknet at gmail com
Reviewed by:	jhb
Tested by:	pho
MFC after:	3 weeks
2010-09-12 19:06:08 +00:00
John Baldwin
8e27c18282 Store the full timestamp when caching timestamps of files and
directories for purposes of validating name cache entries.  This
closes races where two updates to a file or directory within the same
second could result in stale entries in the name cache.  While here,
remove the 'n_expiry' field as it is no longer used.

Reviewed by:	rmacklem
MFC after:	1 week
2010-09-07 14:29:45 +00:00
Rick Macklem
090c02f8bd Modify nfs_diskless.c to recognize the environment variable
boot.nfsroot.nfshandlelen and set the diskless root fs to
use NFSv3 and this file handle length when it is set. If
this environment variable is not set, the diskless root fs
will use NFSv2 and the same defaults as before. This fixes
the problem where the diskless nfs root fs had to be on a
FreeBSD server for NFSv3 to work, because it did not know
the correct file handle length and assumed the size used
by FreeBSD. Until pxeboot and loader are replaced by ones
built from commits coming soon, boot.nfsroot.nfshandlelen
will not be set by them and the diskless root fs will use
NFSv2 unless the /etc/fstab entry has the "nfsv3" option
specified.

Tested by:	danny at cs.huji.ac.il
MFC after:	2 weeks
2010-09-01 23:51:07 +00:00
John Baldwin
3634d5b241 Add dedicated routines to toggle lockmgr flags such as LK_NOSHARE and
LK_CANRECURSE after a lock is created.  Use them to implement macros that
otherwise manipulated the flags directly.  Assert that the associated
lockmgr lock is exclusively locked by the current thread when manipulating
these flags to ensure the flag updates are safe.  This last change required
some minor shuffling in a few filesystems to exclusively lock a brand new
vnode slightly earlier.

Reviewed by:	kib
MFC after:	3 days
2010-08-20 19:46:50 +00:00
Rick Macklem
ffea18bdfa Add some mutex locking on the nfsnode to the regular NFS client.
Reviewed by:	jhb
2010-08-04 01:19:11 +00:00
Rick Macklem
f92bbff248 Move sys/nfsclient/nfs_lock.c into sys/nfs and build it as a separate
module that can be used by both the regular and experimental nfs
clients. This fixes the problem reported by jh@ where /dev/nfslock
would be registered twice when both nfs clients were used.
I also defined the size of the lm_fh field to be the correct value,
as it should be the maximum size of an NFSv3 file handle.

Reviewed by:	jh
MFC after:	2 weeks
2010-07-24 22:11:11 +00:00
John Baldwin
3c497facfb Retire the NFS access cache timestamp structure. It was used in VOP_OPEN()
to avoid sending multiple ACCESS/GETATTR RPCs during a single open()
between VOP_LOOKUP() and VOP_OPEN().  Now we always send the RPC in
VOP_LOOKUP() and not VOP_OPEN() in the cases that multiple RPCs could be
sent.

MFC after:	2 weeks
2010-07-15 19:40:48 +00:00
John Baldwin
72ccfc240b A previous change moved the GETATTR RPC for open() calls that hit in the
name cache up into nfs_lookup() instead of nfs_open().  Continue this
trend by flushing the attribute cache for leaf nodes in nfs_lookup() during
an open() if we do a LOOKUP RPC.  For NFSv3 this should generally be a NOP
as the attributes are flushed before fetching the post-op attributes from
the LOOKUP RPC which most (all?) NFSv3 servers provide, so the post-op
attributes should populate the cache.

Now all NFS open() calls will always clear the cached attributes during the
nfs_lookup() prior to nfs_open() in the !NMODIFIED case to provide CTOC.
As a result, we can remove the conditional flushing of the attribute
cache from nfs_open().

Reviewed by:	rmacklem, bde
MFC after:	2 weeks
2010-07-12 14:27:49 +00:00
John Baldwin
985fda3dfe - Add missing locking around flushing of an NFS node's attribute cache
in the NMODIFIED case of nfs_open().
- Cosmetic tweak to simplify an expression in nfs_lookup().

Reviewed by:	rmacklem, bde
MFC after:	1 week
2010-07-12 14:19:23 +00:00
Konstantin Belousov
b38f7723eb In NFS clients, instead of inconsistently using #ifdef
DIAGNOSTIC and #ifndef DIAGNOSTIC for debug assertions, prefer
KASSERT(). Also change one #ifdef DIAGNOSTIC in the new nfs server.

Submitted by:	Mikolaj Golub <to.my.trociny gmail com>
MFC after:	2 weeks
2010-06-13 05:24:27 +00:00
Xin LI
76af820152 Fix build: newnp represents newvp so KDTRACE_NFS_ATTRCACHE_FLUSH_DONE()
on newvp instead of vp here.
2010-05-27 22:59:37 +00:00
John Baldwin
b367632ec2 More gracefully handle stale file handles and attributes when opening a
file via NFS.  Specifically, to satisfy close-to-open-consistency, the NFS
client always performs at least one RPC on a file during an open(2) to see
if the file has changed.  Normally this RPC is an ACCESS or GETATTR RPC
that is forced by flushing a file's attribute cache during nfs_open() and
then requesting new attributes.  However, if the file is noticed to be
stale during nfs_open(), the only recourse is to fail the open(2) call
with ESTALE.  On the other hand, if the ACCESS or GETATTR RPC is sent
during nfs_lookup(), then the NFS client can fall back to a LOOKUP RPC to
obtain the new file handle in the case that a file has been replaced.

This change causes the NFS client to flush the attribute cache during
nfs_lookup() when validating a name cache hit if the attributes fetched
during nfs_lookup() can be reused in nfs_open().  This allows the client
to open a replaced file via the new file handle the first time that it
notices a replaced file rather than failing with ESTALE in some cases.

Reviewed by:	rmacklem, bde
Reviewed by:	mohans (older version)
MFC after:	1 week
2010-05-27 18:07:20 +00:00
Colin Percival
8fd6c56d29 Change the current working directory to be inside the jail created by
the jail(8) command. [10:04]

Fix a one-NUL-byte buffer overflow in libopie. [10:05]

Correctly sanity-check a buffer length in nfs mount. [10:06]

Approved by:	so (cperciva)
Approved by:	re (kensmith)
Security:	FreeBSD-SA-10:04.jail
Security:	FreeBSD-SA-10:05.opie
Security:	FreeBSD-SA-10:06.nfsclient
2010-05-27 03:15:04 +00:00
Alan Cox
03679e2334 Push down the page queues lock into vm_page_activate(). 2010-05-07 15:49:43 +00:00
Alan Cox
eb00b276ab Eliminate page queues locking around most calls to vm_page_free(). 2010-05-06 18:58:32 +00:00
Alan Cox
5ac59343be Acquire the page lock around all remaining calls to vm_page_free() on
managed pages that didn't already have that lock held.  (Freeing an
unmanaged page, such as the various pmaps use, doesn't require the page
lock.)

This allows a change in vm_page_remove()'s locking requirements.  It now
expects the page lock to be held instead of the page queues lock.
Consequently, the page queues lock is no longer required at all by callers
to vm_page_rename().

Discussed with: kib
2010-05-05 18:16:06 +00:00
Edward Tomasz Napierala
b5f770bd86 Move checking against RLIMIT_FSIZE into one place, vn_rlimit_fsize().
Reviewed by:	kib
2010-05-05 16:44:25 +00:00
Konstantin Belousov
fc0c3802f0 Lock the page around vm_page_activate() and vm_page_deactivate() calls
where it was missed. The wrapped fragments now protect wire_count with
page lock.

Reviewed by:	alc
2010-05-03 20:31:13 +00:00
Pawel Jakub Dawidek
6619e90994 Simplify code a bit. 2010-02-18 22:10:55 +00:00
Marius Strobl
b06b8fe3a7 Factor out the code shared between NFS client and server into its own
module. With r203732 it became apparent that creating the sysctl nodes
twice causes at least a warning, however the whole code shouldn't be
present twice in the first place.

Discussed with:	rmacklem
2010-02-16 20:00:21 +00:00
Marius Strobl
9ea01fedc0 - Move nfs_realign() from the NFS client to the shared NFS code and
remove the NFS server version in order to reduce code duplication.
  The shared version now uses a second parameter how, which is passed
  on to m_get(9) and m_getcl(9) as the server used M_WAIT while the
  client requires M_DONTWAIT, and replaces the the previously unused
  parameter hsiz.
- Change nfs_realign() to use nfsm_aligned() so as with other NFS code
  the alignment check isn't actually performed on platforms without
  strict alignment requirements for performance reasons because as the
  comment suggests unaligned data only occasionally occurs with TCP.
- Change fha_extract_info() to use nfs_realign() with M_DONTWAIT rather
  than M_WAIT because it's called with the RPC sp_lock held.

Reviewed by:	jhb, rmacklem
MFC after:	1 week
2010-02-09 23:45:14 +00:00
Marius Strobl
be03f0b907 Some style(9) fixes 2010-02-09 23:40:07 +00:00
Rick Macklem
651ff543f8 Fix a race that can occur when nfs nfsiod threads are being created.
Without this patch it was possible for a different thread that calls
nfs_asyncio() to snitch a newly created nfsiod thread that was
intended for another caller of nfs_asyncio(), because the nfs_iod_mtx
mutex was unlocked while the new nfsiod thread was created. This patch
labels the newly created nfsiod, so that it is not taken by another
caller of nfs_asyncio(). This is believed to fix the problem reported
on the freebsd-stable email list under the subject:
FreeBSD NFS client/Linux NFS server issue.

Tested by:	to DOT my DOT trociny AT gmail DOT com
Reviewed by:	jhb
MFC after:	2 weeks
2010-01-27 15:22:20 +00:00
Rick Macklem
ff589d0da1 Fix a typo in a comment introduced by r202767.
MFC after:	2 weeks
2010-01-21 21:59:10 +00:00
Rick Macklem
f957b30da2 Add a timeout for the negative name cache entries in the NFS client.
This avoids a bogus negative name cache entry from persisting forever
when another client creates an entry with the same name within the
same NFS server time of day clock tick. The mount option negnametimeo
can be used to override the default timeout interval on a
per-mount-point basis. Setting negnametimeo to 0 disables negative
name caching for the mount point.
I also fixed one obvious typo where args.timeo should be
args.maxgrouplist.

Submitted by:	jhb (earlier version)
Reviewed by:	jhb
MFC after:	2 weeks
2010-01-21 20:57:25 +00:00
Marko Zec
5d005b51e5 Reduce recursions on curvnet and thus spamming the console with warning
messages for kernels built with options VIMAGE and VNET_DEBUG enabled.

Reviewed by:	bz
MFC after:	3 days
2010-01-09 14:56:38 +00:00
Martin Blapp
c2ede4b379 Remove extraneous semicolons, no functional changes.
Submitted by:	Marc Balmer <marc@msys.ch>
MFC after:	1 week
2010-01-07 21:01:37 +00:00
Bjoern A. Zeeb
2254f022a0 Add missing include to make LINT-VIMAGE build as well.
Found by:	test building an MFC candidate
X-MFC with:	r200471
2009-12-27 10:10:38 +00:00
Bjoern A. Zeeb
e65a4ba18b Add a few more V_hacks to nfsclient to allow machines with a VIMAGE
kernel to boot from NFS. [1]

Note: this is not a full virtualization of nfsclient. It is only does
what advertised above and nothing more.

Requested by:	public demand [1]
Tested by:	kris, ..
MFC after:	5 days
2009-12-13 11:06:39 +00:00
John Baldwin
12ac99dc46 Close a race with caching of -ve name lookups in the NFS client.
Specifically, clients only trust -ve cache entries while the directory
remains unchanged and discard any -ve cache entries for a directory when
they notice that the modification time of a directory entry changes.  The
race involves two concurrent lookups as follows:
- Thread A does a lookup for file 'foo' which sends a lookup RPC to the
  server.  The lookup fails and the server replies.
- The 'foo' file is created (either by the same client or a different
  client) updating the modification time on the parent directory of 'foo'.
- Thread B does a lookup for a different file 'bar' which updates the
  cached attributes of the parent directory of 'foo' to reflect the new
  modification time after 'foo' was created.
- Thread A finally resumes execution to parse the reply from the NFS
  server.  It adds a -ve cache entry and sets the cached value of the
  directory's modification time that is used for invalidating -ve cached
  lookups to the new modification time set by thread B.

At this point, future lookups of 'foo' will honor the -ve cached entry
until the cached entry is pushed out of the name cache's LRU or the
modification time of the parent directory is changed again by some other
change.  The fix is to read the directory's modification time before
sending the lookup RPC and use that cached modification time when setting
the directory's cached modification time.  Also, we do not add a -ve cache
entry if another thread has added -ve cache entry that set the directory's
cached modification time to a newer value than the value we read before
sending the lookup RPC.

Reviewed by:	rmacklem
MFC after:	1 week
2009-10-16 19:30:48 +00:00
Robert Watson
4347e9fd66 Add a MODULE_DEPEND() on the NFS client from dtnfsclient so that dtnfsclient
can access NFS client symbols.

MFC after:	3 days
Discussed with:	kib
Reported by:	markm
2009-10-12 18:58:42 +00:00
Qing Li
812777783d Reverting the previous change for now. Some users reports the patch
fixes their issues but one reports a failure in NFS ROOT. Revert
the change for now pending further investigation.

Reviewed by:	bz
MFC after:	immediately
2009-09-15 22:09:42 +00:00
Qing Li
3b208f7ca0 Simply remove the code instead of using "#if 0".
Pointed out by sam
2009-09-15 02:22:57 +00:00
Qing Li
96ed1732bb The bootp code installs an interface address and the nfs client
module tries to install the same address again. This extra code
is removed, which was discovered by the removal of a call to
in_ifscrub() in r196714. This call to in_ifscrub is put back here
because the SIOCAIFADDR command can be used to change the prefix
length of an existing alias.

Reviewed by:    kmacy
2009-09-15 01:01:03 +00:00
Rick Macklem
8f63187ec1 Add LK_NOWITNESS to the vn_lock() calls done on newly created nfs
vnodes, since these nodes are not linked into the mount queue and,
as such, the vn_lock() cannot cause a deadlock so LORs are harmless.

Suggested by:	kib
Approved by:	kib (mentor)
MFC after:	3 days
2009-09-09 20:37:49 +00:00
Marko Zec
0348c661d1 Fix NFS panics with options VIMAGE kernels by apropriately setting curvnet
context inside the RPC code.

Temporarily set td's cred to mount's cred before calling socreate() via
__rpc_nconf2socket().

Submitted by:	rmacklem (in part)
Reviewed by:	rmacklem, rwatson
Discussed with:	dfr, bz
Approved by:	re (rwatson), julian (mentor)
MFC after:	3 days
2009-08-24 10:09:30 +00:00
Robert Watson
77dfcdc445 Rework global locks for interface list and index management, correcting
several critical bugs, including race conditions and lock order issues:

Replace the single rwlock, ifnet_lock, with two locks, an rwlock and an
sxlock.  Either can be held to stablize the lists and indexes, but both
are required to write.  This allows the list to be held stable in both
network interrupt contexts and sleepable user threads across sleeping
memory allocations or device driver interactions.  As before, writes to
the interface list must occur from sleepable contexts.

Reviewed by:	bz, julian
MFC after:	3 days
2009-08-23 20:40:19 +00:00
Konstantin Belousov
48bd6d4a49 In nfs_upgrade_vnlock(), assert that the vnode is locked. It is for all
pathes, as far as I see and testing seems to confirm it. Comparision of
old_lock with LK_SHARED make sense only if vnode is locked by current
thread.

When downgrading, pass LK_RETRY to the vn_lock(), since otherwise
vn_lock() unlocks the doomed vnode, causing extra unlock.

Reported and tested by:	pho
Approved by:	re (rwatson)
MFC after:	3 weeks
2009-08-14 10:59:17 +00:00
Robert Watson
530c006014 Merge the remainder of kern_vimage.c and vimage.h into vnet.c and
vnet.h, we now use jails (rather than vimages) as the abstraction
for virtualization management, and what remained was specific to
virtual network stacks.  Minor cleanups are done in the process,
and comments updated to reflect these changes.

Reviewed by:	bz
Approved by:	re (vimage blanket)
2009-08-01 19:26:27 +00:00
Rick Macklem
874bb76647 Patch the regular nfs client in a manner analagous to
r195704 for the experimental client. The patch avoids calling vn_lock()
for the case where nfs_nget() has acquired the same vnode as dvp,
since nfs_nget() has already locked the vnode.

Reviewed by:	kib, jhb
Approved by:	re (kensmith), kib (mentor)
2009-07-17 19:38:07 +00:00
Konstantin Belousov
b35687df13 Use PBDRY flag for msleep(9) in NFS and NLM when sleeping thread owns
kernel resources that block other threads, like vnode locks. The SIGSTOP
sent to such thread (process, rather) shall not stop it until thread
releases the resources.

Tested by:	pho
Reviewed by:	jhb
Approved by:	re (kensmith)
2009-07-14 22:54:29 +00:00
Robert Watson
eddfbb763d Build on Jeff Roberson's linker-set based dynamic per-CPU allocator
(DPCPU), as suggested by Peter Wemm, and implement a new per-virtual
network stack memory allocator.  Modify vnet to use the allocator
instead of monolithic global container structures (vinet, ...).  This
change solves many binary compatibility problems associated with
VIMAGE, and restores ELF symbols for virtualized global variables.

Each virtualized global variable exists as a "reference copy", and also
once per virtual network stack.  Virtualized global variables are
tagged at compile-time, placing the in a special linker set, which is
loaded into a contiguous region of kernel memory.  Virtualized global
variables in the base kernel are linked as normal, but those in modules
are copied and relocated to a reserved portion of the kernel's vnet
region with the help of a the kernel linker.

Virtualized global variables exist in per-vnet memory set up when the
network stack instance is created, and are initialized statically from
the reference copy.  Run-time access occurs via an accessor macro, which
converts from the current vnet and requested symbol to a per-vnet
address.  When "options VIMAGE" is not compiled into the kernel, normal
global ELF symbols will be used instead and indirection is avoided.

This change restores static initialization for network stack global
variables, restores support for non-global symbols and types, eliminates
the need for many subsystem constructors, eliminates large per-subsystem
structures that caused many binary compatibility issues both for
monitoring applications (netstat) and kernel modules, removes the
per-function INIT_VNET_*() macros throughout the stack, eliminates the
need for vnet_symmap ksym(2) munging, and eliminates duplicate
definitions of virtualized globals under VIMAGE_GLOBALS.

Bump __FreeBSD_version and update UPDATING.

Portions submitted by:  bz
Reviewed by:            bz, zec
Discussed with:         gnn, jamie, jeff, jhb, julian, sam
Suggested by:           peter
Approved by:            re (kensmith)
2009-07-14 22:48:30 +00:00
Konstantin Belousov
f1eccd05ec In vn_vget_ino() and their inline equivalents, mnt_ref() the mount point
around the sequence that drop vnode lock and then busies the mount point.
Not having vlocked node or direct reference to the mp allows for the
forced unmount to proceed, making mp unmounted or reused.

Tested by:	pho
Reviewed by:	jeff
Approved by:	re (kensmith)
MFC after:	2 weeks
2009-07-02 18:02:55 +00:00
Doug Rabson
98c497255b Adjust the internal NFS KPI to avoid the last traces of NFS_LEGACYRPC.
Approved by: re
2009-06-30 19:10:17 +00:00
Doug Rabson
b49a2b39fd Remove the old kernel RPC implementation and the NFS_LEGACYRPC option.
Approved by: re
2009-06-30 19:03:27 +00:00
John Baldwin
5ed6940d13 Fix build with NFS_LEGACYRPC enabled after the socket upcall locking
changes.

Approved by:	re (kensmith)
2009-06-30 03:18:51 +00:00
Robert Watson
2d9cfabad4 Add a new global rwlock, in_ifaddr_lock, which will synchronize use of the
in_ifaddrhead and INADDR_HASH address lists.

Previously, these lists were used unsynchronized as they were effectively
never changed in steady state, but we've seen increasing reports of
writer-writer races on very busy VPN servers as core count has gone up
(and similar configurations where address lists change frequently and
concurrently).

For the time being, use rwlocks rather than rmlocks in order to take
advantage of their better lock debugging support.  As a result, we don't
enable ip_input()'s read-locking of INADDR_HASH until an rmlock conversion
is complete and a performance analysis has been done.  This means that one
class of reader-writer races still exists.

MFC after:      6 weeks
Reviewed by:    bz
2009-06-25 11:52:33 +00:00
Bjoern A. Zeeb
5736e6fb9d After cleaning up rt_tables from vnet.h and cleaning up opt_route.h
a lot of files no longer need route.h either. Garbage collect them.
While here remove now unneeded vnet.h #includes as well.
2009-06-23 17:03:45 +00:00
Alan Cox
57a7e73261 Fix some of the style errors in *getpages(). 2009-06-18 05:56:24 +00:00
Konstantin Belousov
b3c5643a25 For dotdot lookup in nfs_lookup, inline the vn_vget_ino() to prevent
operating on the unmounted mount point and freed mount data in case of
forced unmount performed while dvp is unlocked to nget the target vnode.

Add missed calls to m_freem(mrep) there on error exits [1].

Submitted by:	rmacklem [1]
Tested by:	pho
MFC after:	2 weeks
2009-06-17 12:47:27 +00:00
Jamie Gritton
c1f192193d Rename the host-related prison fields to be the same as the host.*
parameters they represent, and the variables they replaced, instead of
abbreviated versions of them.

Approved by:	bz (mentor)
2009-06-13 15:39:12 +00:00
Rick Macklem
5081c8c757 Add a test for VI_DOOMED just after nfs_upgrade_vnlock() in
nfs_bioread_check_cons(). This is required since it is possible
for the vnode to be vgonel()'d while in nfs_upgrade_vnlock() when
a forced dismount is in progress. Also, move the check for VI_DOOMED
in nfs_vinvalbuf() down to after nfs_upgrade_vnlock() and replace the
out of date comment for it.

Submitted by:	jhb
Tested by:	pho
Approved by:	kib (mentor)
MFC after:	1 month
2009-06-10 21:03:57 +00:00
Bjoern A. Zeeb
8d8bc0182e After r193232 rt_tables in vnet.h are no longer indirectly dependent on
the ROUTETABLES kernel option thus there is no need to include opt_route.h
anymore in all consumers of vnet.h and no longer depend on it for module
builds.

Remove the hidden include in flowtable.h as well and leave the two
explicit #includes in ip_input.c and ip_output.c.
2009-06-08 19:57:35 +00:00
John Baldwin
74fb0ba732 Rework socket upcalls to close some races with setup/teardown of upcalls.
- Each socket upcall is now invoked with the appropriate socket buffer
  locked.  It is not permissible to call soisconnected() with this lock
  held; however, so socket upcalls now return an integer value.  The two
  possible values are SU_OK and SU_ISCONNECTED.  If an upcall returns
  SU_ISCONNECTED, then the soisconnected() will be invoked on the
  socket after the socket buffer lock is dropped.
- A new API is provided for setting and clearing socket upcalls.  The
  API consists of soupcall_set() and soupcall_clear().
- To simplify locking, each socket buffer now has a separate upcall.
- When a socket upcall returns SU_ISCONNECTED, the upcall is cleared from
  the receive socket buffer automatically.  Note that a SO_SND upcall
  should never return SU_ISCONNECTED.
- All this means that accept filters should now return SU_ISCONNECTED
  instead of calling soisconnected() directly.  They also no longer need
  to explicitly clear the upcall on the new socket.
- The HTTP accept filter still uses soupcall_set() to manage its internal
  state machine, but other accept filters no longer have any explicit
  knowlege of socket upcall internals aside from their return value.
- The various RPC client upcalls currently drop the socket buffer lock
  while invoking soreceive() as a temporary band-aid.  The plan for
  the future is to add a new flag to allow soreceive() to be called with
  the socket buffer locked.
- The AIO callback for socket I/O is now also invoked with the socket
  buffer locked.  Previously sowakeup() would drop the socket buffer
  lock only to call aio_swake() which immediately re-acquired the socket
  buffer lock for the duration of the function call.

Discussed with:	rwatson, rmacklem
2009-06-01 21:17:03 +00:00
Bjoern A. Zeeb
c2c2a7c11e Convert the two dimensional array to be malloced and introduce
an accessor function to get the correct rnh pointer back.

Update netstat to get the correct pointer using kvm_read()
as well.

This not only fixes the ABI problem depending on the kernel
option but also permits the tunable to overwrite the kernel
option at boot time up to MAXFIBS, enlarging the number of
FIBs without having to recompile. So people could just use
GENERIC now.

Reviewed by:	julian, rwatson, zec
X-MFC:		not possible
2009-06-01 15:49:42 +00:00
Alan Cox
1f17689408 nfs_write() can use the recently introduced vfs_bio_set_valid() instead of
vfs_bio_set_validclean(), thereby avoiding the page queues lock.

Garbage collect vfs_bio_set_validclean().  Nothing uses it any longer.
2009-05-31 20:18:02 +00:00
Jamie Gritton
76ca6f88da Place hostnames and similar information fully under the prison system.
The system hostname is now stored in prison0, and the global variable
"hostname" has been removed, as has the hostname_mtx mutex.  Jails may
have their own host information, or they may inherit it from the
parent/system.  The proper way to read the hostname is via
getcredhostname(), which will copy either the hostname associated with
the passed cred, or the system hostname if you pass NULL.  The system
hostname can still be accessed directly (and without locking) at
prison0.pr_host, but that should be avoided where possible.

The "similar information" referred to is domainname, hostid, and
hostuuid, which have also become prison parameters and had their
associated global variables removed.

Approved by:	bz (mentor)
2009-05-29 21:27:12 +00:00
Alan Cox
3933ec4d15 Make *getpages()s' assertion on the state of each page's dirty bits
stricter.
2009-05-28 18:11:09 +00:00
Doug Rabson
f435931b4a Make sure we feed 32bit align memory to nfsm_dissect otherwise we will fault
on platforms with strict alignment requirements. In particular, this fixes the
problems with the new RPC transport on the arm platform.

Note: this adds yet another copy of nfs_realign(). I will attempt to refactor
after NFS_LEGACYRPC is removed.

Submitted by:	sam
2009-05-24 13:22:00 +00:00
Bjoern A. Zeeb
1849938c8e While r192615 fixed the former problems, make this file VIMAGE
compliant now as well initializing local context variables.
2009-05-23 16:27:42 +00:00
Bjoern A. Zeeb
a43c797788 It seems this file was ignored by MRT, rnh locking changes and new-arpv2.
So let the V_irtualization people finally make the disabled debugging code
compile again.

MFC after:	2 weeks
X-MFC:		MRT and adapt rnh locking
2009-05-23 00:07:55 +00:00
Robert Watson
86ce6a83d1 Remove the unmaintained University of Michigan NFSv4 client from 8.x
prior to 8.0-RELEASE.  Rick Macklem's new and more feature-rich NFSv234
client and server are replacing it.

Discussed with:	rmacklem
2009-05-22 12:35:12 +00:00
Alan Cox
42eb41087c Eliminate unnecessary clearing of the page's dirty mask from various
getpages functions.

Eliminate a stale comment.
2009-05-15 04:33:35 +00:00
Alan Cox
12aa4fdca9 Eliminate gratuitous clearing of the page's dirty mask. 2009-05-12 05:49:02 +00:00
Attilio Rao
dfd233edd5 Remove the thread argument from the FSD (File-System Dependent) parts of
the VFS.  Now all the VFS_* functions and relating parts don't want the
context as long as it always refers to curthread.

In some points, in particular when dealing with VOPs and functions living
in the same namespace (eg. vflush) which still need to be converted,
pass curthread explicitly in order to retain the old behaviour.
Such loose ends will be fixed ASAP.

While here fix a bug: now, UFS_EXTATTR can be compiled alone without the
UFS_EXTATTR_AUTOSTART option.

VFS KPI is heavilly changed by this commit so thirdy parts modules needs
to be recompiled.  Bump __FreeBSD_version in order to signal such
situation.
2009-05-11 15:33:26 +00:00
Alan Cox
f45cc06eb1 Eliminate stale comments.
Eliminate a case of unnecessary page queues locking.
2009-05-10 17:05:43 +00:00
Marko Zec
21ca7b57bd Change the curvnet variable from a global const struct vnet *,
previously always pointing to the default vnet context, to a
dynamically changing thread-local one.  The currvnet context
should be set on entry to networking code via CURVNET_SET() macros,
and reverted to previous state via CURVNET_RESTORE().  Recursions
on curvnet are permitted, though strongly discuouraged.

This change should have no functional impact on nooptions VIMAGE
kernel builds, where CURVNET_* macros expand to whitespace.

The curthread->td_vnet (aka curvnet) variable's purpose is to be an
indicator of the vnet context in which the current network-related
operation takes place, in case we cannot deduce the current vnet
context from any other source, such as by looking at mbuf's
m->m_pkthdr.rcvif->if_vnet, sockets's so->so_vnet etc.  Moreover, so
far curvnet has turned out to be an invaluable consistency checking
aid: it helps to catch cases when sockets, ifnets or any other
vnet-aware structures may have leaked from one vnet to another.

The exact placement of the CURVNET_SET() / CURVNET_RESTORE() macros
was a result of an empirical iterative process, whith an aim to
reduce recursions on CURVNET_SET() to a minimum, while still reducing
the scope of CURVNET_SET() to networking only operations - the
alternative would be calling CURVNET_SET() on each system call entry.
In general, curvnet has to be set in three typicall cases: when
processing socket-related requests from userspace or from within the
kernel; when processing inbound traffic flowing from device drivers
to upper layers of the networking stack, and when executing
timer-driven networking functions.

This change also introduces a DDB subcommand to show the list of all
vnet instances.

Approved by:	julian (mentor)
2009-05-05 10:56:12 +00:00
Robert Watson
6fb82ecbde Remove redundant NFSMNT_NFSV3 check in DTrace hooks for NFS RPC.
MFC after:	1 month
2009-05-04 02:19:52 +00:00
Robert Watson
9e4fda10a0 Fix typo in comment.
MFC after:	1 month
2009-05-04 02:06:39 +00:00
Konstantin Belousov
e0769d2dff Remove trailing spaces 2009-04-13 19:54:33 +00:00
Robert Watson
885868cd8f Remove VOP_LEASE and supporting functions. This hasn't been used since
the removal of NQNFS, but was left in in case it was required for NFSv4.
Since our new NFSv4 client and server can't use it for their
requirements, GC the old mechanism, as well as other unused lease-
related code and interfaces.

Due to its impact on kernel programming and binary interfaces, this
change should not be MFC'd.

Proposed by:    jeff
Reviewed by:    jeff
Discussed with: rmacklem, zach loafman @ isilon
2009-04-10 10:52:19 +00:00
Konstantin Belousov
3f54086eba Cache_lookup() for DOTDOT drops dvp vnode lock, allowing dvp to be reclaimed.
Check the condition and return ENOENT then.

In nfs_lookup(), respect ENOENT return from cache_lookup() when it is caused
by dvp reclaim.

Reported and tested by:	pho
2009-04-10 10:22:44 +00:00
John Baldwin
3429118095 When a stale file handle is encountered, purge all cached information about
an NFS node including the access and attribute caches.  Previously the NFS
client only purged any name cache entries associated with the file.

PR:		kern/123755
Submitted by:	Jaakko Heinonen  jh of saunalahti fi
Reported by:	Timo Sirainen  tss of iki fi
Reviewed by:	rwatson, rmacklem
MFC after:	1 month
2009-04-06 21:11:08 +00:00
John Baldwin
a58158ef08 Change the default timeout for caching attributes of a directory in the NFS
client from 30 seconds to 3 seconds.  After the recent changes to add
caching of negative name cache lookups, a negative name cache hit will
persist until the client notices the parent directory has changed.  The
higher the attribute cache timeout on directories, the longer that can take,
so lower the default timeout for directories to match that of regular files.

Suggested by:	bde, mohans
MFC after:	1 month
2009-04-06 19:12:47 +00:00
Robert Watson
455f3aa24f Move dtnfsclient.c in the cddl tree to nfs_kdtrace.c in the nfsclient
directory, since it's under a BSD license, and this keeps NFS internals-
aware tracing parts close to NFS.

MFC after:	1 month
Suggested by:	jhb
2009-03-25 17:47:22 +00:00
Robert Watson
346ef8cd59 Fix two bugs in DTrace tracing of accesscache and attrcache load events:
- Trace non-error loads into the access cache once, not zero times or
  twice.
- Sometimes attr cache loads fail due to a race, in which case they are
  aborted leading to an invalidation; in this case, trace only the flush,
  not a load.

MFC after:	1 month
Sponsored by:	Google, Inc.
2009-03-24 23:16:48 +00:00
Robert Watson
10263f0832 Add DTrace probes to the NFS access and attribute caches. Access cache
events are:

  nfsclient:accesscache:flush:done
  nfsclient:accesscache:get:hit
  nfsclient:accesscache:get:miss
  nfsclient:accesscache:load:done

They pass the vnode, uid, and requested or loaded access mode (if any);
the load event may also report a load error if the RPC fails.

The attribute cache events are:

  nfsclient:attrcache:flush:done
  nfsclient:attrcache:get:hit
  nfsclient:attrcache:get:miss
  nfsclient:attrcache:load:done

They pass the vnode, optionally the vattr if one is present (hit or load),
and in the case of a load event, also a possible RPC error.

MFC after:	1 month
Sponsored by:	Google, Inc.
2009-03-24 17:14:34 +00:00
Robert Watson
47294818f9 Add dtnfsclient, a first cut at an NFSv2/v3 client reuest DTrace
provider.  The NFS client exposes 'start' and 'done' probes for NFSv2
and NFSv3 RPCs when using the new RPC implementation, passing in the
vnode, mbuf chain, credential, and NFSv2 or NFSv3 procedure number.
For 'done' probes, the error number is also available.

Probes are named in the following way:

  ...
  nfsclient:nfs2:write:start
  nfsclient:nfs2:write:done
  ...
  nfsclient:nfs3:access:start
  nfsclient:nfs3:access:done
  ...

Access to the unmarshalled arguments is not easily available at this
point in the stack, but the passed probe arguments are sufficient to
to a lot of interesting things in practice.  Technically, these probes
may cover multiple RPC retransmits, and even transactions if the
transaction ID change as a result of authentication failure or a
jukebox error from the server, but usefully capture the intent of a
single NFS request, such as access, getattr, write, etc.

Typical use might involve profiling RPC latency by system call, number
of RPCs, how often a getattr leads to a call to access, when failed
access control checks occur, etc.  More detailed RPC information might
best be provided by adding a krpc provider.  It would also be useful
to add NFS client probes for events such as the access cache or
attribute cache satisfying requests without an RPC.

Sponsored by:	Google, Inc.
MFC after:	1 month
2009-03-22 22:07:52 +00:00
Robert Watson
06bd99086d In nfs_request(), always exit using the nfsmout label once we're
definitely doing an NFSv2 or NFSv3 RPC, rather than sometimes doing
so and sometimes not.  This makes it easier to add a DTrace return
probe at a single point in the function.

MFC after:	1 week
2009-03-21 21:49:07 +00:00
John Baldwin
2a3f3a09c6 Expand the per-node access cache to cache permissions for multiple users.
The number of entries in the cache defaults to 8 but is easily changed in
nfsclient/nfs.h.  When the cache is filled, the oldest cache entry is
evicted when space is needed.

I mirrored the changes to the NFSv[23] client in the NFSv4 client to fix
compile breakage.  However, the NFSv4 client doesn't actually use the
access cache currently.

Submitted by:	rmacklem
2009-03-20 21:12:38 +00:00
John Baldwin
667c6c197e - Remove code to set SAVENAME for CREATE or RENAME requests that get a -ve
hit in the name cache.  cache_lookup() doesn't actually return ENOENT
  for such requests to force the filesystem to do an explicit lookup, so
  this was effectively dead code.
- Grab the nfsnode mutex while writing to n_dmtime.  We don't grab the lock
  when comparing the time against the cached directory mod time (just as
  we don't when comparing ctime's for +ve name cache hits) since the
  attribute caching is already racy for NFS clients as it is.

Discussed with:	bde
2009-03-10 18:41:06 +00:00
Bjoern A. Zeeb
33553d6e99 For all files including net/vnet.h directly include opt_route.h and
net/route.h.

Remove the hidden include of opt_route.h and net/route.h from net/vnet.h.

We need to make sure that both opt_route.h and net/route.h are included
before net/vnet.h because of the way MRT figures out the number of FIBs
from the kernel option. If we do not, we end up with the default number
of 1 when including net/vnet.h and array sizes are wrong.

This does not change the list of files which depend on opt_route.h
but we can identify them now more easily.
2009-02-27 14:12:05 +00:00
John Baldwin
fdd1a964d1 Bring back the code to prime the ACCESS cache when fetching attributes for
an NFS file.  Now the priming is conditional on  a new
vfs.nfs.prime_access_cache sysctl.  For now I've left the default setting
to disabling the priming.

Requested by:	 scottl
2009-02-24 16:01:56 +00:00
John Baldwin
03964c8e09 Enable caching of negative pathname lookups in the NFS client. To avoid
stale entries, we save a copy of the directory's modification time when
the first negative cache entry was added in the directory's NFS node.
When a negative cache entry is hit during a pathname lookup, the parent
directory's modification time is checked.  If it has changed, all of the
negative cache entries for that parent are purged and the lookup falls
back to using the RPC.  This required adding a new cache_purge_negative()
method to the name cache to purge only negative cache entries for a given
directory.

Submitted by:	mohans, Rick Macklem, Ricardo Labiaga @ NetApp
Reviewed by:	mohans
2009-02-19 22:28:48 +00:00
John Baldwin
5428c66130 When fetching attributes for a file for NFSv3 mounts, do not perform an
opportunistic ACCESS RPC to populate both the access and attribute caches
of the file and instead always use a GETATTR RPC.  On many modern NFS
servers, an ACCESS RPC is much more expensive to service than a GETATTR
RPC.

Submitted by:	mohans
2009-02-19 22:18:00 +00:00
John Baldwin
3e057a2477 Don't clear the attribute cache of a file when it is closed. A subsequent
open() of the same file will load fresh attributes, so they do not need to
be explicitly flushed in close() to guarantee close to open consistency.
However, other file desciptors may still reference this file and clearing
the attributes in close() forces those other file descriptors to fetch
fresh attributes the next time they need them.

Reviewed by:	mohans
MFC after:	1 week
2009-02-19 22:10:39 +00:00
John Baldwin
093e877818 Reindent a small bit of code that was not 8-space indented like the rest
of the nfs_lookup() function.
2009-02-18 16:34:13 +00:00
Ed Schouten
a4611ab612 Last step of splitting up minor and unit numbers: remove minor().
Inside the kernel, the minor() function was responsible for obtaining
the device minor number of a character device. Because we made device
numbers dynamically allocated and independent of the unit number passed
to make_dev() a long time ago, it was actually a misnomer. If you really
want to obtain the device number, you should use dev2udev().

We already converted all the drivers to use dev2unit() to obtain the
device unit number, which is still used by a lot of drivers. I've
noticed not a single driver passes NULL to dev2unit(). Even if they
would, its behaviour would make little sense. This is why I've removed
the NULL check.

Ths commit removes minor(), minor2unit() and unit2minor() from the
kernel. Because there was a naming collision with uminor(), we can
rename umajor() and uminor() back to major() and minor(). This means
that the makedev(3) manual page also applies to kernel space code now.

I suspect umajor() and uminor() isn't used that often in external code,
but to make it easier for other parties to port their code, I've
increased __FreeBSD_version to 800062.
2009-01-28 17:57:16 +00:00
Craig Rodrigues
e4f9e894d4 Fix parsing of acregmin, acregmax, acdirmin and acdirmax NFS mount options
when passed as strings via nmount().

Submitted by: Jaakko Heinonen <jh saunalahti fi>
2009-01-28 07:46:35 +00:00
John Baldwin
beace17649 Move the VA_MARKATIME flag for VOP_SETATTR() out into its own VOP:
VOP_MARKATIME() since unlike the rest of VOP_SETATTR(), VA_MARKATIME
can be performed while holding a shared vnode lock (the same functionality
is done internally by VOP_READ which can run with a shared vnode lock).
Add missing locking of the vnode interlock to the ufs implementation and
remove a special note and test from the NFS client about not supporting the
feature.

Inspired by:	ups
Tested by:	pho
2009-01-21 14:42:00 +00:00
Bjoern A. Zeeb
4b79449e2f Rather than using hidden includes (with cicular dependencies),
directly include only the header files needed. This reduces the
unneeded spamming of various headers into lots of files.

For now, this leaves us with very few modules including vnet.h
and thus needing to depend on opt_route.h.

Reviewed by:	brooks, gnn, des, zec, imp
Sponsored by:	The FreeBSD Foundation
2008-12-02 21:37:28 +00:00
Doug Rabson
335317d291 Switch the default rpc implementation for NFS back to the new code. I believe
I have fixed the reported problems - if you still have trouble with it, please
contact me with as much detail as possible so that I can track down any other
issues as quickly as possible.
2008-11-14 11:27:53 +00:00
Doug Rabson
0eec5c87bf Temporarily switch NFS back to the old RPC code while I try to diagnose and
fix the problems a few people have noticed with the new code. People who want
to continue testing the new code or who need RPCSEC_GSS support should use
the new option NFS_NEWRPC to select it.
2008-11-13 11:35:18 +00:00
Doug Rabson
a9148abd9d Implement support for RPCSEC_GSS authentication to both the NFS client
and server. This replaces the RPC implementation of the NFS client and
server with the newer RPC implementation originally developed
(actually ported from the userland sunrpc code) to support the NFS
Lock Manager.  I have tested this code extensively and I believe it is
stable and that performance is at least equal to the legacy RPC
implementation.

The NFS code currently contains support for both the new RPC
implementation and the older legacy implementation inherited from the
original NFS codebase. The default is to use the new implementation -
add the NFS_LEGACYRPC option to fall back to the old code. When I
merge this support back to RELENG_7, I will probably change this so
that users have to 'opt in' to get the new code.

To use RPCSEC_GSS on either client or server, you must build a kernel
which includes the KGSSAPI option and the crypto device. On the
userland side, you must build at least a new libc, mountd, mount_nfs
and gssd. You must install new versions of /etc/rc.d/gssd and
/etc/rc.d/nfsd and add 'gssd_enable=YES' to /etc/rc.conf.

As long as gssd is running, you should be able to mount an NFS
filesystem from a server that requires RPCSEC_GSS authentication. The
mount itself can happen without any kerberos credentials but all
access to the filesystem will be denied unless the accessing user has
a valid ticket file in the standard place (/tmp/krb5cc_<uid>). There
is currently no support for situations where the ticket file is in a
different place, such as when the user logged in via SSH and has
delegated credentials from that login. This restriction is also
present in Solaris and Linux. In theory, we could improve this in
future, possibly using Brooks Davis' implementation of variant
symlinks.

Supporting RPCSEC_GSS on a server is nearly as simple. You must create
service creds for the server in the form 'nfs/<fqdn>@<REALM>' and
install them in /etc/krb5.keytab. The standard heimdal utility ktutil
makes this fairly easy. After the service creds have been created, you
can add a '-sec=krb5' option to /etc/exports and restart both mountd
and nfsd.

The only other difference an administrator should notice is that nfsd
doesn't fork to create service threads any more. In normal operation,
there will be two nfsd processes, one in userland waiting for TCP
connections and one in the kernel handling requests. The latter
process will create as many kthreads as required - these should be
visible via 'top -H'. The code has some support for varying the number
of service threads according to load but initially at least, nfsd uses
a fixed number of threads according to the value supplied to its '-n'
option.

Sponsored by:	Isilon Systems
MFC after:	1 month
2008-11-03 10:38:00 +00:00
Tom Rhodes
8b4acb0cc0 Document a few sysctls in the NFS client and server code.
Minor style(9) where applicable.

Approved by:	alfred (slightly older version)
2008-11-02 17:00:23 +00:00
Attilio Rao
83b3bdbc8a Improve VFS locking:
- Implement real draining for vfs consumers by not relying on the
  mnt_lock and using instead a refcount in order to keep track of lock
  requesters.
- Due to the change above, remove the mnt_lock lockmgr because it is now
  useless.
- Due to the change above, vfs_busy() is no more linked to a lockmgr.
  Change so its KPI by removing the interlock argument and defining 2 new
  flags for it: MBF_NOWAIT which basically replaces the LK_NOWAIT of the
  old version (which was unlinked from the lockmgr alredy) and
  MBF_MNTLSTLOCK which provides the ability to drop the mountlist_mtx
  once the mnt interlock is held (ability still desired by most consumers).
- The stub used into vfs_mount_destroy(), that allows to override the
  mnt_ref if running for more than 3 seconds, make it totally useless.
  Remove it as it was thought to work into older versions.
  If a problem of "refcount held never going away" should appear, we will
  need to fix properly instead than trust on such hackish solution.
- Fix a bug where returning (with an error) from dounmount() was still
  leaving the MNTK_MWAIT flag on even if it the waiters were actually
  woken up. Just a place in vfs_mount_destroy() is left because it is
  going to recycle the structure in any case, so it doesn't matter.
- Remove the markercnt refcount as it is useless.

This patch modifies VFS ABI and breaks KPI for vfs_busy() so manpages and
__FreeBSD_version will be modified accordingly.

Discussed with:	kib
Tested by:	pho
2008-11-02 10:15:42 +00:00
Edward Tomasz Napierala
15bc6b2bd8 Introduce accmode_t. This is required for NFSv4 ACLs - it will be neccessary
to add more V* constants, and the variables changed by this patch were often
being assigned to mode_t variables, which is 16 bit.

Approved by:	rwatson (mentor)
2008-10-28 13:44:11 +00:00
Dag-Erling Smørgrav
e11e3f187d Fix a number of style issues in the MALLOC / FREE commit. I've tried to
be careful not to fix anything that was already broken; the NFSv4 code is
particularly bad in this respect.
2008-10-23 20:26:15 +00:00