927 Commits

Author SHA1 Message Date
kib
350f96b4bf In vn_vget_ino() and their inline equivalents, mnt_ref() the mount point
around the sequence that drop vnode lock and then busies the mount point.
Not having vlocked node or direct reference to the mp allows for the
forced unmount to proceed, making mp unmounted or reused.

Tested by:	pho
Reviewed by:	jeff
Approved by:	re (kensmith)
MFC after:	2 weeks
2009-07-02 18:02:55 +00:00
dfr
836bf4cc58 Adjust the internal NFS KPI to avoid the last traces of NFS_LEGACYRPC.
Approved by: re
2009-06-30 19:10:17 +00:00
dfr
5d248bb05f Remove the old kernel RPC implementation and the NFS_LEGACYRPC option.
Approved by: re
2009-06-30 19:03:27 +00:00
jhb
ccb2608d06 Fix build with NFS_LEGACYRPC enabled after the socket upcall locking
changes.

Approved by:	re (kensmith)
2009-06-30 03:18:51 +00:00
rwatson
ea70a3542d Add a new global rwlock, in_ifaddr_lock, which will synchronize use of the
in_ifaddrhead and INADDR_HASH address lists.

Previously, these lists were used unsynchronized as they were effectively
never changed in steady state, but we've seen increasing reports of
writer-writer races on very busy VPN servers as core count has gone up
(and similar configurations where address lists change frequently and
concurrently).

For the time being, use rwlocks rather than rmlocks in order to take
advantage of their better lock debugging support.  As a result, we don't
enable ip_input()'s read-locking of INADDR_HASH until an rmlock conversion
is complete and a performance analysis has been done.  This means that one
class of reader-writer races still exists.

MFC after:      6 weeks
Reviewed by:    bz
2009-06-25 11:52:33 +00:00
bz
0808d0b1a6 After cleaning up rt_tables from vnet.h and cleaning up opt_route.h
a lot of files no longer need route.h either. Garbage collect them.
While here remove now unneeded vnet.h #includes as well.
2009-06-23 17:03:45 +00:00
alc
f0cb2be073 Fix some of the style errors in *getpages(). 2009-06-18 05:56:24 +00:00
kib
5a77f36af9 For dotdot lookup in nfs_lookup, inline the vn_vget_ino() to prevent
operating on the unmounted mount point and freed mount data in case of
forced unmount performed while dvp is unlocked to nget the target vnode.

Add missed calls to m_freem(mrep) there on error exits [1].

Submitted by:	rmacklem [1]
Tested by:	pho
MFC after:	2 weeks
2009-06-17 12:47:27 +00:00
jamie
f419891544 Rename the host-related prison fields to be the same as the host.*
parameters they represent, and the variables they replaced, instead of
abbreviated versions of them.

Approved by:	bz (mentor)
2009-06-13 15:39:12 +00:00
rmacklem
3d458aebb2 Add a test for VI_DOOMED just after nfs_upgrade_vnlock() in
nfs_bioread_check_cons(). This is required since it is possible
for the vnode to be vgonel()'d while in nfs_upgrade_vnlock() when
a forced dismount is in progress. Also, move the check for VI_DOOMED
in nfs_vinvalbuf() down to after nfs_upgrade_vnlock() and replace the
out of date comment for it.

Submitted by:	jhb
Tested by:	pho
Approved by:	kib (mentor)
MFC after:	1 month
2009-06-10 21:03:57 +00:00
bz
b7ff2bdc20 After r193232 rt_tables in vnet.h are no longer indirectly dependent on
the ROUTETABLES kernel option thus there is no need to include opt_route.h
anymore in all consumers of vnet.h and no longer depend on it for module
builds.

Remove the hidden include in flowtable.h as well and leave the two
explicit #includes in ip_input.c and ip_output.c.
2009-06-08 19:57:35 +00:00
jhb
a1af9ecca4 Rework socket upcalls to close some races with setup/teardown of upcalls.
- Each socket upcall is now invoked with the appropriate socket buffer
  locked.  It is not permissible to call soisconnected() with this lock
  held; however, so socket upcalls now return an integer value.  The two
  possible values are SU_OK and SU_ISCONNECTED.  If an upcall returns
  SU_ISCONNECTED, then the soisconnected() will be invoked on the
  socket after the socket buffer lock is dropped.
- A new API is provided for setting and clearing socket upcalls.  The
  API consists of soupcall_set() and soupcall_clear().
- To simplify locking, each socket buffer now has a separate upcall.
- When a socket upcall returns SU_ISCONNECTED, the upcall is cleared from
  the receive socket buffer automatically.  Note that a SO_SND upcall
  should never return SU_ISCONNECTED.
- All this means that accept filters should now return SU_ISCONNECTED
  instead of calling soisconnected() directly.  They also no longer need
  to explicitly clear the upcall on the new socket.
- The HTTP accept filter still uses soupcall_set() to manage its internal
  state machine, but other accept filters no longer have any explicit
  knowlege of socket upcall internals aside from their return value.
- The various RPC client upcalls currently drop the socket buffer lock
  while invoking soreceive() as a temporary band-aid.  The plan for
  the future is to add a new flag to allow soreceive() to be called with
  the socket buffer locked.
- The AIO callback for socket I/O is now also invoked with the socket
  buffer locked.  Previously sowakeup() would drop the socket buffer
  lock only to call aio_swake() which immediately re-acquired the socket
  buffer lock for the duration of the function call.

Discussed with:	rwatson, rmacklem
2009-06-01 21:17:03 +00:00
bz
c62e99f85d Convert the two dimensional array to be malloced and introduce
an accessor function to get the correct rnh pointer back.

Update netstat to get the correct pointer using kvm_read()
as well.

This not only fixes the ABI problem depending on the kernel
option but also permits the tunable to overwrite the kernel
option at boot time up to MAXFIBS, enlarging the number of
FIBs without having to recompile. So people could just use
GENERIC now.

Reviewed by:	julian, rwatson, zec
X-MFC:		not possible
2009-06-01 15:49:42 +00:00
alc
5e539a33da nfs_write() can use the recently introduced vfs_bio_set_valid() instead of
vfs_bio_set_validclean(), thereby avoiding the page queues lock.

Garbage collect vfs_bio_set_validclean().  Nothing uses it any longer.
2009-05-31 20:18:02 +00:00
jamie
572db1408a Place hostnames and similar information fully under the prison system.
The system hostname is now stored in prison0, and the global variable
"hostname" has been removed, as has the hostname_mtx mutex.  Jails may
have their own host information, or they may inherit it from the
parent/system.  The proper way to read the hostname is via
getcredhostname(), which will copy either the hostname associated with
the passed cred, or the system hostname if you pass NULL.  The system
hostname can still be accessed directly (and without locking) at
prison0.pr_host, but that should be avoided where possible.

The "similar information" referred to is domainname, hostid, and
hostuuid, which have also become prison parameters and had their
associated global variables removed.

Approved by:	bz (mentor)
2009-05-29 21:27:12 +00:00
alc
87880e35a7 Make *getpages()s' assertion on the state of each page's dirty bits
stricter.
2009-05-28 18:11:09 +00:00
dfr
18697420d2 Make sure we feed 32bit align memory to nfsm_dissect otherwise we will fault
on platforms with strict alignment requirements. In particular, this fixes the
problems with the new RPC transport on the arm platform.

Note: this adds yet another copy of nfs_realign(). I will attempt to refactor
after NFS_LEGACYRPC is removed.

Submitted by:	sam
2009-05-24 13:22:00 +00:00
bz
3a3f39d482 While r192615 fixed the former problems, make this file VIMAGE
compliant now as well initializing local context variables.
2009-05-23 16:27:42 +00:00
bz
8fc598097f It seems this file was ignored by MRT, rnh locking changes and new-arpv2.
So let the V_irtualization people finally make the disabled debugging code
compile again.

MFC after:	2 weeks
X-MFC:		MRT and adapt rnh locking
2009-05-23 00:07:55 +00:00
rwatson
ccb17e335a Remove the unmaintained University of Michigan NFSv4 client from 8.x
prior to 8.0-RELEASE.  Rick Macklem's new and more feature-rich NFSv234
client and server are replacing it.

Discussed with:	rmacklem
2009-05-22 12:35:12 +00:00
alc
1af8842f56 Eliminate unnecessary clearing of the page's dirty mask from various
getpages functions.

Eliminate a stale comment.
2009-05-15 04:33:35 +00:00
alc
cb76946a7f Eliminate gratuitous clearing of the page's dirty mask. 2009-05-12 05:49:02 +00:00
attilio
1dcb84131b Remove the thread argument from the FSD (File-System Dependent) parts of
the VFS.  Now all the VFS_* functions and relating parts don't want the
context as long as it always refers to curthread.

In some points, in particular when dealing with VOPs and functions living
in the same namespace (eg. vflush) which still need to be converted,
pass curthread explicitly in order to retain the old behaviour.
Such loose ends will be fixed ASAP.

While here fix a bug: now, UFS_EXTATTR can be compiled alone without the
UFS_EXTATTR_AUTOSTART option.

VFS KPI is heavilly changed by this commit so thirdy parts modules needs
to be recompiled.  Bump __FreeBSD_version in order to signal such
situation.
2009-05-11 15:33:26 +00:00
alc
63529fb14b Eliminate stale comments.
Eliminate a case of unnecessary page queues locking.
2009-05-10 17:05:43 +00:00
zec
d78a1b1a82 Change the curvnet variable from a global const struct vnet *,
previously always pointing to the default vnet context, to a
dynamically changing thread-local one.  The currvnet context
should be set on entry to networking code via CURVNET_SET() macros,
and reverted to previous state via CURVNET_RESTORE().  Recursions
on curvnet are permitted, though strongly discuouraged.

This change should have no functional impact on nooptions VIMAGE
kernel builds, where CURVNET_* macros expand to whitespace.

The curthread->td_vnet (aka curvnet) variable's purpose is to be an
indicator of the vnet context in which the current network-related
operation takes place, in case we cannot deduce the current vnet
context from any other source, such as by looking at mbuf's
m->m_pkthdr.rcvif->if_vnet, sockets's so->so_vnet etc.  Moreover, so
far curvnet has turned out to be an invaluable consistency checking
aid: it helps to catch cases when sockets, ifnets or any other
vnet-aware structures may have leaked from one vnet to another.

The exact placement of the CURVNET_SET() / CURVNET_RESTORE() macros
was a result of an empirical iterative process, whith an aim to
reduce recursions on CURVNET_SET() to a minimum, while still reducing
the scope of CURVNET_SET() to networking only operations - the
alternative would be calling CURVNET_SET() on each system call entry.
In general, curvnet has to be set in three typicall cases: when
processing socket-related requests from userspace or from within the
kernel; when processing inbound traffic flowing from device drivers
to upper layers of the networking stack, and when executing
timer-driven networking functions.

This change also introduces a DDB subcommand to show the list of all
vnet instances.

Approved by:	julian (mentor)
2009-05-05 10:56:12 +00:00
rwatson
a200645766 Remove redundant NFSMNT_NFSV3 check in DTrace hooks for NFS RPC.
MFC after:	1 month
2009-05-04 02:19:52 +00:00
rwatson
ae64f087b4 Fix typo in comment.
MFC after:	1 month
2009-05-04 02:06:39 +00:00
kib
bcd3ca9aed Remove trailing spaces 2009-04-13 19:54:33 +00:00
rwatson
fba90f2e03 Remove VOP_LEASE and supporting functions. This hasn't been used since
the removal of NQNFS, but was left in in case it was required for NFSv4.
Since our new NFSv4 client and server can't use it for their
requirements, GC the old mechanism, as well as other unused lease-
related code and interfaces.

Due to its impact on kernel programming and binary interfaces, this
change should not be MFC'd.

Proposed by:    jeff
Reviewed by:    jeff
Discussed with: rmacklem, zach loafman @ isilon
2009-04-10 10:52:19 +00:00
kib
a5ecda6fd6 Cache_lookup() for DOTDOT drops dvp vnode lock, allowing dvp to be reclaimed.
Check the condition and return ENOENT then.

In nfs_lookup(), respect ENOENT return from cache_lookup() when it is caused
by dvp reclaim.

Reported and tested by:	pho
2009-04-10 10:22:44 +00:00
jhb
d6565365e4 When a stale file handle is encountered, purge all cached information about
an NFS node including the access and attribute caches.  Previously the NFS
client only purged any name cache entries associated with the file.

PR:		kern/123755
Submitted by:	Jaakko Heinonen  jh of saunalahti fi
Reported by:	Timo Sirainen  tss of iki fi
Reviewed by:	rwatson, rmacklem
MFC after:	1 month
2009-04-06 21:11:08 +00:00
jhb
54504c4985 Change the default timeout for caching attributes of a directory in the NFS
client from 30 seconds to 3 seconds.  After the recent changes to add
caching of negative name cache lookups, a negative name cache hit will
persist until the client notices the parent directory has changed.  The
higher the attribute cache timeout on directories, the longer that can take,
so lower the default timeout for directories to match that of regular files.

Suggested by:	bde, mohans
MFC after:	1 month
2009-04-06 19:12:47 +00:00
rwatson
e610f292c5 Move dtnfsclient.c in the cddl tree to nfs_kdtrace.c in the nfsclient
directory, since it's under a BSD license, and this keeps NFS internals-
aware tracing parts close to NFS.

MFC after:	1 month
Suggested by:	jhb
2009-03-25 17:47:22 +00:00
rwatson
8f6024ae23 Fix two bugs in DTrace tracing of accesscache and attrcache load events:
- Trace non-error loads into the access cache once, not zero times or
  twice.
- Sometimes attr cache loads fail due to a race, in which case they are
  aborted leading to an invalidation; in this case, trace only the flush,
  not a load.

MFC after:	1 month
Sponsored by:	Google, Inc.
2009-03-24 23:16:48 +00:00
rwatson
f0f3719742 Add DTrace probes to the NFS access and attribute caches. Access cache
events are:

  nfsclient:accesscache:flush:done
  nfsclient:accesscache:get:hit
  nfsclient:accesscache:get:miss
  nfsclient:accesscache:load:done

They pass the vnode, uid, and requested or loaded access mode (if any);
the load event may also report a load error if the RPC fails.

The attribute cache events are:

  nfsclient:attrcache:flush:done
  nfsclient:attrcache:get:hit
  nfsclient:attrcache:get:miss
  nfsclient:attrcache:load:done

They pass the vnode, optionally the vattr if one is present (hit or load),
and in the case of a load event, also a possible RPC error.

MFC after:	1 month
Sponsored by:	Google, Inc.
2009-03-24 17:14:34 +00:00
rwatson
c0055de891 Add dtnfsclient, a first cut at an NFSv2/v3 client reuest DTrace
provider.  The NFS client exposes 'start' and 'done' probes for NFSv2
and NFSv3 RPCs when using the new RPC implementation, passing in the
vnode, mbuf chain, credential, and NFSv2 or NFSv3 procedure number.
For 'done' probes, the error number is also available.

Probes are named in the following way:

  ...
  nfsclient:nfs2:write:start
  nfsclient:nfs2:write:done
  ...
  nfsclient:nfs3:access:start
  nfsclient:nfs3:access:done
  ...

Access to the unmarshalled arguments is not easily available at this
point in the stack, but the passed probe arguments are sufficient to
to a lot of interesting things in practice.  Technically, these probes
may cover multiple RPC retransmits, and even transactions if the
transaction ID change as a result of authentication failure or a
jukebox error from the server, but usefully capture the intent of a
single NFS request, such as access, getattr, write, etc.

Typical use might involve profiling RPC latency by system call, number
of RPCs, how often a getattr leads to a call to access, when failed
access control checks occur, etc.  More detailed RPC information might
best be provided by adding a krpc provider.  It would also be useful
to add NFS client probes for events such as the access cache or
attribute cache satisfying requests without an RPC.

Sponsored by:	Google, Inc.
MFC after:	1 month
2009-03-22 22:07:52 +00:00
rwatson
2f4a716c79 In nfs_request(), always exit using the nfsmout label once we're
definitely doing an NFSv2 or NFSv3 RPC, rather than sometimes doing
so and sometimes not.  This makes it easier to add a DTrace return
probe at a single point in the function.

MFC after:	1 week
2009-03-21 21:49:07 +00:00
jhb
79aaac6e7b Expand the per-node access cache to cache permissions for multiple users.
The number of entries in the cache defaults to 8 but is easily changed in
nfsclient/nfs.h.  When the cache is filled, the oldest cache entry is
evicted when space is needed.

I mirrored the changes to the NFSv[23] client in the NFSv4 client to fix
compile breakage.  However, the NFSv4 client doesn't actually use the
access cache currently.

Submitted by:	rmacklem
2009-03-20 21:12:38 +00:00
jhb
62fc1c4fbf - Remove code to set SAVENAME for CREATE or RENAME requests that get a -ve
hit in the name cache.  cache_lookup() doesn't actually return ENOENT
  for such requests to force the filesystem to do an explicit lookup, so
  this was effectively dead code.
- Grab the nfsnode mutex while writing to n_dmtime.  We don't grab the lock
  when comparing the time against the cached directory mod time (just as
  we don't when comparing ctime's for +ve name cache hits) since the
  attribute caching is already racy for NFS clients as it is.

Discussed with:	bde
2009-03-10 18:41:06 +00:00
bz
df2be82cec For all files including net/vnet.h directly include opt_route.h and
net/route.h.

Remove the hidden include of opt_route.h and net/route.h from net/vnet.h.

We need to make sure that both opt_route.h and net/route.h are included
before net/vnet.h because of the way MRT figures out the number of FIBs
from the kernel option. If we do not, we end up with the default number
of 1 when including net/vnet.h and array sizes are wrong.

This does not change the list of files which depend on opt_route.h
but we can identify them now more easily.
2009-02-27 14:12:05 +00:00
jhb
ccc9172c8d Bring back the code to prime the ACCESS cache when fetching attributes for
an NFS file.  Now the priming is conditional on  a new
vfs.nfs.prime_access_cache sysctl.  For now I've left the default setting
to disabling the priming.

Requested by:	 scottl
2009-02-24 16:01:56 +00:00
jhb
5dc7ef7e69 Enable caching of negative pathname lookups in the NFS client. To avoid
stale entries, we save a copy of the directory's modification time when
the first negative cache entry was added in the directory's NFS node.
When a negative cache entry is hit during a pathname lookup, the parent
directory's modification time is checked.  If it has changed, all of the
negative cache entries for that parent are purged and the lookup falls
back to using the RPC.  This required adding a new cache_purge_negative()
method to the name cache to purge only negative cache entries for a given
directory.

Submitted by:	mohans, Rick Macklem, Ricardo Labiaga @ NetApp
Reviewed by:	mohans
2009-02-19 22:28:48 +00:00
jhb
508874f429 When fetching attributes for a file for NFSv3 mounts, do not perform an
opportunistic ACCESS RPC to populate both the access and attribute caches
of the file and instead always use a GETATTR RPC.  On many modern NFS
servers, an ACCESS RPC is much more expensive to service than a GETATTR
RPC.

Submitted by:	mohans
2009-02-19 22:18:00 +00:00
jhb
76ad79ad2c Don't clear the attribute cache of a file when it is closed. A subsequent
open() of the same file will load fresh attributes, so they do not need to
be explicitly flushed in close() to guarantee close to open consistency.
However, other file desciptors may still reference this file and clearing
the attributes in close() forces those other file descriptors to fetch
fresh attributes the next time they need them.

Reviewed by:	mohans
MFC after:	1 week
2009-02-19 22:10:39 +00:00
jhb
5a3fa82570 Reindent a small bit of code that was not 8-space indented like the rest
of the nfs_lookup() function.
2009-02-18 16:34:13 +00:00
ed
a964306db9 Last step of splitting up minor and unit numbers: remove minor().
Inside the kernel, the minor() function was responsible for obtaining
the device minor number of a character device. Because we made device
numbers dynamically allocated and independent of the unit number passed
to make_dev() a long time ago, it was actually a misnomer. If you really
want to obtain the device number, you should use dev2udev().

We already converted all the drivers to use dev2unit() to obtain the
device unit number, which is still used by a lot of drivers. I've
noticed not a single driver passes NULL to dev2unit(). Even if they
would, its behaviour would make little sense. This is why I've removed
the NULL check.

Ths commit removes minor(), minor2unit() and unit2minor() from the
kernel. Because there was a naming collision with uminor(), we can
rename umajor() and uminor() back to major() and minor(). This means
that the makedev(3) manual page also applies to kernel space code now.

I suspect umajor() and uminor() isn't used that often in external code,
but to make it easier for other parties to port their code, I've
increased __FreeBSD_version to 800062.
2009-01-28 17:57:16 +00:00
rodrigc
89c6c71f34 Fix parsing of acregmin, acregmax, acdirmin and acdirmax NFS mount options
when passed as strings via nmount().

Submitted by: Jaakko Heinonen <jh saunalahti fi>
2009-01-28 07:46:35 +00:00
jhb
47455a7b41 Move the VA_MARKATIME flag for VOP_SETATTR() out into its own VOP:
VOP_MARKATIME() since unlike the rest of VOP_SETATTR(), VA_MARKATIME
can be performed while holding a shared vnode lock (the same functionality
is done internally by VOP_READ which can run with a shared vnode lock).
Add missing locking of the vnode interlock to the ufs implementation and
remove a special note and test from the NFS client about not supporting the
feature.

Inspired by:	ups
Tested by:	pho
2009-01-21 14:42:00 +00:00
bz
604d89458a Rather than using hidden includes (with cicular dependencies),
directly include only the header files needed. This reduces the
unneeded spamming of various headers into lots of files.

For now, this leaves us with very few modules including vnet.h
and thus needing to depend on opt_route.h.

Reviewed by:	brooks, gnn, des, zec, imp
Sponsored by:	The FreeBSD Foundation
2008-12-02 21:37:28 +00:00
dfr
367ae23f20 Switch the default rpc implementation for NFS back to the new code. I believe
I have fixed the reported problems - if you still have trouble with it, please
contact me with as much detail as possible so that I can track down any other
issues as quickly as possible.
2008-11-14 11:27:53 +00:00