Commit Graph

411 Commits

Author SHA1 Message Date
ru
3b1bf8c2e9 Replaced the misleading uses of a historical artefact M_TRYWAIT with M_WAIT.
Removed dead code that assumed that M_TRYWAIT can return NULL; it's not true
since the advent of MBUMA.

Reviewed by:	arch

There are ongoing disputes as to whether we want to switch to directly using
UMA flags M_WAITOK/M_NOWAIT for mbuf(9) allocation.
2008-03-25 09:39:02 +00:00
jeff
a9d123c3ab - Complete part of the unfinished bufobj work by consistently using
BO_LOCK/UNLOCK/MTX when manipulating the bufobj.
 - Create a new lock in the bufobj to lock bufobj fields independently.
   This leaves the vnode interlock as an 'identity' lock while the bufobj
   is an io lock.  The bufobj lock is ordered before the vnode interlock
   and also before the mnt ilock.
 - Exploit this new lock order to simplify softdep_check_suspend().
 - A few sync related functions are marked with a new XXX to note that
   we may not properly interlock against a non-zero bv_cnt when
   attempting to sync all vnodes on a mountlist.  I do not believe this
   race is important.  If I'm wrong this will make these locations easier
   to find.

Reviewed by:	kib (earlier diff)
Tested by:	kris, pho (earlier diff)
2008-03-22 09:15:16 +00:00
dfr
cba668f51c Fix a regression from the last revision - don't edit the ns_rec list while
not holding the lock.
2008-03-19 12:33:25 +00:00
dfr
f46620ae37 Don't call nfs_realign while holding locks.
Reviewed by: kib
2008-03-18 18:42:59 +00:00
kib
02dada141b Fix the Giant leak in the nfsrv_remove().
Reported by:	pluknet <pluknet gmail com>
MFC after:	1 week
2008-03-04 11:05:03 +00:00
remko
c050b3d1bc Use nfsrv_destroycache() only once, else it crashes the server.
PR:		kern/118152
Submitted by:	Bjoern Groenvall <bg at sics dot se>
Approved by:	imp (mentor, a while ago already), jhb
MFC After:	3 days
2008-01-18 17:03:36 +00:00
attilio
71b7824213 VOP_LOCK1() (and so VOP_LOCK()) and VOP_UNLOCK() are only used in
conjuction with 'thread' argument passing which is always curthread.
Remove the unuseful extra-argument and pass explicitly curthread to lower
layer functions, when necessary.

KPI results broken by this change, which should affect several ports, so
version bumping and manpage update will be further committed.

Tested by: kris, pho, Diego Sardina <siarodx at gmail dot com>
2008-01-13 14:44:15 +00:00
attilio
18d0a0dd51 vn_lock() is currently only used with the 'curthread' passed as argument.
Remove this argument and pass curthread directly to underlying
VOP_LOCK1() VFS method. This modify makes the code cleaner and in
particular remove an annoying dependence helping next lockmgr() cleanup.
KPI results, obviously, changed.

Manpage and FreeBSD_version will be updated through further commits.

As a side note, would be valuable to say that next commits will address
a similar cleanup about VFS methods, in particular vop_lock1 and
vop_unlock.

Tested by:	Diego Sardina <siarodx at gmail dot com>,
		Andrea Di Pasquale <whyx dot it at gmail dot com>
2008-01-10 01:10:58 +00:00
rwatson
35dc7f2858 Garbage collect now-unused nfsrv_setcred() -- it's not only unused, but
also a purveyor of unfortunate (and now unsupported) direct frobbing of
struct ucred.

MFC after:	3 days
2007-11-04 19:20:33 +00:00
rwatson
8756317538 Rename mac_associate_nfsd_label() to mac_proc_associate_nfsd(), and move
from mac_vfs.c to mac_process.c to join other functions that setup up
process labels for specific purposes.  Unlike the two proc create calls,
this call is intended to run after creation when a process registers as
the NFS daemon, so remains an _associate_ call..

Obtained from:	TrustedBSD Project
2007-10-25 12:34:14 +00:00
jhb
1f6b3a5f2c Add a -z flag to nfsstat which zeros the NFS statistics after displaying
them.

MFC after:	1 week
Requested by:	ps
Submitted by:	ps (6 years ago)
2007-10-18 16:38:07 +00:00
mohans
11057bb00f Set the NFS server sockbuf high watermarks to the system defaults
(up form 32KB). The low highwatermark setting caused UDP fullsock
request drops, throttling thruput greatly.
Reported by: Kris Kennaway
Approved by: re@ (Ken Smith)
2007-10-12 03:56:27 +00:00
rwatson
23574c8673 Remove the now-unused NET_{LOCK,UNLOCK,ASSERT}_GIANT() macros, which
previously conditionally acquired Giant based on debug.mpsafenet.  As that
has now been removed, they are no longer required.  Removing them
significantly simplifies error-handling in the socket layer, eliminated
quite a bit of unwinding of locking in error cases.

While here clean up the now unneeded opt_net.h, which previously was used
for the NET_WITH_GIANT kernel option.  Clean up some related gotos for
consistency.

Reviewed by:	bz, csjp
Tested by:	kris
Approved by:	re (kensmith)
2007-08-06 14:26:03 +00:00
rwatson
c29e74320b First in a series of changes to remove the now-unused Giant compatibility
framework for non-MPSAFE network protocols:

- Remove debug_mpsafenet variable, sysctl, and tunable.
- Remove NET_NEEDS_GIANT() and associate SYSINITSs used by it to force
  debug.mpsafenet=0 if non-MPSAFE protocols are compiled into the kernel.
- Remove logic to automatically flag interrupt handlers as non-MPSAFE if
  debug.mpsafenet is set for an INTR_TYPE_NET handler.
- Remove logic to automatically flag netisr handlers as non-MPSAFE if
  debug.mpsafenet is set.
- Remove references in a few subsystems, including NFS and Cronyx drivers,
  which keyed off debug_mpsafenet to determine various aspects of their own
  locking behavior.
- Convert NET_LOCK_GIANT(), NET_UNLOCK_GIANT(), and NET_ASSERT_GIANT into
  no-op's, as their entire behavior was determined by the value in
  debug_mpsafenet.
- Alias NET_CALLOUT_MPSAFE to CALLOUT_MPSAFE.

Many remaining references to NET_.*_GIANT() and NET_CALLOUT_MPSAFE are still
present in subsystems, and will be removed in followup commits.

Reviewed by:	bz, jhb
Approved by:	re (kensmith)
2007-07-27 11:59:57 +00:00
rwatson
f938c62f4e Include priv.h to pick up suser(9) definitions, missed in an earlier
commit.

Warnings spotted by:	kris
2007-06-13 22:42:43 +00:00
mjacob
24a416aad5 Init timespec to zero fo quiesce warnings. 2007-06-10 04:42:20 +00:00
rwatson
d1196975a0 Remove MAC Framework access control check entry points made redundant with
the introduction of priv(9) and MAC Framework entry points for privilege
checking/granting.  These entry points exactly aligned with privileges and
provided no additional security context:

- mac_check_sysarch_ioperm()
- mac_check_kld_unload()
- mac_check_settime()
- mac_check_system_nfsd()

Add mpo_priv_check() implementations to Biba and LOMAC policies, which,
for each privilege, determine if they can be granted to processes
considered unprivileged by those two policies.  These mostly, but not
entirely, align with the set of privileges granted in jails.

Obtained from:	TrustedBSD Project
2007-04-22 15:31:22 +00:00
rwatson
32f12b60cc Attempt to rationalize NFS privileges:
- Replace PRIV_NFSD with PRIV_NFS_DAEMON, add PRIV_NFS_LOCKD.

- Use PRIV_NFS_DAEMON in the NFS server.

- In the NFS client, move the privilege check from nfslockdans(), which
  occurs every time a write is performed on /dev/nfslock, and instead do it
  in nfslock_open() just once.  This allows us to avoid checking the saved
  uid for root, and just use the effective on open.  Use PRIV_NFS_LOCKD.
2007-04-21 18:11:19 +00:00
rwatson
35b5232a25 In nfsrv_rcv(), don't reacquire the nfs server lock until after
nfs_realign() has been called, as it may sleep waiting on memory
allocation.

Reported by:	simon
2007-04-15 15:50:50 +00:00
jhb
1ad11dc4f1 - Split out the part of SYSCALL_MODULE_HELPER() that builds a 'struct
sysent' for a new system call into a new MAKE_SYSENT() macro.
- Use MAKE_SYSENT() to build a full sysent for the nfssvc system call in
  the NFS server and use syscall_register() and syscall_deregister() to
  manage the nfssvc system call entry instead of manually frobbing the
  sysent[] array.
2007-04-02 13:53:26 +00:00
jhb
c2c01f044f Initialize vfslocked to 0 before nfsm_srvmtofh() so that the variable is
not used uninitialized in 'nfsmout' if nfsm_srvmtofh() gets an internal
error.

CID:		1766
Found by:	Coverity Prevent (tm)
2007-03-26 15:14:58 +00:00
jeff
d43d58ff45 - Turn all explicit giant acquires into conditional VFS_LOCK_GIANTs.
Only ops which used namei still remained.
 - Implement a scheme for reducing the overhead of tracking which vops
   require giant by constantly reducing the number of recursive giant
   acquires to one, leaving us with only one vfslocked variable.
 - Remove all NFSD lock acquisition and release from the individual nfs
   ops.  Careful examination has shown that they are not required.  This
   greatly simplifies the code.

Sponsored by:	Isilon Systems, Inc.
Discussed with:	rwatson
Tested by:	kkenn
Approved by:	re
2007-03-17 18:18:08 +00:00
wkoszek
d9c0510dba Change these descriptions of memory types used in malloc(9), as their
current, rather long strings make output from vmstat -m look unpleasant.

Approved by:	cognet (mentor)
2007-03-05 00:21:40 +00:00
rwatson
300d4098cf Remove 'MPSAFE' annotations from the comments above most system calls: all
system calls now enter without Giant held, and then in some cases, acquire
Giant explicitly.

Remove a number of other MPSAFE annotations in the credential code and
tweak one or two other adjacent comments.
2007-03-04 22:36:48 +00:00
pjd
cb2d7c85a8 Move vnode-to-file-handle translation from vfs_vptofh to vop_vptofh method.
This way we may support multiple structures in v_data vnode field within
one file system without using black magic.

Vnode-to-file-handle should be VOP in the first place, but was made VFS
operation to keep interface as compatible as possible with SUN's VFS.
BTW. Now Solaris also implements vnode-to-file-handle as VOP operation.

VFS_VPTOFH() was left for API backward compatibility, but is marked for
removal before 8.0-RELEASE.

Approved by:	mckusick
Discussed with:	many (on IRC)
Tested with:	ufs, msdosfs, cd9660, nullfs and zfs
2007-02-15 22:08:35 +00:00
mpp
f66eda706d Get the vfs giant lock before calling nfs_access.
Reviewed by:	mohan
2007-02-13 03:27:45 +00:00
hrs
7c35092b08 The nfsm_srvpathsiz() macro in nfsrv_symlink() in nfs_serv.c should
check length of the pathname in the range 0<=n<=NFS_MAXPATHLEN,
not 0<n<=NFS_MAXPATHLEN.  This fixes a minor interoperability problem
that the FreeBSD NFS server did not allow a symlink pointing the empty
pathname.

MFC after:	1 week
2007-01-02 20:42:08 +00:00
bz
297206ec2a MFp4: 92972, 98913 + one more change
In ip6_sprintf no longer use and return one of eight static buffers
for printing/logging ipv6 addresses.
The caller now has to hand in a sufficiently large buffer as first
argument.
2006-12-12 12:17:58 +00:00
rwatson
65d3526a64 Push Giant a bit further off the NFS server in a number of straight
forward cases by converting from unconditional acquisition of Giant
around vnode operations to conditional acquisition:

- Remove nfsrv_access_withgiant(), and cause nfsrv_access() to now
  assert that Giant will be held if it is required for the vnode.

- Add nfsrv_fhtovp_locked(), which will drop the NFS server lock if
  required, and modify nfsrv_fhtovp() to conditionally acquire
  Giant if required.

- In the VOP's not dealing with more than one vnode at a time (i.e.,
  not involving a lookup), conditionally acquire Giant.

This removes Giant use for MPSAFE file systems for a number of quite
important RPCs, including getattr, read, write.  It leaves
unconditional Giant acquisitions in vnode operations that interact
with the name space or more than one vnode at a time as these
require further work.

Tested by:	kris
Reviewed by:	kib
2006-11-24 11:53:16 +00:00
pjd
62a0bc913e Protect nfsm_srvpathsiz() call with the nfsd_mtx lock.
Reviewed by:	mohans
2006-11-20 07:32:52 +00:00
rwatson
10d0d9cf47 Sweep kernel replacing suser(9) calls with priv(9) calls, assigning
specific privilege names to a broad range of privileges.  These may
require some future tweaking.

Sponsored by:           nCircle Network Security, Inc.
Obtained from:          TrustedBSD Project
Discussed on:           arch@
Reviewed (at least in part) by: mlaier, jmg, pjd, bde, ceri,
                        Alex Lyashkov <umka at sevcity dot net>,
                        Skip Ford <skip dot ford at verizon dot net>,
                        Antoine Brodin <antoine dot brodin at laposte dot net>
2006-11-06 13:42:10 +00:00
kib
bf3aa367e2 Fix leak in NAMEI zone caused by nfs server when VOP_RENAME fails.
Submitted by:	Padma Bhooma <pbhooma at panasas com>
Reviewed by:	bde
Approved by:	pjd (mentor)
MFC after:	1 week
2006-10-26 12:41:53 +00:00
rwatson
7beaaf5cd2 Complete break-out of sys/sys/mac.h into sys/security/mac/mac_framework.h
begun with a repo-copy of mac.h to mac_framework.h.  sys/mac.h now
contains the userspace and user<->kernel API and definitions, with all
in-kernel interfaces moved to mac_framework.h, which is now included
across most of the kernel instead.

This change is the first step in a larger cleanup and sweep of MAC
Framework interfaces in the kernel, and will not be MFC'd.

Obtained from:	TrustedBSD Project
Sponsored by:	SPARTA
2006-10-22 11:52:19 +00:00
jhb
4dc640e56e - Add a new function nfsrv_destroycache() to tear down the server request
cache when unloading the nfsserver module.  This fixes a memory leak and
  a stale pointer.
- Use callout_drain() rather than callout_stop() when unloading the
  nfsserver module.

MFC after:	3 days
2006-08-01 16:27:14 +00:00
jhb
dcdaa35dc6 Use TAILQ_FOREACH_SAFE() in a couple of places. 2006-08-01 15:32:25 +00:00
jhb
c62c38439f Now that all system calls are MPSAFE, retire the SYF_MPSAFE flag used to
mark system calls as being MPSAFE:
- Stop conditionally acquiring Giant around system call invocations.
- Remove all of the 'M' prefixes from the master system call files.
- Remove support for the 'M' prefix from the script that generates the
  syscall-related files from the master system call files.
- Don't explicitly set SYF_MPSAFE when registering nfssvc.
2006-07-28 19:05:28 +00:00
rwatson
40868fda8a soreceive_generic(), and sopoll_generic(). Add new functions sosend(),
soreceive(), and sopoll(), which are wrappers for pru_sosend,
pru_soreceive, and pru_sopoll, and are now used univerally by socket
consumers rather than either directly invoking the old so*() functions
or directly invoking the protocol switch method (about an even split
prior to this commit).

This completes an architectural change that was begun in 1996 to permit
protocols to provide substitute implementations, as now used by UDP.
Consumers now uniformly invoke sosend(), soreceive(), and sopoll() to
perform these operations on sockets -- in particular, distributed file
systems and socket system calls.

Architectural head nod:	sam, gnn, wollman
2006-07-24 15:20:08 +00:00
mohans
798a5b356c Size the NFS server dupreq cache on the basis of nmbclusters. On servers
with low nmbclusters, we tie up too many mbclusters in the NFS duplicate
request cache. This change limits the size of the dupreq cache to 1/2
the nmbclusters (and flaots in a range of [64, 2048]).

MFC after 2 weeks.

Reported by: Steve Kargl, David O'Brien
Tested by:   Steve Kargl
2006-06-23 00:42:26 +00:00
kib
a5b858d3fd Temporary workaround to prevent leak of Giant from nfsd when calling
lookup().

Reviewed by:	tegge
Tested by:	"Arno J. Klaassen" <arno at heho snv jussieu fr>, "Rong-en Fan" <grafan at gmail com>, Dmitriy Kirhlarov <dimma at higis ru>, Dmitry Pryanishnikov <dmitry at atlantis dp ua>
MFC after:	1 week
Approved by:	kan, pjd (mentors)
2006-06-05 14:48:02 +00:00
mohans
38b8fecaba Bump up the NFS server dupreq cache limit to 2K (from 64). With a small
duplicate request cache, under heavy load a lot of non-idempotent requests
were getting served again, resulting in errors.

Found by : Kris Kennaway.
2006-04-25 00:21:56 +00:00
csjp
be495bef58 Introduce a new MAC entry point for label initialization of the NFS daemon's
credential: mac_associate_nfsd_label()

This entry point can be utilized by various Mandatory Access Control policies
so they can properly initialize the label of files which get created
as a result of an NFS operation. This work will be useful for fixing kernel
panics associated with accessing un-initialized or invalid vnode labels.

The implementation of these entry points will come shortly.

Obtained from:	TrustedBSD
Requested by:	mdodd
MFC after:	3 weeks
2006-04-06 23:33:11 +00:00
cel
08249d49bf rick says:
The following bug was just identified in OpenBSD and it looks like the same
bug exists in the other BSDen NFS servers.

A Linux client (don't know which version, but you can look at
	http://bugzilla.kernel.org/show_bug.cgi?id=6256)
does a Setattr of mtime to the server's time, where the file is mode 0664 and
the client user has group access (ie. caller is not the file owner).

The BSD servers fail the Setattr with EPERM, since the VA_UTIMES_NULL flag
isn't set before doing the VOP_SETATTR.

It seems to me that this should be allowed, since it is allowed for a local
utimes(2). If so, the fix is to set VA_UTIMES_NULL for the
"set-time-to-server-time" cases of setting atime and/or mtime.

Submitted by:	rick@snowhite.cis.uoguelph.ca
Reviewed by:	cel
Approved by:	silby
MFC after:	1 week
2006-04-02 04:24:57 +00:00
jeff
32b1878006 - Release the references acquired by VOP_GETWRITEMOUNT and vfs_getvfs().
Discussed with:	tegge
Tested by:	kris
Sponsored by:	Isilon Systems, Inc.
2006-03-31 03:54:20 +00:00
jeff
52c1783c83 - Reorder vrele calls after vput calls to prevent lock order reversals
between leaf and directory locks.

Found by:	kris
Sponsored by:	Isilon Systems, Inc.
2006-03-12 04:59:04 +00:00
simon
edc000b320 When parsing an RPC request in nfsrv_dorec(), KASSERT that there
actually is an mbuf to process.  This catches the missing mbuf before it
would otherwise causes a NULL pointer dereference, which could be
triggered by a 0 length RPC record before the check for such records was
added in rev 1.97.

Approved by:	cperciva (mentor)
2006-03-08 20:21:15 +00:00
simon
1b31e5fc10 Correct a remote kernel panic when processing zero-length RPC records
via TCP. [06:10]

Security:	FreeBSD-SA-06:10.nfs
Approved by:	cperciva
2006-03-01 14:17:32 +00:00
jeff
30a231055b - Reorder calls to vrele() after calls to vput() when the vrele is a
directory.  vrele() may lock the passed vnode, which in these cases would
   give an invalid lock order of child -> parent.  These situations are
   deadlock prone although do not typically deadlock because the vrele
   is typically not releasing the last reference to the vnode.  Users of
   vrele must consider it as a call to vn_lock() and order it appropriately.

MFC After: 	1 week
Sponsored by:	Isilon Systems, Inc.
Tested by:	kkenn
2006-02-01 00:25:26 +00:00
csjp
34b8c6a440 Manage the ucred for the NFS server using the crget/crfree API defined in
kern_prot.c. This API handles reference counting among many other things.
Notably, if MAC is compiled into the kernel, it will properly initialize the
MAC labels when the ucred is allocated.

This work is in preparation for a new MAC entry point which will be responsible
for properly initializing policy specific labels for the NFS server credential.
Utilization of the crfree/crget APIs reduce the complexity associated with
this label's management.

Submitted by:	green (with changes) [1]
Obtained from:	TrustedBSD Project
Discussed with:	rwatson, alfred

[1] I moved the ucred allocation outside the scope of the NFS server lock to
    prevent M_WAIKOK allocations from occurring with non-sleep-able locks held.
    Additionally, to reduce complexity, the ucred persist as long as the NFS
    server descriptor.
2006-01-28 19:24:40 +00:00
trhodes
80610803f5 Revert my previous commit.
Proved I'm not that bright at times:	jhb
2006-01-23 21:06:22 +00:00
trhodes
f927a72593 Fix indentation.
Prodded by:	stefanf, ru, njl (in that order)
2006-01-23 17:41:43 +00:00