Commit Graph

743 Commits

Author SHA1 Message Date
jeff
4d0b3883a4 - Remove interlock protection around VI_XLOCK. The interlock is not
sufficient to guarantee that this race is not hit.  The XLOCK will likely
   have to be redesigned due to the way reference counting and mutexes work
   in FreeBSD.  We currently can not be guaranteed that xlock was not set
   and cleared while we were blocked on the interlock while waiting to check
   for XLOCK.  This would lead us to reference a vnode which was not the
   vnode we requested.
 - Add a backtrace() call inside of INVARIANTS in the hopes of finding out if
   this condition is ever hit.  It should not, since we should be retaining
   a reference to the vnode in these cases.  The reference would be sufficient
   to block recycling.
2003-09-19 23:37:49 +00:00
phk
0e80d17900 Name the vnode method vectors consistently with the rest of the filesystems.
This improves the output of src/tools/tools/vop_table
2003-09-12 16:44:40 +00:00
phk
af43a08ef8 Remove now unused BOOTP tags related to NFS swap device. 2003-09-05 11:12:55 +00:00
dds
a07778264c KNF: parentheses around return values.
Suggested by:	bde
Approved by:	schweikh (mentor - blanket)
MFC after:	6 weeks
2003-09-04 11:27:13 +00:00
dds
c5e451a8b7 Fix errno return values to better represent failure reasons for
read and open.

Approved by:	schweikh (mentor)
Agreed:		bde
MFC after:	6 weeks
2003-09-02 16:46:31 +00:00
phk
8eb928cd77 Remove the magic way of configuring NFS backed swap.
This code dates back to the very first diskless support on FreeBSD,
back when swapon(8) couldn't simply be run on a NFS backed file.

Suggested replacement command sequence on the client:

        dd if=/dev/zero of=/swapfile bs=1k count=1 oseek=100000
        swapon /swapfile
        rm -f /swapfile

For whatever value of 100000 you want.
2003-08-15 12:04:02 +00:00
billf
08d78e9b49 0) preallocate per-interface context structures without the ifnet lock held
1) avoid immediately calling bzero() after malloc() by passing M_ZERO
2) do not initialize individual members of the global context to zero
3) remove an unused assignment of ifctx in bootpc_init()

Reviewed by:	tegge
2003-08-07 21:27:17 +00:00
tjr
f4b299adc0 Fix a problem that occurs when truncating files on NFSv3 mounts: we need
to set np->n_size back to the desired size again after calling
nfs_meta_setsize(), since it could end up in nfs_loadattrcache() getting
called, which would change n_size back to the value it had before the
truncate request was issued. The result of this bug is that the size info
cached in the nfsnode becomes incorrect, lseek(fd, ofs, SEEK_END) seeks
past the end of the file, stat() returns the wrong size, etc.

PR:		41792
MFC after:	2 weeks
2003-07-29 00:17:29 +00:00
phk
d4d7ca154a Add fdidx argument to vn_open() and vn_open_cred() and pass -1 throughout. 2003-07-27 17:04:56 +00:00
phk
b99b564c20 Change idle sleep indentifier to "-" for nfsiod 2003-07-02 08:09:20 +00:00
alc
636a482b8d Lock the vm object when freeing a page. 2003-06-17 05:17:00 +00:00
phk
24cc9156fe Add the same KASSERT to all VOP_STRATEGY and VOP_SPECSTRATEGY implementations
to check that the buffer points to the correct vnode.
2003-06-15 18:53:00 +00:00
phk
fd139fd7d0 Initialize struct vfsops C99-sparsely.
Submitted by:   hmp
Reviewed by:	phk
2003-06-12 20:48:38 +00:00
iedowse
2c04f19896 When removing a sillyrename file, make sure that the directory vnode
has not been cleaned in the meantime, since this can happen during
a forced unmount. Also add a comment that nfs_removeit() should
really be locking the directory vnode before calling nfs_removerpc().

Reported by:	mbr
Tested by:	mbr
MFC after:	1 week
2003-06-12 15:41:20 +00:00
obrien
8b64eb1925 Use __FBSDID(). 2003-06-11 05:37:42 +00:00
rwatson
decffe6132 Add the comment I meant to add about not passing in PCATCH to the
tsleep().  Note the XXX.
2003-06-11 03:32:42 +00:00
hsu
d5ee1a976b On a socket creation error, don't close the socket. 2003-06-09 03:44:34 +00:00
phk
174a772296 Remove unsed variables.
Add explicit breaks to switch

Found by:       FlexeLint
2003-05-31 20:05:25 +00:00
phk
0129a20107 The IO_NOWDRAIN and B_NOWDRAIN hacks are no longer needed to prevent
deadlocks with vnode backed md(4) devices because md now uses a
kthread to run the bio requests instead of doing it directly from
the bio down path.
2003-05-31 16:42:45 +00:00
rwatson
c264d8171d rpc.lockd stability workaround: remove PCATCH from the tsleep() in
nfs_lock.c.  Right now, if we permit a signal to interrupt the sleep,
we will slip the lock and no process on that client, the server, or
any other client will be able to acquire the lock.  This can happen,
for example, if a user hits Ctrl-C or Ctrl-T while a process is
waiting for the lock.  By removing PCATCH, we prevent that from
happening, at the cost of not permitting a user-requested lock abort:
also nasty.  However, a user interface bug might be preferable to a
serious semantic bug, so we go with that for now.

We need to teach the rpc.lockd/kernel protocol how to abort lock
requests, and rpc.lockd how to handle aborted lock requests; patches
for the kernel bit are floating around, but no rpc.lockd bit yet.

Approved by:	re (scottl)
2003-05-30 17:15:56 +00:00
peter
da1b9f9f88 Deal with the possibility of negative available space from the file server
to avoid Bad Things(TM) happening (eg: df crashing with a floating point
exception).

Submitted by:	Harold Gutch <logix@foobar.franken.de>
Approved by:	re (scottl)
2003-05-19 22:35:00 +00:00
rwatson
94ff93f449 This change grabs the vnode lock for NFS client vnodes when calling
VOP_SETATTR() or VOP_GETATTR(); without these locks (a) VFS_DEBUG_LOCKS
will panic, and (b) it may be possible to corrupt entries in the cached
vnode attributes in the nfsnode, since nfsnode attribute cache data is
also protected by the vnode lock.

Approved by:	re (jhb)
Pointed out by:	VFS_DEBUG_LOCKS
2003-05-15 21:12:08 +00:00
jhb
89a4eb17de - Merge struct procsig with struct sigacts.
- Move struct sigacts out of the u-area and malloc() it using the
  M_SUBPROC malloc bucket.
- Add a small sigacts_*() API for managing sigacts structures: sigacts_alloc(),
  sigacts_free(), sigacts_copy(), sigacts_share(), and sigacts_shared().
- Remove the p_sigignore, p_sigacts, and p_sigcatch macros.
- Add a mutex to struct sigacts that protects all the members of the struct.
- Add sigacts locking.
- Remove Giant from nosys(), kill(), killpg(), and kern_sigaction() now
  that sigacts is locked.
- Several in-kernel functions such as psignal(), tdsignal(), trapsignal(),
  and thread_stopped() are now MP safe.

Reviewed by:	arch@
Approved by:	re (rwatson)
2003-05-13 20:36:02 +00:00
des
8ed712ead1 Instead of recording the Unix time in a process when it starts, record the
uptime.  Where necessary, convert it back to Unix time by adding boottime
to it.  This fixes a potential problem in the accounting code, which would
compute the elapsed time incorrectly if the Unix time was stepped during
the lifetime of the process.
2003-05-01 16:59:23 +00:00
kan
9468fdaf14 Deprecate machine/limits.h in favor of new sys/limits.h.
Change all in-tree consumers to include <sys/limits.h>

Discussed on:	standards@
Partially submitted by: Craig Rodrigues <rodrigc@attbi.com>
2003-04-29 13:36:06 +00:00
truckman
79b9eea6e8 VOP_FSYNC() expects to be called with the vnode locked, so lock fvp in
nfs_rename() before calling VOP_FSYNC() and unlock fvp immediately after.

Reviewed by:	bde
2003-04-24 20:39:40 +00:00
peter
88151c4f8c Fix a bug with df on large (>1TB) nfsv3 file servers on 32 bit client
machines where the 'long' number of blocks in struct statfs wont fit.
Instead of chosing an artificial 512 byte block size, simply scale it up
until we avoid an overflow.  NFSv3 reports the sizes in bytes, and the
blocksize is a figment of nfsclient's imagination.
2003-04-24 20:36:32 +00:00
truckman
b8272feca3 Release the vnode interlock in nfs_flush() before calling nfs_sigintr(),
and grab it again later if necessary.  This prevents a lock order reversal
because nfs_sigintr() calls PROC_LOCK().
2003-04-23 02:58:26 +00:00
thomas
7e134f95f3 Revert change 1.201 (removing mapping of VAPPEND to VWRITE).
Instead, use the generic vaccess() operation to determine whether
an operation is permitted. This avoids embedding knowledge on
vnode permission bits such as VAPPEND in the NFS client.

PR:		kern/46515
vaccess() patch submitted by:	"Peter Edwards" <pmedwards@eircom.net>
Approved by:	tjr, roberto (mentor)
2003-03-31 23:26:10 +00:00
jeff
46e6ba39f1 - Move p->p_sigmask to td->td_sigmask. Signal masks will be per thread with
a follow on commit to kern_sig.c
 - signotify() now operates on a thread since unmasked pending signals are
   stored in the thread.
 - PS_NEEDSIGCHK moves to TDF_NEEDSIGCHK.
2003-03-31 22:49:17 +00:00
rwatson
109543a3e5 Add O_NONBLOCK to the vn_open_cred() flags for NFS client locking when
opening the POSIX fifo; convert ENXIO error returns to EOPNOTSUPP.

This improves handling of the case where the /var/run/lock fifo exists
but there is no listener: we immediately return EOPNOTSUPP rather
than blocking until a listener turns up.  This could occur during a
diskless boot before rpc.lockd is loaded, or if the lock file persists
across a reboot following the disabling of rpc.lockd.  This may have
suddenly started to occur due to fifo blocking fixes--previously it
looks like attempts to read on a fifo with no listener would time out
due to insufficient resources.

Reviewed by:	alfred
2003-03-26 19:21:34 +00:00
alfred
5fb77f7c70 req can not be NULL or we'd die.
Sponsored by: RED
2003-03-26 01:46:11 +00:00
tjr
874c219fad Map VAPPEND to VWRITE in nfsspec_access() - VAPPEND is never set in the
mode returned by VOP_GETATTR. This fixes incorrect "Permission denied"
errors when trying to append to a file on an NFSv2 mount.
2003-03-21 05:13:23 +00:00
jeff
f500ebe3c4 - Add a forgotten BUF_LOCK()
Most sincere apologies to:	jake
2003-03-14 05:13:19 +00:00
jeff
4b8b33db8a - Lock the buf before inspecting its contents. 2003-03-13 07:04:11 +00:00
jeff
4de0ae322c - Add a new 'flags' parameter to getblk().
- Define one flag GB_LOCK_NOWAIT that tells getblk() to pass the LK_NOWAIT
   flag to the initial BUF_LOCK().  This will eventually be used in cases
   were we want to use a buffer only if it is not currently in use.
 - Convert all consumers of the getblk() api to use this extra parameter.

Reviwed by:	arch
Not objected to by:	mckusick
2003-03-04 00:04:44 +00:00
njl
5a225ad933 Finish cleanup of vprint() which was begun with changing v_tag to a string.
Remove extraneous uses of vop_null, instead defering to the default op.
Rename vnode type "vfs" to the more descriptive "syncer".
Fix formatting for various filesystems that use vop_print.
2003-03-03 19:15:40 +00:00
des
2756b6c964 More low-hanging fruit: kill caddr_t in calls to wakeup(9) / [mt]sleep(9). 2003-03-02 16:54:40 +00:00
jeff
3c4fe935b7 - The interlock was not being droped in nfs_flush() if the first part of
an if clause was true.  Break the two clauses out into seperate statements
   since they require different actions.

Reported/Tested by:	jake
Spotted by:	jhb
2003-02-26 00:24:19 +00:00
jeff
e28e3bf81c - Properly handle the vnode interlock in nfs_fsync.
Reported by:	phk
2003-02-25 08:50:21 +00:00
jeff
9e4c9a6ce9 - Add an interlock argument to BUF_LOCK and BUF_TIMELOCK.
- Remove the buftimelock mutex and acquire the buf's interlock to protect
   these fields instead.
 - Hold the vnode interlock while locking bufs on the clean/dirty queues.
   This reduces some cases from one BUF_LOCK with a LK_NOWAIT and another
   BUF_LOCK with a LK_TIMEFAIL to a single lock.

Reviewed by:	arch, mckusick
2003-02-25 03:37:48 +00:00
imp
cf874b345d Back out M_* changes, per decision of the TRB.
Approved by: trb
2003-02-19 05:47:46 +00:00
peter
20bddb2776 Get rid of a silly message I added back in Sept 2001 (1.68). 2003-02-18 23:45:01 +00:00
tjr
a7cd813d68 Lock proc while accessing p_siglist, p_sigmask and p_sigignore
in nfs_sigintr().
2003-02-15 08:25:57 +00:00
dillon
0a8a44e0b5 Provide a sysctl to allow defaulting of the connectionless (-c) feature
to mount_nfs.  The sysctl defaults to 1 (paranoid mode).  Setting it to 0
will allow an NFS client to receive replies on a different IP then they
were sent to by default.

Submitted by:	Sean Eric Fagan <sef@kithrup.com>
2003-01-22 19:57:31 +00:00
alfred
bf8e8a6e8f Remove M_TRYWAIT/M_WAITOK/M_WAIT. Callers should use 0.
Merge M_NOWAIT/M_DONTWAIT into a single flag M_NOWAIT.
2003-01-21 08:56:16 +00:00
phk
157437ec08 Since Jeffr made the std* functions the default in rev 1.63 of
kern/vfs_defaults.c it is wrong for the individual filesystems to use
the std* functions as that prevents override of the default.

Found by:       src/tools/tools/vop_table
2003-01-04 08:47:19 +00:00
phk
daf6948653 Convert calls to BUF_STRATEGY to VOP_STRATEGY calls. This is a no-op since
all BUF_STRATEGY did in the first place was call VOP_STRATEGY.
2003-01-03 06:32:15 +00:00
schweikh
86f7487fb6 Fix typos, mostly s/ an / a / where appropriate and a few s/an/and/
Add FreeBSD Id tag where missing.
2002-12-30 21:18:15 +00:00
dillon
fd92c8a195 Abstract-out the constants for the sequential heuristic.
No operational changes.

MFC after:	1 day
2002-12-28 20:37:50 +00:00
hsu
32436a25c0 SMP locking for radix nodes. 2002-12-24 03:03:39 +00:00
alc
1b398f04d9 Avoid holding the vnode interlock around malloc() or free() to prevent a
lock order reversal.

Reviewed by:	jeff
2002-12-23 06:20:41 +00:00
hsu
82e1e3bab0 SMP locking for ifnet list. 2002-12-22 05:35:03 +00:00
dillon
63da09d1e6 do not try to free a mountpoint that we did not allocate.
X-MFC after:	immediately
2002-12-21 20:55:34 +00:00
alfred
8f7431caeb reapply 1.26 through 1.28.
Approved by: re
2002-11-20 15:21:06 +00:00
alfred
e398b8022d forgot about 5.x freeze, backout 1.26 through 1.28 pending re@ appoval. 2002-11-20 10:53:06 +00:00
alfred
8f8a40cefe remove useless casts, unused macros and cleanup a line wrap. 2002-11-20 10:13:04 +00:00
alfred
22ecc18d19 comment and untwist error return logic 2002-11-20 10:06:51 +00:00
alfred
f4e72b4767 Remove an outdated comment complaining about exporting struct ucred
to userspace, I fixed it a while ago.
2002-11-20 10:00:04 +00:00
phk
b9ad3f37bf Don't examine an un-initialized variable.
Spotted by:	FlexeLint.
2002-10-20 21:52:05 +00:00
phk
9b7c9e2c4d Remove extern declarations of stuff which is static in nfs_node.c
Move related macro to nfs_node.c

Spotted by:	FlexeLint
2002-10-20 21:40:55 +00:00
mckusick
25230d4c6a Regularize the vop_stdlock'ing protocol across all the filesystems
that use it. Specifically, vop_stdlock uses the lock pointed to by
vp->v_vnlock. By default, getnewvnode sets up vp->v_vnlock to
reference vp->v_lock. Filesystems that wish to use the default
do not need to allocate a lock at the front of their node structure
(as some still did) or do a lockinit. They can simply start using
vn_lock/VOP_UNLOCK. Filesystems that wish to manage their own locks,
but still use the vop_stdlock functions (such as nullfs) can simply
replace vp->v_vnlock with a pointer to the lock that they wish to
have used for the vnode. Such filesystems are responsible for
setting the vp->v_vnlock back to the default in their vop_reclaim
routine (e.g., vp->v_vnlock = &vp->v_lock).

In theory, this set of changes cleans up the existing filesystem
lock interface and should have no function change to the existing
locking scheme.

Sponsored by:	DARPA & NAI Labs.
2002-10-14 03:20:36 +00:00
mike
8630abe45f Change iov_base's type from char *' to the standard void *'. All
uses of iov_base which assume its type is `char *' (in order to do
pointer arithmetic) have been updated to cast iov_base to `char *'.
2002-10-11 14:58:34 +00:00
scottl
3a150bca9c Some kernel threads try to do significant work, and the default KSTACK_PAGES
doesn't give them enough stack to do much before blowing away the pcb.
This adds MI and MD code to allow the allocation of an alternate kstack
who's size can be speficied when calling kthread_create.  Passing the
value 0 prevents the alternate kstack from being created.  Note that the
ia64 MD code is missing for now, and PowerPC was only partially written
due to the pmap.c being incomplete there.
Though this patch does not modify anything to make use of the alternate
kstack, acpi and usb are good candidates.

Reviewed by:	jake, peter, jhb
2002-10-02 07:44:29 +00:00
jmallett
7a693db242 Back our kernel support for reliable signal queues.
Requested by:	rwatson, phk, and many others
2002-10-01 17:15:53 +00:00
jmallett
068343413c Lock access to the signal queue, and related structures, with PROC_LOCK.
Submitted by:	jhb
2002-09-30 21:15:33 +00:00
jmallett
7bf6052470 Convert use of p_siglist and old SIG*() macros to use <sys/ksiginfo.h>
prototyped functions to get a sigset_t, and further to check for any
queued signals, rather than an empty signal set, to go with the move
to signal queues rather than signal sets.
2002-09-30 20:48:29 +00:00
phk
1dfc2c167f Be consistent about "static" functions: if the function is marked
static in its prototype, mark it static at the definition too.

Inspired by:    FlexeLint warning #512
2002-09-28 17:15:38 +00:00
rwatson
c583effb16 Remove an errant debugging printf that got left in during my last
commit.

Pointed out by:	guido
2002-09-27 00:25:54 +00:00
rwatson
2fed42cd45 Apparently pxeboot passes in a mygateway of non-zero sin length
from DHCP in the event that no gateway is returned from DHCP, breaking
the assumption that we skip the routing insertion of the gateway
if the sin length is zero.  Check also for s_addr of 0 to avoid the
"Oh no, adding my default route failed" panic, making it possible
to pxeboot machines on segments without default routes.  Arguably
this could be a bug in pxeboot, or in the TUNABLE code, but this
makes my boxes boot.
2002-09-26 19:56:43 +00:00
jeff
5c7f8a426d - Lock access to the buf lists.
- Use vrefcnt() where appropriate.
 - Add some locking asserts.
2002-09-25 02:38:43 +00:00
jake
be3bee9396 Moved nfs_diskless setup code from autoconf.c to nfsclient/nfs_diskless.c
so that it is MI.  Allow nfs_mountroot to return an error if the nfs_diskless
struct is not valid, rather than panicing later on.  Call nfs_setup_diskless()
from nfs_mountroot if NFS_ROOT is defined, like bootpc_init().  Removed legacy
root mount support for sparc64, and enabled NFS_ROOT by default.
2002-09-22 00:59:02 +00:00
phk
63d87674c8 Use m_length() instead of home-rolled versions. 2002-09-18 19:44:14 +00:00
njl
0590c43070 Remove all use of vnode->v_tag, replacing with appropriate substitutes.
v_tag is now const char * and should only be used for debugging.

Additionally:
1. All users of VT_NTS now check vfsconf->vf_type VFCF_NETWORK
2. The user of VT_PROCFS now checks for the new flag VV_PROCDEP, which
is propagated by pseudofs to all child vnodes if the fs sets PFS_PROCDEP.

Suggested by:   phk
Reviewed by:    bde, rwatson (earlier version)
2002-09-14 09:02:28 +00:00
phk
549c71a099 Now that we have a cached mount credential in struct mount, use it istead
of a private cached copy.
2002-09-08 15:11:18 +00:00
bde
c1c3f72703 Use `struct uma_zone *' instead of uma_zone_t, so that <sys/uma.h> isn't
a prerequisite.
2002-09-05 14:04:34 +00:00
sobomax
f6cebc0606 Increase size of ifnet.if_flags from 16 bits (short) to 32 bits (int). To avoid
breaking application ABI use unused ifreq.ifru_flags[1] for upper 16 bits in
SIOCSIFFLAGS and SIOCGIFFLAGS ioctl's.

Reviewed by:	-hackers, -net
2002-08-18 07:05:00 +00:00
alfred
6d7e27aceb Remove a case of exposing 'struct ucred' to userspace. Use a struct xucred
for LOCKD_MSG instead.

Requested by: rwatson
2002-08-15 21:52:22 +00:00
rwatson
44404e4547 In order to better support flexible and extensible access control,
make a series of modifications to the credential arguments relating
to file read and write operations to cliarfy which credential is
used for what:

- Change fo_read() and fo_write() to accept "active_cred" instead of
  "cred", and change the semantics of consumers of fo_read() and
  fo_write() to pass the active credential of the thread requesting
  an operation rather than the cached file cred.  The cached file
  cred is still available in fo_read() and fo_write() consumers
  via fp->f_cred.  These changes largely in sys_generic.c.

For each implementation of fo_read() and fo_write(), update cred
usage to reflect this change and maintain current semantics:

- badfo_readwrite() unchanged
- kqueue_read/write() unchanged
  pipe_read/write() now authorize MAC using active_cred rather
  than td->td_ucred
- soo_read/write() unchanged
- vn_read/write() now authorize MAC using active_cred but
  VOP_READ/WRITE() with fp->f_cred

Modify vn_rdwr() to accept two credential arguments instead of a
single credential: active_cred and file_cred.  Use active_cred
for MAC authorization, and select a credential for use in
VOP_READ/WRITE() based on whether file_cred is NULL or not.  If
file_cred is provided, authorize the VOP using that cred,
otherwise the active credential, matching current semantics.

Modify current vn_rdwr() consumers to pass a file_cred if used
in the context of a struct file, and to always pass active_cred.
When vn_rdwr() is used without a file_cred, pass NOCRED.

These changes should maintain current semantics for read/write,
but avoid a redundant passing of fp->f_cred, as well as making
it more clear what the origin of each credential is in file
descriptor read/write operations.

Follow-up commits will make similar changes to other file descriptor
operations, and modify the MAC framework to pass both credentials
to MAC policy modules so they can implement either semantic for
revocation.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-08-15 20:55:08 +00:00
phk
e4f487f25e Introduce typedefs for the member functions of struct vfsops and employ
these in the main filesystems.  This does not change the resulting code
but makes the source a little bit more grepable.

Sponsored by:	DARPA and NAI Labs.
2002-08-13 10:05:50 +00:00
rwatson
b0388fc24a Pass IO_NOMACCHECK to vn_rdwr() in the following checks to prevent
enforcement of MAC policy on the read or write operations:

- In ext2fs, don't enforce MAC on loop-back reads and writes supporting
  directory read operations in lookup(), directory modifications in
  rename(), directory write operations in mkdir(), symlink write
  operations in symlink().

- In the NFS client locking code, perform vn_rdwr() on the NFS locking
  socket without enforcing MAC, since the write is done on behalf of
  the kernel NFS implementation rather than the user process.

- In UFS, don't enforce MAC on loop-back reads and writes supporting
  directory read operations in lookup(), and symlink write operations
  in symlink().

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-08-12 16:43:04 +00:00
jeff
fcdac052f8 - Add a missing VI_UNLOCK to an error case in nfs_flush. 2002-08-05 08:54:29 +00:00
jeff
02517b6731 - Replace v_flag with v_iflag and v_vflag
- v_vflag is protected by the vnode lock and is used when synchronization
   with VOP calls is needed.
 - v_iflag is protected by interlock and is used for dealing with vnode
   management issues.  These flags include X/O LOCK, FREE, DOOMED, etc.
 - All accesses to v_iflag and v_vflag have either been locked or marked with
   mp_fixme's.
 - Many ASSERT_VOP_LOCKED calls have been added where the locking was not
   clear.
 - Many functions in vfs_subr.c were restructured to provide for stronger
   locking.

Idea stolen from:	BSD/OS
2002-08-04 10:29:36 +00:00
alc
2e0c2c9a48 o Lock page queue accesses in nfs_getpages(). 2002-07-21 20:01:32 +00:00
dillon
9d3af4cbd8 Fix a bug nfs_write() related to ^C'ing during a file write on an
interruptable mount.  We were returning from inside the loop without
releasing the rslock.

Submitted by:	Mike Junk <junk@isilon.com>
MFC after:	3 days
2002-07-16 19:43:59 +00:00
jhb
9618cc94df If we get a receive error in nfs_receive() and then get an error trying to
obtain the send lock, we would bogusly try to unlock the send lock before
returning resulting in a panic.  Instead, only unlock the send lock if
nfs_sndlock() succeeds and nfs_reconnect() fails.

MFC after:	3 days
Sponsored by:	The Weather Channel
2002-07-16 15:12:07 +00:00
alfred
df766765ba Add IPv6 support.
Submitted by: Jean-Luc Richier <Jean-Luc.Richier@imag.fr>
2002-07-15 19:40:23 +00:00
dillon
0b74a2da00 Convert old style (type foo *)0 casts to NULLs
PR:		kern/40360
Requested by:	Hiten PAndya via direct email
2002-07-11 17:54:58 +00:00
dillon
da4e111a55 Replace the global buffer hash table with per-vnode splay trees using a
methodology similar to the vm_map_entry splay and the VM splay that Alan
Cox is working on.  Extensive testing has appeared to have shown no
increase in overhead.

Disadvantages
    Dirties more cache lines during lookups.

    Not as fast as a hash table lookup (but still N log N and optimal
    when there is locality of reference).

Advantages
    vnode->v_dirtyblkhd is now perfectly sorted, making fsync/sync/filesystem
    syncer operate more efficiently.

    I get to rip out all the old hacks (some of which were mine) that tried
    to keep the v_dirtyblkhd tailq sorted.

    The per-vnode splay tree should be easier to lock / SMPng pushdown on
    vnodes will be easier.

    This commit along with another that Alan is working on for the VM page
    global hash table will allow me to implement ranged fsync(), optimize
    server-side nfs commit rpcs, and implement partial syncs by the
    filesystem syncer (aka filesystem syncer would detect that someone is
    trying to get the vnode lock, remembers its place, and skip to the
    next vnode).

Note that the buffer cache splay is somewhat more complex then other splays
due to special handling of background bitmap writes (multiple buffers with
the same lblkno in the same vnode), and B_INVAL discontinuities between the
old hash table and the existence of the buffer on the v_cleanblkhd list.

Suggested by: alc
2002-07-10 17:02:32 +00:00
jhb
8969d48c6a In namei(), we use a NULL thread for uio_td when doing a VOP_READLINK().
nfs_readlink() calls nfs_bioread() which passes in uio_td as the thread
argument to nfs_getcacheblk().  In nfs_getcacheblk() we dereference the
thread pointer to get a process pointer to pass to nfs_sigintr().  This
obviously results in a panic. :)

Rather than change nfs_getcacheblk() to check if the thread pointer is
NULL when calling nfs_sigintr() like other callers do, change
nfs_sigintr() to take a thread as the last argument instead of a
process so none of the callers have to care if the thread is NULL or not.
2002-06-28 21:53:08 +00:00
tanimura
e6fa9b9e92 Back out my lats commit of locking down a socket, it conflicts with hsu's work.
Requested by:	hsu
2002-05-31 11:52:35 +00:00
dd
90158e3b68 Don't tsleep() with an sb_mtx held. 2002-05-27 05:20:15 +00:00
peter
6fd2a8cc3f Fix warning; deprecated use of label at end of compound statement 2002-05-24 05:50:28 +00:00
tanimura
92d8381dd5 Lock down a socket, milestone 1.
o Add a mutex (sb_mtx) to struct sockbuf. This protects the data in a
  socket buffer. The mutex in the receive buffer also protects the data
  in struct socket.

o Determine the lock strategy for each members in struct socket.

o Lock down the following members:

  - so_count
  - so_options
  - so_linger
  - so_state

o Remove *_locked() socket APIs.  Make the following socket APIs
  touching the members above now require a locked socket:

 - sodisconnect()
 - soisconnected()
 - soisconnecting()
 - soisdisconnected()
 - soisdisconnecting()
 - sofree()
 - soref()
 - sorele()
 - sorwakeup()
 - sotryfree()
 - sowakeup()
 - sowwakeup()

Reviewed by:	alfred
2002-05-20 05:41:09 +00:00
ambrisko
0f6f0cccbe Add TAG_VENDOR_INDENTIFIER (option 60) to our DHCP request done by the
kernel BOOTP option.  The format will be:
	FreeBSD:<MACHINE>:<osrelease>
this way people can tune their DHCP server to server up root file systems
via the OS, machine type and version.

Obtained from:	NetBSD
MFC after:	3 weeks
2002-05-17 20:18:48 +00:00
trhodes
28d42899b7 More s/file system/filesystem/g 2002-05-16 21:28:32 +00:00
phk
5549556e4f We don't need the arp kludge any more. 2002-04-28 18:29:44 +00:00
iedowse
bae478cc81 Remove the nfs_{lock,unlock,islocked} functions and the associated
definitions; they have been unused and #if 0'd out since the Lite/2
merge and we are unlikely to want them in the future.
2002-04-27 22:10:16 +00:00
iedowse
64322dabea The recent NFS forced unmount improvements introduced a side-effect
where some client operations might be unexpectedly cancelled during
an unsuccessful non-forced unmount attempt. This causes problems
for amd(8), because it periodically attempts a non-forced unmount
to check if the filesystem is still in use.

Fix this by adding a new mountpoint flag MNTK_UNMOUNTF that is set
only during the operation of a forced unmount. Use this instead of
MNTK_UNMOUNT to trigger the cancellation of hung NFS operations.

Also correct a problem where dounmount() might inadvertently clear
the MNTK_UNMOUNT flag.

Reported by:	simokawa
MFC after:	1 week
2002-04-17 01:07:29 +00:00
jhb
dc2e474f79 Change the suser() API to take advantage of td_ucred as well as do a
general cleanup of the API.  The entire API now consists of two functions
similar to the pre-KSE API.  The suser() function takes a thread pointer
as its only argument.  The td_ucred member of this thread must be valid
so the only valid thread pointers are curthread and a few kernel threads
such as thread0.  The suser_cred() function takes a pointer to a struct
ucred as its first argument and an integer flag as its second argument.
The flag is currently only used for the PRISON_ROOT flag.

Discussed on:	smp@
2002-04-01 21:31:13 +00:00
jeff
5cc8ffe0d4 Remove references to vm_zone.h and switch over to the new uma API. 2002-03-20 10:07:52 +00:00
luigi
d4a3339ee0 Add a readonly sysctl variable of type string, kern.bootp_cookie,
which is initialized with whatever string a dhcp/bootp server passes
as vendor tag 134.
There is no standard tag that I know with this information, and
no vendor-defined tag that applies to FreeBSD that I could find
doing the same thing.

The intended use is to pass information to userland for run-time
configuration of a diskless client without having to run a bootp/dhcp
client for the third time (after the one in pxeboot/etherboot, and
the one in the kernel bootp), also because these clients generally
screwup the interface configuration, which is not exactly what you
want when you have your disks nfs-mounted.

Manpage update to follow soon.

MFC-after: 3 days
2002-03-13 09:23:11 +00:00
phk
0b8d3eb375 vhold() our vnode while checking the remote side.
This is belived to be the only place where a soft reference to a vnode
is held with no sort of hard reference, consequently this change should
allow us to free(9) vnodes from the freelist after properly cleaning
them up.

Reviewed by:	dillon
2002-03-08 13:43:43 +00:00
peter
0535cd31ee Fix warnings.. bootpc_init() and related. 2002-02-28 03:07:35 +00:00
jhb
b8b3ac8816 Use thread0.td_ucred instead of proc0.p_ucred. This change is cosmetic
and isn't strictly required.  However, it lowers the number of false
positives found when grep'ing the kernel sources for p_ucred to ensure
proper locking.
2002-02-27 19:18:10 +00:00
jhb
3706cd3509 Simple p_ucred -> td_ucred changes to start using the per-thread ucred
reference.
2002-02-27 18:32:23 +00:00
peter
5830d22764 Fix a long line touched in previous commit (but not caused by previous
commit)
2002-02-07 23:03:41 +00:00
julian
b5eb64d6f0 Pre-KSE/M3 commit.
this is a low-functionality change that changes the kernel to access the main
thread of a process via the linked list of threads rather than
assuming that it is embedded in the process. It IS still embeded there
but remove all teh code that assumes that in preparation for the next commit
which will actually move it out.

Reviewed by: peter@freebsd.org, gallatin@cs.duke.edu, benno rice,
2002-02-07 20:58:47 +00:00
peter
f71468f39b Revise the nfsiod auto tuning code. Now both the upper and lower limits
are specifyable by sysctl and are respected.

Submitted by:	Maxime Henrion <mux@sneakerz.org>
2002-01-15 20:57:21 +00:00
peter
08d32da0a5 Implement vfs.nfs.iodmin (minimum number of nfsiod's) and
vfs.nfs.iodmaxidle (idle time before nfsiod's exit).  Make it adaptive
so that we create nfsiod's on demand and they go away after not being
used for a while.  The upper limit is NFS_MAXASYNCDAEMON (currently 20).
More will be done here, but this is a useful checkpoint.

Submitted by:	Maxime Henrion <mux@qualys.com>
2002-01-14 02:13:46 +00:00
iedowse
2d507f0adf Terminate requests in nfs_sigintr() if the filesystem is in the
process of being unmounted. This allows forced NFS unmounts to
complete even if there are processes stuck holding the mnt_lock
while the server is down. The mechanism is not ideal in that there
is a small chance we might accidentally cancel requests during a
failed non-forced unmount attempt on that filesystem, but this
is not really a big problem.

Also, move the tsleep() in nfs_nmcancelreqs() so that we do not
sleep in the case where there are no requests to be cancelled.
2002-01-10 02:15:35 +00:00
iedowse
e90d2d4ddf Permit NFS filesystems to be forcibly unmounted when the server is
down, even if there are hung processes and the mount is non-
interruptible.

This works by having nfs_unmount call a new function nfs_nmcancelreqs()
in the FORCECLOSE case. It scans the list of outstanding requests
and marks as interrupted any requests belonging to the specified
mount. Then it waits up to 30 seconds for all requests to terminate.
A few other changes are necessary to support this:
- Unconditionally set a socket timeout so that even hard mounts
  are guaranteed to occasionally check the R_SOFTTERM flag on
  requests. For hard mounts this flag can only be set by
  nfs_nmcancelreqs().
- Reject requests on a mount that is currently being unmounted.
- Never grant the receive lock to a request that has been cancelled.

This should also avoid an old problem where a forced NFS unmount
could cause a crash; it occurred when a VOP on an unlocked vnode
(usually VOP_GETATTR) was in progress at the time of the forced
unmount.
2002-01-02 00:41:26 +00:00
alc
9da90558a5 o Remove an errant ';' introduced in the last revision.
o Remove an unused variable.
2002-01-01 19:44:01 +00:00
rwatson
4f087f57b5 o Remove premature use of nmp->nm_cred, it hasn't been initialized yet. 2002-01-01 16:17:55 +00:00
rwatson
85fc04400d o Pass td into nfs_mountroot() to eliminate an XXX'd curthread use.
Since it's in the parent function anyway, might as well pass it
  another layer down.

Obtained from:	TrustedBSD Project
2001-12-31 21:00:00 +00:00
rwatson
9348f9cada o Remove premature leakage of use of td_ucred from base source tree:
instead, use td->td_proc->p_ucred.
2001-12-31 20:56:59 +00:00
rwatson
70a29b1e5a o Add missing #include's of sys/proc.h, missed in merge, required to
dereference td->td_proc->p_ucred.
2001-12-31 20:05:26 +00:00
rwatson
5eea21ccca o Make the credential used by socreate() an explicit argument to
socreate(), rather than getting it implicitly from the thread
  argument.

o Make NFS cache the credential provided at mount-time, and use
  the cached credential (nfsmount->nm_cred) when making calls to
  socreate() on initially connecting, or reconnecting the socket.

This fixes bugs involving NFS over TCP and ipfw uid/gid rules, as well
as bugs involving NFS and mandatory access control implementations.

Reviewed by:	freebsd-arch
2001-12-31 17:45:16 +00:00
iedowse
fb3ea25673 Add a #define for the size of the nfs_backoff[] array, and use this
instead of magic constants in the code.
2001-12-30 18:41:52 +00:00
ambrisko
c79d7cebd4 Increase the buffer size to hold a bootp/DHCP reply from 256 bytes to
1222 bytes (derived as the maximum that isc-dhcpd uses).  This solves
the problem if a bootp/DHCP reply is over 256 bytes in which the
end of the bootp/DHCP reply will not be found and then the reply will
be ignored.  This happens when swap and root paths are longish or many
parameters are set.

Reviewed by: imp
Approved by: imp
2001-12-30 02:35:09 +00:00
dillon
2cc743e124 nfs_nget() does no locking whatsoever when looking up a vnode. If the
vget() sleeps we have to retry the operation to avoid racing against
a deletion.

MFC maybe: submitted to re's
2001-12-27 19:40:34 +00:00
iedowse
6e9f1df98f Avoid passing the variable `tl' to functions that just use it for
temporary storage. In the old NFS code it wasn't at all clear if
the value of `tl' was used across or after macro calls, but I'm
fairly confident that the convention was to keep its use local.
Each ex-macro function now uses a local version of this variable,
so all of the double-indirection goes away.

The only exception to the `local use' rule for `tl' is nfsm_clget(),
which is left unchanged by this commit.

Reviewed by:	peter
2001-12-18 01:22:09 +00:00
dillon
cd4d323ad3 This fixes a large number of bugs in our NFS client side code. A recent
commit by Kirk also fixed a softupdates bug that could easily be triggered
by server side NFS.

	* An edge case with shared R+W mmap()'s and truncate whereby
	  the system would inappropriately clear the dirty bits on
	  still-dirty data.  (applicable to all filesystems)

	  THIS FIX TEMPORARILY DISABLED PENDING FURTHER TESTING.
	  see vm/vm_page.c line 1641

	* The straddle case for VM pages and buffer cache buffers when
	  truncating.  (applicable to NFS client side)

	* Possible SMP database corruption due to vm_pager_unmap_page()
	  not clearing the TLB for the other cpu's.  (applicable to NFS
	  client side but could effect all filesystems).  Note: not
	  considered serious since the corruption occurs beyond the file
	  EOF.

	* When flusing a dirty buffer due to B_CACHE getting cleared,
	  we were accidently setting B_CACHE again (that is, bwrite() sets
	  B_CACHE), when we really want it to stay clear after the write
	  is complete.  This resulted in a corrupt buffer.  (applicable
	  to all filesystems but probably only triggered by NFS)

	* We have to call vtruncbuf() when ftruncate()ing to remove
	  any buffer cache buffers.  This is still tentitive, I may
	  be able to remove it due to the second bug fix.  (applicable
	  to NFS client side)

	* vnode_pager_setsize() race against nfs_vinvalbuf()... we have
	  to set n_size before calling nfs_vinvalbuf or the NFS code
	  may recursively vnode_pager_setsize() to the original value
	  before the truncate.  This is what was causing the user mmap
	  bus faults in the nfs tester program.  (applicable to NFS
	  client side)

	* Fix to softupdates (see ufs/ffs/ffs_inode.c 1.73, commit made
	  by Kirk).

Testing program written by: Avadis Tevanian, Jr.
Testing program supplied by: jkh / Apple (see Dec2001 posting to freebsd-hackers with Subject 'NFS: How to make FreeBS fall on its face in one easy step')
MFC after:	1 week
2001-12-14 01:16:57 +00:00
rwatson
08704afd44 o Modify nfslockdans() to accept a thread reference instead of a proc
reference: with td->td_ucred, it will be desirable to authorize
  based on td->td_ucred, rather than p->p_ucred.
o Since the same variable 'p' was later used with pfind() on the target
  process for the wakeup, introduce a new local variable 'targetp'
  to use instead.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2001-11-14 18:20:45 +00:00
alfred
fa9d19d5b5 Allow users to use the 'nolockd' or -L options with mount_nfs in order
to avoid the need for rpc.lockd to perform client locks.  Using
this option a user can revert back to using local locks for NFS mounts
like we did before we had rpc.lockd.
2001-11-12 02:33:52 +00:00
alfred
015f13094a turn vn_open() into a wrapper around vn_open_cred() which allows
one to perform a vn_open using temporary/other/fake credentials.

Modify the nfs client side locking code to use vn_open_cred() passing
proc0's ucred instead of the old way which was to temporary raise
privs while running vn_open().  This should close the race hopefully.
2001-11-11 22:39:07 +00:00
dillon
1147eaf58a Implement IO_NOWDRAIN and B_NOWDRAIN - prevents the buffer cache from blocking
in wdrain during a write.  This flag needs to be used in devices whos
strategy routines turn-around and issue another high level I/O, such as
when MD turns around and issues a VOP_WRITE to vnode backing store, in order
to avoid deadlocking the dirty buffer draining code.

Remove a vprintf() warning from MD when the backing vnode is found to be
in-use.  The syncer of buf_daemon could be flushing the backing vnode at
the time of an MD operation so the warning is not correct.

MFC after:	1 week
2001-11-05 18:48:54 +00:00
rwatson
1704b54dc9 o Note an additional potential problem here: LOCKD_MSG directly exports
struct ucred to userland.  In 5.0-CURRENT, it is desirable to instead
  export struct xucred, as ucred contains mutexes, pointers, and other
  kernel evil.  I'll add it to my work queue.
2001-10-24 02:48:38 +00:00
rwatson
337c917faf o Add two comments identifying problems with the current nfs_lock.c
implementation, so that the information doesn't get lost.
  (1) /var/run/lock is looked up relative to the current thread's root
      directory, but it's not clear that's desirable.
  (2) A race condition associated with live credential modification on
      a shared credential is present when privilege is granted for
      the purposes of talking to /var/run/lock.
2001-10-23 19:11:31 +00:00
dillon
45a6fabe87 Change the vnode list under the mount point from a LIST to a TAILQ
in preparation for an implementation of limiting code for kern.maxvnodes.

MFC after:	3 days
2001-10-23 01:21:29 +00:00
jhb
4806d88677 Change the kernel's ucred API as follows:
- crhold() returns a reference to the ucred whose refcount it bumps.
- crcopy() now simply copies the credentials from one credential to
  another and has no return value.
- a new crshared() primitive is added which returns true if a ucred's
  refcount is > 1 and false (0) otherwise.
2001-10-11 23:38:17 +00:00
jhb
daacd5aa55 Use crhold() instead of crdup() since we aren't modifying the cred but
just need to ensure it remains immutable.
2001-10-09 16:48:57 +00:00
peter
fd12502e1d Make this compile after last commit. It should be:
"td ? td->td_proc : NULL", not "td ? td->td_proc, NULL"
2001-10-09 02:40:45 +00:00
julian
5596973a17 Don't dereference td if it's NULL.
Submitted by:	Alexander N. Kabaev <ak03@gte.com>
2001-10-08 23:47:44 +00:00
peter
562ebdfbed Unwind some more macros. NFSMADV() was kinda silly since it was right
next to equivalent m_len adjustments.  Move the nfsm_subs.h macros
into groups depending on which phase they are used in, since that
affects the error recovery requirements.  Collect some of the common error
checking into a single macro as preparation for unwinding some more.
Have nfs_rephead return a value instead of secretly modifying args.
Remove some unused function arguments that were being passed around.
Clarify nfsm_reply()'s error handling (I hope).
2001-09-28 04:37:08 +00:00
peter
2854bb2840 Make nfsm_dissect() have an obvious return value. 2001-09-27 22:40:38 +00:00
peter
bc122022f9 Tidy up nfsm_build usage. This is only partially finished. 2001-09-27 02:33:36 +00:00
iedowse
879c2b08b5 Add a missing dereference level. This caused nfsm_postop_attr_xx()
to try and extract node attributes from an RPC reply even if none
were present.

Reviewed by:	peter
2001-09-25 00:00:33 +00:00
peter
f6cc549f2c Add the magic marker so that loader and kldload(2) can find this in
module form automagically.
2001-09-20 04:57:34 +00:00
peter
afb77dde2c Oops. Fix a missing indirection level. gcc didn't complain about it on
x86, but did complain about it on alpha (since int and pointer are
different sizes)
2001-09-20 03:45:51 +00:00
peter
09d2b9e4f7 Sigh, Last minute pre-merge typo. (missing quotes) 2001-09-18 23:49:33 +00:00
peter
85182a8d78 Cleanup and split of nfs client and server code.
This builds on the top of several repo-copies.
2001-09-18 23:32:09 +00:00
imp
26847c44d7 nfs_strategy calls nfs_asyncio with td as NULL. So add a bandaid that
will pass NULL as the struct proc when td is NULL.  This has stopped
crashing on my machine.

Note: The passing of NULL may be bogus, but I'll let others fix that
problem.

Reviewed by: jhb
2001-09-18 18:37:52 +00:00
peter
2392b3448b Sync some differences that were different between the copies of the files
that were in nfs/nfs.h and nfsserver/nfs.h in the p4 tree.
2001-09-15 04:41:56 +00:00
julian
5596676e6c KSE Milestone 2
Note ALL MODULES MUST BE RECOMPILED
make the kernel aware that there are smaller units of scheduling than the
process. (but only allow one thread per process at this time).
This is functionally equivalent to teh previousl -current except
that there is a thread associated with each process.

Sorry john! (your next MFC will be a doosie!)

Reviewed by: peter@freebsd.org, dillon@freebsd.org

X-MFC after:    ha ha ha ha
2001-09-12 08:38:13 +00:00
kris
bd6f9cb9b6 Fix some signed/unsigned integer confusion, and add bounds checking of
arguments to some functions.

Obtained from:	NetBSD
Reviewed by:	peter
MFC after:	2 weeks
2001-09-10 11:28:07 +00:00
dillon
6b8714e0aa Pushdown Giant for nfs syscalls (nfssvc()) 2001-08-31 22:39:36 +00:00
ache
6e290545e1 Stupid error from my side in prev. commit: || -> && 2001-08-23 18:02:29 +00:00
ache
86b9c46400 Implement l_len<0 per POSIX check.
Check for valid l_whence too.
2001-08-23 16:13:59 +00:00
ache
34f1fd94b4 Even better move: suppose that server is able to handle SEEK_END,
so check arguments for all but not SEEK_END case, leaving SEEK_END
handling for server
2001-08-23 14:21:26 +00:00
ache
aafa17c550 Apparently SEEK_END locking not supported by NFS. Previous variant
returns EINVAL in that case, change it to EOPNOTSUPP.
2001-08-23 14:09:16 +00:00
ache
2879f02ee4 Move <machine/*> after <sys/*>
Pointed by:	bde
2001-08-23 13:27:58 +00:00
ache
e955b0b735 adv. lock:
detect off_t overflow _before_ it occurse and return EOVERFLOW instead of
EINVAL
2001-08-23 08:20:21 +00:00
iedowse
a39dd4a8a2 Fix a client-side memory leak in nfs_flush(). The code allocates
a temporary array to store struct buf pointers if the list doesn't
fit in a local array. Usually it frees the array when finished,
but if it jumps to the 'again' label and the new list does fit in
the local array then it can forget to free a previously malloc'd
M_TEMP memory.

Move the free() up a line so that it frees any previously allocated
memory whether or not it needs to malloc a new array.

Reviewed by:	dillon
2001-08-01 10:25:13 +00:00
peter
4763bc528e Check the filehandle size when mounting.
Obtained from:  Constantine Sapuntzakis <csapuntz@openbsd.org>
2001-07-30 20:01:59 +00:00
jhb
0b11844c1a - Sort includes.
- Update vmmeter statistics for vnode pagein/pageouts in getpages/putpages.
2001-07-04 20:14:59 +00:00
dillon
e028603b7e With Alfred's permission, remove vm_mtx in favor of a fine-grained approach
(this commit is just the first stage).  Also add various GIANT_ macros to
formalize the removal of Giant, making it easy to test in a more piecemeal
fashion. These macros will allow us to test fine-grained locks to a degree
before removing Giant, and also after, and to remove Giant in a piecemeal
fashion via sysctl's on those subsystems which the authors believe can
operate without Giant.
2001-07-04 16:20:28 +00:00
jhb
ab91beada7 - Protect the mnt_vnode list with the mntvnode lock.
- Use queue(9) macros.
2001-06-28 04:10:07 +00:00
jake
d729aaf555 Unlock the process returned from pfind() if it does not return NULL.
This fixes a witness lock violation for nfssvc returning with locks
held.

Submitted by:	Jean-Luc Richier <Jean-Luc.Richier@imag.fr>
PR:		kern/27776
2001-06-01 01:30:51 +00:00
rwatson
f504530d9f o Merge contents of struct pcred into struct ucred. Specifically, add the
real uid, saved uid, real gid, and saved gid to ucred, as well as the
  pcred->pc_uidinfo, which was associated with the real uid, only rename
  it to cr_ruidinfo so as not to conflict with cr_uidinfo, which
  corresponds to the effective uid.
o Remove p_cred from struct proc; add p_ucred to struct proc, replacing
  original macro that pointed.
  p->p_ucred to p->p_cred->pc_ucred.
o Universally update code so that it makes use of ucred instead of pcred,
  p->p_ucred instead of p->p_pcred, cr_ruidinfo instead of p_uidinfo,
  cr_{r,sv}{u,g}id instead of p_*, etc.
o Remove pcred0 and its initialization from init_main.c; initialize
  cr_ruidinfo there.
o Restruction many credential modification chunks to always crdup while
  we figure out locking and optimizations; generally speaking, this
  means moving to a structure like this:
        newcred = crdup(oldcred);
        ...
        p->p_ucred = newcred;
        crfree(oldcred);
  It's not race-free, but better than nothing.  There are also races
  in sys_process.c, all inter-process authorization, fork, exec, and
  exit.
o Remove sigio->sio_ruid since sigio->sio_ucred now contains the ruid;
  remove comments indicating that the old arrangement was a problem.
o Restructure exec1() a little to use newcred/oldcred arrangement, and
  use improved uid management primitives.
o Clean up exit1() so as to do less work in credential cleanup due to
  pcred removal.
o Clean up fork1() so as to do less work in credential cleanup and
  allocation.
o Clean up ktrcanset() to take into account changes, and move to using
  suser_xxx() instead of performing a direct uid==0 comparision.
o Improve commenting in various kern_prot.c credential modification
  calls to better document current behavior.  In a couple of places,
  current behavior is a little questionable and we need to check
  POSIX.1 to make sure it's "right".  More commenting work still
  remains to be done.
o Update credential management calls, such as crfree(), to take into
  account new ruidinfo reference.
o Modify or add the following uid and gid helper routines:
      change_euid()
      change_egid()
      change_ruid()
      change_rgid()
      change_svuid()
      change_svgid()
  In each case, the call now acts on a credential not a process, and as
  such no longer requires more complicated process locking/etc.  They
  now assume the caller will do any necessary allocation of an
  exclusive credential reference.  Each is commented to document its
  reference requirements.
o CANSIGIO() is simplified to require only credentials, not processes
  and pcreds.
o Remove lots of (p_pcred==NULL) checks.
o Add an XXX to authorization code in nfs_lock.c, since it's
  questionable, and needs to be considered carefully.
o Simplify posix4 authorization code to require only credentials, not
  processes and pcreds.  Note that this authorization, as well as
  CANSIGIO(), needs to be updated to use the p_cansignal() and
  p_cansched() centralized authorization routines, as they currently
  do not take into account some desirable restrictions that are handled
  by the centralized routines, as well as being inconsistent with other
  similar authorization instances.
o Update libkvm to take these changes into account.

Obtained from:	TrustedBSD Project
Reviewed by:	green, bde, jhb, freebsd-arch, freebsd-audit
2001-05-25 16:59:11 +00:00
jhb
c1ce7745c1 Assert Giant is held by the caller rather than getting it and releasing
it in getpages/putpages.
2001-05-23 22:26:05 +00:00
ru
35437d86aa - FDESC, FIFO, NULL, PORTAL, PROC, UMAP and UNION file
systems were repo-copied from sys/miscfs to sys/fs.

- Renamed the following file systems and their modules:
  fdesc -> fdescfs, portal -> portalfs, union -> unionfs.

- Renamed corresponding kernel options:
  FDESC -> FDESCFS, PORTAL -> PORTALFS, UNION -> UNIONFS.

- Install header files for the above file systems.

- Removed bogus -I${.CURDIR}/../../sys CFLAGS from userland
  Makefiles.
2001-05-23 09:42:29 +00:00
alfred
a3f0842419 Introduce a global lock for the vm subsystem (vm_mtx).
vm_mtx does not recurse and is required for most low level
vm operations.

faults can not be taken without holding Giant.

Memory subsystems can now call the base page allocators safely.

Almost all atomic ops were removed as they are covered under the
vm mutex.

Alpha and ia64 now need to catch up to i386's trap handlers.

FFS and NFS have been tested, other filesystems will need minor
changes (grabbing the vm lock when twiddling page properties).

Reviewed (partially) by: jake, jhb
2001-05-19 01:28:09 +00:00
iedowse
dafd513732 Change the second argument of vflush() to an integer that specifies
the number of references on the filesystem root vnode to be both
expected and released. Many filesystems hold an extra reference on
the filesystem root vnode, which must be accounted for when
determining if the filesystem is busy and then released if it isn't
busy. The old `skipvp' approach required individual filesystem
xxx_unmount functions to re-implement much of vflush()'s logic to
deal with the root vnode.

All 9 filesystems that hold an extra reference on the root vnode
got the logic wrong in the case of forced unmounts, so `umount -f'
would always fail if there were any extra root vnode references.
Fix this issue centrally in vflush(), now that we can.

This commit also fixes a vnode reference leak in devfs, which could
result in idle devfs filesystems that refuse to unmount.

Reviewed by:	phk, bp
2001-05-16 18:04:37 +00:00
markm
bcca5847d5 Undo part of the tangle of having sys/lock.h and sys/mutex.h included in
other "system" header files.

Also help the deprecation of lockmgr.h by making it a sub-include of
sys/lock.h and removing sys/lockmgr.h form kernel .c files.

Sort sys/*.h includes where possible in affected files.

OK'ed by:	bde (with reservations)
2001-05-01 08:13:21 +00:00
phk
608c1caf3b Add a vop_stdbmap(), and make it part of the default vop vector.
Make 7 filesystems which don't really know about VOP_BMAP rely
on the default vector, rather than more or less complete local
vop_nopbmap() implementations.
2001-04-29 11:48:41 +00:00
alfred
6aad15a674 Remove incorrect comment.
Submitted by: quinot@inf.enst.fr <quinot@inf.enst.fr>
PR: kern/26893
2001-04-29 03:10:24 +00:00
grog
4b9d9cbaac Revert consequences of changes to mount.h, part 2.
Requested by:	bde
2001-04-29 02:45:39 +00:00
grog
1f5de30718 Correct #includes to work with fixed sys/mount.h. 2001-04-23 09:05:15 +00:00
alfred
1e50a5f33e vnode_pager_freepage() is really vm_page_free() in disguise,
nuke vnode_pager_freepage() and replace all calls to it with vm_page_free()
2001-04-19 06:18:23 +00:00
alfred
f0669d6c9e Implement client side NFS locks.
Obtained from: BSD/os
Import Ok'd by: mckusick, jkh, motd on builder.freebsd.org
2001-04-17 20:45:23 +00:00
phk
378e561228 This patch removes the VOP_BWRITE() vector.
VOP_BWRITE() was a hack which made it possible for NFS client
side to use struct buf with non-bio backing.

This patch takes a more general approach and adds a bp->b_op
vector where more methods can be added.

The success of this patch depends on bp->b_op being initialized
all relevant places for some value of "relevant" which is not
easy to determine.  For now the buffers have grown a b_magic
element which will make such issues a tiny bit easier to debug.
2001-04-17 08:56:39 +00:00
peter
8b9d89e1e4 Create debug.hashstat.[raw]nchash and debug.hashstat.[raw]nfsnode to
enable easy access to the hash chain stats.  The raw prefixed versions
dump an integer array to userland with the chain lengths.  This cheats
and calls it an array of 'struct int' rather than 'int' or sysctl -a
faithfully dumps out the 128K array on an average machine.  The non-raw
versions return 4 integers: count, number of chains used, maximum chain
length, and percentage utilization (fixed point, multiplied by 100).
The raw forms are more useful for analyzing the hash distribution, while
the other form can be read easily by humans and stats loggers.
2001-04-11 00:39:20 +00:00
rwatson
c124492b00 o Rather than arbitrarily construct a credential in the nfs_statfs()
VFS operation, make use of the calling process's credential.  This
  solution may not be ideal (there are a number of other possible
  proposals, including making use of the proc0 credential, adding a
  credential argument to the VFSOP, and switching from a hard-coded
  ucred to a hard-coded nfscred), it is simple and appears to
  work.  The arguments against using simply crget() are fairly
  strong: it is the only place in the code (other than a nearly
  identical invocation in ncp) where crget() is invoked, other than
  in the process credential creation code; as ucred becomes extensible,
  this use of crget() without appropriate context results in less and
  less meaningful credential data.  The implementation here will
  probably be tweaked as a result of experimentation and further
  exploration of the requirements.  In the mean-time, it allows
  progress to be made in ucred expansion for new security models without
  causing a crash every time df is used on an NFS mounted file system.

  This code has been interop tested against FreeBSD and Solaris NFS
  servers.  While using the process credentials should not introduce
  interop problems, please let me know if any turn out to exist.

Reviewed by:	freebsd-arch
2001-04-05 06:12:38 +00:00
peter
266cea85ee Use the same API as the example code.
Allow the initial hash value to be passed in, as the examples do.
Incrementally hash in the dvp->v_id (using the official api) rather than
add it.  This seems to help power-of-two predictable filename trees
where the filenames repeat on a power-of-two cycle and the directory trees
have power-of-two components in it.  The simple add then mask was causing
things like 12000+ entry collision chains while most other entries have
between 0 and 3 entries each.  This way seems to improve things.
2001-03-20 02:10:18 +00:00
peter
670e711dd1 Use a generic implementation of the Fowler/Noll/Vo hash (FNV hash).
Make the name cache hash as well as the nfsnode hash use it.

As a special tweak, create an unsigned version of register_t.  This allows
us to use a special tweak for the 64 bit versions that significantly
speeds up the i386 version (ie: int64 XOR int64 is slower than int64
XOR int32).

The code layout is a little strange for the string function, but I was
able to get between 5 to 10% improvement over the original version I
started with. The layout affects gcc code generation choices and this way
was fastest on x86 and alpha.

Note that 'CPUTYPE=p3' etc makes a fair difference to this.  It is
around 45% faster with -march=pentiumpro on a p6 cpu.
2001-03-17 09:31:06 +00:00
peter
e0fe7f7124 Dramatically improve the **lame** nfs_hash(). This is based on the
Fowler / Noll / Vo Hash (http://www.isthe.com/chongo/tech/comp/fnv/).

This improves hash coverage a *massive* amount.  We were seeing one
set of machines that were using 0.84% of their 131072 entry nfsnode
hash buckets with maximum chain lengths of up to ~500 entries.  The
machine was spending nearly 100% of its time in 'system'.
A test with this has pushed the coverage from a few perCent up to 91%
utilization with a max chain length of 11.

Submitted by:  David Filo
2001-03-17 05:43:01 +00:00
jhb
9cd254601b Grab the process lock while calling psignal and before calling psignal. 2001-03-07 03:37:06 +00:00
adrian
4018955334 Reviewed by: jlemon
An initial tidyup of the mount() syscall and VFS mount code.

This code replaces the earlier work done by jlemon in an attempt to
make linux_mount() work.

* the guts of the mount work has been moved into vfs_mount().

* move `type', `path' and `flags' from being userland variables into being
  kernel variables in vfs_mount(). `data' remains a pointer into
  userspace.

* Attempt to verify the `type' and `path' strings passed to vfs_mount()
  aren't too long.

* rework mount() and linux_mount() to take the userland parameters
  (besides data, as mentioned) and pass kernel variables to vfs_mount().
  (linux_mount() already did this, I've just tidied it up a little more.)

* remove the copyin*() stuff for `path'. `data' still requires copyin*()
  since its a pointer into userland.

* set `mount->mnt_statf_mntonname' in vfs_mount() rather than in each
  filesystem.  This variable is generally initialised with `path', and
  each filesystem can override it if they want to.

* NOTE: f_mntonname is intiailised with "/" in the case of a root mount.
2001-03-01 21:00:17 +00:00
dillon
f1c1ec2bab Fix lockup for loopback NFS mounts. The pipelined I/O limitations could be
hit on the client side and prevent the server side from retiring writes.
Pipeline operations turned off for all READs (no big loss since reads are
usually synchronous) and for NFS writes, and left on for the default bwrite().
(MFC expected prior to 4.3 freeze)

Testing by: mjacob, dillon
2001-02-28 04:13:11 +00:00
green
18d474781f Switch to using a struct xucred instead of a struct xucred when not
actually in the kernel.  This structure is a different size than
what is currently in -CURRENT, but should hopefully be the last time
any application breakage is caused there.  As soon as any major
inconveniences are removed, the definition of the in-kernel struct
ucred should be conditionalized upon defined(_KERNEL).

This also changes struct export_args to remove dependency on the
constantly-changing struct ucred, as well as limiting the bounds
of the size fields to the correct size.  This means: a) mountd and
friends won't break all the time, b) mountd and friends won't crash
the kernel all the time if they don't know what they're doing wrt
actual struct export_args layout.

Reviewed by:	bde
2001-02-18 13:30:20 +00:00
tegge
880bcc58c6 Enable use of DHCP extensions.
Reviewed by:	Per Kristian Hove <Per.Hove@math.ntnu.no>
2001-02-02 02:35:40 +00:00
dillon
7fdb864af7 NFS O_EXCL file create semantics temporarily uses file attributes to store
the file verifier.  The NFS client is supposed to do a SETATTR after a
successful O_EXCL open/create to clean up the attributes.  FreeBSD's
client code was generating a SETATTR rpc but was not generating an access
or modification time update within that rpc, leaving the file with a
broken access time that solaris chokes on (and it doesn't look very
nice when you ls -lua under FreeBSD either!).    Fixed.
2001-01-04 22:45:19 +00:00
bmilekic
4b6a7bddad * Rename M_WAIT mbuf subsystem flag to M_TRYWAIT.
This is because calls with M_WAIT (now M_TRYWAIT) may not wait
  forever when nothing is available for allocation, and may end up
  returning NULL. Hopefully we now communicate more of the right thing
  to developers and make it very clear that it's necessary to check whether
  calls with M_(TRY)WAIT also resulted in a failed allocation.
  M_TRYWAIT basically means "try harder, block if necessary, but don't
  necessarily wait forever." The time spent blocking is tunable with
  the kern.ipc.mbuf_wait sysctl.
  M_WAIT is now deprecated but still defined for the next little while.

* Fix a typo in a comment in mbuf.h

* Fix some code that was actually passing the mbuf subsystem's M_WAIT to
  malloc(). Made it pass M_WAITOK instead. If we were ever to redefine the
  value of the M_WAIT flag, this could have became a big problem.
2000-12-21 21:44:31 +00:00
dwmalone
dd75d1d73b Convert more malloc+bzero to malloc+M_ZERO.
Submitted by:	josh@zipperup.org
Submitted by:	Robert Drehmel <robd@gmx.net>
2000-12-08 21:51:06 +00:00
phk
7101ba5caa Simplify the tprintf() API.
Loose the special <sys/tprintf.h> #include file.
2000-11-26 20:35:21 +00:00
dillon
15a44d16ca This patchset fixes a large number of file descriptor race conditions.
Pre-rfork code assumed inherent locking of a process's file descriptor
    array.  However, with the advent of rfork() the file descriptor table
    could be shared between processes.  This patch closes over a dozen
    serious race conditions related to one thread manipulating the table
    (e.g. closing or dup()ing a descriptor) while another is blocked in
    an open(), close(), fcntl(), read(), write(), etc...

PR: kern/11629
Discussed with: Alexander Viro <viro@math.psu.edu>
2000-11-18 21:01:04 +00:00
mckusick
42ecbdff70 In preparation for deprecating CIRCLEQ macros in favor of TAILQ
macros which provide the same functionality and are a bit more
efficient, convert use of CIRCLEQ's in NFS to TAILQ's.
2000-11-14 08:00:39 +00:00
eivind
1afa7eea27 Give vop_mmap an untimely death. The opportunity to give it a timely
death timed out in 1996.
2000-11-01 17:57:24 +00:00
phk
94a5006c9a Remove unneeded #include <sys/proc.h> lines. 2000-10-29 13:57:19 +00:00
tegge
251bfbe494 Reduce kernel stack usage by not having large packets on the stack.
Supply correct size parameter to dhcpd.
Replace some magic numbers with macro names.
Handle more than one interface.
2000-10-29 01:19:32 +00:00
tegge
5ee55d1bc1 Eliminate some bitrot (nonexisting member variable names).
Don't use curproc when a proc pointer is available.
2000-10-24 23:33:01 +00:00
tegge
d3a3d81338 Style fixes. 2000-10-24 22:40:18 +00:00
tegge
46334203c1 Make RPC timeout message more readable.
Supply proc pointer to sosend.
2000-10-24 22:37:55 +00:00
dwmalone
23aacadb39 Problem to avoid processes getting stuck in "vmopar". From Ian's
mail:

	The problem seems to originate with NFS's postop_attr
	information that is returned with a read or write RPC.
	Within a vm_fault context, the code cannot deal with
	vnode_pager_setsize() shrinking a vnode.

	The workaround in the patch below stops the nfsm_postop_attr()
	macro from ever shrinking a vnode. If the new size in the
	postop_attr information is smaller, then it just sets the
	nfsnode n_attrstamp to 0 to stop the wrong size getting
	used in the future. This change only affects postop_attr
	attributes; the nfsm_loadattr() macro works as normal.

	The change is implemented by adding a new argument to
	nfs_loadattrcache() called 'dontshrink'. When this is
	non-zero, nfs_loadattrcache() will never reduce the
	vnode/nfsnode size; instead it zeros n_attrstamp.

There remain other was processes can get stuck in vmopar.

Submitted by:	Ian Dowse <iedowse@maths.tcd.ie>
Reviewed by:	dillon
Tested by:	Vadim Belman <voland@lflat.org>
2000-10-24 10:13:36 +00:00
bp
9c029c0ce5 Make nfs PDIRUNLOCK aware. Now it is possible to use nullfs mounts on top
of nfs mounts, but there can be side effects because nfs uses shared locks
for vnodes.
2000-10-15 08:06:32 +00:00
bp
86d96862d1 Add missed vop_stdunlock() for fifo's vnops (this affects only v2 mounts).
Give nfs's node lock its own name.
2000-10-15 08:01:28 +00:00
jasone
4e290e67b7 Convert lockmgr locks from using simple locks to using mutexes.
Add lockdestroy() and appropriate invocations, which corresponds to
lockinit() and must be called to clean up after a lockmgr lock is no
longer needed.
2000-10-04 01:29:17 +00:00
bp
6110b03d24 Add a lock structure to vnode structure. Previously it was either allocated
separately (nfs, cd9660 etc) or keept as a first element of structure
referenced by v_data pointer(ffs). Such organization leads to known problems
with stacked filesystems.

From this point vop_no*lock*() functions maintain only interlock lock.
vop_std*lock*() functions maintain built-in v_lock structure using lockmgr().
vop_sharedlock() is compatible with vop_stdunlock(), but maintains a shared
lock on vnode.

If filesystem wishes to export lockmgr compatible lock, it can put an address
of this lock to v_vnlock field. This indicates that the upper filesystem
can take advantage of it and use single lock structure for entire (or part)
of stack of vnodes. This field shouldn't be examined or modified by VFS code
except for initialization purposes.

Reviewed in general by:	mckusick
2000-09-25 15:24:04 +00:00
msmith
7de3089b6a Don't scan for the "right" network interface by shooting in the dark.
Assume that the nfs_diskless structure is correctly set up; the provider
ought to be getting it right.
2000-09-05 22:29:36 +00:00