freebsd-dev

Author	SHA1	Message	Date
Jeff Roberson	1b01fc6ef7	- Remove an incorrect XXX comment. This code does respect the XLOCK since it uses vget() which will fail if the identity changes.	2003-10-05 06:47:56 +00:00
Jeff Roberson	b21ad23fff	- Check the XLOCK before we inspect the vnode.	2003-10-05 06:46:45 +00:00
Jeff Roberson	4b1a52f639	- We don't need to cache_purge() in nfs_reclaim(), vclean() does it for us.	2003-10-05 06:46:02 +00:00
Jeff Roberson	b2b64a90d2	- Consistently set sopt_dir. Pointed out by: pete@isilon.com	2003-10-04 17:41:59 +00:00
Jeff Roberson	bb33b5fabf	- Acquire the vnode interlock prior to dropping the mntvnode_mtx. - Make a note of the lack of XLOCK protection in this code. We would access a vnode while it is changing identities without Giant.	2003-10-04 13:44:51 +00:00
Jeff Roberson	8b5905a47d	- Remove the backtrace() call from the *_vinvalbuf() functions. Thanks to a stack trace supplied by phk, I now understand what's going on here. The check for VI_XLOCK stops us from calling vinvalbuf once the vnode has been partially torn down in vclean(). It is not clear that this would cause a problem. Document this in nfs_bio.c, which is where the other two filesystems copied this code from.	2003-10-04 08:51:50 +00:00
Jeff Roberson	ce1fb23146	- Remove interlock protection around VI_XLOCK. The interlock is not sufficient to guarantee that this race is not hit. The XLOCK will likely have to be redesigned due to the way reference counting and mutexes work in FreeBSD. We currently can not be guaranteed that xlock was not set and cleared while we were blocked on the interlock while waiting to check for XLOCK. This would lead us to reference a vnode which was not the vnode we requested. - Add a backtrace() call inside of INVARIANTS in the hopes of finding out if this condition is ever hit. It should not, since we should be retaining a reference to the vnode in these cases. The reference would be sufficient to block recycling.	2003-09-19 23:37:49 +00:00
Poul-Henning Kamp	2a3d23acc6	Name the vnode method vectors consistently with the rest of the filesystems. This improves the output of src/tools/tools/vop_table	2003-09-12 16:44:40 +00:00
Poul-Henning Kamp	0ace036ce5	Remove now unused BOOTP tags related to NFS swap device.	2003-09-05 11:12:55 +00:00
Diomidis Spinellis	cf669e5456	KNF: parentheses around return values. Suggested by: bde Approved by: schweikh (mentor - blanket) MFC after: 6 weeks	2003-09-04 11:27:13 +00:00
Diomidis Spinellis	5a5f2134b8	Fix errno return values to better represent failure reasons for read and open. Approved by: schweikh (mentor) Agreed: bde MFC after: 6 weeks	2003-09-02 16:46:31 +00:00
Poul-Henning Kamp	0bddf4c8e9	Remove the magic way of configuring NFS backed swap. This code dates back to the very first diskless support on FreeBSD, back when swapon(8) couldn't simply be run on a NFS backed file. Suggested replacement command sequence on the client: dd if=/dev/zero of=/swapfile bs=1k count=1 oseek=100000 swapon /swapfile rm -f /swapfile For whatever value of 100000 you want.	2003-08-15 12:04:02 +00:00
Bill Fumerola	2766bd022f	0) preallocate per-interface context structures without the ifnet lock held 1) avoid immediately calling bzero() after malloc() by passing M_ZERO 2) do not initialize individual members of the global context to zero 3) remove an unused assignment of ifctx in bootpc_init() Reviewed by: tegge	2003-08-07 21:27:17 +00:00
Tim J. Robbins	aae962d56e	Fix a problem that occurs when truncating files on NFSv3 mounts: we need to set np->n_size back to the desired size again after calling nfs_meta_setsize(), since it could end up in nfs_loadattrcache() getting called, which would change n_size back to the value it had before the truncate request was issued. The result of this bug is that the size info cached in the nfsnode becomes incorrect, lseek(fd, ofs, SEEK_END) seeks past the end of the file, stat() returns the wrong size, etc. PR: 41792 MFC after: 2 weeks	2003-07-29 00:17:29 +00:00
Poul-Henning Kamp	7c89f162bc	Add fdidx argument to vn_open() and vn_open_cred() and pass -1 throughout.	2003-07-27 17:04:56 +00:00
Poul-Henning Kamp	e82b33f337	Change idle sleep indentifier to "-" for nfsiod	2003-07-02 08:09:20 +00:00
Alan Cox	aec774abec	Lock the vm object when freeing a page.	2003-06-17 05:17:00 +00:00
Poul-Henning Kamp	cefb5754dd	Add the same KASSERT to all VOP_STRATEGY and VOP_SPECSTRATEGY implementations to check that the buffer points to the correct vnode.	2003-06-15 18:53:00 +00:00
Poul-Henning Kamp	7652131bee	Initialize struct vfsops C99-sparsely. Submitted by: hmp Reviewed by: phk	2003-06-12 20:48:38 +00:00
Ian Dowse	df12c16630	When removing a sillyrename file, make sure that the directory vnode has not been cleaned in the meantime, since this can happen during a forced unmount. Also add a comment that nfs_removeit() should really be locking the directory vnode before calling nfs_removerpc(). Reported by: mbr Tested by: mbr MFC after: 1 week	2003-06-12 15:41:20 +00:00
David E. O'Brien	ab0de15baf	Use __FBSDID().	2003-06-11 05:37:42 +00:00
Robert Watson	13b7350a5b	Add the comment I meant to add about not passing in PCATCH to the tsleep(). Note the XXX.	2003-06-11 03:32:42 +00:00
Jeffrey Hsu	807c988d7a	On a socket creation error, don't close the socket.	2003-06-09 03:44:34 +00:00
Poul-Henning Kamp	bf975fe45b	Remove unsed variables. Add explicit breaks to switch Found by: FlexeLint	2003-05-31 20:05:25 +00:00
Poul-Henning Kamp	17a1391990	The IO_NOWDRAIN and B_NOWDRAIN hacks are no longer needed to prevent deadlocks with vnode backed md(4) devices because md now uses a kthread to run the bio requests instead of doing it directly from the bio down path.	2003-05-31 16:42:45 +00:00
Robert Watson	6d7f268ad1	rpc.lockd stability workaround: remove PCATCH from the tsleep() in nfs_lock.c. Right now, if we permit a signal to interrupt the sleep, we will slip the lock and no process on that client, the server, or any other client will be able to acquire the lock. This can happen, for example, if a user hits Ctrl-C or Ctrl-T while a process is waiting for the lock. By removing PCATCH, we prevent that from happening, at the cost of not permitting a user-requested lock abort: also nasty. However, a user interface bug might be preferable to a serious semantic bug, so we go with that for now. We need to teach the rpc.lockd/kernel protocol how to abort lock requests, and rpc.lockd how to handle aborted lock requests; patches for the kernel bit are floating around, but no rpc.lockd bit yet. Approved by: re (scottl)	2003-05-30 17:15:56 +00:00
Peter Wemm	62d8fb93d0	Deal with the possibility of negative available space from the file server to avoid Bad Things(TM) happening (eg: df crashing with a floating point exception). Submitted by: Harold Gutch <logix@foobar.franken.de> Approved by: re (scottl)	2003-05-19 22:35:00 +00:00
Robert Watson	7042ac8cd7	This change grabs the vnode lock for NFS client vnodes when calling VOP_SETATTR() or VOP_GETATTR(); without these locks (a) VFS_DEBUG_LOCKS will panic, and (b) it may be possible to corrupt entries in the cached vnode attributes in the nfsnode, since nfsnode attribute cache data is also protected by the vnode lock. Approved by: re (jhb) Pointed out by: VFS_DEBUG_LOCKS	2003-05-15 21:12:08 +00:00
John Baldwin	90af4afacb	- Merge struct procsig with struct sigacts. - Move struct sigacts out of the u-area and malloc() it using the M_SUBPROC malloc bucket. - Add a small sigacts_*() API for managing sigacts structures: sigacts_alloc(), sigacts_free(), sigacts_copy(), sigacts_share(), and sigacts_shared(). - Remove the p_sigignore, p_sigacts, and p_sigcatch macros. - Add a mutex to struct sigacts that protects all the members of the struct. - Add sigacts locking. - Remove Giant from nosys(), kill(), killpg(), and kern_sigaction() now that sigacts is locked. - Several in-kernel functions such as psignal(), tdsignal(), trapsignal(), and thread_stopped() are now MP safe. Reviewed by: arch@ Approved by: re (rwatson)	2003-05-13 20:36:02 +00:00
Dag-Erling Smørgrav	87ccef7b77	Instead of recording the Unix time in a process when it starts, record the uptime. Where necessary, convert it back to Unix time by adding boottime to it. This fixes a potential problem in the accounting code, which would compute the elapsed time incorrectly if the Unix time was stepped during the lifetime of the process.	2003-05-01 16:59:23 +00:00
Alexander Kabaev	104a9b7e3e	Deprecate machine/limits.h in favor of new sys/limits.h. Change all in-tree consumers to include <sys/limits.h> Discussed on: standards@ Partially submitted by: Craig Rodrigues <rodrigc@attbi.com>	2003-04-29 13:36:06 +00:00
Don Lewis	78b0aaefb5	VOP_FSYNC() expects to be called with the vnode locked, so lock fvp in nfs_rename() before calling VOP_FSYNC() and unlock fvp immediately after. Reviewed by: bde	2003-04-24 20:39:40 +00:00
Peter Wemm	d76f16de39	Fix a bug with df on large (>1TB) nfsv3 file servers on 32 bit client machines where the 'long' number of blocks in struct statfs wont fit. Instead of chosing an artificial 512 byte block size, simply scale it up until we avoid an overflow. NFSv3 reports the sizes in bytes, and the blocksize is a figment of nfsclient's imagination.	2003-04-24 20:36:32 +00:00
Don Lewis	8b3182e212	Release the vnode interlock in nfs_flush() before calling nfs_sigintr(), and grab it again later if necessary. This prevents a lock order reversal because nfs_sigintr() calls PROC_LOCK().	2003-04-23 02:58:26 +00:00
Thomas Quinot	da4898b1d7	Revert change 1.201 (removing mapping of VAPPEND to VWRITE). Instead, use the generic vaccess() operation to determine whether an operation is permitted. This avoids embedding knowledge on vnode permission bits such as VAPPEND in the NFS client. PR: kern/46515 vaccess() patch submitted by: "Peter Edwards" <pmedwards@eircom.net> Approved by: tjr, roberto (mentor)	2003-03-31 23:26:10 +00:00
Jeff Roberson	4093529dee	- Move p->p_sigmask to td->td_sigmask. Signal masks will be per thread with a follow on commit to kern_sig.c - signotify() now operates on a thread since unmasked pending signals are stored in the thread. - PS_NEEDSIGCHK moves to TDF_NEEDSIGCHK.	2003-03-31 22:49:17 +00:00
Robert Watson	847b14bd3c	Add O_NONBLOCK to the vn_open_cred() flags for NFS client locking when opening the POSIX fifo; convert ENXIO error returns to EOPNOTSUPP. This improves handling of the case where the /var/run/lock fifo exists but there is no listener: we immediately return EOPNOTSUPP rather than blocking until a listener turns up. This could occur during a diskless boot before rpc.lockd is loaded, or if the lock file persists across a reboot following the disabling of rpc.lockd. This may have suddenly started to occur due to fifo blocking fixes--previously it looks like attempts to read on a fifo with no listener would time out due to insufficient resources. Reviewed by: alfred	2003-03-26 19:21:34 +00:00
Alfred Perlstein	cbee8fbe2e	req can not be NULL or we'd die. Sponsored by: RED	2003-03-26 01:46:11 +00:00
Tim J. Robbins	9ba703c024	Map VAPPEND to VWRITE in nfsspec_access() - VAPPEND is never set in the mode returned by VOP_GETATTR. This fixes incorrect "Permission denied" errors when trying to append to a file on an NFSv2 mount.	2003-03-21 05:13:23 +00:00
Jeff Roberson	8501ead911	- Add a forgotten BUF_LOCK() Most sincere apologies to: jake	2003-03-14 05:13:19 +00:00
Jeff Roberson	619bddc702	- Lock the buf before inspecting its contents.	2003-03-13 07:04:11 +00:00
Jeff Roberson	7261f5f68e	- Add a new 'flags' parameter to getblk(). - Define one flag GB_LOCK_NOWAIT that tells getblk() to pass the LK_NOWAIT flag to the initial BUF_LOCK(). This will eventually be used in cases were we want to use a buffer only if it is not currently in use. - Convert all consumers of the getblk() api to use this extra parameter. Reviwed by: arch Not objected to by: mckusick	2003-03-04 00:04:44 +00:00
Nate Lawson	99648386d3	Finish cleanup of vprint() which was begun with changing v_tag to a string. Remove extraneous uses of vop_null, instead defering to the default op. Rename vnode type "vfs" to the more descriptive "syncer". Fix formatting for various filesystems that use vop_print.	2003-03-03 19:15:40 +00:00
Dag-Erling Smørgrav	521f364b80	More low-hanging fruit: kill caddr_t in calls to wakeup(9) / [mt]sleep(9).	2003-03-02 16:54:40 +00:00
Jeff Roberson	48d4ffc119	- The interlock was not being droped in nfs_flush() if the first part of an if clause was true. Break the two clauses out into seperate statements since they require different actions. Reported/Tested by: jake Spotted by: jhb	2003-02-26 00:24:19 +00:00
Jeff Roberson	869d735043	- Properly handle the vnode interlock in nfs_fsync. Reported by: phk	2003-02-25 08:50:21 +00:00
Jeff Roberson	17661e5ac4	- Add an interlock argument to BUF_LOCK and BUF_TIMELOCK. - Remove the buftimelock mutex and acquire the buf's interlock to protect these fields instead. - Hold the vnode interlock while locking bufs on the clean/dirty queues. This reduces some cases from one BUF_LOCK with a LK_NOWAIT and another BUF_LOCK with a LK_TIMEFAIL to a single lock. Reviewed by: arch, mckusick	2003-02-25 03:37:48 +00:00
Warner Losh	a163d034fa	Back out M_* changes, per decision of the TRB. Approved by: trb	2003-02-19 05:47:46 +00:00
Peter Wemm	169ade77af	Get rid of a silly message I added back in Sept 2001 (1.68).	2003-02-18 23:45:01 +00:00
Tim J. Robbins	1bdd8ae409	Lock proc while accessing p_siglist, p_sigmask and p_sigignore in nfs_sigintr().	2003-02-15 08:25:57 +00:00
Matthew Dillon	146cc84e0d	Provide a sysctl to allow defaulting of the connectionless (-c) feature to mount_nfs. The sysctl defaults to 1 (paranoid mode). Setting it to 0 will allow an NFS client to receive replies on a different IP then they were sent to by default. Submitted by: Sean Eric Fagan <sef@kithrup.com>	2003-01-22 19:57:31 +00:00
Alfred Perlstein	44956c9863	Remove M_TRYWAIT/M_WAITOK/M_WAIT. Callers should use 0. Merge M_NOWAIT/M_DONTWAIT into a single flag M_NOWAIT.	2003-01-21 08:56:16 +00:00
Poul-Henning Kamp	c6e3ae999b	Since Jeffr made the std* functions the default in rev 1.63 of kern/vfs_defaults.c it is wrong for the individual filesystems to use the std* functions as that prevents override of the default. Found by: src/tools/tools/vop_table	2003-01-04 08:47:19 +00:00
Poul-Henning Kamp	862702306b	Convert calls to BUF_STRATEGY to VOP_STRATEGY calls. This is a no-op since all BUF_STRATEGY did in the first place was call VOP_STRATEGY.	2003-01-03 06:32:15 +00:00
Jens Schweikhardt	d64ada501a	Fix typos, mostly s/ an / a / where appropriate and a few s/an/and/ Add FreeBSD Id tag where missing.	2002-12-30 21:18:15 +00:00
Matthew Dillon	3a3d82ec0a	Abstract-out the constants for the sequential heuristic. No operational changes. MFC after: 1 day	2002-12-28 20:37:50 +00:00
Jeffrey Hsu	956b0b653c	SMP locking for radix nodes.	2002-12-24 03:03:39 +00:00
Alan Cox	903caa598d	Avoid holding the vnode interlock around malloc() or free() to prevent a lock order reversal. Reviewed by: jeff	2002-12-23 06:20:41 +00:00
Jeffrey Hsu	b30a244c34	SMP locking for ifnet list.	2002-12-22 05:35:03 +00:00
Matthew Dillon	a19f30d8c9	do not try to free a mountpoint that we did not allocate. X-MFC after: immediately	2002-12-21 20:55:34 +00:00
Alfred Perlstein	818407fe06	reapply 1.26 through 1.28. Approved by: re	2002-11-20 15:21:06 +00:00
Alfred Perlstein	7affe44ee3	forgot about 5.x freeze, backout 1.26 through 1.28 pending re@ appoval.	2002-11-20 10:53:06 +00:00
Alfred Perlstein	0b2724b10f	remove useless casts, unused macros and cleanup a line wrap.	2002-11-20 10:13:04 +00:00
Alfred Perlstein	9822015014	comment and untwist error return logic	2002-11-20 10:06:51 +00:00
Alfred Perlstein	32cb464571	Remove an outdated comment complaining about exporting struct ucred to userspace, I fixed it a while ago.	2002-11-20 10:00:04 +00:00
Poul-Henning Kamp	6999d2ef6d	Don't examine an un-initialized variable. Spotted by: FlexeLint.	2002-10-20 21:52:05 +00:00
Poul-Henning Kamp	0c183c5a56	Remove extern declarations of stuff which is static in nfs_node.c Move related macro to nfs_node.c Spotted by: FlexeLint	2002-10-20 21:40:55 +00:00
Kirk McKusick	a5b65058d5	Regularize the vop_stdlock'ing protocol across all the filesystems that use it. Specifically, vop_stdlock uses the lock pointed to by vp->v_vnlock. By default, getnewvnode sets up vp->v_vnlock to reference vp->v_lock. Filesystems that wish to use the default do not need to allocate a lock at the front of their node structure (as some still did) or do a lockinit. They can simply start using vn_lock/VOP_UNLOCK. Filesystems that wish to manage their own locks, but still use the vop_stdlock functions (such as nullfs) can simply replace vp->v_vnlock with a pointer to the lock that they wish to have used for the vnode. Such filesystems are responsible for setting the vp->v_vnlock back to the default in their vop_reclaim routine (e.g., vp->v_vnlock = &vp->v_lock). In theory, this set of changes cleans up the existing filesystem lock interface and should have no function change to the existing locking scheme. Sponsored by: DARPA & NAI Labs.	2002-10-14 03:20:36 +00:00
Mike Barcroft	2b7f24d210	Change iov_base's type from `char ' to the standard` void '. All uses of iov_base which assume its type is `char ' (in order to do pointer arithmetic) have been updated to cast iov_base to `char '.	2002-10-11 14:58:34 +00:00
Scott Long	316ec49abd	Some kernel threads try to do significant work, and the default KSTACK_PAGES doesn't give them enough stack to do much before blowing away the pcb. This adds MI and MD code to allow the allocation of an alternate kstack who's size can be speficied when calling kthread_create. Passing the value 0 prevents the alternate kstack from being created. Note that the ia64 MD code is missing for now, and PowerPC was only partially written due to the pmap.c being incomplete there. Though this patch does not modify anything to make use of the alternate kstack, acpi and usb are good candidates. Reviewed by: jake, peter, jhb	2002-10-02 07:44:29 +00:00
Juli Mallett	1d9c56964d	Back our kernel support for reliable signal queues. Requested by: rwatson, phk, and many others	2002-10-01 17:15:53 +00:00
Juli Mallett	f4430f22b8	Lock access to the signal queue, and related structures, with PROC_LOCK. Submitted by: jhb	2002-09-30 21:15:33 +00:00
Juli Mallett	70d4d0c0f5	Convert use of p_siglist and old SIG*() macros to use <sys/ksiginfo.h> prototyped functions to get a sigset_t, and further to check for any queued signals, rather than an empty signal set, to go with the move to signal queues rather than signal sets.	2002-09-30 20:48:29 +00:00
Poul-Henning Kamp	37c841831f	Be consistent about "static" functions: if the function is marked static in its prototype, mark it static at the definition too. Inspired by: FlexeLint warning #512	2002-09-28 17:15:38 +00:00
Robert Watson	f0fb902771	Remove an errant debugging printf that got left in during my last commit. Pointed out by: guido	2002-09-27 00:25:54 +00:00
Robert Watson	203639c449	Apparently pxeboot passes in a mygateway of non-zero sin length from DHCP in the event that no gateway is returned from DHCP, breaking the assumption that we skip the routing insertion of the gateway if the sin length is zero. Check also for s_addr of 0 to avoid the "Oh no, adding my default route failed" panic, making it possible to pxeboot machines on segments without default routes. Arguably this could be a bug in pxeboot, or in the TUNABLE code, but this makes my boxes boot.	2002-09-26 19:56:43 +00:00
Jeff Roberson	8926aed697	- Lock access to the buf lists. - Use vrefcnt() where appropriate. - Add some locking asserts.	2002-09-25 02:38:43 +00:00
Jake Burkholder	abc370fa85	Moved nfs_diskless setup code from autoconf.c to nfsclient/nfs_diskless.c so that it is MI. Allow nfs_mountroot to return an error if the nfs_diskless struct is not valid, rather than panicing later on. Call nfs_setup_diskless() from nfs_mountroot if NFS_ROOT is defined, like bootpc_init(). Removed legacy root mount support for sparc64, and enabled NFS_ROOT by default.	2002-09-22 00:59:02 +00:00
Poul-Henning Kamp	7ed60de837	Use m_length() instead of home-rolled versions.	2002-09-18 19:44:14 +00:00
Nate Lawson	06be2aaa83	Remove all use of vnode->v_tag, replacing with appropriate substitutes. v_tag is now const char * and should only be used for debugging. Additionally: 1. All users of VT_NTS now check vfsconf->vf_type VFCF_NETWORK 2. The user of VT_PROCFS now checks for the new flag VV_PROCDEP, which is propagated by pseudofs to all child vnodes if the fs sets PFS_PROCDEP. Suggested by: phk Reviewed by: bde, rwatson (earlier version)	2002-09-14 09:02:28 +00:00
Poul-Henning Kamp	7e6fb406ff	Now that we have a cached mount credential in struct mount, use it istead of a private cached copy.	2002-09-08 15:11:18 +00:00
Bruce Evans	6af7f1e511	Use `struct uma_zone *' instead of uma_zone_t, so that <sys/uma.h> isn't a prerequisite.	2002-09-05 14:04:34 +00:00
Maxim Sobolev	62f7648682	Increase size of ifnet.if_flags from 16 bits (short) to 32 bits (int). To avoid breaking application ABI use unused ifreq.ifru_flags[1] for upper 16 bits in SIOCSIFFLAGS and SIOCGIFFLAGS ioctl's. Reviewed by: -hackers, -net	2002-08-18 07:05:00 +00:00
Alfred Perlstein	f898f7c5b2	Remove a case of exposing 'struct ucred' to userspace. Use a struct xucred for LOCKD_MSG instead. Requested by: rwatson	2002-08-15 21:52:22 +00:00
Robert Watson	9ca435893b	In order to better support flexible and extensible access control, make a series of modifications to the credential arguments relating to file read and write operations to cliarfy which credential is used for what: - Change fo_read() and fo_write() to accept "active_cred" instead of "cred", and change the semantics of consumers of fo_read() and fo_write() to pass the active credential of the thread requesting an operation rather than the cached file cred. The cached file cred is still available in fo_read() and fo_write() consumers via fp->f_cred. These changes largely in sys_generic.c. For each implementation of fo_read() and fo_write(), update cred usage to reflect this change and maintain current semantics: - badfo_readwrite() unchanged - kqueue_read/write() unchanged pipe_read/write() now authorize MAC using active_cred rather than td->td_ucred - soo_read/write() unchanged - vn_read/write() now authorize MAC using active_cred but VOP_READ/WRITE() with fp->f_cred Modify vn_rdwr() to accept two credential arguments instead of a single credential: active_cred and file_cred. Use active_cred for MAC authorization, and select a credential for use in VOP_READ/WRITE() based on whether file_cred is NULL or not. If file_cred is provided, authorize the VOP using that cred, otherwise the active credential, matching current semantics. Modify current vn_rdwr() consumers to pass a file_cred if used in the context of a struct file, and to always pass active_cred. When vn_rdwr() is used without a file_cred, pass NOCRED. These changes should maintain current semantics for read/write, but avoid a redundant passing of fp->f_cred, as well as making it more clear what the origin of each credential is in file descriptor read/write operations. Follow-up commits will make similar changes to other file descriptor operations, and modify the MAC framework to pass both credentials to MAC policy modules so they can implement either semantic for revocation. Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs	2002-08-15 20:55:08 +00:00
Poul-Henning Kamp	9bf1a75697	Introduce typedefs for the member functions of struct vfsops and employ these in the main filesystems. This does not change the resulting code but makes the source a little bit more grepable. Sponsored by: DARPA and NAI Labs.	2002-08-13 10:05:50 +00:00
Robert Watson	c08b677fb5	Pass IO_NOMACCHECK to vn_rdwr() in the following checks to prevent enforcement of MAC policy on the read or write operations: - In ext2fs, don't enforce MAC on loop-back reads and writes supporting directory read operations in lookup(), directory modifications in rename(), directory write operations in mkdir(), symlink write operations in symlink(). - In the NFS client locking code, perform vn_rdwr() on the NFS locking socket without enforcing MAC, since the write is done on behalf of the kernel NFS implementation rather than the user process. - In UFS, don't enforce MAC on loop-back reads and writes supporting directory read operations in lookup(), and symlink write operations in symlink(). Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs	2002-08-12 16:43:04 +00:00
Jeff Roberson	be12d7a61d	- Add a missing VI_UNLOCK to an error case in nfs_flush.	2002-08-05 08:54:29 +00:00
Jeff Roberson	e6e370a7fe	- Replace v_flag with v_iflag and v_vflag - v_vflag is protected by the vnode lock and is used when synchronization with VOP calls is needed. - v_iflag is protected by interlock and is used for dealing with vnode management issues. These flags include X/O LOCK, FREE, DOOMED, etc. - All accesses to v_iflag and v_vflag have either been locked or marked with mp_fixme's. - Many ASSERT_VOP_LOCKED calls have been added where the locking was not clear. - Many functions in vfs_subr.c were restructured to provide for stronger locking. Idea stolen from: BSD/OS	2002-08-04 10:29:36 +00:00
Alan Cox	e0c9fdb50e	o Lock page queue accesses in nfs_getpages().	2002-07-21 20:01:32 +00:00
Matthew Dillon	8e0619c6b0	Fix a bug nfs_write() related to ^C'ing during a file write on an interruptable mount. We were returning from inside the loop without releasing the rslock. Submitted by: Mike Junk <junk@isilon.com> MFC after: 3 days	2002-07-16 19:43:59 +00:00
John Baldwin	14d199ad29	If we get a receive error in nfs_receive() and then get an error trying to obtain the send lock, we would bogusly try to unlock the send lock before returning resulting in a panic. Instead, only unlock the send lock if nfs_sndlock() succeeds and nfs_reconnect() fails. MFC after: 3 days Sponsored by: The Weather Channel	2002-07-16 15:12:07 +00:00
Alfred Perlstein	09ce4f7aaf	Add IPv6 support. Submitted by: Jean-Luc Richier <Jean-Luc.Richier@imag.fr>	2002-07-15 19:40:23 +00:00
Matthew Dillon	3d8f797ac1	Convert old style (type foo *)0 casts to NULLs PR: kern/40360 Requested by: Hiten PAndya via direct email	2002-07-11 17:54:58 +00:00
Matthew Dillon	d331c5d43f	Replace the global buffer hash table with per-vnode splay trees using a methodology similar to the vm_map_entry splay and the VM splay that Alan Cox is working on. Extensive testing has appeared to have shown no increase in overhead. Disadvantages Dirties more cache lines during lookups. Not as fast as a hash table lookup (but still N log N and optimal when there is locality of reference). Advantages vnode->v_dirtyblkhd is now perfectly sorted, making fsync/sync/filesystem syncer operate more efficiently. I get to rip out all the old hacks (some of which were mine) that tried to keep the v_dirtyblkhd tailq sorted. The per-vnode splay tree should be easier to lock / SMPng pushdown on vnodes will be easier. This commit along with another that Alan is working on for the VM page global hash table will allow me to implement ranged fsync(), optimize server-side nfs commit rpcs, and implement partial syncs by the filesystem syncer (aka filesystem syncer would detect that someone is trying to get the vnode lock, remembers its place, and skip to the next vnode). Note that the buffer cache splay is somewhat more complex then other splays due to special handling of background bitmap writes (multiple buffers with the same lblkno in the same vnode), and B_INVAL discontinuities between the old hash table and the existence of the buffer on the v_cleanblkhd list. Suggested by: alc	2002-07-10 17:02:32 +00:00
John Baldwin	56e9ce41a5	In namei(), we use a NULL thread for uio_td when doing a VOP_READLINK(). nfs_readlink() calls nfs_bioread() which passes in uio_td as the thread argument to nfs_getcacheblk(). In nfs_getcacheblk() we dereference the thread pointer to get a process pointer to pass to nfs_sigintr(). This obviously results in a panic. :) Rather than change nfs_getcacheblk() to check if the thread pointer is NULL when calling nfs_sigintr() like other callers do, change nfs_sigintr() to take a thread as the last argument instead of a process so none of the callers have to care if the thread is NULL or not.	2002-06-28 21:53:08 +00:00
Seigo Tanimura	4cc20ab1f0	Back out my lats commit of locking down a socket, it conflicts with hsu's work. Requested by: hsu	2002-05-31 11:52:35 +00:00
Dima Dorfman	ad308c10c7	Don't tsleep() with an sb_mtx held.	2002-05-27 05:20:15 +00:00
Peter Wemm	e82685e79f	Fix warning; deprecated use of label at end of compound statement	2002-05-24 05:50:28 +00:00
Seigo Tanimura	243917fe3b	Lock down a socket, milestone 1. o Add a mutex (sb_mtx) to struct sockbuf. This protects the data in a socket buffer. The mutex in the receive buffer also protects the data in struct socket. o Determine the lock strategy for each members in struct socket. o Lock down the following members: - so_count - so_options - so_linger - so_state o Remove *_locked() socket APIs. Make the following socket APIs touching the members above now require a locked socket: - sodisconnect() - soisconnected() - soisconnecting() - soisdisconnected() - soisdisconnecting() - sofree() - soref() - sorele() - sorwakeup() - sotryfree() - sowakeup() - sowwakeup() Reviewed by: alfred	2002-05-20 05:41:09 +00:00
Doug Ambrisko	c6b12feb29	Add TAG_VENDOR_INDENTIFIER (option 60) to our DHCP request done by the kernel BOOTP option. The format will be: FreeBSD:<MACHINE>:<osrelease> this way people can tune their DHCP server to server up root file systems via the OS, machine type and version. Obtained from: NetBSD MFC after: 3 weeks	2002-05-17 20:18:48 +00:00
Tom Rhodes	d394511de3	More s/file system/filesystem/g	2002-05-16 21:28:32 +00:00
Poul-Henning Kamp	2f35ea476d	We don't need the arp kludge any more.	2002-04-28 18:29:44 +00:00
Ian Dowse	10632e44cc	Remove the nfs_{lock,unlock,islocked} functions and the associated definitions; they have been unused and #if 0'd out since the Lite/2 merge and we are unlikely to want them in the future.	2002-04-27 22:10:16 +00:00
Ian Dowse	df99ca52f1	The recent NFS forced unmount improvements introduced a side-effect where some client operations might be unexpectedly cancelled during an unsuccessful non-forced unmount attempt. This causes problems for amd(8), because it periodically attempts a non-forced unmount to check if the filesystem is still in use. Fix this by adding a new mountpoint flag MNTK_UNMOUNTF that is set only during the operation of a forced unmount. Use this instead of MNTK_UNMOUNT to trigger the cancellation of hung NFS operations. Also correct a problem where dounmount() might inadvertently clear the MNTK_UNMOUNT flag. Reported by: simokawa MFC after: 1 week	2002-04-17 01:07:29 +00:00
John Baldwin	44731cab3b	Change the suser() API to take advantage of td_ucred as well as do a general cleanup of the API. The entire API now consists of two functions similar to the pre-KSE API. The suser() function takes a thread pointer as its only argument. The td_ucred member of this thread must be valid so the only valid thread pointers are curthread and a few kernel threads such as thread0. The suser_cred() function takes a pointer to a struct ucred as its first argument and an integer flag as its second argument. The flag is currently only used for the PRISON_ROOT flag. Discussed on: smp@	2002-04-01 21:31:13 +00:00
Jeff Roberson	ab426dc822	Remove references to vm_zone.h and switch over to the new uma API.	2002-03-20 10:07:52 +00:00
Luigi Rizzo	49b144f286	Add a readonly sysctl variable of type string, kern.bootp_cookie, which is initialized with whatever string a dhcp/bootp server passes as vendor tag 134. There is no standard tag that I know with this information, and no vendor-defined tag that applies to FreeBSD that I could find doing the same thing. The intended use is to pass information to userland for run-time configuration of a diskless client without having to run a bootp/dhcp client for the third time (after the one in pxeboot/etherboot, and the one in the kernel bootp), also because these clients generally screwup the interface configuration, which is not exactly what you want when you have your disks nfs-mounted. Manpage update to follow soon. MFC-after: 3 days	2002-03-13 09:23:11 +00:00
Poul-Henning Kamp	f58932f237	vhold() our vnode while checking the remote side. This is belived to be the only place where a soft reference to a vnode is held with no sort of hard reference, consequently this change should allow us to free(9) vnodes from the freelist after properly cleaning them up. Reviewed by: dillon	2002-03-08 13:43:43 +00:00
Peter Wemm	85a745c15e	Fix warnings.. bootpc_init() and related.	2002-02-28 03:07:35 +00:00
John Baldwin	fdcc1cc09f	Use thread0.td_ucred instead of proc0.p_ucred. This change is cosmetic and isn't strictly required. However, it lowers the number of false positives found when grep'ing the kernel sources for p_ucred to ensure proper locking.	2002-02-27 19:18:10 +00:00
John Baldwin	a854ed9893	Simple p_ucred -> td_ucred changes to start using the per-thread ucred reference.	2002-02-27 18:32:23 +00:00
Peter Wemm	c2e42439ad	Fix a long line touched in previous commit (but not caused by previous commit)	2002-02-07 23:03:41 +00:00
Julian Elischer	079b7badea	Pre-KSE/M3 commit. this is a low-functionality change that changes the kernel to access the main thread of a process via the linked list of threads rather than assuming that it is embedded in the process. It IS still embeded there but remove all teh code that assumes that in preparation for the next commit which will actually move it out. Reviewed by: peter@freebsd.org, gallatin@cs.duke.edu, benno rice,	2002-02-07 20:58:47 +00:00
Peter Wemm	1bde568682	Revise the nfsiod auto tuning code. Now both the upper and lower limits are specifyable by sysctl and are respected. Submitted by: Maxime Henrion <mux@sneakerz.org>	2002-01-15 20:57:21 +00:00
Peter Wemm	117f61374c	Implement vfs.nfs.iodmin (minimum number of nfsiod's) and vfs.nfs.iodmaxidle (idle time before nfsiod's exit). Make it adaptive so that we create nfsiod's on demand and they go away after not being used for a while. The upper limit is NFS_MAXASYNCDAEMON (currently 20). More will be done here, but this is a useful checkpoint. Submitted by: Maxime Henrion <mux@qualys.com>	2002-01-14 02:13:46 +00:00
Ian Dowse	a7f6ff2e8c	Terminate requests in nfs_sigintr() if the filesystem is in the process of being unmounted. This allows forced NFS unmounts to complete even if there are processes stuck holding the mnt_lock while the server is down. The mechanism is not ideal in that there is a small chance we might accidentally cancel requests during a failed non-forced unmount attempt on that filesystem, but this is not really a big problem. Also, move the tsleep() in nfs_nmcancelreqs() so that we do not sleep in the case where there are no requests to be cancelled.	2002-01-10 02:15:35 +00:00
Ian Dowse	1278d57acd	Permit NFS filesystems to be forcibly unmounted when the server is down, even if there are hung processes and the mount is non- interruptible. This works by having nfs_unmount call a new function nfs_nmcancelreqs() in the FORCECLOSE case. It scans the list of outstanding requests and marks as interrupted any requests belonging to the specified mount. Then it waits up to 30 seconds for all requests to terminate. A few other changes are necessary to support this: - Unconditionally set a socket timeout so that even hard mounts are guaranteed to occasionally check the R_SOFTTERM flag on requests. For hard mounts this flag can only be set by nfs_nmcancelreqs(). - Reject requests on a mount that is currently being unmounted. - Never grant the receive lock to a request that has been cancelled. This should also avoid an old problem where a forced NFS unmount could cause a crash; it occurred when a VOP on an unlocked vnode (usually VOP_GETATTR) was in progress at the time of the forced unmount.	2002-01-02 00:41:26 +00:00
Alan Cox	62d69898f9	o Remove an errant ';' introduced in the last revision. o Remove an unused variable.	2002-01-01 19:44:01 +00:00
Robert Watson	0e97c01d6d	o Remove premature use of nmp->nm_cred, it hasn't been initialized yet.	2002-01-01 16:17:55 +00:00
Robert Watson	147839396c	o Pass td into nfs_mountroot() to eliminate an XXX'd curthread use. Since it's in the parent function anyway, might as well pass it another layer down. Obtained from: TrustedBSD Project	2001-12-31 21:00:00 +00:00
Robert Watson	1b17a3c9ca	o Remove premature leakage of use of td_ucred from base source tree: instead, use td->td_proc->p_ucred.	2001-12-31 20:56:59 +00:00
Robert Watson	474c19561b	o Add missing #include's of sys/proc.h, missed in merge, required to dereference td->td_proc->p_ucred.	2001-12-31 20:05:26 +00:00
Robert Watson	9c4d63da6d	o Make the credential used by socreate() an explicit argument to socreate(), rather than getting it implicitly from the thread argument. o Make NFS cache the credential provided at mount-time, and use the cached credential (nfsmount->nm_cred) when making calls to socreate() on initially connecting, or reconnecting the socket. This fixes bugs involving NFS over TCP and ipfw uid/gid rules, as well as bugs involving NFS and mandatory access control implementations. Reviewed by: freebsd-arch	2001-12-31 17:45:16 +00:00
Ian Dowse	a8206e3559	Add a #define for the size of the nfs_backoff[] array, and use this instead of magic constants in the code.	2001-12-30 18:41:52 +00:00
Doug Ambrisko	236f9adc78	Increase the buffer size to hold a bootp/DHCP reply from 256 bytes to 1222 bytes (derived as the maximum that isc-dhcpd uses). This solves the problem if a bootp/DHCP reply is over 256 bytes in which the end of the bootp/DHCP reply will not be found and then the reply will be ignored. This happens when swap and root paths are longish or many parameters are set. Reviewed by: imp Approved by: imp	2001-12-30 02:35:09 +00:00
Matthew Dillon	885d36ce36	nfs_nget() does no locking whatsoever when looking up a vnode. If the vget() sleeps we have to retry the operation to avoid racing against a deletion. MFC maybe: submitted to re's	2001-12-27 19:40:34 +00:00
Ian Dowse	9669bb479a	Avoid passing the variable `tl' to functions that just use it for temporary storage. In the old NFS code it wasn't at all clear if the value of `tl' was used across or after macro calls, but I'm fairly confident that the convention was to keep its use local. Each ex-macro function now uses a local version of this variable, so all of the double-indirection goes away. The only exception to the `local use' rule for `tl' is nfsm_clget(), which is left unchanged by this commit. Reviewed by: peter	2001-12-18 01:22:09 +00:00
Matthew Dillon	3ebeaf5984	This fixes a large number of bugs in our NFS client side code. A recent commit by Kirk also fixed a softupdates bug that could easily be triggered by server side NFS. * An edge case with shared R+W mmap()'s and truncate whereby the system would inappropriately clear the dirty bits on still-dirty data. (applicable to all filesystems) THIS FIX TEMPORARILY DISABLED PENDING FURTHER TESTING. see vm/vm_page.c line 1641 * The straddle case for VM pages and buffer cache buffers when truncating. (applicable to NFS client side) * Possible SMP database corruption due to vm_pager_unmap_page() not clearing the TLB for the other cpu's. (applicable to NFS client side but could effect all filesystems). Note: not considered serious since the corruption occurs beyond the file EOF. * When flusing a dirty buffer due to B_CACHE getting cleared, we were accidently setting B_CACHE again (that is, bwrite() sets B_CACHE), when we really want it to stay clear after the write is complete. This resulted in a corrupt buffer. (applicable to all filesystems but probably only triggered by NFS) * We have to call vtruncbuf() when ftruncate()ing to remove any buffer cache buffers. This is still tentitive, I may be able to remove it due to the second bug fix. (applicable to NFS client side) * vnode_pager_setsize() race against nfs_vinvalbuf()... we have to set n_size before calling nfs_vinvalbuf or the NFS code may recursively vnode_pager_setsize() to the original value before the truncate. This is what was causing the user mmap bus faults in the nfs tester program. (applicable to NFS client side) * Fix to softupdates (see ufs/ffs/ffs_inode.c 1.73, commit made by Kirk). Testing program written by: Avadis Tevanian, Jr. Testing program supplied by: jkh / Apple (see Dec2001 posting to freebsd-hackers with Subject 'NFS: How to make FreeBS fall on its face in one easy step') MFC after: 1 week	2001-12-14 01:16:57 +00:00
Robert Watson	69aaef0122	o Modify nfslockdans() to accept a thread reference instead of a proc reference: with td->td_ucred, it will be desirable to authorize based on td->td_ucred, rather than p->p_ucred. o Since the same variable 'p' was later used with pfind() on the target process for the wakeup, introduce a new local variable 'targetp' to use instead. Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs	2001-11-14 18:20:45 +00:00
Alfred Perlstein	13190d8754	Allow users to use the 'nolockd' or -L options with mount_nfs in order to avoid the need for rpc.lockd to perform client locks. Using this option a user can revert back to using local locks for NFS mounts like we did before we had rpc.lockd.	2001-11-12 02:33:52 +00:00
Alfred Perlstein	f03e89de68	turn vn_open() into a wrapper around vn_open_cred() which allows one to perform a vn_open using temporary/other/fake credentials. Modify the nfs client side locking code to use vn_open_cred() passing proc0's ucred instead of the old way which was to temporary raise privs while running vn_open(). This should close the race hopefully.	2001-11-11 22:39:07 +00:00
Matthew Dillon	7e76bb562e	Implement IO_NOWDRAIN and B_NOWDRAIN - prevents the buffer cache from blocking in wdrain during a write. This flag needs to be used in devices whos strategy routines turn-around and issue another high level I/O, such as when MD turns around and issues a VOP_WRITE to vnode backing store, in order to avoid deadlocking the dirty buffer draining code. Remove a vprintf() warning from MD when the backing vnode is found to be in-use. The syncer of buf_daemon could be flushing the backing vnode at the time of an MD operation so the warning is not correct. MFC after: 1 week	2001-11-05 18:48:54 +00:00
Robert Watson	c1787d3b75	o Note an additional potential problem here: LOCKD_MSG directly exports struct ucred to userland. In 5.0-CURRENT, it is desirable to instead export struct xucred, as ucred contains mutexes, pointers, and other kernel evil. I'll add it to my work queue.	2001-10-24 02:48:38 +00:00
Robert Watson	b5c05ddcb8	o Add two comments identifying problems with the current nfs_lock.c implementation, so that the information doesn't get lost. (1) /var/run/lock is looked up relative to the current thread's root directory, but it's not clear that's desirable. (2) A race condition associated with live credential modification on a shared credential is present when privilege is granted for the purposes of talking to /var/run/lock.	2001-10-23 19:11:31 +00:00
Matthew Dillon	c72ccd014d	Change the vnode list under the mount point from a LIST to a TAILQ in preparation for an implementation of limiting code for kern.maxvnodes. MFC after: 3 days	2001-10-23 01:21:29 +00:00
John Baldwin	bd78cece5d	Change the kernel's ucred API as follows: - crhold() returns a reference to the ucred whose refcount it bumps. - crcopy() now simply copies the credentials from one credential to another and has no return value. - a new crshared() primitive is added which returns true if a ucred's refcount is > 1 and false (0) otherwise.	2001-10-11 23:38:17 +00:00
John Baldwin	5162c5cc1e	Use crhold() instead of crdup() since we aren't modifying the cred but just need to ensure it remains immutable.	2001-10-09 16:48:57 +00:00
Peter Wemm	caf4b18ba9	Make this compile after last commit. It should be: "td ? td->td_proc : NULL", not "td ? td->td_proc, NULL"	2001-10-09 02:40:45 +00:00
Julian Elischer	7e49874f08	Don't dereference td if it's NULL. Submitted by: Alexander N. Kabaev <ak03@gte.com>	2001-10-08 23:47:44 +00:00
Peter Wemm	b9b0e19206	Unwind some more macros. NFSMADV() was kinda silly since it was right next to equivalent m_len adjustments. Move the nfsm_subs.h macros into groups depending on which phase they are used in, since that affects the error recovery requirements. Collect some of the common error checking into a single macro as preparation for unwinding some more. Have nfs_rephead return a value instead of secretly modifying args. Remove some unused function arguments that were being passed around. Clarify nfsm_reply()'s error handling (I hope).	2001-09-28 04:37:08 +00:00
Peter Wemm	1290984b33	Make nfsm_dissect() have an obvious return value.	2001-09-27 22:40:38 +00:00
Peter Wemm	ea7fe289fe	Tidy up nfsm_build usage. This is only partially finished.	2001-09-27 02:33:36 +00:00
Ian Dowse	1782e17d6f	Add a missing dereference level. This caused nfsm_postop_attr_xx() to try and extract node attributes from an RPC reply even if none were present. Reviewed by: peter	2001-09-25 00:00:33 +00:00
Peter Wemm	d55d47aded	Add the magic marker so that loader and kldload(2) can find this in module form automagically.	2001-09-20 04:57:34 +00:00
Peter Wemm	247c65c27f	Oops. Fix a missing indirection level. gcc didn't complain about it on x86, but did complain about it on alpha (since int and pointer are different sizes)	2001-09-20 03:45:51 +00:00
Peter Wemm	891a092764	Sigh, Last minute pre-merge typo. (missing quotes)	2001-09-18 23:49:33 +00:00
Peter Wemm	eb25edbda3	Cleanup and split of nfs client and server code. This builds on the top of several repo-copies.	2001-09-18 23:32:09 +00:00
Warner Losh	976a26437e	nfs_strategy calls nfs_asyncio with td as NULL. So add a bandaid that will pass NULL as the struct proc when td is NULL. This has stopped crashing on my machine. Note: The passing of NULL may be bogus, but I'll let others fix that problem. Reviewed by: jhb	2001-09-18 18:37:52 +00:00
Peter Wemm	38f48395d6	Sync some differences that were different between the copies of the files that were in nfs/nfs.h and nfsserver/nfs.h in the p4 tree.	2001-09-15 04:41:56 +00:00
Julian Elischer	b40ce4165d	KSE Milestone 2 Note ALL MODULES MUST BE RECOMPILED make the kernel aware that there are smaller units of scheduling than the process. (but only allow one thread per process at this time). This is functionally equivalent to teh previousl -current except that there is a thread associated with each process. Sorry john! (your next MFC will be a doosie!) Reviewed by: peter@freebsd.org, dillon@freebsd.org X-MFC after: ha ha ha ha	2001-09-12 08:38:13 +00:00
Kris Kennaway	bf61e26696	Fix some signed/unsigned integer confusion, and add bounds checking of arguments to some functions. Obtained from: NetBSD Reviewed by: peter MFC after: 2 weeks	2001-09-10 11:28:07 +00:00
Matthew Dillon	4e174404a3	Pushdown Giant for nfs syscalls (nfssvc())	2001-08-31 22:39:36 +00:00
Andrey A. Chernov	f6bf1abc1b	Stupid error from my side in prev. commit: \|\| -> &&	2001-08-23 18:02:29 +00:00
Andrey A. Chernov	e02faad5ca	Implement l_len<0 per POSIX check. Check for valid l_whence too.	2001-08-23 16:13:59 +00:00
Andrey A. Chernov	6c3f4fef64	Even better move: suppose that server is able to handle SEEK_END, so check arguments for all but not SEEK_END case, leaving SEEK_END handling for server	2001-08-23 14:21:26 +00:00
Andrey A. Chernov	e018907ed4	Apparently SEEK_END locking not supported by NFS. Previous variant returns EINVAL in that case, change it to EOPNOTSUPP.	2001-08-23 14:09:16 +00:00
Andrey A. Chernov	fb2f187058	Move <machine/> after <sys/> Pointed by: bde	2001-08-23 13:27:58 +00:00
Andrey A. Chernov	e9d095afdc	adv. lock: detect off_t overflow _before_ it occurse and return EOVERFLOW instead of EINVAL	2001-08-23 08:20:21 +00:00
Ian Dowse	02b31a0ee9	Fix a client-side memory leak in nfs_flush(). The code allocates a temporary array to store struct buf pointers if the list doesn't fit in a local array. Usually it frees the array when finished, but if it jumps to the 'again' label and the new list does fit in the local array then it can forget to free a previously malloc'd M_TEMP memory. Move the free() up a line so that it frees any previously allocated memory whether or not it needs to malloc a new array. Reviewed by: dillon	2001-08-01 10:25:13 +00:00
Peter Wemm	7b141d5db3	Check the filehandle size when mounting. Obtained from: Constantine Sapuntzakis <csapuntz@openbsd.org>	2001-07-30 20:01:59 +00:00
John Baldwin	617e358cdf	- Sort includes. - Update vmmeter statistics for vnode pagein/pageouts in getpages/putpages.	2001-07-04 20:14:59 +00:00
Matthew Dillon	0cddd8f023	With Alfred's permission, remove vm_mtx in favor of a fine-grained approach (this commit is just the first stage). Also add various GIANT_ macros to formalize the removal of Giant, making it easy to test in a more piecemeal fashion. These macros will allow us to test fine-grained locks to a degree before removing Giant, and also after, and to remove Giant in a piecemeal fashion via sysctl's on those subsystems which the authors believe can operate without Giant.	2001-07-04 16:20:28 +00:00
John Baldwin	bc2327c310	- Protect the mnt_vnode list with the mntvnode lock. - Use queue(9) macros.	2001-06-28 04:10:07 +00:00
Jake Burkholder	d389ead74f	Unlock the process returned from pfind() if it does not return NULL. This fixes a witness lock violation for nfssvc returning with locks held. Submitted by: Jean-Luc Richier <Jean-Luc.Richier@imag.fr> PR: kern/27776	2001-06-01 01:30:51 +00:00
Robert Watson	b1fc0ec1a7	o Merge contents of struct pcred into struct ucred. Specifically, add the real uid, saved uid, real gid, and saved gid to ucred, as well as the pcred->pc_uidinfo, which was associated with the real uid, only rename it to cr_ruidinfo so as not to conflict with cr_uidinfo, which corresponds to the effective uid. o Remove p_cred from struct proc; add p_ucred to struct proc, replacing original macro that pointed. p->p_ucred to p->p_cred->pc_ucred. o Universally update code so that it makes use of ucred instead of pcred, p->p_ucred instead of p->p_pcred, cr_ruidinfo instead of p_uidinfo, cr_{r,sv}{u,g}id instead of p_*, etc. o Remove pcred0 and its initialization from init_main.c; initialize cr_ruidinfo there. o Restruction many credential modification chunks to always crdup while we figure out locking and optimizations; generally speaking, this means moving to a structure like this: newcred = crdup(oldcred); ... p->p_ucred = newcred; crfree(oldcred); It's not race-free, but better than nothing. There are also races in sys_process.c, all inter-process authorization, fork, exec, and exit. o Remove sigio->sio_ruid since sigio->sio_ucred now contains the ruid; remove comments indicating that the old arrangement was a problem. o Restructure exec1() a little to use newcred/oldcred arrangement, and use improved uid management primitives. o Clean up exit1() so as to do less work in credential cleanup due to pcred removal. o Clean up fork1() so as to do less work in credential cleanup and allocation. o Clean up ktrcanset() to take into account changes, and move to using suser_xxx() instead of performing a direct uid==0 comparision. o Improve commenting in various kern_prot.c credential modification calls to better document current behavior. In a couple of places, current behavior is a little questionable and we need to check POSIX.1 to make sure it's "right". More commenting work still remains to be done. o Update credential management calls, such as crfree(), to take into account new ruidinfo reference. o Modify or add the following uid and gid helper routines: change_euid() change_egid() change_ruid() change_rgid() change_svuid() change_svgid() In each case, the call now acts on a credential not a process, and as such no longer requires more complicated process locking/etc. They now assume the caller will do any necessary allocation of an exclusive credential reference. Each is commented to document its reference requirements. o CANSIGIO() is simplified to require only credentials, not processes and pcreds. o Remove lots of (p_pcred==NULL) checks. o Add an XXX to authorization code in nfs_lock.c, since it's questionable, and needs to be considered carefully. o Simplify posix4 authorization code to require only credentials, not processes and pcreds. Note that this authorization, as well as CANSIGIO(), needs to be updated to use the p_cansignal() and p_cansched() centralized authorization routines, as they currently do not take into account some desirable restrictions that are handled by the centralized routines, as well as being inconsistent with other similar authorization instances. o Update libkvm to take these changes into account. Obtained from: TrustedBSD Project Reviewed by: green, bde, jhb, freebsd-arch, freebsd-audit	2001-05-25 16:59:11 +00:00
John Baldwin	ce70e0a964	Assert Giant is held by the caller rather than getting it and releasing it in getpages/putpages.	2001-05-23 22:26:05 +00:00
Ruslan Ermilov	99d300a1ec	- FDESC, FIFO, NULL, PORTAL, PROC, UMAP and UNION file systems were repo-copied from sys/miscfs to sys/fs. - Renamed the following file systems and their modules: fdesc -> fdescfs, portal -> portalfs, union -> unionfs. - Renamed corresponding kernel options: FDESC -> FDESCFS, PORTAL -> PORTALFS, UNION -> UNIONFS. - Install header files for the above file systems. - Removed bogus -I${.CURDIR}/../../sys CFLAGS from userland Makefiles.	2001-05-23 09:42:29 +00:00
Alfred Perlstein	2395531439	Introduce a global lock for the vm subsystem (vm_mtx). vm_mtx does not recurse and is required for most low level vm operations. faults can not be taken without holding Giant. Memory subsystems can now call the base page allocators safely. Almost all atomic ops were removed as they are covered under the vm mutex. Alpha and ia64 now need to catch up to i386's trap handlers. FFS and NFS have been tested, other filesystems will need minor changes (grabbing the vm lock when twiddling page properties). Reviewed (partially) by: jake, jhb	2001-05-19 01:28:09 +00:00
Ian Dowse	0864ef1e8a	Change the second argument of vflush() to an integer that specifies the number of references on the filesystem root vnode to be both expected and released. Many filesystems hold an extra reference on the filesystem root vnode, which must be accounted for when determining if the filesystem is busy and then released if it isn't busy. The old `skipvp' approach required individual filesystem xxx_unmount functions to re-implement much of vflush()'s logic to deal with the root vnode. All 9 filesystems that hold an extra reference on the root vnode got the logic wrong in the case of forced unmounts, so `umount -f' would always fail if there were any extra root vnode references. Fix this issue centrally in vflush(), now that we can. This commit also fixes a vnode reference leak in devfs, which could result in idle devfs filesystems that refuse to unmount. Reviewed by: phk, bp	2001-05-16 18:04:37 +00:00
Mark Murray	fb919e4d5a	Undo part of the tangle of having sys/lock.h and sys/mutex.h included in other "system" header files. Also help the deprecation of lockmgr.h by making it a sub-include of sys/lock.h and removing sys/lockmgr.h form kernel .c files. Sort sys/*.h includes where possible in affected files. OK'ed by: bde (with reservations)	2001-05-01 08:13:21 +00:00
Poul-Henning Kamp	b7ebffbc08	Add a vop_stdbmap(), and make it part of the default vop vector. Make 7 filesystems which don't really know about VOP_BMAP rely on the default vector, rather than more or less complete local vop_nopbmap() implementations.	2001-04-29 11:48:41 +00:00
Alfred Perlstein	f411fba5d3	Remove incorrect comment. Submitted by: quinot@inf.enst.fr <quinot@inf.enst.fr> PR: kern/26893	2001-04-29 03:10:24 +00:00
Greg Lehey	60fb0ce365	Revert consequences of changes to mount.h, part 2. Requested by: bde	2001-04-29 02:45:39 +00:00
Greg Lehey	d98dc34f52	Correct #includes to work with fixed sys/mount.h.	2001-04-23 09:05:15 +00:00
Alfred Perlstein	d8d5fa8805	vnode_pager_freepage() is really vm_page_free() in disguise, nuke vnode_pager_freepage() and replace all calls to it with vm_page_free()	2001-04-19 06:18:23 +00:00
Alfred Perlstein	603c86672c	Implement client side NFS locks. Obtained from: BSD/os Import Ok'd by: mckusick, jkh, motd on builder.freebsd.org	2001-04-17 20:45:23 +00:00
Poul-Henning Kamp	f84e29a06c	This patch removes the VOP_BWRITE() vector. VOP_BWRITE() was a hack which made it possible for NFS client side to use struct buf with non-bio backing. This patch takes a more general approach and adds a bp->b_op vector where more methods can be added. The success of this patch depends on bp->b_op being initialized all relevant places for some value of "relevant" which is not easy to determine. For now the buffers have grown a b_magic element which will make such issues a tiny bit easier to debug.	2001-04-17 08:56:39 +00:00
Peter Wemm	9d10eb0c0c	Create debug.hashstat.[raw]nchash and debug.hashstat.[raw]nfsnode to enable easy access to the hash chain stats. The raw prefixed versions dump an integer array to userland with the chain lengths. This cheats and calls it an array of 'struct int' rather than 'int' or sysctl -a faithfully dumps out the 128K array on an average machine. The non-raw versions return 4 integers: count, number of chains used, maximum chain length, and percentage utilization (fixed point, multiplied by 100). The raw forms are more useful for analyzing the hash distribution, while the other form can be read easily by humans and stats loggers.	2001-04-11 00:39:20 +00:00
Robert Watson	2955f0b360	o Rather than arbitrarily construct a credential in the nfs_statfs() VFS operation, make use of the calling process's credential. This solution may not be ideal (there are a number of other possible proposals, including making use of the proc0 credential, adding a credential argument to the VFSOP, and switching from a hard-coded ucred to a hard-coded nfscred), it is simple and appears to work. The arguments against using simply crget() are fairly strong: it is the only place in the code (other than a nearly identical invocation in ncp) where crget() is invoked, other than in the process credential creation code; as ucred becomes extensible, this use of crget() without appropriate context results in less and less meaningful credential data. The implementation here will probably be tweaked as a result of experimentation and further exploration of the requirements. In the mean-time, it allows progress to be made in ucred expansion for new security models without causing a crash every time df is used on an NFS mounted file system. This code has been interop tested against FreeBSD and Solaris NFS servers. While using the process credentials should not introduce interop problems, please let me know if any turn out to exist. Reviewed by: freebsd-arch	2001-04-05 06:12:38 +00:00
Peter Wemm	439fea92c2	Use the same API as the example code. Allow the initial hash value to be passed in, as the examples do. Incrementally hash in the dvp->v_id (using the official api) rather than add it. This seems to help power-of-two predictable filename trees where the filenames repeat on a power-of-two cycle and the directory trees have power-of-two components in it. The simple add then mask was causing things like 12000+ entry collision chains while most other entries have between 0 and 3 entries each. This way seems to improve things.	2001-03-20 02:10:18 +00:00
Peter Wemm	6eb39ac8fc	Use a generic implementation of the Fowler/Noll/Vo hash (FNV hash). Make the name cache hash as well as the nfsnode hash use it. As a special tweak, create an unsigned version of register_t. This allows us to use a special tweak for the 64 bit versions that significantly speeds up the i386 version (ie: int64 XOR int64 is slower than int64 XOR int32). The code layout is a little strange for the string function, but I was able to get between 5 to 10% improvement over the original version I started with. The layout affects gcc code generation choices and this way was fastest on x86 and alpha. Note that 'CPUTYPE=p3' etc makes a fair difference to this. It is around 45% faster with -march=pentiumpro on a p6 cpu.	2001-03-17 09:31:06 +00:00
Peter Wemm	be1d4058eb	Dramatically improve the lame nfs_hash(). This is based on the Fowler / Noll / Vo Hash (http://www.isthe.com/chongo/tech/comp/fnv/). This improves hash coverage a massive amount. We were seeing one set of machines that were using 0.84% of their 131072 entry nfsnode hash buckets with maximum chain lengths of up to ~500 entries. The machine was spending nearly 100% of its time in 'system'. A test with this has pushed the coverage from a few perCent up to 91% utilization with a max chain length of 11. Submitted by: David Filo	2001-03-17 05:43:01 +00:00
John Baldwin	19eb87d22a	Grab the process lock while calling psignal and before calling psignal.	2001-03-07 03:37:06 +00:00
Adrian Chadd	f3a90da995	Reviewed by: jlemon An initial tidyup of the mount() syscall and VFS mount code. This code replaces the earlier work done by jlemon in an attempt to make linux_mount() work. * the guts of the mount work has been moved into vfs_mount(). * move `type', `path' and `flags' from being userland variables into being kernel variables in vfs_mount(). `data' remains a pointer into userspace. * Attempt to verify the `type' and `path' strings passed to vfs_mount() aren't too long. * rework mount() and linux_mount() to take the userland parameters (besides data, as mentioned) and pass kernel variables to vfs_mount(). (linux_mount() already did this, I've just tidied it up a little more.) * remove the copyin() stuff for `path'. `data' still requires copyin() since its a pointer into userland. * set `mount->mnt_statf_mntonname' in vfs_mount() rather than in each filesystem. This variable is generally initialised with `path', and each filesystem can override it if they want to. * NOTE: f_mntonname is intiailised with "/" in the case of a root mount.	2001-03-01 21:00:17 +00:00
Matthew Dillon	63692125a9	Fix lockup for loopback NFS mounts. The pipelined I/O limitations could be hit on the client side and prevent the server side from retiring writes. Pipeline operations turned off for all READs (no big loss since reads are usually synchronous) and for NFS writes, and left on for the default bwrite(). (MFC expected prior to 4.3 freeze) Testing by: mjacob, dillon	2001-02-28 04:13:11 +00:00
Brian Feldman	c0511d3b58	Switch to using a struct xucred instead of a struct xucred when not actually in the kernel. This structure is a different size than what is currently in -CURRENT, but should hopefully be the last time any application breakage is caused there. As soon as any major inconveniences are removed, the definition of the in-kernel struct ucred should be conditionalized upon defined(_KERNEL). This also changes struct export_args to remove dependency on the constantly-changing struct ucred, as well as limiting the bounds of the size fields to the correct size. This means: a) mountd and friends won't break all the time, b) mountd and friends won't crash the kernel all the time if they don't know what they're doing wrt actual struct export_args layout. Reviewed by: bde	2001-02-18 13:30:20 +00:00
Tor Egge	7d1af7b215	Enable use of DHCP extensions. Reviewed by: Per Kristian Hove <Per.Hove@math.ntnu.no>	2001-02-02 02:35:40 +00:00
Matthew Dillon	d2d00d11be	NFS O_EXCL file create semantics temporarily uses file attributes to store the file verifier. The NFS client is supposed to do a SETATTR after a successful O_EXCL open/create to clean up the attributes. FreeBSD's client code was generating a SETATTR rpc but was not generating an access or modification time update within that rpc, leaving the file with a broken access time that solaris chokes on (and it doesn't look very nice when you ls -lua under FreeBSD either!). Fixed.	2001-01-04 22:45:19 +00:00
Bosko Milekic	2a0c503e7a	* Rename M_WAIT mbuf subsystem flag to M_TRYWAIT. This is because calls with M_WAIT (now M_TRYWAIT) may not wait forever when nothing is available for allocation, and may end up returning NULL. Hopefully we now communicate more of the right thing to developers and make it very clear that it's necessary to check whether calls with M_(TRY)WAIT also resulted in a failed allocation. M_TRYWAIT basically means "try harder, block if necessary, but don't necessarily wait forever." The time spent blocking is tunable with the kern.ipc.mbuf_wait sysctl. M_WAIT is now deprecated but still defined for the next little while. * Fix a typo in a comment in mbuf.h * Fix some code that was actually passing the mbuf subsystem's M_WAIT to malloc(). Made it pass M_WAITOK instead. If we were ever to redefine the value of the M_WAIT flag, this could have became a big problem.	2000-12-21 21:44:31 +00:00
David Malone	7cc0979fd6	Convert more malloc+bzero to malloc+M_ZERO. Submitted by: josh@zipperup.org Submitted by: Robert Drehmel <robd@gmx.net>	2000-12-08 21:51:06 +00:00
Poul-Henning Kamp	a52585d77e	Simplify the tprintf() API. Loose the special <sys/tprintf.h> #include file.	2000-11-26 20:35:21 +00:00
Matthew Dillon	279d722604	This patchset fixes a large number of file descriptor race conditions. Pre-rfork code assumed inherent locking of a process's file descriptor array. However, with the advent of rfork() the file descriptor table could be shared between processes. This patch closes over a dozen serious race conditions related to one thread manipulating the table (e.g. closing or dup()ing a descriptor) while another is blocked in an open(), close(), fcntl(), read(), write(), etc... PR: kern/11629 Discussed with: Alexander Viro <viro@math.psu.edu>	2000-11-18 21:01:04 +00:00
Kirk McKusick	d6514f21d7	In preparation for deprecating CIRCLEQ macros in favor of TAILQ macros which provide the same functionality and are a bit more efficient, convert use of CIRCLEQ's in NFS to TAILQ's.	2000-11-14 08:00:39 +00:00
Eivind Eklund	e3c4036b18	Give vop_mmap an untimely death. The opportunity to give it a timely death timed out in 1996.	2000-11-01 17:57:24 +00:00
Poul-Henning Kamp	53ce36d17a	Remove unneeded #include <sys/proc.h> lines.	2000-10-29 13:57:19 +00:00
Tor Egge	e4e7a9a4e9	Reduce kernel stack usage by not having large packets on the stack. Supply correct size parameter to dhcpd. Replace some magic numbers with macro names. Handle more than one interface.	2000-10-29 01:19:32 +00:00
Tor Egge	5b93d1da3f	Eliminate some bitrot (nonexisting member variable names). Don't use curproc when a proc pointer is available.	2000-10-24 23:33:01 +00:00
Tor Egge	6d7518c134	Style fixes.	2000-10-24 22:40:18 +00:00
Tor Egge	f6ee793a3c	Make RPC timeout message more readable. Supply proc pointer to sosend.	2000-10-24 22:37:55 +00:00
David Malone	dc6dd1259f	Problem to avoid processes getting stuck in "vmopar". From Ian's mail: The problem seems to originate with NFS's postop_attr information that is returned with a read or write RPC. Within a vm_fault context, the code cannot deal with vnode_pager_setsize() shrinking a vnode. The workaround in the patch below stops the nfsm_postop_attr() macro from ever shrinking a vnode. If the new size in the postop_attr information is smaller, then it just sets the nfsnode n_attrstamp to 0 to stop the wrong size getting used in the future. This change only affects postop_attr attributes; the nfsm_loadattr() macro works as normal. The change is implemented by adding a new argument to nfs_loadattrcache() called 'dontshrink'. When this is non-zero, nfs_loadattrcache() will never reduce the vnode/nfsnode size; instead it zeros n_attrstamp. There remain other was processes can get stuck in vmopar. Submitted by: Ian Dowse <iedowse@maths.tcd.ie> Reviewed by: dillon Tested by: Vadim Belman <voland@lflat.org>	2000-10-24 10:13:36 +00:00
Boris Popov	c523a62949	Make nfs PDIRUNLOCK aware. Now it is possible to use nullfs mounts on top of nfs mounts, but there can be side effects because nfs uses shared locks for vnodes.	2000-10-15 08:06:32 +00:00
Boris Popov	823548e131	Add missed vop_stdunlock() for fifo's vnops (this affects only v2 mounts). Give nfs's node lock its own name.	2000-10-15 08:01:28 +00:00
Jason Evans	a18b1f1d4d	Convert lockmgr locks from using simple locks to using mutexes. Add lockdestroy() and appropriate invocations, which corresponds to lockinit() and must be called to clean up after a lockmgr lock is no longer needed.	2000-10-04 01:29:17 +00:00
Boris Popov	67e871664b	Add a lock structure to vnode structure. Previously it was either allocated separately (nfs, cd9660 etc) or keept as a first element of structure referenced by v_data pointer(ffs). Such organization leads to known problems with stacked filesystems. From this point vop_nolock() functions maintain only interlock lock. vop_stdlock() functions maintain built-in v_lock structure using lockmgr(). vop_sharedlock() is compatible with vop_stdunlock(), but maintains a shared lock on vnode. If filesystem wishes to export lockmgr compatible lock, it can put an address of this lock to v_vnlock field. This indicates that the upper filesystem can take advantage of it and use single lock structure for entire (or part) of stack of vnodes. This field shouldn't be examined or modified by VFS code except for initialization purposes. Reviewed in general by: mckusick	2000-09-25 15:24:04 +00:00
Mike Smith	a77773909d	Don't scan for the "right" network interface by shooting in the dark. Assume that the nfs_diskless structure is correctly set up; the provider ought to be getting it right.	2000-09-05 22:29:36 +00:00
Kirk McKusick	9b97113391	This patch corrects the first round of panics and hangs reported with the new snapshot code. Update addaliasu to correctly implement the semantics of the old checkalias function. When a device vnode first comes into existence, check to see if an anonymous vnode for the same device was created at boot time by bdevvp(). If so, adopt the bdevvp vnode rather than creating a new vnode for the device. This corrects a problem which caused the kernel to panic when taking a snapshot of the root filesystem. Change the calling convention of vn_write_suspend_wait() to be the same as vn_start_write(). Split out softdep_flushworklist() from softdep_flushfiles() so that it can be used to clear the work queue when suspending filesystem operations. Access to buffers becomes recursive so that snapshots can recursively traverse their indirect blocks using ffs_copyonwrite() when checking for the need for copy on write when flushing one of their own indirect blocks. This eliminates a deadlock between the syncer daemon and a process taking a snapshot. Ensure that softdep_process_worklist() can never block because of a snapshot being taken. This eliminates a problem with buffer starvation. Cleanup change in ffs_sync() which did not synchronously wait when MNT_WAIT was specified. The result was an unclean filesystem panic when doing forcible unmount with heavy filesystem I/O in progress. Return a zero'ed block when reading a block that was not in use at the time that a snapshot was taken. Normally, these blocks should never be read. However, the readahead code will occationally read them which can cause unexpected behavior. Clean up the debugging code that ensures that no blocks be written on a filesystem while it is suspended. Snapshots must explicitly label the blocks that they are writing during the suspension so that they do not cause a `write on suspended filesystem' panic. Reorganize ffs_copyonwrite() to eliminate a deadlock and also to prevent a race condition that would permit the same block to be copied twice. This change eliminates an unexpected soft updates inconsistency in fsck caused by the double allocation. Use bqrelse rather than brelse for buffers that will be needed soon again by the snapshot code. This improves snapshot performance.	2000-07-24 05:28:33 +00:00
Paul Saab	fb27899f3b	Correctly set the Maximum DHCP Message Size. bootpd now works again as well as ISC dhcpd.	2000-06-13 09:32:09 +00:00
Jake Burkholder	e39756439c	Back out the previous change to the queue(3) interface. It was not discussed and should probably not happen. Requested by: msmith and others	2000-05-26 02:09:24 +00:00
Jake Burkholder	740a1973a6	Change the way that the queue(3) structures are declared; don't assume that the type argument to _HEAD and _ENTRY is a struct. Suggested by: phk Reviewed by: phk Approved by: mdodd	2000-05-23 20:41:01 +00:00
Poul-Henning Kamp	831e32f863	Include a RFC 1533 "Maximum DHCP Message Size" option in our request. ISC DHCP will limit the reply length to 64 bytes for bootp replies unless we explicitly tell it we can do more. We tell it that we can do 1200 bytes.	2000-05-07 14:29:19 +00:00
Poul-Henning Kamp	9626b608de	Separate the struct bio related stuff out of <sys/buf.h> into <sys/bio.h>. <sys/bio.h> is now a prerequisite for <sys/buf.h> but it shall not be made a nested include according to bdes teachings on the subject of nested includes. Diskdrivers and similar stuff below specfs::strategy() should no longer need to include <sys/buf.> unless they need caching of data. Still a few bogus uses of struct buf to track down. Repocopy by: peter	2000-05-05 09:59:14 +00:00
Poul-Henning Kamp	2c9b67a8df	Remove unneeded #include <vm/vm_zone.h> Generated by: src/tools/tools/kerninclude	2000-04-30 18:52:11 +00:00
Poul-Henning Kamp	87150cb06d	s/biowait/bufwait/g Prodded by: several.	2000-04-29 16:25:22 +00:00
Poul-Henning Kamp	3389ae9350	Remove ~25 unneeded #include <sys/conf.h> Remove ~60 unneeded #include <sys/malloc.h>	2000-04-19 14:58:28 +00:00
Poul-Henning Kamp	8177437d85	Complete the bio/buf divorce for all code below devfs::strategy Exceptions: Vinum untouched. This means that it cannot be compiled. Greg Lehey is on the case. CCD not converted yet, casts to struct buf (still safe) atapi-cd casts to struct buf to examine B_PHYS	2000-04-15 05:54:02 +00:00
Poul-Henning Kamp	c244d2de43	Move B_ERROR flag to b_ioflags and call it BIO_ERROR. (Much of this done by script) Move B_ORDERED flag to b_ioflags and call it BIO_ORDERED. Move b_pblkno and b_iodone_chain to struct bio while we transition, they will be obsoleted once bio structs chain/stack. Add bio_queue field for struct bio aware disksort. Address a lot of stylistic issues brought up by bde.	2000-04-02 15:24:56 +00:00
Matthew Dillon	8d1b3828fa	Add a sysctl to specify the amount of UDP receive space NFS should reserve, in maximal NFS packets. Originally only 2 packets worth of space was reserved. The default is now 4, which appears to greatly improve performance for slow to mid-speed machines on gigabit networks. Add documentation and correct some prior documentation. Problem Researched by: Andrew Gallatin <gallatin@cs.duke.edu> Approved by: jkh	2000-03-27 21:38:35 +00:00
Poul-Henning Kamp	b99c307a21	Rename the existing BUF_STRATEGY() to DEV_STRATEGY() substitute BUF_WRITE(foo) for VOP_BWRITE(foo->b_vp, foo) substitute BUF_STRATEGY(foo) for VOP_STRATEGY(foo->b_vp, foo) This patch is machine generated except for the ccd.c and buf.h parts.	2000-03-20 11:29:10 +00:00
Poul-Henning Kamp	21144e3bf1	Remove B_READ, B_WRITE and B_FREEBUF and replace them with a new field in struct buf: b_iocmd. The b_iocmd is enforced to have exactly one bit set. B_WRITE was bogusly defined as zero giving rise to obvious coding mistakes. Also eliminate the redundant struct buf flag B_CALL, it can just as efficiently be done by comparing b_iodone to NULL. Should you get a panic or drop into the debugger, complaining about "b_iocmd", don't continue. It is likely to write on your disk where it should have been reading. This change is a step in the direction towards a stackable BIO capability. A lot of this patch were machine generated (Thanks to style(9) compliance!) Vinum users: Greg has not had time to test this yet, be careful.	2000-03-20 10:44:49 +00:00
Peter Wemm	242c5536ea	Clean up some loose ends in the network code, including the X.25 and ISO #ifdefs. Clean out unused netisr's and leftover netisr linker set gunk. Tested on x86 and alpha, including world. Approved by: jkh	2000-02-13 03:32:07 +00:00
Matthew Dillon	34ddf54812	The alpha build cuases the 'nfsuid bloated' warning to occur. Well, there is nothing we can do about it. In fact, after further review there simply are not very many instances of the two structures NFS checks for 'bloat' so I've decided to simply rip the checks out entirely. Submitted by: Andrew Gallatin <gallatin@cs.duke.edu>	2000-01-13 20:18:25 +00:00
Yoshinobu Inoue	fb59c426ff	tcp updates to support IPv6. also a small patch to sys/nfs/nfs_socket.c, as max_hdr size change. Reviewed by: freebsd-arch, cvs-committers Obtained from: KAME project	2000-01-09 19:17:30 +00:00
Matthew Dillon	c37c9620cd	Enhance reassignbuf(). When a buffer cannot be time-optimally inserted into vnode dirtyblkhd we append it to the list instead of prepend it to the list in order to maintain a 'forward' locality of reference, which is arguably better then 'reverse'. The original algorithm did things this way to but at a huge time cost. Enhance the append interlock for NFS writes to handle intr/soft mounts better. Fix the hysteresis for NFS async daemon I/O requests to reduce the number of unnecessary context switches. Modify handling of NFS mount options. Any given user option that is too high now defaults to the kernel maximum for that option rather then the kernel default for that option. Reviewed by: Alfred Perlstein <bright@wintelcom.net>	2000-01-05 05:11:37 +00:00
Matthew Dillon	54986abd15	Fix at least one source of the continued 'NFS append race'. close() was calling nfs_flush() and then clearing the NMODIFIED bit. This is not legal since there might still be dirty buffers after the nfs_flush (for example, pending commits). The clearing of this bit in turn prevented a necessary vinvalbuf() from occuring leaving left over dirty buffers even after truncating the file in a new operation. The fix is to simply not clear NMODIFIED. Also added a sysctl vfs.nfs.nfsv3_commit_on_close which, if set to 1, will cause close() to do a stage 1 write AND a stage 2 commit synchronously. By default only the stage 1 write is done synchronously. Reviewed by: Alfred Perlstein <bright@wintelcom.net>	2000-01-05 00:32:18 +00:00
Peter Wemm	c447342094	Change #ifdef KERNEL to #ifdef _KERNEL in the public headers. "KERNEL" is an application space macro and the applications are supposed to be free to use it as they please (but cannot). This is consistant with the other BSD's who made this change quite some time ago. More commits to come.	1999-12-29 05:07:58 +00:00
Alfred Perlstein	20883b0f10	make getfh a standard syscall instead of dependant on having NFSSERVER defined, useful for userland fileservers that want to use a filehandle type interface to the filesystem. Submitted by: Assar Westerlund assar@stacken.kth.se PR: kern/15452	1999-12-21 20:21:12 +00:00
Robert Watson	91f37dcba1	Second pass commit to introduce new ACL and Extended Attribute system calls, vnops, vfsops, both in /kern, and to individual file systems that require a vfsop_ array entry. Reviewed by: eivind	1999-12-19 06:08:07 +00:00
Brian Feldman	d25f3712b7	M_PREPEND-related cleanups (unregisterifying struct mbuf *s).	1999-12-19 01:55:37 +00:00
Eivind Eklund	762e6b856c	Introduce NDFREE (and remove VOP_ABORTOP)	1999-12-15 23:02:35 +00:00
Matthew Dillon	b7303db36e	Fix two problems: First, fix the append seek position race that can occur due to np->n_size potentially changing if nfs_getcacheblk() blocks in nfs_write(). Second, under -current we must supply the proper bufsize when obtaining buffers that straddle the EOF, but due to the fact that np->n_size can change out from under us it is possible that we may specify the wrong buffer size and wind up truncating dirty data written by another process. Both problems are solved by implementing nfs_rslock(), which allows us to lock around sensitive buffer cache operations such as those that occur when appending to a file. It is believed that this race is responsible for causing dirtyoff/dirtyend and (in stable) validoff/validend to exceed the buffer size. Therefore we have now added a warning printf for the dirtyoff/end case in current. However, we have introduced a new problem which we need to fix at some point, and that is that soft or intr NFS mounts may become uninterruptable from the point of view of process A which is stuck waiting on rslock while process B is stuck doing the rpc. To unstick process A, process B would have to be interrupted first. Reviewed by: Alfred Perlstein <bright@wintelcom.net>	1999-12-14 19:07:54 +00:00
Matthew Dillon	4682c8eac9	Fix a timeout deadlock that can occur when the process holding the receive lock hasn't yet managed to send its own request. PR: kern/15055 Submitted by: Ian Dowse iedowse@maths.tcd.ie	1999-12-13 04:24:55 +00:00
Matthew Dillon	5f3bfd608d	Fix a number of server-side issues related to aborting badly formed NFS packets, mainly initializing structure pointers to NULL which are conditionally freed prior to return. PR: kern/15249 Submitted by: Ian Dowse <iedowse@maths.tcd.ie>	1999-12-12 07:06:39 +00:00
Matthew Dillon	ea94c7b968	Synopsis of problem being fixed: Dan Nelson originally reported that blocks of zeros could wind up in a file written to over NFS by a client. The problem only occurs a few times per several gigabytes of data. This problem turned out to be bug #3 below. bug #1: B_CLUSTEROK must be cleared when an NFS buffer is reverted from stage 2 (ready for commit rpc) to stage 1 (ready for write). Reversions can occur when a dirty NFS buffer is redirtied with new data. Otherwise the VFS/BIO system may end up thinking that a stage 1 NFS buffer is clusterable. Stage 1 NFS buffers are not clusterable. bug #2: B_CLUSTEROK was inappropriately set for a 'short' NFS buffer (short buffers only occur near the EOF of the file). Change to only set when the buffer is a full biosize (usually 8K). This bug has no effect but should be fixed in -current anyway. It need not be backported. bug #3: B_NEEDCOMMIT was inappropriately set in nfs_flush() (which is typically only called by the update daemon). nfs_flush() does a multi-pass loop but due to the lack of vnode locking it is possible for new buffers to be added to the dirtyblkhd list while a flush operation is going on. This may result in nfs_flush() setting B_NEEDCOMMIT on a buffer which has NOT yet gone through its stage 1 write, causing only the commit rpc to be made and thus causing the contents of the buffer to be thrown away (never sent to the server). The patch also contains some cleanup, which only applies to the commit into -current. Reviewed by: dg, julian Originally Reported by: Dan Nelson <dnelson@emsphone.com>	1999-12-12 06:09:57 +00:00
Eivind Eklund	6bdfe06ad9	Lock reporting and assertion changes. * lockstatus() and VOP_ISLOCKED() gets a new process argument and a new return value: LK_EXCLOTHER, when the lock is held exclusively by another process. * The ASSERT_VOP_(UN)LOCKED family is extended to use what this gives them * Extend the vnode_if.src format to allow more exact specification than locked/unlocked. This commit should not do any semantic changes unless you are using DEBUG_VFS_LOCKS. Discussed with: grog, mch, peter, phk Reviewed by: peter	1999-12-11 16:13:02 +00:00
Matthew Dillon	98733bd871	The symlink implementation could improperly return a NULL vp along with a 0 error code. The problem occured with NFSv2 mounts and also with any NFSv3 mount returning an EEXIST error (which is translated to 0 prior to return). The reply to the rpc only contains the file handle for the no-error case under NFSv3. The error case under NFSv3 and all cases under NFSv2 do not return the file handle. The fix is to do a secondary lookup to obtain the file handle and thus be able to generate a return vnode for the situations where the rpc reply does not contain the required information. The bug was originally introduced when VOP_SYMLINK semantics were changed for -CURRENT. The NFS symlink implementation was not properly modified to go along with the change despite the fact that three people reviewed the code. It took four attempts to get the current fix correct with five people. Is NFS obfuscated? Ha! Reviewed by: Alfred Perlstein <bright@wintelcom.net> Testing and Discussion: "Viren R.Shah" <viren@rstcorp.com>, Eivind Eklund <eivind@FreeBSD.ORG>, Ian Dowse <iedowse@maths.tcd.ie>	1999-11-30 06:56:15 +00:00
Eivind Eklund	679106b15a	Remap the error EEXISTS => 0 before using error to determine if we should return a vp.	1999-11-27 18:14:41 +00:00
Matthew Dillon	b314ed9662	nm_srtt and nm_sdrtt are arrays[4]. Remove explicit initialization of element [4] in both, which goes beyond the end of the array, leaving [0], [1], [2], and [3]. This bug did not cause any problems since the overrun fields are initialized after the bogus array init but needs to be fixed anyway. Submitted by: Ian Dowse <iedowse@maths.tcd.ie>	1999-11-22 04:50:09 +00:00
Eivind Eklund	b6335212d6	Fix VOP_MKNOD for loss of WILLRELE. I don't know how I could have missed this in the first place :-( Noticed by: bde	1999-11-20 16:09:10 +00:00
Eivind Eklund	dd8c04f4c7	Remove WILLRELE from VOP_SYMLINK Note: Previous commit to these files (except coda_vnops and devfs_vnops) that claimed to remove WILLRELE from VOP_RENAME actually removed it from VOP_MKNOD.	1999-11-13 20:58:17 +00:00
Matthew Dillon	a6aa6d9137	Remove special case socket sharing code in order to allow nfsd to bind IP addresses to udp/cltp sockets separately. PR: kern/13049 Reviewed by: David Malone <dwmalone@maths.tcd.ie>, freebsd-current	1999-11-11 17:24:02 +00:00
Matthew Dillon	6b21e94604	Fix nfssvc_addsock() to not attempt to free a NULL socket structure when returning an error. Bug fix was extracted from the PR. The PR is not yet entirely resolved by this commit. PR: kern/13049 Reviewed by: Matt Dillon <dillon@freebsd.org> Submitted by: Ian Dowse <iedowse@maths.tcd.ie>	1999-11-08 19:10:16 +00:00
Mike Smith	b7017a8210	Call bootpc_init before we try to mount an NFS root, if we're configured to use BOOTP for NFS root discovery. The entire interface setup inside nfs_mountroot is evil, and should die.	1999-11-01 23:55:38 +00:00
Poul-Henning Kamp	923502ff91	useracc() the prequel: Merge the contents (less some trivial bordering the silly comments) of <vm/vm_prot.h> and <vm/vm_inherit.h> into <vm/vm.h>. This puts the #defines for the vm_inherit_t and vm_prot_t types next to their typedefs. This paves the road for the commit to follow shortly: change useracc() to use VM_PROT_{READ\|WRITE} rather than B_{READ\|WRITE} as argument.	1999-10-29 18:09:36 +00:00
Matthew Dillon	a5d3fe3f85	Move NFS access cache hits/misses into nfsstats structure so /usr/bin/nfsstat can get to it easily.	1999-10-25 19:22:33 +00:00
Poul-Henning Kamp	3b6fb88590	Before we start to mess with the VFS name-cache clean things up a little bit: Isolate the namecache in its own file, and give it a dedicated malloc type.	1999-10-03 12:18:29 +00:00
Marcel Moolenaar	16df98ecc6	Careless use of struct proc *p caused major problems. 'p' is allowed to be NULL in this function (nfs_sigintr). Reorder the statements and guard them all with a single if (p != NULL). reported, reviewed and tested by: jdp	1999-09-29 20:12:39 +00:00
Marcel Moolenaar	2c42a14602	sigset_t change (part 2 of 5) ----------------------------- The core of the signalling code has been rewritten to operate on the new sigset_t. No methodological changes have been made. Most references to a sigset_t object are through macros (see signalvar.h) to create a level of abstraction and to provide a basis for further improvements. The NSIG constant has not been changed to reflect the maximum number of signals possible. The reason is that it breaks programs (especially shells) which assume that all signals have a non-null name in sys_signame. See src/bin/sh/trap.c for an example. Instead _SIG_MAXSIG has been introduced to hold the maximum signal possible with the new sigset_t. struct sigprop has been moved from signalvar.h to kern_sig.c because a) it is only used there, and b) access must be done though function sigprop(). The latter because the table doesn't holds properties for all signals, but only for the first NSIG signals. signal.h has been reorganized to make reading easier and to add the new and/or modified structures. The "old" structures are moved to signalvar.h to prevent namespace polution. Especially the coda filesystem suffers from the change, because it contained lines like (p->p_sigmask == SIGIO), which is easy to do for integral types, but not for compound types. NOTE: kdump (and port linux_kdump) must be recompiled. Thanks to Garrett Wollman and Daniel Eischen for pressing the importance of changing sigreturn as well.	1999-09-29 15:03:48 +00:00
Matthew Dillon	8fdd2461b3	Add comment to clarify a commit rpc optimization already being performed.	1999-09-20 19:10:28 +00:00
Matthew Dillon	b5acbc8b9c	Asynchronized client-side nfs_commit. NFS commit operations were previously issued synchronously even if async daemons (nfsiod's) were available. The commit has been moved from the strategy code to the doio code in order to asynchronize it. Removed use of lastr in preparation for removal of vnode->v_lastr. It has been replaced with seqcount, which is already supported by the system and, in fact, gives us a better heuristic for sequential detection then lastr ever did. Made major performance improvements to the server side commit. The server previously fsync'd the entire file for each commit rpc. The server now bawrite()s only those buffers related to the offset/size specified in the commit rpc. Note that we do not commit the meta-data yet. This works still needs to be done. Note that a further optimization can be done (and has not yet been done) on the client: we can merge multiple potential commit rpc's into a single rpc with a greater file offset/size range and greatly reduce rpc traffic. Reviewed by: Alan Cox <alc@cs.rice.edu>, David Greenman <dg@root.com>	1999-09-17 05:57:57 +00:00

... 3 4 5 6 7 ...

799 Commits