freebsd-skq

Author	SHA1	Message	Date
julian	1dfc5c98a4	Add code to allow the system to handle multiple routing tables. This particular implementation is designed to be fully backwards compatible and to be MFC-able to 7.x (and 6.x) Currently the only protocol that can make use of the multiple tables is IPv4 Similar functionality exists in OpenBSD and Linux. From my notes: ----- One thing where FreeBSD has been falling behind, and which by chance I have some time to work on is "policy based routing", which allows different packet streams to be routed by more than just the destination address. Constraints: ------------ I want to make some form of this available in the 6.x tree (and by extension 7.x) , but FreeBSD in general needs it so I might as well do it in -current and back port the portions I need. One of the ways that this can be done is to have the ability to instantiate multiple kernel routing tables (which I will now refer to as "Forwarding Information Bases" or "FIBs" for political correctness reasons). Which FIB a particular packet uses to make the next hop decision can be decided by a number of mechanisms. The policies these mechanisms implement are the "Policies" referred to in "Policy based routing". One of the constraints I have if I try to back port this work to 6.x is that it must be implemented as a EXTENSION to the existing ABIs in 6.x so that third party applications do not need to be recompiled in timespan of the branch. This first version will not have some of the bells and whistles that will come with later versions. It will, for example, be limited to 16 tables in the first commit. Implementation method, Compatible version. (part 1) ------------------------------- For this reason I have implemented a "sufficient subset" of a multiple routing table solution in Perforce, and back-ported it to 6.x. (also in Perforce though not always caught up with what I have done in -current/P4). The subset allows a number of FIBs to be defined at compile time (8 is sufficient for my purposes in 6.x) and implements the changes needed to allow IPV4 to use them. I have not done the changes for ipv6 simply because I do not need it, and I do not have enough knowledge of ipv6 (e.g. neighbor discovery) needed to do it. Other protocol families are left untouched and should there be users with proprietary protocol families, they should continue to work and be oblivious to the existence of the extra FIBs. To understand how this is done, one must know that the current FIB code starts everything off with a single dimensional array of pointers to FIB head structures (One per protocol family), each of which in turn points to the trie of routes available to that family. The basic change in the ABI compatible version of the change is to extent that array to be a 2 dimensional array, so that instead of protocol family X looking at rt_tables[X] for the table it needs, it looks at rt_tables[Y][X] when for all protocol families except ipv4 Y is always 0. Code that is unaware of the change always just sees the first row of the table, which of course looks just like the one dimensional array that existed before. The entry points rtrequest(), rtalloc(), rtalloc1(), rtalloc_ign() are all maintained, but refer only to the first row of the array, so that existing callers in proprietary protocols can continue to do the "right thing". Some new entry points are added, for the exclusive use of ipv4 code called in_rtrequest(), in_rtalloc(), in_rtalloc1() and in_rtalloc_ign(), which have an extra argument which refers the code to the correct row. In addition, there are some new entry points (currently called rtalloc_fib() and friends) that check the Address family being looked up and call either rtalloc() (and friends) if the protocol is not IPv4 forcing the action to row 0 or to the appropriate row if it IS IPv4 (and that info is available). These are for calling from code that is not specific to any particular protocol. The way these are implemented would change in the non ABI preserving code to be added later. One feature of the first version of the code is that for ipv4, the interface routes show up automatically on all the FIBs, so that no matter what FIB you select you always have the basic direct attached hosts available to you. (rtinit() does this automatically). You CAN delete an interface route from one FIB should you want to but by default it's there. ARP information is also available in each FIB. It's assumed that the same machine would have the same MAC address, regardless of which FIB you are using to get to it. This brings us as to how the correct FIB is selected for an outgoing IPV4 packet. Firstly, all packets have a FIB associated with them. if nothing has been done to change it, it will be FIB 0. The FIB is changed in the following ways. Packets fall into one of a number of classes. 1/ locally generated packets, coming from a socket/PCB. Such packets select a FIB from a number associated with the socket/PCB. This in turn is inherited from the process, but can be changed by a socket option. The process in turn inherits it on fork. I have written a utility call setfib that acts a bit like nice.. setfib -3 ping target.example.com # will use fib 3 for ping. It is an obvious extension to make it a property of a jail but I have not done so. It can be achieved by combining the setfib and jail commands. 2/ packets received on an interface for forwarding. By default these packets would use table 0, (or possibly a number settable in a sysctl(not yet)). but prior to routing the firewall can inspect them (see below). (possibly in the future you may be able to associate a FIB with packets received on an interface.. An ifconfig arg, but not yet.) 3/ packets inspected by a packet classifier, which can arbitrarily associate a fib with it on a packet by packet basis. A fib assigned to a packet by a packet classifier (such as ipfw) would over-ride a fib associated by a more default source. (such as cases 1 or 2). 4/ a tcp listen socket associated with a fib will generate accept sockets that are associated with that same fib. 5/ Packets generated in response to some other packet (e.g. reset or icmp packets). These should use the FIB associated with the packet being reponded to. 6/ Packets generated during encapsulation. gif, tun and other tunnel interfaces will encapsulate using the FIB that was in effect withthe proces that set up the tunnel. thus setfib 1 ifconfig gif0 [tunnel instructions] will set the fib for the tunnel to use to be fib 1. Routing messages would be associated with their process, and thus select one FIB or another. messages from the kernel would be associated with the fib they refer to and would only be received by a routing socket associated with that fib. (not yet implemented) In addition Netstat has been edited to be able to cope with the fact that the array is now 2 dimensional. (It looks in system memory using libkvm (!)). Old versions of netstat see only the first FIB. In addition two sysctls are added to give: a) the number of FIBs compiled in (active) b) the default FIB of the calling process. Early testing experience: ------------------------- Basically our (IronPort's) appliance does this functionality already using ipfw fwd but that method has some drawbacks. For example, It can't fully simulate a routing table because it can't influence the socket's choice of local address when a connect() is done. Testing during the generating of these changes has been remarkably smooth so far. Multiple tables have co-existed with no notable side effects, and packets have been routes accordingly. ipfw has grown 2 new keywords: setfib N ip from anay to any count ip from any to any fib N In pf there seems to be a requirement to be able to give symbolic names to the fibs but I do not have that capacity. I am not sure if it is required. SCTP has interestingly enough built in support for this, called VRFs in Cisco parlance. it will be interesting to see how that handles it when it suddenly actually does something. Where to next: -------------------- After committing the ABI compatible version and MFCing it, I'd like to proceed in a forward direction in -current. this will result in some roto-tilling in the routing code. Firstly: the current code's idea of having a separate tree per protocol family, all of the same format, and pointed to by the 1 dimensional array is a bit silly. Especially when one considers that there is code that makes assumptions about every protocol having the same internal structures there. Some protocols don't WANT that sort of structure. (for example the whole idea of a netmask is foreign to appletalk). This needs to be made opaque to the external code. My suggested first change is to add routing method pointers to the 'domain' structure, along with information pointing the data. instead of having an array of pointers to uniform structures, there would be an array pointing to the 'domain' structures for each protocol address domain (protocol family), and the methods this reached would be called. The methods would have an argument that gives FIB number, but the protocol would be free to ignore it. When the ABI can be changed it raises the possibilty of the addition of a fib entry into the "struct route". Currently, the structure contains the sockaddr of the desination, and the resulting fib entry. To make this work fully, one could add a fib number so that given an address and a fib, one can find the third element, the fib entry. Interaction with the ARP layer/ LL layer would need to be revisited as well. Qing Li has been working on this already. This work was sponsored by Ironport Systems/Cisco Reviewed by: several including rwatson, bz and mlair (parts each) Obtained from: Ironport systems/Cisco	2008-05-09 23:03:00 +00:00
phk	8d647da1ed	Now that all platforms use genclock, shuffle things around slightly for better structure. Much of this is related to <sys/clock.h>, which should really have been called <sys/calendar.h>, but unless and until we need the name, the repocopy can wait. In general the kernel does not know about minutes, hours, days, timezones, daylight savings time, leap-years and such. All that is theoretically a matter for userland only. Parts of kernel code does however care: badly designed filesystems store timestamps in local time and RTC chips almost universally track time in a YY-MM-DD HH:MM:SS format, and sometimes in local timezone instead of UTC. For this we have <sys/clock.h> <sys/time.h> on the other hand, deals with time_t, timeval, timespec and so on. These know only seconds and fractions thereof. Move inittodr() and resettodr() prototypes to <sys/time.h>. Retain the names as it is one of the few surviving PDP/VAX references. Move startrtclock() to <machine/clock.h> on relevant platforms, it is a MD call between machdep.c/clock.c. Remove references to it elsewhere. Remove a lot of unnecessary <sys/clock.h> includes. Move the machdep.disable_rtc_set sysctl to subr_rtc.c where it belongs. XXX: should be kern.disable_rtc_set really, it's not MD.	2008-04-22 19:38:30 +00:00
kib	52243403eb	Move the head of byte-level advisory lock list from the filesystem-specific vnode data to the struct vnode. Provide the default implementation for the vop_advlock and vop_advlockasync. Purge the locks on the vnode reclaim by using the lf_purgelocks(). The default implementation is augmented for the nfs and smbfs. In the nfs_advlock, push the Giant inside the nfs_dolock. Before the change, the vop_advlock and vop_advlockasync have taken the unlocked vnode and dereferenced the fs-private inode data, racing with with the vnode reclamation due to forced unmount. Now, the vop_getattr under the shared vnode lock is used to obtain the inode size, and later, in the lf_advlockasync, after locking the vnode interlock, the VI_DOOMED flag is checked to prevent an operation on the doomed vnode. The implementation of the lf_purgelocks() is submitted by dfr. Reported by: kris Tested by: kris, pho Discussed with: jeff, dfr MFC after: 2 weeks	2008-04-16 11:33:32 +00:00
dfr	79d2dfdaa6	Add the new kernel-mode NFS Lock Manager. To use it instead of the user-mode lock manager, build a kernel with the NFSLOCKD option and add '-k' to 'rpc_lockd_flags' in rc.conf. Highlights include: * Thread-safe kernel RPC client - many threads can use the same RPC client handle safely with replies being de-multiplexed at the socket upcall (typically driven directly by the NIC interrupt) and handed off to whichever thread matches the reply. For UDP sockets, many RPC clients can share the same socket. This allows the use of a single privileged UDP port number to talk to an arbitrary number of remote hosts. * Single-threaded kernel RPC server. Adding support for multi-threaded server would be relatively straightforward and would follow approximately the Solaris KPI. A single thread should be sufficient for the NLM since it should rarely block in normal operation. * Kernel mode NLM server supporting cancel requests and granted callbacks. I've tested the NLM server reasonably extensively - it passes both my own tests and the NFS Connectathon locking tests running on Solaris, Mac OS X and Ubuntu Linux. * Userland NLM client supported. While the NLM server doesn't have support for the local NFS client's locking needs, it does have to field async replies and granted callbacks from remote NLMs that the local client has contacted. We relay these replies to the userland rpc.lockd over a local domain RPC socket. * Robust deadlock detection for the local lock manager. In particular it will detect deadlocks caused by a lock request that covers more than one blocking request. As required by the NLM protocol, all deadlock detection happens synchronously - a user is guaranteed that if a lock request isn't rejected immediately, the lock will eventually be granted. The old system allowed for a 'deferred deadlock' condition where a blocked lock request could wake up and find that some other deadlock-causing lock owner had beaten them to the lock. * Since both local and remote locks are managed by the same kernel locking code, local and remote processes can safely use file locks for mutual exclusion. Local processes have no fairness advantage compared to remote processes when contending to lock a region that has just been unlocked - the local lock manager enforces a strict first-come first-served model for both local and remote lockers. Sponsored by: Isilon Systems PR: 95247 107555 115524 116679 MFC after: 2 weeks	2008-03-26 15:23:12 +00:00
ru	3b1bf8c2e9	Replaced the misleading uses of a historical artefact M_TRYWAIT with M_WAIT. Removed dead code that assumed that M_TRYWAIT can return NULL; it's not true since the advent of MBUMA. Reviewed by: arch There are ongoing disputes as to whether we want to switch to directly using UMA flags M_WAITOK/M_NOWAIT for mbuf(9) allocation.	2008-03-25 09:39:02 +00:00
jeff	a9d123c3ab	- Complete part of the unfinished bufobj work by consistently using BO_LOCK/UNLOCK/MTX when manipulating the bufobj. - Create a new lock in the bufobj to lock bufobj fields independently. This leaves the vnode interlock as an 'identity' lock while the bufobj is an io lock. The bufobj lock is ordered before the vnode interlock and also before the mnt ilock. - Exploit this new lock order to simplify softdep_check_suspend(). - A few sync related functions are marked with a new XXX to note that we may not properly interlock against a non-zero bv_cnt when attempting to sync all vnodes on a mountlist. I do not believe this race is important. If I'm wrong this will make these locations easier to find. Reviewed by: kib (earlier diff) Tested by: kris, pho (earlier diff)	2008-03-22 09:15:16 +00:00
rwatson	877d7c65ba	In keeping with style(9)'s recommendations on macros, use a ';' after each SYSINIT() macro invocation. This makes a number of lightweight C parsers much happier with the FreeBSD kernel source, including cflow's prcc and lxr. MFC after: 1 month Discussed with: imp, rink	2008-03-16 10:58:09 +00:00
rodrigc	3c423712fa	Expand the nfs_opts array to include all possible string mount options that mount_nfs could pass down, if it passed down string mount options. Right now, mount_nfs jut passes down a single mount option named "nfs_args" with a fully initialized 'struct nfs_args'. In future commits, we will add code to the kernel for parsing stringified NFS mount options, so that we can convert mount_nfs to pass string options from userspace to kernel, instead of an initialized struct nfs_args.	2008-03-05 10:09:29 +00:00
rodrigc	b3c805a899	In nfs_mount(), default initialize struct nfs_args the same way that it is default initialized in revision 1.77 of mount_nfs.c. Right now, this is a no-op, because currently we initialize struct nfs_args in mount_nfs in userspace, and pass it down into the kernel via nmount(), so we overwrite whatever we initialize here with the value passed in from userspace. However, this lays the groundwork for moving away from passing struct nfs_args from userspace to kernel via nmount(), so that we can instead pass string mount options via nmount() which can be parsed in the kernel. This will make it easier to add new NFS mount options.	2008-03-05 09:41:22 +00:00
attilio	4014b55830	Axe the 'thread' argument from VOP_ISLOCKED() and lockstatus() as it is always curthread. As KPI gets broken by this patch, manpages and __FreeBSD_version will be updated by further commits. Tested by: Andrea Barberio <insomniac at slackware dot it>	2008-02-25 18:45:57 +00:00
attilio	0d54671a48	Introduce some functions in the vnode locks namespace and in the ffs namespace in order to handle lockmgr fields in a controlled way instead than spreading all around bogus stubs: - VN_LOCK_AREC() allows lock recursion for a specified vnode - VN_LOCK_ASHARE() allows lock sharing for a specified vnode In FFS land: - BUF_AREC() allows lock recursion for a specified buffer lock - BUF_NOREC() disallows recursion for a specified buffer lock Side note: union_subr.c::unionfs_node_update() is the only other function directly handling lockmgr fields. As this is not simple to fix, it has been left behind as "sole" exception.	2008-02-24 16:38:58 +00:00
yar	fe586f02db	Prevent the NFS client from losing MNT_ROOTFS on the root file system. In particular, stop overwriting mount point flags in nfs_mountdiskless() because now they are set elsewhere. (They were _initialized_ by that function in the 4.4BSD days, when mount structures were not allocated in a centralized manner -- see rev. 1.1 of this file.) Fix nfs_mount(), which happened to depend on the loss of MNT_ROOTFS when it came to update handling. Also note that mountnfs() no longer handles updates. Now they shouldn't reach this function, so printf a diagnostic message if that happens due to a coding error.	2008-02-17 22:32:08 +00:00
attilio	456bfb1f0f	- Add real assertions to lockmgr locking primitives. A couple of notes for this: * WITNESS support, when enabled, is only used for shared locks in order to avoid problems with the "disowned" locks * KA_HELD and KA_UNHELD only exists in the lockmgr namespace in order to assert for a generic thread (not curthread) owning or not the lock. Really, this kind of check is bogus but it seems very widespread in the consumers code. So, for the moment, we cater this untrusted behaviour, until the consumers are not fixed and the options could be removed (hopefully during 8.0-CURRENT lifecycle) * Implementing KA_HELD and KA_UNHELD (not surported natively by WITNESS) made necessary the introduction of LA_MASKASSERT which specifies the range for default lock assertion flags * About other aspects, lockmgr_assert() follows exactly what other locking primitives offer about this operation. - Build real assertions for buffer cache locks on the top of lockmgr_assert(). They can be used with the BUF_ASSERT_*(bp) paradigm. - Add checks at lock destruction time and use a cookie for verifying lock integrity at any operation. - Redefine BUF_LOCKFREE() in order to not use a direct assert but let it rely on the aforementioned destruction time check. KPI results evidently broken, so __FreeBSD_version bumping and manpage update result necessary and will be committed soon. Side note: lockmgr_assert() will be used soon in order to implement real assertions in the vnode namespace replacing the legacy and still bogus "VOP_ISLOCKED()" way. Tested by: kris (earlier version) Reviewed by: jhb	2008-02-13 20:44:19 +00:00
jhb	e83bd487c4	Consolidate the code to generate a new XID for a NFS request into a nfs_xid_gen() function instead of duplicating the logic in both nfsm_rpchead() and the NFS3ERR_JUKEBOX handling in nfs_request(). MFC after: 1 week Submitted by: mohans (a long while ago)	2008-02-13 00:04:58 +00:00
kris	989a96d5cb	Switch the default NFS mount mode from UDP to TCP. UDP mounts are a historical relic, and are no longer appropriate for either LAN or WAN mounting. At modern (gigabit and 10 gigabit) LAN speeds packet loss from socket buffer fill events is common, and sequence numbers wrap quickly enough that data corruption is possible. TCP solves both of these problems without imposing significant overhead. MFC after: 1 month	2008-02-11 23:23:21 +00:00
attilio	4274e0aa54	namei() can call underlying nfs_readlink() passing a struct uio pointer owned by a NULL owner. This will lead consequent VOP_ISLOCKED() present into nfs_upgrade_vnlock() to panic as it only acquire curthread now. Fix nfs_upgrade_vnlock() and nfs_downgrade_vnlock() in order to not use more the struct thread pointer passed as argument (as it is really nomore required there as vn_lock() and VOP_UNLOCK doesn't get the lock more). Using curthread, in place, doesn't get ambiguity as LK_EXCLOTHER should be handled as a "not locked" request by both functions. Reported by: kris Tested by: kris Reviewed by: ups	2008-02-09 20:13:19 +00:00
attilio	e1db4e70b3	Conver all explicit instances to VOP_ISLOCKED(arg, NULL) into VOP_ISLOCKED(arg, curthread). Now, VOP_ISLOCKED() and lockstatus() should only acquire curthread as argument; this will lead in axing the additional argument from both functions, making the code cleaner. Reviewed by: jeff, kib	2008-02-08 21:45:47 +00:00
attilio	7213f4c32b	Cleanup lockmgr interface and exported KPI: - Remove the "thread" argument from the lockmgr() function as it is always curthread now - Axe lockcount() function as it is no longer used - Axe LOCKMGR_ASSERT() as it is bogus really and no currently used. Hopefully this will be soonly replaced by something suitable for it. - Remove the prototype for dumplockinfo() as the function is no longer present Addictionally: - Introduce a KASSERT() in lockstatus() in order to let it accept only curthread or NULL as they should only be passed - Do a little bit of style(9) cleanup on lockmgr.h KPI results heavilly broken by this change, so manpages and FreeBSD_version will be modified accordingly by further commits. Tested by: matteo	2008-01-24 12:34:30 +00:00
attilio	caa2ca048b	- Introduce the function lockmgr_recursed() which returns true if the lockmgr lkp, when held in exclusive mode, is recursed - Introduce the function BUF_RECURSED() which does the same for bufobj locks based on the top of lockmgr_recursed() - Introduce the function BUF_ISLOCKED() which works like the counterpart VOP_ISLOCKED(9), showing the state of lockmgr linked with the bufobj BUF_RECURSED() and BUF_ISLOCKED() entirely replace the usage of bogus BUF_REFCNT() in a more explicative and SMP-compliant way. This allows us to axe out BUF_REFCNT() and leaving the function lockcount() totally unused in our stock kernel. Further commits will axe lockcount() as well as part of lockmgr() cleanup. KPI results, obviously, broken so further commits will update manpages and freebsd version. Tested by: kris (on UFS and NFS)	2008-01-19 17:36:23 +00:00
attilio	71b7824213	VOP_LOCK1() (and so VOP_LOCK()) and VOP_UNLOCK() are only used in conjuction with 'thread' argument passing which is always curthread. Remove the unuseful extra-argument and pass explicitly curthread to lower layer functions, when necessary. KPI results broken by this change, which should affect several ports, so version bumping and manpage update will be further committed. Tested by: kris, pho, Diego Sardina <siarodx at gmail dot com>	2008-01-13 14:44:15 +00:00
jhb	41560caefd	The previous revision broke the case of reconnecting to a TCP NFS server via a new socket during an NFS operation as that reconnect takes place in the context of an arbitrary thread with an arbitrary credential. Ideally we would like to use the mount point's credential for the entire process of setting up the socket to connect to the NFS server. Since some of the APIs (sobind(), etc.) only take a thread pointer and infer the credential from that instead of a direct credential, work around the problem by temporarily changing the current thread's credential to that of the mount point while connecting the socket and then reverting back to the original credential when we are done. Reviewed by: rwatson Tested on: UDP, TCP, TCP with forced reconnect	2008-01-11 23:57:39 +00:00
jhb	3bcad4568e	Pass curthread to various socket routines (socreate(), sobind(), and soconnect()) instead of &thread0 when establishing a connection to the NFS server. Otherwise inconsistent credentials may be used when setting up the NFS socket. MFC after: 1 week Reviewed by: rwatson	2008-01-10 23:36:00 +00:00
attilio	18d0a0dd51	vn_lock() is currently only used with the 'curthread' passed as argument. Remove this argument and pass curthread directly to underlying VOP_LOCK1() VFS method. This modify makes the code cleaner and in particular remove an annoying dependence helping next lockmgr() cleanup. KPI results, obviously, changed. Manpage and FreeBSD_version will be updated through further commits. As a side note, would be valuable to say that next commits will address a similar cleanup about VFS methods, in particular vop_lock1 and vop_unlock. Tested by: Diego Sardina <siarodx at gmail dot com>, Andrea Di Pasquale <whyx dot it at gmail dot com>	2008-01-10 01:10:58 +00:00
rwatson	742eb19799	Remove hacks from the NFSv2/3 client intended to handle a lack of a server-side RPC retranmission cache for non-idempotent operations: these hacks substituted 0 (success) for the expected EEXIST in the event that a target name already existed for LINK, SYMLINK, and MKDIR operations, under the assumption that EEXIST represented a second application of the original RPC rather than a true failure. Background: certain NFS operations (in this case, LINK, SYMLINK, and MKDIR) are not idempotent, as they leave behind persisting state on the server that prevents them from being replayed without an error;if an UDP RPC reply is lost leading to a retransmission by theclient, the second reply will return EEXIST rather than success, asthe new object has already been created. The NFS client previouslysilently mapped the EEXIST return into success to paper over thisproblem. However, in all modern NFS server implementations, a reply cache is kept in order to retransmit the original reply to a retransmitted request, rather than performing the operation a second time, allowing this hack to be avoided. This allows link()-based filelocking over NFS to operate correctly, as an application requestingthe creation of a new link for a file to tell if it succeededatomically or not. Other NFS clients, including Solaris and Linux, generally follow this behavior for the same reasons. Most clients also now default to TCP, which also helps avoid the issue of retransmitted but non-idempotent requests in most cases. Reported by: Adam McDougall <mcdouga9 at egr dot msu dot edu>, Timo Sirainen <tss at iki dot fi> Reviewed by: mohans MFC after: 1 week	2007-11-19 16:03:21 +00:00
rodrigc	aa4d1c093f	Add the following mount options to the nfs_opts array: noatime, noexec, suiddir, nosuid, nosymfollow, union, noclusterr, noclusterw, multilabel, acls, force, update, async. These options correspond to MOPT_STDOPTS, MOPT_FORCE, MOPT_UPDATE, and MOPT_ASYNC. Currently, mount_nfs converts these "-o" options from strings to MNT_ flags via getmntopts(), and passes the flags from userspace to the kernel. This change will allow us in future to pass these mount options as strings directly to the kernel via nmount() when doing NFS mounts.	2007-10-27 16:28:05 +00:00
julian	51d643caa6	Rename the kthread_xxx (e.g. kthread_create()) calls to kproc_xxx as they actually make whole processes. Thos makes way for us to add REAL kthread_create() and friends that actually make theads. it turns out that most of these calls actually end up being moved back to the thread version when it's added. but we need to make this cosmetic change first. I'd LOVE to do this rename in 7.0 so that we can eventually MFC the new kthread_xxx() calls.	2007-10-20 23:23:23 +00:00
jhb	1f6b3a5f2c	Add a -z flag to nfsstat which zeros the NFS statistics after displaying them. MFC after: 1 week Requested by: ps Submitted by: ps (6 years ago)	2007-10-18 16:38:07 +00:00
alfred	3a60df401c	Get rid of qaddr_t. Requested by: bde	2007-10-16 10:54:55 +00:00
mohans	77c0dc0000	NFS MP scaling changes. - Eliminate the hideous nfs_sndlock that serialized NFS/TCP request senders thru the sndlock. - Institute a new nfs_connectlock that serializes NFS/TCP reconnects. Add logic to wait for pending request senders to finish sending before reconnecting. Dial down the sb_timeo for NFS/TCP sockets to 1 sec. - Break out the nfs xid manipulation under a new nfs xid lock, rather than over loading the nfs request lock for this purpose. - Fix some of the locking in nfs_request. Many thanks to Kris Kennaway for his help with this and for initiating the MP scaling analysis and work. Kris also tested this patch thorougly. Approved by: re@ (Ken Smith)	2007-10-12 19:12:21 +00:00
mohans	422753af04	Fix for a very rare race, caused by the nfsiod wakeup and nfsiod idle timeout occurring at exactly the same time. If this happens, the nfsiod exits although there may be a queued async IO request for it. Found by : Kris Kennaway Approved by: re	2007-09-25 21:08:49 +00:00
rwatson	23574c8673	Remove the now-unused NET_{LOCK,UNLOCK,ASSERT}_GIANT() macros, which previously conditionally acquired Giant based on debug.mpsafenet. As that has now been removed, they are no longer required. Removing them significantly simplifies error-handling in the socket layer, eliminated quite a bit of unwinding of locking in error cases. While here clean up the now unneeded opt_net.h, which previously was used for the NET_WITH_GIANT kernel option. Clean up some related gotos for consistency. Reviewed by: bz, csjp Tested by: kris Approved by: re (kensmith)	2007-08-06 14:26:03 +00:00
jhb	7c4c058305	Fix for a race where out of order loading of NFS attrs into the nfsnode could lead to attrs being stale. One example (that we ran into) was a READDIR+, WRITE. The responses came back in order, but the attrs from the WRITE were loaded before the attrs from the READDIR+, leading to the wrong size from being read on the next stat() call. MFC after: 1 week Submitted by: mohans Approved by: re (kensmith)	2007-07-03 18:31:47 +00:00
jhb	788b235895	Fix up NFS client write error handling. Errors are split into recoverable and unrecoverable. For the former, we redirty the buffer and hang onto it for future retries. For the latter (eg. ESTALE), we discard the buffer and return the error back to the user on the next syscall. This fixes a number of vfs panics and fixes having a large number of dirty buffers (that cannot be written out and reclaimed) from hanging around. Thanks to ups@ for discussions on this issue. Reported by: kris, Kai, others Approved by: re (kensmith)	2007-07-03 18:30:55 +00:00
attilio	9bd4fdf7ce	Do proper "locking" for missing vmmeters part. Now, we assume no more sched_lock protection for some of them and use the distribuited loads method for vmmeter (distribuited through CPUs). Reviewed by: alc, bde Approved by: jeff (mentor)	2007-06-04 21:45:18 +00:00
jeff	a7a8bac81f	- Move rusage from being per-process in struct pstats to per-thread in td_ru. This removes the requirement for per-process synchronization in statclock() and mi_switch(). This was previously supported by sched_lock which is going away. All modifications to rusage are now done in the context of the owning thread. reads proceed without locks. - Aggregate exiting threads rusage in thread_exit() such that the exiting thread's rusage is not lost. - Provide a new routine, rufetch() to fetch an aggregate of all rusage structures from all threads in a process. This routine must be used in any place requiring a rusage from a process prior to it's exit. The exited process's rusage is still available via p_ru. - Aggregate tick statistics only on demand via rufetch() or when a thread exits. Tick statistics are kept in the thread and protected by sched_lock until it exits. Initial patch by: attilio Reviewed by: attilio, bde (some objections), arch (mostly silent)	2007-06-01 01:12:45 +00:00
attilio	7dd8ed88a9	Revert VMCNT_* operations introduction. Probabilly, a general approach is not the better solution here, so we should solve the sched_lock protection problems separately. Requested by: alc Approved by: jeff (mentor)	2007-05-31 22:52:15 +00:00
rwatson	3b7f8bde24	In nfs_down(), if rep can be NULL, which we test for, then we should lock and unlock conditionally, not just set the flag on it conditionally. In practice, this bug couldn't manifest, as in the current revision of the code, no callers pass a NULL rep. CID: 1416 Found with: Coverity Prevent(tm)	2007-05-18 19:34:54 +00:00
jeff	e1996cb960	- define and use VMCNT_{GET,SET,ADD,SUB,PTR} macros for manipulating vmcnts. This can be used to abstract away pcpu details but also changes to use atomics for all counters now. This means sched lock is no longer responsible for protecting counts in the switch routines. Contributed by: Attilio Rao <attilio@FreeBSD.org>	2007-05-18 07:10:50 +00:00
jhb	961de35e83	Various fixes to the NFS Directio support. - Fix for a bug where a close would not wait for all (directio) dirty buffers to drain. The nfsnode was not marked NMODIFIED when there were directio dirtied buffers pending, causing this. - No reason to vhold/vrele the vp when enqueueing DirectIO requests for the nfsiods. The vnode can't really go way since the close has to wait for these requests to drain. MFC after: 1 week Submitted by: mohans	2007-04-25 20:34:55 +00:00
rwatson	32f12b60cc	Attempt to rationalize NFS privileges: - Replace PRIV_NFSD with PRIV_NFS_DAEMON, add PRIV_NFS_LOCKD. - Use PRIV_NFS_DAEMON in the NFS server. - In the NFS client, move the privilege check from nfslockdans(), which occurs every time a write is performed on /dev/nfslock, and instead do it in nfslock_open() just once. This allows us to avoid checking the saved uid for root, and just use the effective on open. Use PRIV_NFS_LOCKD.	2007-04-21 18:11:19 +00:00
delphij	e4435eebba	Don't destroy a mutex just before we use it, instead, destroy it after we have used it.	2007-03-23 08:52:36 +00:00
tegge	214bc5723c	Make insmntque() externally visibile and allow it to fail (e.g. during late stages of unmount). On failure, the vnode is recycled. Add insmntque1(), to allow for file system specific cleanup when recycling vnode on failure. Change getnewvnode() to no longer call insmntque(). Previously, embryonic vnodes were put onto the list of vnode belonging to a file system, which is unsafe for a file system marked MPSAFE. Change vfs_hash_insert() to no longer lock the vnode. The caller now has that responsibility. Change most file systems to lock the vnode and call insmntque() or insmntque1() after a new vnode has been sufficiently setup. Handle failed insmntque*() calls by propagating errors to callers, possibly after some file system specific cleanup. Approved by: re (kensmith) Reviewed by: kib In collaboration with: kib	2007-03-13 01:50:27 +00:00
mohans	f2213cb325	Back out a chance to nfs_timer() that inadvertantly crept in the last checkin :(	2007-03-09 04:07:54 +00:00
mohans	a332cb00d5	Over NFS, an open() call could result in multiple over-the-wire GETATTRs being generated - one from lookup()/namei() and the other from nfs_open() (for cto consistency). This change eliminates the GETATTR in nfs_open() if an otw GETATTR was done from the namei() path. Instead of extending the vop interface, we timestamp each attr load, and use this to detect whether a GETATTR was done from namei() for this syscall. Introduces a thread-local variable that counts the syscalls made by the thread and uses <pid, tid, thread syscalls> as the attrload timestamp. Thanks to jhb@ and peter@ for a discussion on thread state that could be used as the timestamp with minimal overhead.	2007-03-09 04:02:38 +00:00
jhb	9081d44243	Use pause() rather than tsleep() on stack variables and function pointers.	2007-02-27 17:23:29 +00:00
mohans	fd33eed884	Backing out an earlier change. It seems harmless for NFS to miss the "force unmount" flag, making the acquisition of the MNT_ILOCK in nfs_request() and nfs_sigintr() unnecessary. Pointed out by tegge@.	2007-02-16 03:46:55 +00:00
mohans	6d93e14637	Add missing MNT_ILOCK around some mnt_kern_flag accesses.	2007-02-11 04:01:10 +00:00
mohans	5f0bd46234	Fix for a vnode lock leak in nfs_create() in the event of an error. Spotted by ups@.	2007-01-31 23:10:27 +00:00
kris	365d5c4ba7	Instead of always hard-coding the socket type for the nfs root mount as SOCK_DGRAM (i.e. UDP), respect the value configured earlier. This allows TCP NFS root mounts using e.g. the boot.nfsroot.options="tcp" tunable. In this case some of the connection parameters like the retry timer were previously set appropriately for TCP but inappropriately for the UDP socket that was actually used, leading to e.g. extremely long recovery times (O(hours)) after a nfs server reboot. Reviewed by: mohans MFC After: 2 weeks	2007-01-30 00:26:04 +00:00
bde	5a952c0766	Unstaticize nfs_iosize() in nfsclient and use it in nfs4client instead of duplicating it except for larger style bugs in the copy. Fix some nearby style bugs (including a harmless type mismatch) in and near the remaining copy. This is part of fixing collisions of the 2 nfs*client's names. Even static names should have a unique prefixes so that they can be debugged easily.	2007-01-25 13:07:25 +00:00
kib	fdd50404d1	Cylinder group bitmaps and blocks containing inode for a snapshot file are after snaplock, while other ffs device buffers are before snaplock in global lock order. By itself, this could cause deadlock when bdwrite() tries to flush dirty buffers on snapshotted ffs. If, during the flush, COW activity for snapshot needs to allocate block and ffs_alloccg() selects the cylinder group that is being written by bdwrite(), then kernel would panic due to recursive buffer lock acquision. Avoid dealing with buffers in bdwrite() that are from other side of snaplock divisor in the lock order then the buffer being written. Add new BOP, bop_bdwrite(), to do dirty buffer flushing for same vnode in the bdwrite(). Default implementation, bufbdflush(), refactors the code from bdwrite(). For ffs device buffers, specialized implementation is used. Reviewed by: tegge, jeff, Russell Cattelan (cattelan xfs org, xfs changes) Tested by: Peter Holm X-MFC after: 3 weeks (if ever: it changes ABI)	2007-01-23 10:01:19 +00:00
mohans	a3a728eabb	NetApp filers return corrupt post op attrs in the wcc on NFS error responses. This is easy to reproduce for EROFS. I am not sure if the attrs can be corrupt for other NFS error responses. For now, disabling wcc pre-op attr checks and post-op attr loads on NFS errors (sysctl'ed). Reported by: Kris Kennaway	2006-12-11 19:54:25 +00:00
sam	17d1a5f84e	consolidate parsing of nfs root mount options in one place and handle all options (some may require fixes elsewhere) Reviewed by: jhb, mohans MFC after: 1 month	2006-12-06 02:15:25 +00:00
mohans	9a2d94c135	In nfs_nget(), we must initialize the fh in the nfsnode before inserting the vnode into the vfs hash. Otherwise, another thread walking the hash can trip on an nfsnode with an uninitialized or partially initialized fh. Thanks to ups@ for spotting this race.	2006-11-29 02:21:40 +00:00
mohans	ef00496038	bde@ pointed out that tprintf() acquires Giant so callers of tprintf() don't have to explicitly acquire Giant (although they need to be aware of this and not hold any locks at that point). Remove the acquisitions of Giant in the NFS client wrapping tprintf().	2006-11-27 23:26:06 +00:00
mohans	9ca7ca48b8	Fix for a bug caused by a race when 2 threads lookup the same file. Leave the loser's lock(s) initialized, so the reclaim logic can unconditionally destroy them when that race occurs (or if the vfs hash insert happened to fail for some other reason). Thanks to ups@ for a careful review of the code. Reported by : Kris Kennaway	2006-11-27 19:06:43 +00:00
mohans	4800f71e23	1) Fix up locking in nfs_up() and nfs_down. 2) Reduce the acquisitions of the Giant lock in the nfs_socket.c paths significantly. - We don't need to acquire Giant before tsleeping on lbolt anymore, since jhb specialcased lbolt handling in msleep. - nfs_up() needs to acquire Giant only if printing the "server up" message. - nfs_timer() held Giant for the duration of the NFS timer processing, just because the printing of the message in nfs_down() needed it (and we acquire other locks in nfs_timer()). The acquisition of Giant is moved down into nfs_down() now, reducing the time Giant is held in that path. Reported by: Kris Kennaway	2006-11-20 04:14:23 +00:00
mohans	755a61c0b7	vfs_hash_insert() vputs() the losing vnode before returning, in the event of a race where a duplicate vnode is entered into the vfs hash. nfs_nget() shouldn't be releasing the vnode in that case.	2006-11-16 23:03:46 +00:00
mohans	3ef54eed4c	Fix to readdir+ reply handling. When inserting an entry into the namecache, initialize the nfsnode's ctime. Otherwise a subsequent lookup purges the just entered namecache entry.	2006-11-16 23:02:37 +00:00
sam	2619dbffd2	honor nolockd flag in root mount options MFC after: 2 weeks	2006-11-07 18:02:45 +00:00
mohans	6afec858bc	Make EWOULDBLOCK a recoverable error so that the request is retransmitted. This bug results in data corruption with NFS/TCP. Writes are silently dropped on EWOULDBLOCK (because socket send buffer is full and sockbuf timer fires). Reviewed by: ups@	2006-10-31 20:25:37 +00:00
bde	fea5a64567	Fixed some style bugs (especially ones involving long lines and use of __P(())). There are many more.	2006-10-17 22:07:07 +00:00
bde	d0962ab7a5	Don't do null Setattr RPCs for VA_MARK_ATIME. When we added the VA_MARK_ATIME feature to fix POSIX conformance fore execve() and mmap(), we thought that it was optimized well enough for the one file system that supports it (ffs) and harmless for other file systems (except layered ones which already get the layering for VOP_SETATTR() wrong). However, nfs_setattr() doesn't do much parameter checking, so when it gets a combination of parameters that it doesn't understand, it always does a Setattr RPC. This RPC can't do anything good, and for VA_MARK_ATIME it is null except for wasting a lot of time. This is the smallest and easiest to fix of several bugs that have increased the number of RPCs for kernel builds on nfs by more than 100% since 2004-11-05. The real-time increase depends on network latency and parallelization and can also be very large (approaching the same percentage for unparallelized operations like "make depend" on systems with fast CPUs and high-latency networks).	2006-10-14 07:25:11 +00:00
phk	50c81b8a9a	First part of a little cleanup in the calendar/timezone/RTC handling. Move relevant variables to <sys/clock.h> and fix #includes as necessary. Use libkern's much more time- & spamce-efficient BCD routines.	2006-10-02 12:59:59 +00:00
tegge	f42473d76b	Add mnt_noasync counter to better handle interleaved calls to nmount(), sync() and sync_fsync() without losing MNT_ASYNC. Add MNTK_ASYNC flag which is set only when MNT_ASYNC is set and mnt_noasync is zero, and check that flag instead of MNT_ASYNC before initiating async io.	2006-09-26 04:15:59 +00:00
tegge	83154f853d	Use mount interlock to protect all changes to mnt_flag and mnt_kern_flag. This eliminates a race where MNT_UPDATE flag could be lost when nmount() raced against sync(), sync_fsync() or quotactl().	2006-09-26 04:12:49 +00:00
mohans	21daa650a9	Fixes up the handling of shared vnode lock lookups in the NFS client, adds a FS type specific flag indicating that the FS supports shared vnode lock lookups, adds some logic in vfs_lookup.c to test this flag and set lock flags appropriately. - amd on 6.x is a non-starter (without this change). Using amd under heavy load results in a deadlock (with cascading vnode locks all the way to the root) very quickly. - This change should also fix the more general problem of cascading vnode deadlocks when an NFS server goes down. Ideally, we wouldn't need these changes, as enabling shared vnode lock lookups globally would work. Unfortunately, UFS, for example isn't ready for shared vnode lock lookups, crashing pretty quickly. This change is the result of discussions with Stephan Uphoff (ups@). Reviewed by: ups@	2006-09-13 18:39:09 +00:00
mohans	0b0d785f6a	Fix for a deadlock triggered by a 'umount -f' causing a NFS request to never retransmit (or return). Thanks to John Baldwin for helping nail this one. Found by : Kris Kennaway	2006-08-29 22:00:12 +00:00
thomas	2f330cd31b	Fix typos in comment.	2006-08-16 23:53:05 +00:00
alc	b98eae58a6	Introduce a field to struct vm_page for storing flags that are synchronized by the lock on the object containing the page. Transition PG_WANTED and PG_SWAPINPROG to use the new field, eliminating the need for holding the page queues lock when setting or clearing these flags. Rename PG_WANTED and PG_SWAPINPROG to VPO_WANTED and VPO_SWAPINPROG, respectively. Eliminate the assertion that the page queues lock is held in vm_page_io_finish(). Eliminate the acquisition and release of the page queues lock around calls to vm_page_io_finish() in kern_sendfile() and vfs_unbusy_pages().	2006-08-09 17:43:27 +00:00
brooks	f9c7fb948d	Add a new kernel environment variable "boot.netif.mtu" which is used to set the MTU prior to mounting root via NFS. This is required if the server supports a higher than default MTU because the client will not see the responses otherwise. MFC after: 3 weeks	2006-08-09 01:56:17 +00:00
rwatson	40868fda8a	soreceive_generic(), and sopoll_generic(). Add new functions sosend(), soreceive(), and sopoll(), which are wrappers for pru_sosend, pru_soreceive, and pru_sopoll, and are now used univerally by socket consumers rather than either directly invoking the old so*() functions or directly invoking the protocol switch method (about an even split prior to this commit). This completes an architectural change that was begun in 1996 to permit protocols to provide substitute implementations, as now used by UDP. Consumers now uniformly invoke sosend(), soreceive(), and sopoll() to perform these operations on sockets -- in particular, distributed file systems and socket system calls. Architectural head nod: sam, gnn, wollman	2006-07-24 15:20:08 +00:00
kib	574ae24779	Signals may be delivered to process as well as to the thread. Check the thread-delivered signals in addition to the process one. Reviewed by: mohan MFC after: 1 month Approved by: kan (mentor)	2006-07-08 15:39:11 +00:00
kib	54b7822070	Always supply curthread as argument to nfs_asyncio and nfs_doio in nfs_strategy. Otherwise, for some buffers, signals would be ignored at the intr mounts. Reviewed by: mohan MFC after: 1 month Approved by: kan (mentor)	2006-07-08 15:36:51 +00:00
yar	ba19b1ecd4	There is a consensus that ifaddr.ifa_addr should never be NULL, except in places dealing with ifaddr creation or destruction; and in such special places incomplete ifaddrs should never be linked to system-wide data structures. Therefore we can eliminate all the superfluous checks for "ifa->ifa_addr != NULL" and get ready to the system crashing honestly instead of masking possible bugs. Suggested by: glebius, jhb, ru	2006-06-29 19:22:05 +00:00
yar	af09d16f73	Use the elegant TAILQ_FOREACH() in place of a hand-rolled for() loop.	2006-06-29 15:37:39 +00:00
mohans	b2a8ee65bc	Kris Kennaway found that for '/' NFS mounts, the MPSAFE mount flag was not being set, which means Giant would be acquired for these mounts.	2006-05-30 20:32:44 +00:00
mohans	ac7b19c8b2	Fix for a potential attempt to sleep while holding nm_mtx. Caught and reported by Witness (which forces the mbuf allocation flag to M_NOWAIT). Reported by: "sekes".	2006-05-26 18:45:55 +00:00
ups	d193a40233	Call vm_object_page_clean() with the object lock held. Submitted by: kensmith@ Reviewed by: mohans@ MFC after: 6 days	2006-05-25 17:16:11 +00:00
ups	4eb5a7d9ee	Do not set B_NOCACHE on buffers when releasing them in flushbuflist(). If B_NOCACHE is set the pages of vm backed buffers will be invalidated. However clean buffers can be backed by dirty VM pages so invalidating them can lead to data loss. Add support for flush dirty page in the data invalidation function of some network file systems. This fixes data losses during vnode recycling (and other code paths using invalbuf(,V_SAVE,,*)) for data written using an mmaped file. Collaborative effort by: jhb@,mohans@,peter@,ps@,ups@ Reviewed by: tegge@ MFC after: 7 days	2006-05-25 01:00:35 +00:00
mohans	365e894b0f	Since NFSv4 is not SMP safe, nfsiod needs to acquire Giant for NFSv4 mounts before doing the read/write. Reported by: Chuck Lever.	2006-05-24 23:06:50 +00:00
rwatson	726780d475	Adjust minimum iod threads from 4 to 0 -- since we compile the NFS client into the kernel by default, and many users won't use NFS, don't start an extra 4 kernel threads that are unused. Once NFS becomes active, it will start nfsiod's as it needs them. We might consider mandating a minimum iod's equal to the number of active NFS mounts (truncated to some value), which would force some to remain available without having to create a new one if the file system is mostly inactive. PR: 70880 MFC after: 2 weeks Prodded by: cel Head nod: peter Pointed out by: Joe <fbsd_user at a1poweruser dot com>	2006-05-24 21:04:46 +00:00
cel	8b94e52728	NFS over TCP retransmit behavior should default to a 60 second time out, mimicing the NFS reference implementation. NFS over TCP does not need fast retransmit timeouts, since network loss and congestion are managed by the transport (TCP), unlike with NFS over UDP. A long timeout prevents the unnecessary retransmission of non- idempotent NFS requests. Reviewed by: mohans, silby, rees? Sponsored by: Network Appliance, Incorporated	2006-05-23 18:48:07 +00:00
cel	ec80996e6b	Refactor the NFS over UDP retransmit timeout estimation logic to allow the estimator to be more easily tuned and maintained. There should be no functional change except there is now a lower limit on the retransmit timeout to prevent the client from retransmitting faster than the server's disks can fill requests, and an upper limit to prevent the estimator from taking to long to retransmit during a server outage. Reviewed by: mohan, kris, silby Sponsored by: Network Appliance, Incorporated	2006-05-23 18:33:58 +00:00
mohans	f29eefcdec	Vnode locks are recursive and the NFS client support shared vnode locks. Found by: Kris Kennaway.	2006-05-23 16:07:23 +00:00
mohans	60ef615733	Changes to make the NFS client MP safe. Thanks to Kris Kennaway for testing and sending lots of bugs my way.	2006-05-19 00:04:24 +00:00
mohans	cbd19813ff	Fix a snafu caused while patching the previous fix from another branch.	2006-05-05 18:12:13 +00:00
mohans	a10657a423	Fix for a NFS/TCP client bug which would cause the NFS/TCP stream to get out of sync under heavy loads, forcing frequent reconnets, causing EBADRPC errors etc.	2006-05-05 18:04:53 +00:00
mohans	8a6c02da7d	Keep track of the number of in-progress async direct IO writes in the nfsnode. Make fsync/close wait until all of these drain. Add a check to nfs_getpage() and nfs_putpage().	2006-04-06 01:20:30 +00:00
jeff	9fb762d231	- Busy the filesystem in nfs_statfs to prevent us from creating a new vnode after vflush() has succeeded. This would cause a dangling vnode panic at unmount time otherwise. Other filesystems may have this problem via their VFS_VGET() routines. Found by: kris Sponsored by: Isilon Systems, Inc.	2006-04-01 01:15:23 +00:00
kris	79229ffb86	Fix a bug in the NFS/TCP retransmission path. The bug was that earlier, if a request was retransmitted, we would do subsequent retransmits every 10 msecs. This can cause data corruption under moderate loads by reordering operations as seen by the client NFS attribute cache, and on the server side when the retransmission occurs after the original request has left the duplicate cache, since the operation will be committed for a second time. Further work on retransmission handling is needed (e.g. they are still being done sent too often since they are scaled by HZ, and the size of the dup cache is too small and easily overwhelmed on busy servers). Submitted by: mohans	2006-03-23 22:58:42 +00:00
pjd	e93fc5d214	Actually I wanted 'nolockd' here instead of 'lockd'. MFC after: 2 days	2006-03-19 13:27:37 +00:00
cel	6441a8bff9	If an NFS server returns more than a few EJUKEBOX errors for a given RPC request, the FreeBSD NFS client will quickly back off to a excessively long wait (days, then weeks) before retrying the request. Change the behavior of the FreeBSD NFS client to match the behavior of the reference NFS client implementation (Solaris). This provides a fixed delay of 10 seconds between each retry by default. A sysctl, called nfs3_jukebox_delay, is now available to tune the delay. Unlike Solaris, the sysctl value on FreeBSD is in seconds, rather than in HZ. Sponsored by: Network Appliance, Incorporated Reviewed by: rick Approved by: silby MFC after: 3 days	2006-03-17 22:14:23 +00:00
cel	85c8b06cbe	Fix a bug in NFSv3 READDIRPLUS reply processing The client's READDIRPLUS logic skips the attributes and filehandle of the ".." entry. If the server doesn't send attributes but does send a filehandle for "..", the client's logic doesn't account for the extra "value follows" field that indicates whether the filehandle is present, causing the remaining entries in the reply to be ignored. Sponsored by: Network Appliance, Inc. Reviewed by: rick, mohans Approved by: silby MFC after: 2 weeks	2006-03-08 01:43:01 +00:00
rees	fc1f7144ff	Don't log an error on tcp connection reset, even if we don't get ECONNRESET. Submitted by: cel@citi.umich.edu	2006-01-20 15:07:18 +00:00
alfred	a0282ebc04	I ran into an nfs client panic a couple of times in a row over the last few days. I tracked it down to the fact that nfs_reclaim() is setting vp->v_data to NULL _before_ calling vnode_destroy_object(). After silence from the mailing list I checked further and discovered that ufs_reclaim() is unique among FreeBSD filesystems for calling vnode_destroy_object() early, long before tossing v_data or much of anything else, for that matter. The rest, including NFS, appear to be identical, as if they were just clones of one original routine. The enclosed patch fixes all file systems in essentially the same way, by moving the call to vnode_destroy_object() to early in the routine (before the call to vfs_hash_remove(), if any). I have only tested NFS, but I've now run for over eighteen hours with the patch where I wouldn't get past four or five without it. Submitted by: Frank Mayhar Requested by: Mohan Srinivasan MFC After: 1 week	2006-01-17 17:29:03 +00:00
rwatson	a062d55bf0	In nfs_dolock(), GC now under-used ioflg, rendered obsolete when we moved from using a fifo to talk to rpc.lockd to using a special device node. Noticed by: Coverity Prevent analysis tool MFC after: 3 days	2006-01-13 23:16:29 +00:00
tegge	d344c11861	Add marker vnodes to ensure that all vnodes associated with the mount point are iterated over when using MNT_VNODE_FOREACH. Reviewed by: truckman	2006-01-09 20:42:19 +00:00
delphij	fc7124b631	Correct a typo	2005-12-28 10:03:48 +00:00
ps	3b0631d180	Improve upon rev 1.133 where NFS/TCP would not reconnect. Submitted by: Mohan Srinivasan	2005-12-12 23:18:05 +00:00
ru	573fa22624	Unexpand LLADDR().	2005-11-29 09:51:47 +00:00
ps	3278e302f0	Fix for a bug where NFS/TCP would not reconnect (in the case where the server FIN'ed). Seen with Solaris NFS servers. Reported by: TOMITA Yoshinori <yoshint@flab.fujitsu.co.jp> Submitted by: Mohan Strinivasan	2005-11-21 19:25:24 +00:00
ps	6364b280f8	- Always return success from NFS strategy. nfs_doio(), in the event of an error, does the right thing, in terms of setting the error flags in the buf header. That fixes a crash from bstrategy(). - Treat ETIMEDOUT as a "recoverable" error, causing the buffer to be re-dirtied. ETIMEDOUT can occur on soft mounts, when the number of retries are exceeded, and we don't want data loss in that case. Submitted by: Mohan Srinivasan	2005-11-21 19:23:46 +00:00
rees	1a3808ebdf	fix a problem with XID re-use when a server returns NFSERR_JUKEBOX. Submitted by: cel@citi.umich.edu Fixed by: rick@snowhite.cis.uoguelph.ca Approved by: alfred MFC after: 3 weeks	2005-11-21 18:39:18 +00:00
jon	9b47705fc0	fix a crash when an nfsv2 mount fails MFC after: 1 week	2005-11-10 23:25:16 +00:00
ps	e5615c0136	Fix for a crash (from nfs_lookup() in an error case). Submitted by: Mohan Srinivasan	2005-11-03 19:24:54 +00:00
ps	7b692a9f88	In nfs_flush(), clear the NMODIFIED bit only if there are no dirty buffers and there are no buffers queued up for writing. The bug was that NMODIFIED was being cleared even while there were buffers scheduled to be written out, which leads to all sorts of interesting bugs - one where the file could shrink (because of a post-op getattr load, say) causing data in buffer(s) queued for write to be tossed, resulting in data corruption. Submitted by: Mohan Srinivasan	2005-11-03 07:42:15 +00:00
ps	65fc4c8f22	Fix for a race between the thread transmitting the request and the thread processing the reply. Submitted by: Mohan Srinivasan	2005-11-03 07:31:06 +00:00
rwatson	be4f357149	Normalize a significant number of kernel malloc type names: - Prefer '_' to ' ', as it results in more easily parsed results in memory monitoring tools such as vmstat. - Remove punctuation that is incompatible with using memory type names as file names, such as '/' characters. - Disambiguate some collisions by adding subsystem prefixes to some memory types. - Generally prefer lower case to upper case. - If the same type is defined in multiple architecture directories, attempt to use the same name in additional cases. Not all instances were caught in this change, so more work is required to finish this conversion. Similar changes are required for UMA zone names.	2005-10-31 15:41:29 +00:00
glebius	d9ad5313fd	- Fix leak of struct nlminfo on process exit. - Fix malloc type collision, that made the above problem difficult to understand. Reported by: Vladimir Sharun <sharun ukr.net>	2005-10-26 07:18:37 +00:00
pjd	79642efc0e	- Use strsep() instead of strtok(). - strdup() uses M_WAITOK, so we don't need to check it's return value against NULL. MFC after: 2 weeks	2005-10-06 19:04:08 +00:00
pjd	a73b4cad05	Add boot.nfsroot.options loader tunable. It allows to specify options for NFS root file system. Currently supported options are: soft, intr, conn, lockd. I'm adding this functionality mostly for 'lockd' option, which is only honored when performing the initial mount and will be silently ignored if used while updating the mount options. This will allow to use flock(2) without the need of using varmfs or rpc.lockd and friends. Example of use: boot.nfsroot.options="intr,lockd" MFC after: 2 weeks	2005-10-06 11:18:34 +00:00
rwatson	c479a90eb8	Add GIANT_REQUIRED and WITNESS sleep warnings to uprintf() and tprintf(), as they both interact with the tty code (!MPSAFE) and may sleep if the tty buffer is full (per comment). Modify all consumers of uprintf() and tprintf() to hold Giant around calls into these functions. In most cases, this means adding an acquisition of Giant immediately around the function. In some cases (nfs_timer()), it means acquiring Giant higher up in the callout. With these changes, UFS no longer panics on SMP when either blocks are exhausted or inodes are exhausted under load due to races in the tty code when running without Giant. NB: Some reduction in calls to uprintf() in the svr4 code is probably desirable. NB: In the case of nfs_timer(), calling uprintf() while holding a mutex, or even in a callout at all, is a bad idea, and will generate warnings and potential upset. This needs to be fixed, but was a problem before this change. NB: uprintf()/tprintf() sleeping is generally a bad ideas, as is having non-MPSAFE tty code. MFC after: 1 week	2005-09-19 16:51:43 +00:00
ps	142d92ef25	FIx for a bug in the change that made nfs_timer() MPSAFE. We need to grab Giant before calling pru_send() (if running with mpsafenet = 0). Found by: Jeremie Le Hen. Fixed by: Maxime Henrion	2005-07-27 15:06:26 +00:00
ps	c30cfa7b9b	In nfs_nget() if two threads race on the same filehandle, the loser should cause the nfsnode to get freed. This fixes a potential vnode (and nfsnode) leak in that path. Submitted by: Mohan Srinivasan Reviewed by: phk	2005-07-27 15:05:31 +00:00
ps	3c7af06c7a	Remove the NFS client rslock. The rslock was used to serialize writers that want to extend the file. It was also used to serialize readers that might want to read the last block of the file (with a writer extending the file). Now that we support vnode locking for NFS, the rslock is unnecessary. Writers grab the exclusive vnode lock before writing and readers grab the shared (or in some cases the exclusive) lock. Submitted by: Mohan Srinivasan	2005-07-21 22:46:56 +00:00
ps	46ea7f6a70	Make nfs_timer() MPSAFE. With this change, the bottom half of the NFS client (the interface with the protocol stack and callouts) is Giant-free. Submitted by: Mohan Srinivasan.	2005-07-19 21:27:25 +00:00
ps	f2e72f270b	Fix for a NFS soft mounts bug where if the number of retries exceeds the max rexmits, the request was not being bounced back with a ETIMEDOUT error. Reported by: Oliver Lehmann Submitted by: Mohan Srinivasan	2005-07-18 02:12:17 +00:00
ps	3494397059	Fixes for NFS crashes on architectures that require strict alignment. - Fix nfsm_disct() so that after pulling up data, the remaining data is aligned if necessary. - Fix nfs_clnt_tcp_soupcall() to bcopy() the rpc length out of the mbuf (instead of casting m_data to a uint32). Submitted by: Pyun YongHyeon Reviewed by: Mohan Srinivasan	2005-07-14 20:08:27 +00:00
green	b4b5044eed	Ifdef out the incomplete non-blocking IO implementation for NFS pending discussion of how implementation would proceed. Applications like -lc_r expect select(3) to match the EAGAIN-status of IO functions. Approved by: re	2005-06-16 15:43:17 +00:00
green	ff904ffb64	Fix a serious deadlock with the NFS client. Given a large enough atomic write request, it can fill the buffer cache with the entirety of that write in order to handle retries. However, it never drops the vnode lock, or else it wouldn't be atomic, so it ends up waiting indefinitely for more buf memory that cannot be gotten as it has it all, and it waits in an uncancellable state. To fix this, hibufspace is exported and scaled to a reasonable fraction. This is used as the limit of how much of an atomic write request by the NFS client will be handled asynchronously. If the request is larger than this, it will be turned into a synchronous request which won't deadlock the system. It's possible this value is far off from what is required by some, so it shall be tunable as soon as mount_nfs(8) learns of the new field. The slowdown between an asynchronous and a synchronous write on NFS appears to be on the order of 2x-4x. General nod by: gad MFC after: 2 weeks More testing: wes PR: kern/79208	2005-06-10 23:50:41 +00:00
des	0bbbcadeb1	Ugh. Previous commit got the logic exactly backward. Submitted by: bland Pointy hat to: des	2005-05-17 18:23:03 +00:00
des	d3a9750001	Revision 1.173 broke updating a mount from ro to rw. Fix that by clearing the MNT_RDONLY flag if MNT_UPDATE is set and "ro" was not specified. Suggested by: cognet	2005-05-17 12:00:43 +00:00
rees	59c5573379	set R_MUSTRESEND flag in mark_for_reconnect so re-connected requests get re-sent instead of timing out. don't log an error message on reconnection, which is not an error. remove unused nfs_mrep_before_tsleep. Reviewed by: Mohan Srinivasan Approved by: alfred	2005-05-10 14:25:14 +00:00
ps	40a0d434da	Fix a bug in NFS/TCP where retransmissions would not reliably happen if the server rebooted or tore down the connection for any reason. Found by: Jonathan Noack. Submitted by: Mohan Srinivasan.	2005-05-04 16:37:31 +00:00
iedowse	2593dad93c	Don't copy the NFSMNT_* flags into struct statfs's f_flags field, as they have no connection with the expected MNT_* flags. This bug was exposed 18 months ago when the assignments to f_flags in vfs_syscalls.c were moved to before the VFS_STATFS() call. It was fixed in the CSRG source 10 years ago, but we never picked up that change. PR: kern/80390 MFC after: 1 week	2005-05-02 15:57:10 +00:00
des	5e15cfc3fa	When NFS was converted to the new mount syscall, code was written that sets the MNT_RDONLY flag if the "ro" option was passed in from userland, and clears it otherwise. In the diskless case, the MNT_RDONLY flag is already set when this code is reached, but there are no mount options, so it was incorrectly cleared. Change the logic so the MNT_RDONLY flag is set if the "ro" option was specified, and left alone otherwise. Note that the NFS code will still happily let you mount a filesystem RW even if the server exports it RO. I'm not sure how to fix that.	2005-04-27 14:46:02 +00:00
des	de2d951ab7	While I'm here, list the new kenv (boot.netif.name) along with the others.	2005-04-26 20:47:59 +00:00
des	37881dde0f	When netbooting, as soon as we've figured out which interface we booted from, store its name in a kenv variable.	2005-04-26 20:45:29 +00:00
rees	3e9035accc	TCP reconnect is not an error. Change the message from LOG_ERR to LOG_INFO. Approved by: alfred	2005-04-18 13:42:13 +00:00
jeff	e4eab9fb69	- cache_lookup() relocks the parent in the DOTDOT case for us. Spotted by: phk Sponsored by: Isilon Systems, Inc.	2005-04-14 07:08:34 +00:00
jeff	afab3762a0	- Change all filesystems and vfs_cache to relock the dvp once the child is locked in the ISDOTDOT case. Se vfs_lookup.c r1.79 for details. Sponsored by: Isilon Systems, Inc.	2005-04-13 10:59:09 +00:00
jeff	97c40ebd49	- LK_NOPAUSE is a nop now. Sponsored by: Isilon Systems, Inc.	2005-03-31 04:37:09 +00:00
jeff	ca1e4c2fe0	- Remove wantparent, it is no longer necessary. An assert in vfs_lookup.c prevents any callers from doing a modifying op without LOCKPARENT or WANTPARENT.	2005-03-29 13:09:42 +00:00
jeff	141aba2c7b	- cache_lookup() now locks the new vnode for us to prevent some races. Remove redundant code. Sponsored by: Isilon Systems, Inc.	2005-03-29 13:00:37 +00:00
jeff	5f8bc80203	- We no longer have to bother with PDIRUNLOCK, lookup() handles it for us. - Network filesystems are written with a special idiom that checks the cache first, and may even unlock dvp before discovering that a network round-trip is required to resolve the name. I believe dvp is prevented from being recycled even in the forced unmount case by the shared lock on the mount point. If not, this code should grow checks for VI_DOOMED after it relocks dvp or it will access NULL v_data fields. Sponsored by: Isilon Systems, Inc.	2005-03-28 09:29:58 +00:00
jeff	56f1fc7189	- Update vfs_root implementations to match the new prototype. None of these filesystems will support shared locks until they are explicitly modified to do so. Careful review must be done to ensure that this is safe for each individual filesystem. Sponsored by: Isilon Systems, Inc.	2005-03-24 07:39:03 +00:00
ps	114057c633	- The NFS client was incorrectly masking SIGSTOP (which is non-maskable). - The NFS client needs to guard against spurious wakeups while waiting for the response. ltrace causes the process under question to wakeup (possibly from ptrace()), which causes NFS to wakeup from tsleep without the response being delivered. Submitted by: Mohan Srinivasan	2005-03-23 22:10:10 +00:00
das	89bc04ad2d	Don't brelse(bp) if bp is null. Also, eliminate some redundancy and dead code. Found by: Coverity Prevent analysis tool	2005-03-18 21:23:32 +00:00
phk	172eba2632	Use vfs_hash.	2005-03-16 11:28:19 +00:00
jmg	64c69bfb4e	MFp4: use the function to fix the packet header length instead of rolling our own...	2005-03-16 08:13:08 +00:00
jeff	29a4f75b9b	- VOP_INACTIVE should no longer drop the vnode lock. Sponsored by: Isilon Systems, Inc.	2005-03-13 12:15:36 +00:00
jeff	5bd51ec6e6	- The VI_DOOMED flag now signals the end of a vnode's relationship with the filesystem. Check that rather than VI_XLOCK. Sponsored by: Isilon Systems, Inc.	2005-03-13 12:14:56 +00:00
jeff	5f59e0cd19	- It is no longer necessary to lock and unlock the vnode in nfs_close() as the top level does this for us now. Sponsored by: Isilon Systems, Inc.	2005-03-13 12:11:23 +00:00
ps	d4a5a3bc89	Minor cleanup in nfs_request() and removal of a comment that doesn't reflect reality. Submitted by: Mohan Srinivasan	2005-02-26 18:55:36 +00:00
phk	33d6741fda	vp->v_id is a private field for the vfs namecache and it is a big mistake that NFS ever started using it. Long time ago I added the necessary vhold()/vdrop() calls to replace it, but forgot to remove the v_id code. Do it now.	2005-02-22 14:52:00 +00:00
phk	66dfd63961	Try to unbreak the vnode locking around vop_reclaim() (based mostly on patch from kan@). Pull bufobj_invalbuf() out of vinvalbuf() and make g_vfs call it on close. This is not yet a generally safe function, but for this very specific use it is safe. This solves the problem with buffers not being flushed by unmount or after failed mount attempts.	2005-02-19 11:44:57 +00:00
ps	f6b334da2c	Fix for a potential NFS client race where shared data is updated from base context as well as the socket callback. Submitted by: Mohan Srinivasan	2005-02-18 23:41:39 +00:00
jhb	685dd13b54	Drop Giant before calling kthread_exit().	2005-02-07 18:21:50 +00:00
rwatson	39c4afac56	Style cleanup for O_DIRECT sysctl comment introduced in nfs_vnops.c:1.242.	2005-01-29 23:19:08 +00:00
phk	1b21636022	Make filesystems get rid of their own vnodes vnode_pager object in VOP_RECLAIM().	2005-01-28 14:42:17 +00:00
phk	d0599e9c31	Create a vnode_pager object when a file is opened.	2005-01-24 23:03:29 +00:00
phk	8dba90be16	Remove unused cred arg from nfs_vinvalbuf() and many bogus arguments passed for it.	2005-01-24 12:31:06 +00:00
peter	e4129c1fb1	Mostly back out rev 1.33 from quite some time ago, and the followup fixes and tweaks. The code was actually quite broken because it discarded the upper bits of the 64 bit division. We only had a 50% chance of scaling up the blocksize for large NFS client mounts when it was needed. For 5.x and beyond, this was harmless because we could represent the result in either case. For 4.x this was a big problem though. (4.x also has a df(1) bug to compound the problem)	2005-01-18 21:59:44 +00:00
phk	cc0cbc6b34	Eliminate unused and unnecessary "cred" argument from vinvalbuf()	2005-01-14 07:33:51 +00:00
brian	2b05c4cf78	Include opt_bootp.h for BOOTP_NFSROOT PR: 73183 Submitted by: Darrin Smith sdar at salseast dot org MFC after: 7 days	2005-01-12 12:42:46 +00:00
phk	5a497775d6	Add BO_SYNC() and add a default which uses the secret vnode pointer and VOP_FSYNC() for now.	2005-01-11 10:43:08 +00:00
phk	da2718f1af	Remove the unused credential argument from VOP_FSYNC() and VFS_SYNC(). I'm not sure why a credential was added to these in the first place, it is not used anywhere and it doesn't make much sense: The credentials for syncing a file (ability to write to the file) should be checked at the system call level. Credentials for syncing one or more filesystems ("none") should be checked at the system call level as well. If the filesystem implementation needs a particular credential to carry out the syncing it would logically have to the cached mount credential, or a credential cached along with any delayed write data. Discussed with: rwatson	2005-01-11 07:36:22 +00:00
imp	a50ffc2912	/* -> /*- for license, minor formatting changes	2005-01-07 01:45:51 +00:00
ps	86f9a5d44a	If the NFS/TCP stream is out of sync between the client and server, and if the client (erroneously) reads the RPC length as 0 bytes, the client can loop around in the socket callback. Explicitly check for the length being 0 case and teardown/re-connect. Submitted by: Mohan Srinivasan	2005-01-05 23:21:13 +00:00
ps	ad001884ff	Turn NFS directio off until the stability issues are resolved.	2004-12-23 21:30:30 +00:00
ps	0a2e8227c4	Change the NFS sillyrename convention so that we won't run out of sillyrenames (which were limited to 58 per pid per directory, for no good reason). The new format of sillyrenames looks like .nfs.0000b31a.00d24.4 ^^^^^^^^ ^^^^^ ticks pid Submitted by: Mohan Srinivasan mohans at yahoo-inc dot com Obtained from: Yahoo!	2004-12-16 19:28:37 +00:00
ps	7c0944d56c	First cut of NFS direct IO support. - NFS direct IO completely bypasses the buffer and page caches. If a file is open for direct IO all caching is disabled. - Direct IO for Directories will be addressed later. - 2 new NFS directio related sysctls are added. One is a knob to disable NFS direct IO completely (direct IO is enabled by default). The other is to disallow mmaped IO on a file that has at least one O_DIRECT open (see the comment in nfs_vnops.c for more details). The default is to allow mmaps on a file that has O_DIRECT opens. Submitted by: Mohan Srinivasan mohans at yahoo-inc dot com Obtained from: Yahoo!	2004-12-15 22:20:22 +00:00
marcel	4b90107750	Revert rev 1.233. The null-pointer function call (a dereference on ia64) was not the result of a change in the vector operations. It was caused by the NFS locking code using a FIFO and those bypassing the vnode. This indirectly caused the panic. The NFS locking code has been changed. Requested by: phk	2004-12-11 21:36:29 +00:00
ps	b4a200824a	In nfs_rename(), skip the otw rename operation if the fsync (to either src or dst) fails. This closes a potential data loss case (where the fsync failed with ENOSPC, for example). Submitted by: Mohan Srinivasan mohans at yahoo-inc dot com Obtained from: Yahoo!	2004-12-10 03:29:02 +00:00
ps	f46c52047f	Store a hint in the nfsnode to detect sequential access of the file. Kick off a readahead only when sequential access is detected. This eliminates wasteful readaheads in random file access. Submitted by: Mohan Srinivasan mohans at yahoo-inc dot com Obtained from: Yahoo!	2004-12-10 03:27:12 +00:00
ps	81f484b21d	Fix for a Lock Order Reversal in the nfs_flush() path, between the vnode interlock and the proc lock. Reported by: marcel Submitted by: Mohan Srinivasan mohans at yahoo-inc dot com	2004-12-07 21:16:32 +00:00
phk	849729c60b	Don't clobber mnt_stat.f_mntonname	2004-12-07 14:26:39 +00:00
phk	4a639d6164	The remaining part of nmount/omount/rootfs mount changes. I cannot sensibly split the conversion of the remaining three filesystems out from the root mounting changes, so in one go: cd9660: Convert to nmount. Add omount compat shims. Remove dedicated rootfs mounting code. Use vfs_mountedfrom() Rely on vfs_mount.c calling VFS_STATFS() nfs(client): Convert to nmount (the simple way, mount_nfs(8) is still necessary). Add omount compat shims. Drop COMPAT_PRELITE2 mount arg compatibility. ffs: Convert to nmount. Add omount compat shims. Remove dedicated rootfs mounting code. Use vfs_mountedfrom() Rely on vfs_mount.c calling VFS_STATFS() Remove vfs_omount() method, all filesystems are now converted. Remove MNTK_WANTRDWR, handling RO/RW conversions is a filesystem task, and they all do it now. Change rootmounting to use DEVFS trampoline: vfs_mount.c: Mount devfs on /. Devfs needs no 'from' so this is clean. symlink /dev to /. This makes it possible to lookup /dev/foo. Mount "real" root filesystem on /. Surgically move the devfs mountpoint from under the real root filesystem onto /dev in the real root filesystem. Remove now unnecessary getdiskbyname(). kern_init.c: Don't do devfs mounting and rootvnode assignment here, it was already handled by vfs_mount.c. Remove now unused bdevvp(), addaliasu() and addalias(). Put the few necessary lines in devfs where they belong. This eliminates the second-last source of bogo vnodes, leaving only the lemming-syncer. Remove rootdev variable, it doesn't give meaning in a global context and was not trustworth anyway. Correct information is provided by statfs(/).	2004-12-07 08:15:41 +00:00
ps	1d9d717d90	Always issue wakeups() to the NFS requestors under the mutex to close all potential cases of missed wakeups. Submitted by: Mohan Srinivasan mohans at yahoo-inc dot com	2004-12-07 03:39:52 +00:00
ps	eeccf3813d	Rewrite of the NFS client's reply handling. We now have NFS socket upcalls which do RPC header parsing and match up the reply with the request. NFS calls now sleep on the nfsreq structure. This enables us to eliminate the NFS recvlock. Submitted by: Mohan Srinivasan mohans at yahoo-inc dot com	2004-12-06 21:11:15 +00:00
ps	8eaa4f53e4	2 fixes that improve on the consistency of the NFS client cache. - Change the cached mtime to a 'struct timespec' from a time_t. Improving the precision of the cached mtime tightens up NFS' "close-to-open" consistency considerably. - Always force an over-the-wire consistency check from nfs_open() (unless the file is marked modified). This further improves NFS' "close-to-open" consistency. Submitted by: Mohan Srinivasan mohans at yahoo-inc dot com	2004-12-06 19:18:00 +00:00
ps	aa4aa62af0	Serialize NFS vinvalbuf operations by acquiring/upgrading to the vnode EXCLUSIVE lock. This prevents threads from adding pages to the vnode while an invalidation is in progress, closing potential races. In the bioread() path, callers acquire the SHARED vnode lock - so while an invalidate was in progress, it was possible to fault in new pages onto the vnode causing the invalidation to take a while or fail. We saw these races at Yahoo! with very large files+heavy concurrent access. Forcing an upgrade to EXCLUSIVE lock before doing the invalidation closes all these races. Submitted by: Mohan Srinivasan mohans at yahoo-inc dot com	2004-12-06 18:52:28 +00:00
ps	5feadd3eba	Add non-blocking versions of nfsm_dissect() and friends, for use from socket callbacks or similar callers, from both the NFS client and the server. Instituted nfsm_dissect_nonblock(), nfsm_dissect_xx_nonblock(). And nfsm_disct() now takes an extra M_TRYWAIT/M_DONTWAIT argument. Submitted by: Mohan Srinivasan mohans at yahoo-inc dot com	2004-12-06 17:33:52 +00:00
ps	ebd6438ae1	- If all data has been committed to stable storage on the server, it is safe to turn off the nfsnode's NMODIFIED flag. - Move the check for signals to the top of the loop where we loop around the dirty buffers on the vnode, scheduling writes. This ensures that we'll break ouf of the flush operation on reception of a signal. Submitted by: Mohan Srinivasan mohans at yahoo-inc dot com	2004-12-06 16:35:58 +00:00
rwatson	a98750ac1b	Correct a typo in a comment.	2004-12-06 16:11:25 +00:00
phk	753d615ec0	For reasons unknown, the nfs locking code used a fifo to send requests to userland and a dedicated system call to get replies. The vnode-bypass of fifos broke this into a panic. Ditch all the magic and create a device /dev/nfslock instead, and use that for both directions apart from the shorter path, this is also faster because the device driver runs Giant free using the vnode bypass. Noticed by: marcel	2004-12-06 08:31:32 +00:00
rwatson	6b017b90b9	Convert GIANT_REQUIRED; in nfs_mountroot() to NET_ASSERT_GIANT(), and annotate that nfs_mountroot assumes it is OK to step on the values in the global NFSv3 diskless structure as the mountroot function is called during a serialized part of the boot, before any other NFS client activity occurs. MFC after: 2 weeks	2004-12-05 22:53:17 +00:00
rwatson	22be685755	Convert a GIANT_REQUIRED; into a NET_ASSERT_GIANT();, as sockets are now only conditionally protected by Giant based on debug.mpsafenet.	2004-12-05 22:50:09 +00:00
phk	6c14f71ef7	VFS_STATFS(mp, ...) is mostly called with &mp->mnt_stat, but a few cases doesn't. Most of the implementations have grown weeds for this so they copy some fields from mnt_stat if the passed argument isn't that. Fix this the cleaner way: Always call the implementation on mnt_stat and copy that in toto to the VFS_STATFS argument if different.	2004-12-05 22:41:02 +00:00
marcel	8b42e21d12	Fix null-pointer indirect function calls introduced in the previous commit. In the new world order, the transitive closure on the vector operations is not precomputed. As such, it's unsafe to actually use any of the function pointers in an indirect function call. They can be null, and we need to use the default vector in that case. This is mostly a quick fix for the four function pointers that are ed explicitly. A more generic or scalable solution is likely to see the light of day. No pathos on: current@	2004-12-05 22:30:28 +00:00
phk	59f305606c	Back when VOP_* was introduced, we did not have new-style struct initializations but we did have lofty goals and big ideals. Adjust to more contemporary circumstances and gain type checking. Replace the entire vop_t frobbing thing with properly typed structures. The only casualty is that we can not add a new VOP_ method with a loadable module. History has not given us reason to belive this would ever be feasible in the the first place. Eliminate in toto VOCALL(), vop_t, VNODEOP_SET() etc. Give coda correct prototypes and function definitions for all vop_()s. Generate a bit more data from the vnode_if.src file: a struct vop_vector and protype typedefs for all vop methods. Add a new vop_bypass() and make vop_default be a pointer to another struct vop_vector. Remove a lot of vfs_init since vop_vector is ready to use from the compiler. Cast various vop_mumble() to void * with uppercase name, for instance VOP_PANIC, VOP_NULL etc. Implement VCALL() by making vdesc_offset the offsetof() the relevant function pointer in vop_vector. This is disgusting but since the code is generated by a script comparatively safe. The alternative for nullfs etc. would be much worse. Fix up all vnode method vectors to remove casts so they become typesafe. (The bulk of this is generated by scripts)	2004-12-01 23:16:38 +00:00
phk	cb64ed501e	Remove redundant functions (repo-copied from nfsclient) for dealing with fifos.	2004-12-01 20:18:56 +00:00
phk	ab549174e2	Scripted modification of vop_* prototypes to use typedefs.	2004-12-01 19:08:40 +00:00
phk	4eaab0b383	Add missing #include	2004-12-01 07:34:08 +00:00
ps	3601987765	Fix for a race between lookup and readdirplus, that causes a deadlock (with NFS exclusive vnode locks enabled). Lookup grabs the parent's lock and wants to lock child. Readdirplus locks the child and wants to lock parent (for loading the attrs for ".."). The fix is to not load the attrs for ".." in readdirplus. Submitted by: Mohan Srinivasan mohans at yahoo-inc dot com Reviewed by: rwatson	2004-12-01 06:51:07 +00:00
ps	531cb416ae	Clean all dirty pages (dirtied by mmap'ed writes) in nfs_close(). This closes a major hole in close-to-open consistency support. Added a new sysctl so that this can be disabled for single NFS client applications with very large amounts of mmap'ed IO (for performance). Submitted by: Mohan Srinivasan mohans at yahoo-inc dot com Reviewed by: rwatson	2004-12-01 06:48:54 +00:00
ps	69d7e65011	Fix for a (blocks) underrun bug where negative values were being returned back to df from a statfs call. Causing df to print negative values. Submitted by: Mohan Srinivasan mohans at yahoo-inc dot com Reviewed by: rwatson	2004-12-01 06:42:21 +00:00
ps	2b85447398	Fix for a bug in nfs_mkdir() that called vrele() instead of vput() in the error cases, causing panics. Submitted by: Mohan Srinivasan mohans at yahoo-inc dot com Reviewed by: rwatson	2004-11-29 23:05:30 +00:00
jeff	9caab2e843	- Eliminate the acquisition and release of the bqlock in bremfree() by setting the B_REMFREE flag in the buf. This is done to prevent lock order reversals with code that must call bremfree() with a local lock held. This also reduces overhead by removing two lock operations per buf for fsync() and similar. - Check for the B_REMFREE flag in brelse() and bqrelse() after the bqlock has been acquired so that we may remove ourself from the free-list. - Provide a bremfreef() function to immediately remove a buf from a free-list for use only by NFS. This is done because the nfsclient code overloads the b_freelist queue for its own async. io queue. - Simplify the numfreebuffers accounting by removing a switch statement that executed the same code in every possible case. - getnewbuf() can encounter locked bufs on free-lists once Giant is removed. Remove a panic associated with this condition and delay asserts that inspect the buf until after it is locked. Reviewed by: phk Sponsored by: Isilon Systems, Inc.	2004-11-18 08:44:09 +00:00
phk	5eae02ee76	Detect root mount attempts on the flag, not on the NULL path.	2004-11-09 22:21:52 +00:00
phk	e5715b2cc1	Retire b_magic now, we have the bufobj containing the same hint.	2004-11-04 09:48:18 +00:00
phk	1b25a59886	Move the buffer method vector (buf->b_op) to the bufobj. Extend it with a strategy method. Add bufstrategy() which do the usual VOP_SPECSTRATEGY/VOP_STRATEGY song and dance. Rename ibwrite to bufwrite(). Move the two NFS buf_ops to more sensible places, add bufstrategy to them. Add inlines for bwrite() and bstrategy() which calls through buf->b_bufobj->b_ops->b_{write,strategy}(). Replace almost all VOP_STRATEGY()/VOP_SPECSTRATEGY() calls with bstrategy().	2004-10-24 20:03:41 +00:00
phk	52a089c526	Add b_bufobj to struct buf which eventually will eliminate the need for b_vp. Initialize b_bufobj for all buffers. Make incore() and gbincore() take a bufobj instead of a vnode. Make inmem() local to vfs_bio.c Change a lot of VI_[UN]LOCK(bp->b_vp) to BO_[UN]LOCK(bp->b_bufobj) also VI_MTX() to BO_MTX(), Make buf_vlist_add() take a bufobj instead of a vnode. Eliminate other uses of bp->b_vp where bp->b_bufobj will do. Various minor polishing: remove "register", turn panic into KASSERT, use new function declarations, TAILQ_FOREACH_SAFE() etc.	2004-10-22 08:47:20 +00:00
phk	3833976d12	Move the VI_BWAIT flag into no bo_flag element of bufobj and call it BO_WWAIT Add bufobj_wref(), bufobj_wdrop() and bufobj_wwait() to handle the write count on a bufobj. Bufobj_wdrop() replaces vwakeup(). Use these functions all relevant places except in ffs_softdep.c where the use if interlocked_sleep() makes this impossible. Rename b_vnbufs to b_bobufs now that we touch all the relevant files anyway.	2004-10-21 15:53:54 +00:00
pjd	5515efcf61	Add a missing newline character.	2004-10-14 19:00:44 +00:00
das	c32ecae436	nfsclient/nfs_bio.c has a PHOLD() without a PRELE(). Neither should be necessary here. Also, use killproc() instead of psignal().	2004-10-01 05:01:41 +00:00
phk	d3ceec948f	Remove support for using NFS device nodes.	2004-09-28 08:50:01 +00:00
phk	5c67a82c63	Remove NFS4 vop method vector for devices: we are desupporing device nodes on anything but DEVFS and in this case it was not even used (see below). Put the NFS4 vop method for fifo's behind "#if 0" because it is unused. Add a XXX comment to say that I think the unusedness is a bug.	2004-09-27 20:02:50 +00:00
phk	46bdd46105	style consistency.	2004-09-27 19:44:39 +00:00
phk	02df7323ee	Remove unused B_WRITEINPROG flag	2004-09-15 21:49:22 +00:00
phk	9f1a2f23b2	Explicitly pass vnode to nfs_doio() and mountpoint to nfs_asyncio().	2004-09-07 08:56:43 +00:00
rwatson	68779f8b5e	In nfs_timer(), pass curthread rather than &thread0 into the protocol send routine. In IPv6 UDP, the thread will be passed to suser(), which asserts that if a thread is used for a super user check, it be curthread. Many of these protocol entry points probably need to accept credentials instead of threads. MT5 candidate. Noticed/tested by: kuriyama	2004-08-25 01:23:38 +00:00
phk	2d868d02cf	Put a version element in the VFS filesystem configuration structure and refuse initializing filesystems with a wrong version. This will aid maintenance activites on the 5-stable branch. s/vfs_mount/vfs_omount/ s/vfs_nmount/vfs_mount/ Name our filesystems mount function consistently. Eliminate the namiedata argument to both vfs_mount and vfs_omount. It was originally there to save stack space. A few places abused it to get hold of some credentials to pass around. Effectively it is unused. Reorganize the root filesystem selection code.	2004-07-30 22:08:52 +00:00
phk	98d8f3741c	Move a relic to its correct location(s): Put nfs diskless initialization calls with the code they call. (Yet another example of mindless copy&paste).	2004-07-28 21:54:57 +00:00
phk	075684f5fd	Remove global variable rootdevs and rootvp, they are unused as such. Add local rootvp variables as needed. Remove checks for miniroot's in the swappartition. We never did that and most of the filesystems could never be used for that, but it had still been copy&pasted all over the place.	2004-07-28 20:21:04 +00:00
phk	5297516e02	Eliminate unused second argument to reassignbuf() and simplify it accordingly.	2004-07-25 21:24:23 +00:00
alfred	b4f778e20c	Turn off SO_REUSEADDR and SO_REUSEPORT, they were causing EADDRINUSE to be returned from the protocol stack. Pointy hat to me for not groking what those options _really_ mean.	2004-07-13 05:42:59 +00:00
dwmalone	6ff1185c1d	Rename Alfred's kern_setsockopt to so_setsockopt, as this seems a a better name. I have a kern_[sg]etsockopt which I plan to commit shortly, but the arguments to these function will be quite different from so_setsockopt. Approved by: alfred	2004-07-12 21:42:33 +00:00
alfred	8a1713aada	Make VFS_ROOT() and vflush() take a thread argument. This is to allow filesystems to decide based on the passed thread which vnode to return. Several filesystems used curthread, they now use the passed thread.	2004-07-12 08:14:09 +00:00
alfred	031e087d2c	Use SO_REUSEADDR and SO_REUSEPORT when reconnecting NFS mounts. Tune the timeout from 5 seconds to 12 seconds. Provide a sysctl to show how many reconnects the NFS client has done. Seems to fix IPv6 from: kuriyama	2004-07-12 06:22:42 +00:00
brian	aae31dbf32	Change the following environment variables to kernel options: bootp -> BOOTP bootp.nfsroot -> BOOTP_NFSROOT bootp.nfsv3 -> BOOTP_NFSV3 bootp.compat -> BOOTP_COMPAT bootp.wired_to -> BOOTP_WIRED_TO - i.e. back out the previous commit. It's already possible to pxeboot(8) with a GENERIC kernel. Pointed out by: dwmalone	2004-07-08 22:35:36 +00:00
brian	2821a50eaa	Change the following kernel options to environment variables: BOOTP -> bootp BOOTP_NFSROOT -> bootp.nfsroot BOOTP_NFSV3 -> bootp.nfsv3 BOOTP_COMPAT -> bootp.compat BOOTP_WIRED_TO -> bootp.wired_to This lets you PXE boot with a GENERIC kernel by putting this sort of thing in loader.conf: bootp="YES" bootp.nfsroot="YES" bootp.nfsv3="YES" bootp.wired_to="bge1" or even setting the variables manually from the OK prompt.	2004-07-08 13:40:33 +00:00
rwatson	7e85c099fc	Acquire socket lock in nfs_connect() connection/sleep loop to protect socket state and avoid missed wakeups.	2004-07-06 16:55:41 +00:00
alfred	864fa13b59	use vfs_suser() to restrict access to the nfs mount's timeout.	2004-07-06 09:40:44 +00:00
alfred	8fd8b8c57f	NFS mobility Phase VI: Export NFS mount state via sysctl. Export timeout via sysctl.	2004-07-06 09:23:17 +00:00
alfred	97a6f04270	NFS mobility PHASE I, II & III (phase VI, and V pending): Rebind the client socket when we experience a timeout. This fixes the case where our IP changes for some reason. Signal a VFS event when NFS transitions from up to down and vice versa. Add a placeholder vfs_sysctl where we will put status reporting shortly. Also: Make down NFS mounts return EIO instead of EINTR when there is a soft timeout or force unmount in progress.	2004-07-06 09:12:03 +00:00
phk	070a613a48	When we traverse the vnodes on a mountpoint we need to look out for our cached 'next vnode' being removed from this mountpoint. If we find that it was recycled, we restart our traversal from the start of the list. Code to do that is in all local disk filesystems (and a few other places) and looks roughly like this: MNT_ILOCK(mp); loop: for (vp = TAILQ_FIRST(&mp...); (vp = nvp) != NULL; nvp = TAILQ_NEXT(vp,...)) { if (vp->v_mount != mp) goto loop; MNT_IUNLOCK(mp); ... MNT_ILOCK(mp); } MNT_IUNLOCK(mp); The code which takes vnodes off a mountpoint looks like this: MNT_ILOCK(vp->v_mount); ... TAILQ_REMOVE(&vp->v_mount->mnt_nvnodelist, vp, v_nmntvnodes); ... MNT_IUNLOCK(vp->v_mount); ... vp->v_mount = something; (Take a moment and try to spot the locking error before you read on.) On a SMP system, one CPU could have removed nvp from our mountlist but not yet gotten to assign a new value to vp->v_mount while another CPU simultaneously get to the top of the traversal loop where it finds that (vp->v_mount != mp) is not true despite the fact that the vnode has indeed been removed from our mountpoint. Fix: Introduce the macro MNT_VNODE_FOREACH() to traverse the list of vnodes on a mountpoint while taking into account that vnodes may be removed from the list as we go. This saves approx 65 lines of duplicated code. Split the insmntque() which potentially moves a vnode from one mount point to another into delmntque() and insmntque() which does just what the names say. Fix delmntque() to set vp->v_mount to NULL while holding the mountpoint lock.	2004-07-04 08:52:35 +00:00
rwatson	6b9af88e9d	When updating sb_flags, acquire the socket buffer lock to prevent races.	2004-06-24 03:12:13 +00:00
phk	40dd98a3bd	Second half of the dev_t cleanup. The big lines are: NODEV -> NULL NOUDEV -> NODEV udev_t -> dev_t udev2dev() -> findcdev() Various minor adjustments including handling of userland access to kernel space struct cdev etc.	2004-06-17 17:16:53 +00:00
rwatson	10cdb7ab20	Remove bad cookie vp kernel printf; while it does notify about an interesting event, there's little or nothing the user can do about it.	2004-06-17 00:15:37 +00:00
rwatson	65f0bd9a10	Convert GIANT_REQUIRED to NET_ASSERT_GIANT where Giant is used to protect socket operations. Leave one "as-is" as it also frobs rootvp.	2004-06-16 03:12:50 +00:00
alc	b57e5e03fd	Make vm_page's PG_ZERO flag immutable between the time of the page's allocation and deallocation. This flag's principal use is shortly after allocation. For such cases, clearing the flag is pointless. The only unusual use of PG_ZERO is in vfs_bio_clrbuf(). However, allocbuf() never requests a prezeroed page. So, vfs_bio_clrbuf() never sees a prezeroed page. Reviewed by: tegge@	2004-05-06 05:03:23 +00:00
peadar	5617193e91	Let the NFS client notice a file's size changing as a modification. This avoids presenting invalid data to the client's applications when the file is modified, and then extended within the window of the resolution of the modifcation timestamp. Reviewed By: iedowse PR: kern/64091	2004-04-14 23:23:55 +00:00
marcel	6dbee1d482	Unbreak build: s/TAILQ_ISEMPTY/TAILQ_EMPTY/g	2004-04-11 17:15:36 +00:00
peadar	9bb40b73ee	Clean up properly when unloading NFS client module. This includes a modified form of some code from Thomas Moestl (tmm@) to properly clean up the UMA zone and the "nfsnodehashtbl" hash table. Reviewed By: iedowse PR: 16299	2004-04-11 13:30:20 +00:00
imp	ebf059d1df	Remove advertising clause from University of California Regent's license, per letter dated July 22, 1999 and email from Peter Wemm, Alan Cox and Robert Watson. Approved by: core, peter, alc, rwatson	2004-04-07 05:00:01 +00:00
rwatson	18b25cc43d	Spell 2 as SHUT_RDWR when used as an argument to soshutdown().	2004-04-04 19:24:08 +00:00
peadar	461541ae31	Flush cached access mode after modifying a files attributes for NFSv3. It's likely that modifying the attributes will affect the file's accessibility. This version of the patch is one suggested by Ian Dowse after reviewing my original attempt in the PR Reviewed By: iedowse PR: kern/44336 MFC after: 3 days	2004-04-03 17:23:46 +00:00
kan	97b7fb767e	Reset callout if in nfs_timeout and rpcclnt_timeout functions. Timer are supposed to continue firing as long as there is work to do, not stop after the first invocation. This is damage control after a patch that has been committed prematurely. Tested by: kris	2004-03-28 05:55:27 +00:00
rees	4bf96c35a5	only do nfs rpc callouts if there is work to do. Submitted by: kan Approved by: alfred	2004-03-25 21:48:09 +00:00
pjd	257769d6cc	Add a comment with an explanation why we don't report EPIPE errors on nfs sockets. Requested by: ru	2004-03-17 21:10:20 +00:00
pjd	f8bf3c9231	Don't report EPIPE errors on nfs sockets. These can be due to idle tcp mounts which will be closed by netapp, solaris, etc. if left idle too long. Obtained from: NetBSD	2004-03-17 18:10:38 +00:00
peter	36be86fb0a	Calculate NFS timeouts in units of 10ms, not 5ms. This matches the default clock precision on i386. This is a NOP change on i386. But this stops the mount_nfs units from suddenly changing to units of 1/20 of a second (vs the normal 1/10 of a second) if HZ is increased.	2004-03-14 06:21:56 +00:00
brooks	7e688e2cec	Allow kernel with the BOOTP option to boot when DHCP/BOOTP sets the root path to an absolute path without a host name. Previously, there was a nasty POLA violation where a system would PXE boot until you added the BOOTP option and then it would panic instead. Reviewed by: tegge, Dirk-Willem van Gulik <dirkx at webweaving.org> (a previous version) Submitted by: tegge (getip function)	2004-03-12 20:37:40 +00:00
phk	2a5e157787	Properly vector all bwrite() and BUF_WRITE() calls through the same path and s/BUF_WRITE()/bwrite()/ since it now does the same as bwrite().	2004-03-11 18:02:36 +00:00
phk	eeb7579130	Remove unused second arg to vfinddev(). Don't call addaliasu() on VBLK nodes.	2004-03-11 16:33:11 +00:00
rwatson	b0b5f961bd	Rename dup_sockaddr() to sodupsockaddr() for consistency with other functions in kern_socket.c. Rename the "canwait" field to "mflags" and pass M_WAITOK and M_NOWAIT in from the caller context rather than "1" or "0". Correct mflags pass into mac_init_socket() from previous commit to not include M_ZERO. Submitted by: sam	2004-03-01 03:14:23 +00:00
rees	108fca056b	NFSv4 fixes from Connectathon 2004: remove unused pid field of file context struct map nfs4 error codes to errnos eliminate redundant code from nfs4_request use zero stateid on setattr that doesn't set file size use same clientid on all mounts until reboot invalidate dirty bufs in nfs4_close, to play it safe open file for writing if truncating and it's not already open Approved by: alfred	2004-02-27 19:37:43 +00:00
cperciva	9576df9a82	If mountnfs returns an error, it will have already freed nam; no need to free it again. Reported by: "Ted Unangst" <tedu@coverity.com> Approved by: rwatson (mentor)	2004-02-22 01:17:47 +00:00
jhb	279b2b8278	Locking for the per-process resource limits structure. - struct plimit includes a mutex to protect a reference count. The plimit structure is treated similarly to struct ucred in that is is always copy on write, so having a reference to a structure is sufficient to read from it without needing a further lock. - The proc lock protects the p_limit pointer and must be held while reading limits from a process to keep the limit structure from changing out from under you while reading from it. - Various global limits that are ints are not protected by a lock since int writes are atomic on all the archs we support and thus a lock wouldn't buy us anything. - All accesses to individual resource limits from a process are abstracted behind a simple lim_rlimit(), lim_max(), and lim_cur() API that return either an rlimit, or the current or max individual limit of the specified resource from a process. - dosetrlimit() was renamed to kern_setrlimit() to match existing style of other similar syscall helper functions. - The alpha OSF/1 compat layer no longer calls getrlimit() and setrlimit() (it didn't used the stackgap when it should have) but uses lim_rlimit() and kern_setrlimit() instead. - The svr4 compat no longer uses the stackgap for resource limits calls, but uses lim_rlimit() and kern_setrlimit() instead. - The ibcs2 compat no longer uses the stackgap for resource limits. It also no longer uses the stackgap for accessing sysctl's for the ibcs2_sysconf() syscall but uses kernel_sysctl() instead. As a result, ibcs2_sysconf() no longer needs Giant. - The p_rlimit macro no longer exists. Submitted by: mtm (mostly, I only did a few cleanups and catchups) Tested on: i386 Compiled on: alpha, amd64	2004-02-04 21:52:57 +00:00
obrien	ac460e8a52	Bump the NFCv3/TCP defaults for rsize and wsize from 8K to 32K to match Solaris and HP-UX. This increases read performance for large files across NFS. PR: 62024 & 26324 Submitted by: Bjoern Groenvall <bg@sics.se>	2004-01-31 10:40:15 +00:00
alfred	a5dc4dbeb8	Use function pointers to remove the depenancy cross dependancy on nfs4 and the nfs3 client. Also fix some bugs that happen to be causing crashes in both v3 and v4 introduced by the v4 import. Submitted by: Jim Rees <rees@umich.edu> Approved by: re	2003-11-22 02:21:49 +00:00
alfred	490e2fe2e2	Move the declaration for "struct nfs4_fctx" out from under #ifdef KERNEL for fstat(1).	2003-11-15 05:03:15 +00:00
alfred	302841f20d	unbreak LINT.	2003-11-15 00:26:42 +00:00
alfred	5b076fe9da	University of Michigan's Citi NFSv4 kernel client code. Submitted by: Jim Rees <rees@umich.edu>	2003-11-14 20:54:10 +00:00
kan	9352a05d40	1. Consolidate mount struct allocation/destruction into a common code in vfs_mount_alloc/vfs_mount_destroy functions and take care to completely destroy the mount point along with its locks. Mount struct has grown in coplexity recently and depending on each failure path to destroy it completely isn't working anymore. 2. Eliminate largely identical vfs_mount and vfs_unmount question by moving the code to handle both cases into a newly introduced vfs_domount function. 3. Simplify nfs_mount_diskless to always expect an allocated mount struct and never attempt an allocation/destruction itself. The vfs_allocroot allocation was there to support 'magic' swap space configuration for diskless clients that was already removed by PHK some time ago. 4. Include a vfs_buildopts cleanups by Peter Edwards to validate the sanity of nmount parameters passed from userland. Submitted by: (4) Peter Edwards <peter.edwards@openet-telecom.com> Reviewed by: rwatson	2003-11-12 02:54:47 +00:00
alfred	b1d1754bf2	Stop using shared locks for nfs vop locks. The reason this was done was to avoid a race to the root when an NFS server went down. However a semi-recent change to the way that the kernel's lookup() routine traverses mount points prevents this. Rev 1.39 of vfs_lookup.c changed the ordering of locks such that we aquire a shared lock on the mount point being accessed and then drop the directory vnode lock before requesting the target lock. With that in place we no longer need shared locks for NFS to prevent race to the root lockups.	2003-11-11 00:32:46 +00:00
sam	3eac15aaa3	Assert GIANT_REQUIRED where sockets are manipulated. This is preparatory for MPSAFE network commits and ongoing socket locking work. Supported by: FreeBSD Foundation	2003-11-07 22:57:09 +00:00
kan	36d60f3bb7	Remove mntvnode_mtx and replace it with per-mountpoint mutex. Introduce two new macros MNT_ILOCK(mp)/MNT_IUNLOCK(mp) to operate on this mutex transparently. Eventually new mutex will be protecting more fields in struct mount, not only vnode list. Discussed with: jeff	2003-11-05 04:30:08 +00:00

... 3 4 5 6 7 ...

1055 Commits