freebsd-skq

Author	SHA1	Message	Date
Tor Egge	61b9d89ff0	Make insmntque() externally visibile and allow it to fail (e.g. during late stages of unmount). On failure, the vnode is recycled. Add insmntque1(), to allow for file system specific cleanup when recycling vnode on failure. Change getnewvnode() to no longer call insmntque(). Previously, embryonic vnodes were put onto the list of vnode belonging to a file system, which is unsafe for a file system marked MPSAFE. Change vfs_hash_insert() to no longer lock the vnode. The caller now has that responsibility. Change most file systems to lock the vnode and call insmntque() or insmntque1() after a new vnode has been sufficiently setup. Handle failed insmntque*() calls by propagating errors to callers, possibly after some file system specific cleanup. Approved by: re (kensmith) Reviewed by: kib In collaboration with: kib	2007-03-13 01:50:27 +00:00
Mohan Srinivasan	d9915117c9	Back out a chance to nfs_timer() that inadvertantly crept in the last checkin :(	2007-03-09 04:07:54 +00:00
Mohan Srinivasan	f9bb753844	Over NFS, an open() call could result in multiple over-the-wire GETATTRs being generated - one from lookup()/namei() and the other from nfs_open() (for cto consistency). This change eliminates the GETATTR in nfs_open() if an otw GETATTR was done from the namei() path. Instead of extending the vop interface, we timestamp each attr load, and use this to detect whether a GETATTR was done from namei() for this syscall. Introduces a thread-local variable that counts the syscalls made by the thread and uses <pid, tid, thread syscalls> as the attrload timestamp. Thanks to jhb@ and peter@ for a discussion on thread state that could be used as the timestamp with minimal overhead.	2007-03-09 04:02:38 +00:00
John Baldwin	4d70511ac3	Use pause() rather than tsleep() on stack variables and function pointers.	2007-02-27 17:23:29 +00:00
Mohan Srinivasan	0973754e14	Backing out an earlier change. It seems harmless for NFS to miss the "force unmount" flag, making the acquisition of the MNT_ILOCK in nfs_request() and nfs_sigintr() unnecessary. Pointed out by tegge@.	2007-02-16 03:46:55 +00:00
Mohan Srinivasan	024465d002	Add missing MNT_ILOCK around some mnt_kern_flag accesses.	2007-02-11 04:01:10 +00:00
Mohan Srinivasan	4e99994cc9	Fix for a vnode lock leak in nfs_create() in the event of an error. Spotted by ups@.	2007-01-31 23:10:27 +00:00
Kris Kennaway	410355bf69	Instead of always hard-coding the socket type for the nfs root mount as SOCK_DGRAM (i.e. UDP), respect the value configured earlier. This allows TCP NFS root mounts using e.g. the boot.nfsroot.options="tcp" tunable. In this case some of the connection parameters like the retry timer were previously set appropriately for TCP but inappropriately for the UDP socket that was actually used, leading to e.g. extremely long recovery times (O(hours)) after a nfs server reboot. Reviewed by: mohans MFC After: 2 weeks	2007-01-30 00:26:04 +00:00
Bruce Evans	e43982a801	Unstaticize nfs_iosize() in nfsclient and use it in nfs4client instead of duplicating it except for larger style bugs in the copy. Fix some nearby style bugs (including a harmless type mismatch) in and near the remaining copy. This is part of fixing collisions of the 2 nfs*client's names. Even static names should have a unique prefixes so that they can be debugged easily.	2007-01-25 13:07:25 +00:00
Konstantin Belousov	2cc7d26f7f	Cylinder group bitmaps and blocks containing inode for a snapshot file are after snaplock, while other ffs device buffers are before snaplock in global lock order. By itself, this could cause deadlock when bdwrite() tries to flush dirty buffers on snapshotted ffs. If, during the flush, COW activity for snapshot needs to allocate block and ffs_alloccg() selects the cylinder group that is being written by bdwrite(), then kernel would panic due to recursive buffer lock acquision. Avoid dealing with buffers in bdwrite() that are from other side of snaplock divisor in the lock order then the buffer being written. Add new BOP, bop_bdwrite(), to do dirty buffer flushing for same vnode in the bdwrite(). Default implementation, bufbdflush(), refactors the code from bdwrite(). For ffs device buffers, specialized implementation is used. Reviewed by: tegge, jeff, Russell Cattelan (cattelan xfs org, xfs changes) Tested by: Peter Holm X-MFC after: 3 weeks (if ever: it changes ABI)	2007-01-23 10:01:19 +00:00
Mohan Srinivasan	7f3a6e42c9	NetApp filers return corrupt post op attrs in the wcc on NFS error responses. This is easy to reproduce for EROFS. I am not sure if the attrs can be corrupt for other NFS error responses. For now, disabling wcc pre-op attr checks and post-op attr loads on NFS errors (sysctl'ed). Reported by: Kris Kennaway	2006-12-11 19:54:25 +00:00
Sam Leffler	49d5157434	consolidate parsing of nfs root mount options in one place and handle all options (some may require fixes elsewhere) Reviewed by: jhb, mohans MFC after: 1 month	2006-12-06 02:15:25 +00:00
Mohan Srinivasan	594ece53bc	In nfs_nget(), we must initialize the fh in the nfsnode before inserting the vnode into the vfs hash. Otherwise, another thread walking the hash can trip on an nfsnode with an uninitialized or partially initialized fh. Thanks to ups@ for spotting this race.	2006-11-29 02:21:40 +00:00
Mohan Srinivasan	d4875805d7	bde@ pointed out that tprintf() acquires Giant so callers of tprintf() don't have to explicitly acquire Giant (although they need to be aware of this and not hold any locks at that point). Remove the acquisitions of Giant in the NFS client wrapping tprintf().	2006-11-27 23:26:06 +00:00
Mohan Srinivasan	88d5725c38	Fix for a bug caused by a race when 2 threads lookup the same file. Leave the loser's lock(s) initialized, so the reclaim logic can unconditionally destroy them when that race occurs (or if the vfs hash insert happened to fail for some other reason). Thanks to ups@ for a careful review of the code. Reported by : Kris Kennaway	2006-11-27 19:06:43 +00:00
Mohan Srinivasan	a18c4dc336	1) Fix up locking in nfs_up() and nfs_down. 2) Reduce the acquisitions of the Giant lock in the nfs_socket.c paths significantly. - We don't need to acquire Giant before tsleeping on lbolt anymore, since jhb specialcased lbolt handling in msleep. - nfs_up() needs to acquire Giant only if printing the "server up" message. - nfs_timer() held Giant for the duration of the NFS timer processing, just because the printing of the message in nfs_down() needed it (and we acquire other locks in nfs_timer()). The acquisition of Giant is moved down into nfs_down() now, reducing the time Giant is held in that path. Reported by: Kris Kennaway	2006-11-20 04:14:23 +00:00
Mohan Srinivasan	3c2fcc3c92	vfs_hash_insert() vputs() the losing vnode before returning, in the event of a race where a duplicate vnode is entered into the vfs hash. nfs_nget() shouldn't be releasing the vnode in that case.	2006-11-16 23:03:46 +00:00
Mohan Srinivasan	87c125cecc	Fix to readdir+ reply handling. When inserting an entry into the namecache, initialize the nfsnode's ctime. Otherwise a subsequent lookup purges the just entered namecache entry.	2006-11-16 23:02:37 +00:00
Sam Leffler	83cc6b9ad2	honor nolockd flag in root mount options MFC after: 2 weeks	2006-11-07 18:02:45 +00:00
Mohan Srinivasan	88b94fba38	Make EWOULDBLOCK a recoverable error so that the request is retransmitted. This bug results in data corruption with NFS/TCP. Writes are silently dropped on EWOULDBLOCK (because socket send buffer is full and sockbuf timer fires). Reviewed by: ups@	2006-10-31 20:25:37 +00:00
Bruce Evans	35259c2c89	Fixed some style bugs (especially ones involving long lines and use of __P(())). There are many more.	2006-10-17 22:07:07 +00:00
Bruce Evans	6a72ff6b09	Don't do null Setattr RPCs for VA_MARK_ATIME. When we added the VA_MARK_ATIME feature to fix POSIX conformance fore execve() and mmap(), we thought that it was optimized well enough for the one file system that supports it (ffs) and harmless for other file systems (except layered ones which already get the layering for VOP_SETATTR() wrong). However, nfs_setattr() doesn't do much parameter checking, so when it gets a combination of parameters that it doesn't understand, it always does a Setattr RPC. This RPC can't do anything good, and for VA_MARK_ATIME it is null except for wasting a lot of time. This is the smallest and easiest to fix of several bugs that have increased the number of RPCs for kernel builds on nfs by more than 100% since 2004-11-05. The real-time increase depends on network latency and parallelization and can also be very large (approaching the same percentage for unparallelized operations like "make depend" on systems with fast CPUs and high-latency networks).	2006-10-14 07:25:11 +00:00
Poul-Henning Kamp	f645b0b51c	First part of a little cleanup in the calendar/timezone/RTC handling. Move relevant variables to <sys/clock.h> and fix #includes as necessary. Use libkern's much more time- & spamce-efficient BCD routines.	2006-10-02 12:59:59 +00:00
Tor Egge	a1e363f256	Add mnt_noasync counter to better handle interleaved calls to nmount(), sync() and sync_fsync() without losing MNT_ASYNC. Add MNTK_ASYNC flag which is set only when MNT_ASYNC is set and mnt_noasync is zero, and check that flag instead of MNT_ASYNC before initiating async io.	2006-09-26 04:15:59 +00:00
Tor Egge	5da56ddb21	Use mount interlock to protect all changes to mnt_flag and mnt_kern_flag. This eliminates a race where MNT_UPDATE flag could be lost when nmount() raced against sync(), sync_fsync() or quotactl().	2006-09-26 04:12:49 +00:00
Mohan Srinivasan	7d7d9e2242	Fixes up the handling of shared vnode lock lookups in the NFS client, adds a FS type specific flag indicating that the FS supports shared vnode lock lookups, adds some logic in vfs_lookup.c to test this flag and set lock flags appropriately. - amd on 6.x is a non-starter (without this change). Using amd under heavy load results in a deadlock (with cascading vnode locks all the way to the root) very quickly. - This change should also fix the more general problem of cascading vnode deadlocks when an NFS server goes down. Ideally, we wouldn't need these changes, as enabling shared vnode lock lookups globally would work. Unfortunately, UFS, for example isn't ready for shared vnode lock lookups, crashing pretty quickly. This change is the result of discussions with Stephan Uphoff (ups@). Reviewed by: ups@	2006-09-13 18:39:09 +00:00
Mohan Srinivasan	6cd7078919	Fix for a deadlock triggered by a 'umount -f' causing a NFS request to never retransmit (or return). Thanks to John Baldwin for helping nail this one. Found by : Kris Kennaway	2006-08-29 22:00:12 +00:00
Thomas Quinot	3401780fa0	Fix typos in comment.	2006-08-16 23:53:05 +00:00
Alan Cox	5786be7cc7	Introduce a field to struct vm_page for storing flags that are synchronized by the lock on the object containing the page. Transition PG_WANTED and PG_SWAPINPROG to use the new field, eliminating the need for holding the page queues lock when setting or clearing these flags. Rename PG_WANTED and PG_SWAPINPROG to VPO_WANTED and VPO_SWAPINPROG, respectively. Eliminate the assertion that the page queues lock is held in vm_page_io_finish(). Eliminate the acquisition and release of the page queues lock around calls to vm_page_io_finish() in kern_sendfile() and vfs_unbusy_pages().	2006-08-09 17:43:27 +00:00
Brooks Davis	a36aa44a85	Add a new kernel environment variable "boot.netif.mtu" which is used to set the MTU prior to mounting root via NFS. This is required if the server supports a higher than default MTU because the client will not see the responses otherwise. MFC after: 3 weeks	2006-08-09 01:56:17 +00:00
Robert Watson	b0668f7151	soreceive_generic(), and sopoll_generic(). Add new functions sosend(), soreceive(), and sopoll(), which are wrappers for pru_sosend, pru_soreceive, and pru_sopoll, and are now used univerally by socket consumers rather than either directly invoking the old so*() functions or directly invoking the protocol switch method (about an even split prior to this commit). This completes an architectural change that was begun in 1996 to permit protocols to provide substitute implementations, as now used by UDP. Consumers now uniformly invoke sosend(), soreceive(), and sopoll() to perform these operations on sockets -- in particular, distributed file systems and socket system calls. Architectural head nod: sam, gnn, wollman	2006-07-24 15:20:08 +00:00
Konstantin Belousov	c915bcbad2	Signals may be delivered to process as well as to the thread. Check the thread-delivered signals in addition to the process one. Reviewed by: mohan MFC after: 1 month Approved by: kan (mentor)	2006-07-08 15:39:11 +00:00
Konstantin Belousov	201599c3af	Always supply curthread as argument to nfs_asyncio and nfs_doio in nfs_strategy. Otherwise, for some buffers, signals would be ignored at the intr mounts. Reviewed by: mohan MFC after: 1 month Approved by: kan (mentor)	2006-07-08 15:36:51 +00:00
Yaroslav Tykhiy	4b97d7affd	There is a consensus that ifaddr.ifa_addr should never be NULL, except in places dealing with ifaddr creation or destruction; and in such special places incomplete ifaddrs should never be linked to system-wide data structures. Therefore we can eliminate all the superfluous checks for "ifa->ifa_addr != NULL" and get ready to the system crashing honestly instead of masking possible bugs. Suggested by: glebius, jhb, ru	2006-06-29 19:22:05 +00:00
Yaroslav Tykhiy	576cdf4352	Use the elegant TAILQ_FOREACH() in place of a hand-rolled for() loop.	2006-06-29 15:37:39 +00:00
Mohan Srinivasan	64c3892747	Kris Kennaway found that for '/' NFS mounts, the MPSAFE mount flag was not being set, which means Giant would be acquired for these mounts.	2006-05-30 20:32:44 +00:00
Mohan Srinivasan	1af6f471ca	Fix for a potential attempt to sleep while holding nm_mtx. Caught and reported by Witness (which forces the mbuf allocation flag to M_NOWAIT). Reported by: "sekes".	2006-05-26 18:45:55 +00:00
Stephan Uphoff	6c1b7d16c2	Call vm_object_page_clean() with the object lock held. Submitted by: kensmith@ Reviewed by: mohans@ MFC after: 6 days	2006-05-25 17:16:11 +00:00
Stephan Uphoff	dcf67e65d2	Do not set B_NOCACHE on buffers when releasing them in flushbuflist(). If B_NOCACHE is set the pages of vm backed buffers will be invalidated. However clean buffers can be backed by dirty VM pages so invalidating them can lead to data loss. Add support for flush dirty page in the data invalidation function of some network file systems. This fixes data losses during vnode recycling (and other code paths using invalbuf(,V_SAVE,,*)) for data written using an mmaped file. Collaborative effort by: jhb@,mohans@,peter@,ps@,ups@ Reviewed by: tegge@ MFC after: 7 days	2006-05-25 01:00:35 +00:00
Mohan Srinivasan	5bbfbd1422	Since NFSv4 is not SMP safe, nfsiod needs to acquire Giant for NFSv4 mounts before doing the read/write. Reported by: Chuck Lever.	2006-05-24 23:06:50 +00:00
Robert Watson	33c6a485bd	Adjust minimum iod threads from 4 to 0 -- since we compile the NFS client into the kernel by default, and many users won't use NFS, don't start an extra 4 kernel threads that are unused. Once NFS becomes active, it will start nfsiod's as it needs them. We might consider mandating a minimum iod's equal to the number of active NFS mounts (truncated to some value), which would force some to remain available without having to create a new one if the file system is mostly inactive. PR: 70880 MFC after: 2 weeks Prodded by: cel Head nod: peter Pointed out by: Joe <fbsd_user at a1poweruser dot com>	2006-05-24 21:04:46 +00:00
Chuck Lever	6d0699a5ba	NFS over TCP retransmit behavior should default to a 60 second time out, mimicing the NFS reference implementation. NFS over TCP does not need fast retransmit timeouts, since network loss and congestion are managed by the transport (TCP), unlike with NFS over UDP. A long timeout prevents the unnecessary retransmission of non- idempotent NFS requests. Reviewed by: mohans, silby, rees? Sponsored by: Network Appliance, Incorporated	2006-05-23 18:48:07 +00:00
Chuck Lever	94163ea283	Refactor the NFS over UDP retransmit timeout estimation logic to allow the estimator to be more easily tuned and maintained. There should be no functional change except there is now a lower limit on the retransmit timeout to prevent the client from retransmitting faster than the server's disks can fill requests, and an upper limit to prevent the estimator from taking to long to retransmit during a server outage. Reviewed by: mohan, kris, silby Sponsored by: Network Appliance, Incorporated	2006-05-23 18:33:58 +00:00
Mohan Srinivasan	f2c48228fe	Vnode locks are recursive and the NFS client support shared vnode locks. Found by: Kris Kennaway.	2006-05-23 16:07:23 +00:00
Mohan Srinivasan	f1cdf89911	Changes to make the NFS client MP safe. Thanks to Kris Kennaway for testing and sending lots of bugs my way.	2006-05-19 00:04:24 +00:00
Mohan Srinivasan	671d06fb2e	Fix a snafu caused while patching the previous fix from another branch.	2006-05-05 18:12:13 +00:00
Mohan Srinivasan	9f5b7dea42	Fix for a NFS/TCP client bug which would cause the NFS/TCP stream to get out of sync under heavy loads, forcing frequent reconnets, causing EBADRPC errors etc.	2006-05-05 18:04:53 +00:00
Mohan Srinivasan	5ef7d50da5	Keep track of the number of in-progress async direct IO writes in the nfsnode. Make fsync/close wait until all of these drain. Add a check to nfs_getpage() and nfs_putpage().	2006-04-06 01:20:30 +00:00
Jeff Roberson	b2282f9a3f	- Busy the filesystem in nfs_statfs to prevent us from creating a new vnode after vflush() has succeeded. This would cause a dangling vnode panic at unmount time otherwise. Other filesystems may have this problem via their VFS_VGET() routines. Found by: kris Sponsored by: Isilon Systems, Inc.	2006-04-01 01:15:23 +00:00
Kris Kennaway	78e31796c9	Fix a bug in the NFS/TCP retransmission path. The bug was that earlier, if a request was retransmitted, we would do subsequent retransmits every 10 msecs. This can cause data corruption under moderate loads by reordering operations as seen by the client NFS attribute cache, and on the server side when the retransmission occurs after the original request has left the duplicate cache, since the operation will be committed for a second time. Further work on retransmission handling is needed (e.g. they are still being done sent too often since they are scaled by HZ, and the size of the dup cache is too small and easily overwhelmed on busy servers). Submitted by: mohans	2006-03-23 22:58:42 +00:00

1 2 3 4 5 ...

814 Commits