freebsd-skq

Author	SHA1	Message	Date
mm	1626913ed1	Add support for mounting devfs inside jails. A new jail(8) option "devfs_ruleset" defines the ruleset enforcement for mounting devfs inside jails. A value of -1 disables mounting devfs in jails, a value of zero means no restrictions. Nested jails can only have mounting devfs disabled or inherit parent's enforcement as jails are not allowed to view or manipulate devfs(8) rules. Utilizes new functions introduced in r231265. Reviewed by: jamie MFC after: 1 month	2012-02-09 10:22:08 +00:00
mm	27a58a0dde	Introduce the "ruleset=number" option for devfs(5) mounts. Add support for updating the devfs mount (currently only changing the ruleset number is supported). Check mnt_optnew with vfs_filteropt(9). This new option sets the specified ruleset number as the active ruleset of the new devfs mount and applies all its rules at mount time. If the specified ruleset doesn't exist, a new empty ruleset is created. MFC after: 1 month	2012-02-09 10:09:12 +00:00
pfg	1fecd2eae2	Update the data structures with some fields reserved for ext4 but that can be used in ext3 mode. Also adjust the internal inode to carry the birthtime, like in UFS, which is starting to get some use when big inodes are available. Right now these are just placeholders for features to come. Approved by: jhb (mentor) MFC after: 2 weeks	2012-02-07 22:31:28 +00:00
rmacklem	c9e28aac3f	r228827 fixed a problem where copying of NFSv4 open credentials into a credential structure would corrupt it. This happened when the p argument was != NULL. However, I now realize that the copying of open credentials should only happen for p == NULL, since that indicates that it is a read-ahead or write-behind. This patch fixes this. After this commit, r228827 could be reverted, but I think the code is clearer and safer with the patch, so I am going to leave it in. Without this patch, it was possible that a NFSv4 VOP_SETATTR() could have changed the credentials of the caller. This would have happened if the process doing the VOP_SETATTR() did not have the file open, but some other process running as a different uid had the file open for writing at the same time. MFC after: 5 days	2012-02-07 16:32:43 +00:00
jhb	1a6892cbe7	Rename cache_lookup_times() to cache_lookup() and retire the old API and ABI stub for cache_lookup().	2012-02-06 17:00:28 +00:00
kib	52c17430bc	Current implementations of sync(2) and syncer vnode fsync() VOP uses mnt_noasync counter to temporary remove MNTK_ASYNC mount option, which is needed to guarantee a synchronous completion of the initiated i/o before syscall or VOP return. Global removal of MNTK_ASYNC option is harmful because not only i/o started from corresponding thread becomes synchronous, but all i/o is synchronous on the filesystem which is initiated during sync(2) or syncer activity. Instead of removing MNTK_ASYNC from mnt_kern_flag, provide a local thread flag to disable async i/o for current thread only. Use the opportunity to move DOINGASYNC() macro into sys/vnode.h and consistently use it through places which tested for MNTK_ASYNC. Some testing demonstrated 60-70% improvements in run time for the metadata-intensive operations on async-mounted UFS volumes, but still with great deviation due to other reasons. Reviewed by: mckusick Tested by: scottl MFC after: 2 weeks	2012-02-06 11:04:36 +00:00
rmacklem	f324662380	When a "mount -u" switches an NFS mount point from TCP to UDP, any thread doing an I/O RPC with a transfer size greater than NFS_UDPMAXDATA will be hung indefinitely, retrying the RPC. After a discussion on freebsd-fs@, I decided to add a warning message for this case, as suggested by Jeremy Chadwick. Suggested by: freebsd at jdc.parodius.com (Jeremy Chadwick) MFC after: 2 weeks	2012-01-31 03:58:26 +00:00
rmacklem	67ad565252	A problem with respect to data read through the buffer cache for both NFS clients was reported to freebsd-fs@ under the subject "NFS corruption in recent HEAD" on Nov. 26, 2011. This problem occurred when a TCP mounted root fs was changed to using UDP. I believe that this problem was caused by the change in mnt_stat.f_iosize that occurred because rsize was decreased to the maximum supported by UDP. This patch fixes the problem by using v_bufobj.bo_bsize instead of f_iosize, since the latter is set to f_iosize when the vnode is allocated, but does not change for a given vnode when f_iosize changes. Reported by: pjd Reviewed by: kib MFC after: 2 weeks	2012-01-27 02:46:12 +00:00
rmacklem	429284bf62	Revert r230516, since it doesn't really fix the problem.	2012-01-26 00:07:34 +00:00
kib	6f4618881e	Fix remaining calls to cache_enter() in both NFS clients to provide appropriate timestamps. Restore the assertions which verify that NCF_TS is set when timestamp is asked for. Reviewed by: jhb (previous version) MFC after: 2 weeks	2012-01-25 20:48:20 +00:00
jhb	768a468c15	Add a timeout on positive name cache entries in the NFS client. That is, we will only trust a positive name cache entry for a specified amount of time before falling back to a LOOKUP RPC, even if the ctime for the file handle matches the cached copy in the name cache entry. The timeout is configured via a new 'nametimeo' mount option and defaults to 60 seconds. It may be set to zero to disable positive name caching entirely. Reviewed by: rmacklem MFC after: 1 week	2012-01-25 20:05:58 +00:00
rmacklem	911de30560	If a mount -u is done to either NFS client that switches it from TCP to UDP and the rsize/wsize/readdirsize is greater than NFS_MAXDGRAMDATA, it is possible for a thread doing an I/O RPC to get stuck repeatedly doing retries. This happens because the RPC will use a resize/wsize/readdirsize that won't work for UDP and, as such, it will keep failing indefinitely. This patch returns an error for this case, to avoid the problem. A discussion on freebsd-fs@ seemed to indicate that returning an error was preferable to silently ignoring the "udp"/"mntudp" option. This problem was discovered while investigating a problem reported by pjd@ via email. MFC after: 2 weeks	2012-01-25 00:22:53 +00:00
jhb	f75e35e4d7	Close a race in NFS lookup processing that could result in stale name cache entries on one client when a directory was renamed on another client. The root cause for the stale entry being trusted is that each per-vnode nfsnode structure has a single 'n_ctime' timestamp used to validate positive name cache entries. However, if there are multiple entries for a single vnode, they all share a single timestamp. To fix this, extend the name cache to allow filesystems to optionally store a timestamp value in each name cache entry. The NFS clients now fetch the timestamp associated with each name cache entry and use that to validate cache hits instead of the timestamps previously stored in the nfsnode. Another part of the fix is that the NFS clients now use timestamps from the post-op attributes of RPCs when adding name cache entries rather than pulling the timestamps out of the file's attribute cache. The latter is subject to races with other lookups updating the attribute cache concurrently. Some more details: - Add a variant of nfsm_postop_attr() to the old NFS client that can return a vattr structure with a copy of the post-op attributes. - Handle lookups of "." as a special case in the NFS clients since the name cache does not store name cache entries for ".", so we cannot get a useful timestamp. It didn't really make much sense to recheck the attributes on the the directory to validate the namecache hit for "." anyway. - ABI compat shims for the name cache routines are present in this commit so that it is safe to MFC. MFC after: 2 weeks	2012-01-20 20:02:01 +00:00
rmacklem	ab80b8350a	Martin Cracauer reported a problem to freebsd-current@ under the subject "Data corruption over NFS in -current". During investigation of this, I came across an ugly bogusity in the new NFS client where it replaced the cr_uid with the one used for the mount. This was done so that "system operations" like the NFSv4 Renew would be performed as the user that did the mount. However, if any other thread shares the credential with the one doing this operation, it could do an RPC (or just about anything else) as the wrong cr_uid. This patch fixes the above, by using the mount credentials instead of the one provided as an argument for this case. It appears to have fixed Martin's problem. This patch is needed for NFSv4 mounts and NFSv3 mounts against some non-FreeBSD servers that do not put post operation attributes in the NFSv3 Statfs RPC reply. Tested by: Martin Cracauer (cracauer at cons.org) Reviewed by: jhb MFC after: 2 weeks	2012-01-20 00:58:51 +00:00
rea	558e00899f	Subject: NULLFS: properly destroy node hash Use hashdestroy() instead of naive free(). Approved by: kib MFC after: 2 weeks	2012-01-18 11:23:46 +00:00
kevlo	1a0c72b3fe	Return EOPNOTSUPP since we only support update mounts for NFS export. Spotted by: trociny	2012-01-17 01:25:53 +00:00
mckusick	af2e331939	Make sure all intermediate variables holding mount flags (mnt_flag) and that all internal kernel calls passing mount flags are declared as uint64_t so that flags in the top 32-bits are not lost. MFC after: 2 weeks	2012-01-17 01:08:01 +00:00
kevlo	ba45f0caad	Add nfs export support to tmpfs(5) Reviewed by: kib	2012-01-16 10:25:22 +00:00
alc	213db2103e	When tmpfs_write() resets an extended file to its original size after an error, we want tmpfs_reg_resize() to ignore I/O errors and unconditionally update the file's size. Reviewed by: kib MFC after: 3 weeks	2012-01-16 00:26:49 +00:00
trociny	d4e71152bd	Abrogate nchr argument in proc_getargv() and proc_getenvv(): we always want to read strings completely to know the actual size. As a side effect it fixes the issue with kern.proc.args and kern.proc.env sysctls, which didn't return the size of available data when calling sysctl(3) with the NULL argument for oldp. Note, in get_ps_strings(), which does actual work for proc_getargv() and proc_getenvv(), we still have a safety limit on the size of data read in case of a corrupted procces stack. Suggested by: kib MFC after: 3 days	2012-01-15 18:47:24 +00:00
uqs	d61d88a310	Convert files to UTF-8	2012-01-15 13:23:18 +00:00
alc	413b89a7e9	Neither tmpfs_nocacheread() nor tmpfs_mappedwrite() needs to call vm_object_pip_{add,subtract}() on the swap object because the swap object can't be destroyed while the vnode is exclusively locked. Moreover, even if the swap object could have been destroyed during tmpfs_nocacheread() and tmpfs_mappedwrite() this code is broken because vm_object_pip_subtract() does not wake up the sleeping thread that is trying to destroy the swap object. Free invalid pages after an I/O error. There is no virtue in keeping them around in the swap object creating more work for the page daemon. (I believe that any non-busy page in the swap object will now always be valid.) vm_pager_get_pages() does not return a standard errno, so its return value should not be returned by tmpfs without translation to an errno value. There is no reason for the wakeup on vpg in tmpfs_mappedwrite() to occur with the swap object locked. Eliminate printf()s from tmpfs_nocacheread() and tmpfs_mappedwrite(). (The swap pager already spam your console if data corruption is imminent.) Reviewed by: kib MFC after: 3 weeks	2012-01-14 23:04:27 +00:00
rmacklem	77b7fa0515	Tai Horgan reported via email that there were two places in the new NFSv4 server where the code follows the wrong list. Fortunately, for these fairly rare cases, the lc_stateid[] lists are normally empty. This patch fixes the code to follow the correct list. Reported by: tai.horgan at isilon.com Discussed with: zack MFC after: 2 weeks	2012-01-14 04:04:58 +00:00
rmacklem	1a4786f9eb	jwd@ reported via email that the "CacheSize" field reported by "nfsstat -e -s" would go negative after using the "-z" option to zero out the stats. This patch fixes that by not zeroing out the srvcache_size field for "-z", since it is the size of the cache and not a counter. MFC after: 2 weeks	2012-01-11 02:46:42 +00:00
alc	d27a56b062	Correct an error of omission in the implementation of the truncation operation on POSIX shared memory objects and tmpfs. Previously, neither of these modules correctly handled the case in which the new size of the object or file was not a multiple of the page size. Specifically, they did not handle partial page truncation of data stored on swap. As a result, stale data might later be returned to an application. Interestingly, a data inconsistency was less likely to occur under tmpfs than POSIX shared memory objects. The reason being that a different mistake by the tmpfs truncation operation helped avoid a data inconsistency. If the data was still resident in memory in a PG_CACHED page, then the tmpfs truncation operation would reactivate that page, zero the truncated portion, and leave the page pinned in memory. More precisely, the benevolent error was that the truncation operation didn't add the reactivated page to any of the paging queues, effectively pinning the page. This page would remain pinned until the file was destroyed or the page was read or written. With this change, the page is now added to the inactive queue. Discussed with: jhb Reviewed by: kib (an earlier version) MFC after: 3 weeks	2012-01-08 20:09:26 +00:00
rmacklem	c8986499b7	opt_inet6.h was missing from some files in the new NFS subsystem. The effect of this was, for clients mounted via inet6 addresses, that the DRC cache would never have a hit in the server. It also broke NFSv4 callbacks when an inet6 address was the only one available in the client. This patch fixes the above, plus deletes opt_inet6.h from a couple of files it is not needed for. MFC after: 2 weeks	2012-01-08 01:54:46 +00:00
jh	6417e1b242	r222004 changed sbuf_finish() to not clear the buffer error status. As a consequence sbuf_len() will return -1 for buffers which had the error status set prior to sbuf_finish() call. This causes a problem in pfs_read() which purposely uses a fixed size sbuf to discard bytes which are not needed to fulfill the read request. Work around the problem by using the full buffer length when sbuf_finish() indicates an overflow. An overflowed sbuf with fixed size is always full. PR: kern/163076 Approved by: des MFC after: 2 weeks	2012-01-06 10:12:59 +00:00
jh	3ff26fa74b	Check the return value of sbuf_finish() in pfs_readlink() and return ENAMETOOLONG if the buffer overflowed. Approved by: des MFC after: 2 weeks	2012-01-06 09:17:34 +00:00
dim	7958662e66	In sys/fs/nullfs/null_subr.c, in a KASSERT, output the correct vnode pointer 'lowervp' instead of 'vp', which is uninitialized at that point. Reviewed by: kib MFC after: 1 week	2012-01-05 17:06:04 +00:00
kib	e4b773d887	Do the vput() for the lowervp in the null_nodeget() for error case too. Several callers of null_nodeget() did the cleanup itself, but several missed it, most prominent being null_bypass(). Remove the cleanup from the callers, now null_nodeget() handles lowervp free itself. Reported and tested by: pho MFC after: 1 week	2012-01-03 21:09:07 +00:00
kib	0c5a3f31ef	Document the state of the lowervp vnode for null_nodeget(). Tested by: pho MFC after: 1 week	2012-01-03 21:03:20 +00:00
pfg	02096cd0ef	Minor cleanups to ntfs code bzero -> memset rename variables to avoid shadowing. PR: 142401 Obtained from: NetBSD Approved by jhb (mentor)	2012-01-03 19:09:01 +00:00
alc	9976fae042	Don't pass VM_ALLOC_ZERO to vm_page_grab() in tmpfs_mappedwrite() and tmpfs_nocacheread(). It is both unnecessary and a pessimization. It results in either the page being zeroed twice or zeroed first and then overwritten by an I/O operation. MFC after: 3 weeks	2012-01-03 03:29:01 +00:00
ed	ab210c8f2f	Use strchr() and strrchr(). It seems strchr() and strrchr() are used more often than index() and rindex(). Therefore, simply migrate all kernel code to use it. For the XFS code, remove an empty line to make the code identical to the code in the Linux kernel.	2012-01-02 12:12:10 +00:00
ed	2dd2060131	Migrate ufs and ext2fs from skpc() to memcchr(). While there, remove a useless check from the code. memcchr() always returns characters unequal to 0xff in this case, so inosused[i] ^ 0xff can never be equal to zero. Also, the fact that memcchr() returns a pointer instead of the number of bytes until the end, makes conversion to an offset far more easy.	2012-01-01 20:47:33 +00:00
kevlo	cda3cff67f	Discard local array based on return values. Pointed out by: uqs Found with: Coverity Prevent(tm) CID: 10089	2011-12-24 15:49:52 +00:00
rmacklem	ff91c62e28	During investigation of an NFSv4 client crash reported by glebius@, jhb@ spotted that nfscl_getstateid() might modify credentials when called from nfsrpc_read() for the case where p != NULL, whereas nfsrpc_read() only did a crdup() to get new credentials for p == NULL. This bug was introduced by r195510, since pre-r195510 nfscl_getstateid() only modified credentials for the p == NULL case. This patch modifies nfsrpc_read()/nfsrpc_write() so that they do crdup() for the p != NULL case. It is conceivable that this bug caused the crash reported by glebius@, but that will not be determined for some time, since the crash occurred after about 1month of operation. Tested by: glebius Reviewed by: jhb MFC after: 2 weeks	2011-12-23 02:04:35 +00:00
kevlo	e043036a25	Discarding local array based on return values	2011-12-22 06:31:29 +00:00
rmacklem	9a222cbb4c	jwd@ reported a problem via email where the old NFS client would get a reply of EEXIST from an NFS server when a Mkdir RPC was retried, for an NFS over UDP mount. Upon investigation, it was found that the client was retransmitting the Mkdir RPC request over UDP, but with a different xid. As such, the retransmitted message would miss the Duplicate Request Cache in the server, causing it to reply EEXIST. The kernel client side UDP rpc code has two timers. The first one causes a retransmit using the same xid and socket and was set to a fixed value of 3seconds. (The default can be overridden via CLSET_RETRY_TIMEOUT.) The second one creates a new socket and xid and should be larger than the first. However, both NFS clients were setting the second timer to nm_timeo ("timeout=<value>" mount argument), which defaulted to 1second, so the first timer would never time out. This patch fixes both NFS clients so that they set the first timer using nm_timeo and makes the second timer larger than the first one. Reported by: jwd Tested by: jwd Reviewed by: jhb MFC after: 2 weeks	2011-12-21 02:45:51 +00:00
pfg	3ac7a716c7	Style cleanups by jh@. Fix a comment from the previous commit. Use M_ZERO instead of bzero() in ext2_vfsops.c Add include guards from PR. PR: 162564 Approved by: jhb (mentor) MFC after: 2 weeks	2011-12-16 15:47:43 +00:00
rmacklem	8b1b8ed3cb	Patch the new NFS server in a manner analagous to r228520 for the old NFS server, so that it correctly handles a count == 0 argument for Commit. PR: kern/118126 MFC after: 2 weeks	2011-12-16 00:58:41 +00:00
pfg	37ef732e37	Bring in reallocblk to ext2fs. The feature has been standard for a while in UFS as a means to reduce fragmentation, therefore maintaining consistent performance with filesystem aging. This is also very similar to what ext4 calls "delayed allocation". In his 2010 GSoC, Zheng Liu ported and benchmarked the missing FANCY_REALLOC code to find more consistent performance improvements than with the preallocation approach. PR: 159233 Author: Zheng Liu <gnehzuil AT SPAMFREE gmail DOT com> Sponsored by: Google Inc. Approved by: jhb (mentor) MFC after: 2 weeks	2011-12-15 20:31:18 +00:00
pfg	ffba019399	Merge ext2_readwrite.c into ext2_vnops.c as done in UFS in r101729. This removes the obfuscations mentioned in ext2_readwrite and places the clustering funtion in a location similar to other UFS-based implementations. No performance or functional changeses are expected from this move. PR: kern/159232 Suggested by: bde Approved by: jhb (mentor) MFC after: 2 weeks	2011-12-14 22:04:14 +00:00
jhb	b00203c514	Explicitly use curthread while manipulating td_fpop during last close of a devfs file descriptor in devfs_close_f(). The passed in td argument may be NULL if the close was invoked by garbage collection of open file descriptors in pending control messages in the socket buffer of a UNIX domain socket after it was closed. PR: kern/151758 Submitted by: Andrey Shidakov andrey shidakov ru Submitted by: Ruben van Staveren ruben verweg com Reviewed by: kib MFC after: 2 weeks	2011-12-09 17:49:34 +00:00
kib	5fb00deced	Initialize fifoinfo fi_wgen field on open. The only important is the difference between fi_wgen and f_seqcount, so the change is purely cosmetic, but it makes the code easier to understand. Submitted by: gianni MFC after: 2 weeks	2011-12-04 19:25:49 +00:00
rmacklem	e7537e2944	This patch adds a sysctl to the NFSv4 server which optionally disables the check for a UTF-8 compliant file name. Enabling this sysctl results in an NFSv4 server that is non-RFC3530 compliant, therefore it is not enabled by default. However, enabling this sysctl results in NFSv3 compatible behaviour and fixes the problem reported by "dan at sunsaturn.com" to freebsd-current@ on Nov. 14, 2011 under the subject "NFSV4 readlink_stat". Tested by: dan at sunsaturn.com Reviewed by: zack MFC after: 2 weeks	2011-12-04 16:33:04 +00:00
rmacklem	ec04bcd39d	Post r223774, the NFSv4 client no longer has multiple instances of the same lock_owner4 string. As such, the handling of cleanup of lock_owners could be simplified. This simplification permitted the client to do a ReleaseLockOwner operation when the process that the lock_owner4 string represents, has exited. This permits the server to release any storage related to the lock_owner4 string before the associated open is closed. Without this change, it is possible to exhaust a server's storage when a long running process opens a file and then many child processes do locking on the file, because the open doesn't get closed. A similar patch was applied to the Linux NFSv4 client recently so that it wouldn't exhaust a server's storage. Reviewed by: zack MFC after: 2 weeks	2011-12-03 02:27:26 +00:00
jhb	6dededc9d9	Enhance the sequential access heuristic used to perform readahead in the NFS server and reuse it for writes as well to allow writes to the backing store to be clustered. - Use a prime number for the size of the heuristic table (1017 is not prime). - Move the logic to locate a heuristic entry from the table and compute the sequential count out of VOP_READ() and into a separate routine. - Use the logic from sequential_heuristic() in vfs_vnops.c to update the seqcount when a sequential access is performed rather than just increasing seqcount by 1. This lets the clustering count ramp up faster. - Allow for some reordering of RPCs and if it is detected leave the current seqcount as-is rather than dropping back to a seqcount of 1. Also, when out of order access is encountered, cut seqcount in half rather than dropping it all the way back to 1 to further aid with reordering. - Fix the new NFS server to properly update the next offset after a successful VOP_READ() so that the readahead actually works. Some of these changes came from an earlier patch by Bjorn Gronwall that was forwarded to me by bde@. Discussed with: bde, rmacklem, fs@ Submitted by: Bjorn Gronwall (1, 4) MFC after: 2 weeks	2011-12-01 18:46:28 +00:00
kib	d326d5565d	Rename vm_page_set_valid() to vm_page_set_valid_range(). The vm_page_set_valid() is the most reasonable name for the m->valid accessor. Reviewed by: attilio, alc	2011-11-30 17:39:00 +00:00
kevlo	d4bd483bec	Add unicode support to ntfs Obtained from: imura	2011-11-27 15:43:49 +00:00

1 2 3 4 5 ...

2764 Commits