freebsd-skq

Author	SHA1	Message	Date
bdrewery	6fcf6199a4	Rename global cnt to vm_cnt to avoid shadowing. To reduce the diff struct pcu.cnt field was not renamed, so PCPU_OP(cnt.field) is still used. pc_cnt and pcpu are also used in kvm(3) and vmstat(8). The goal was to not affect externally used KPI. Bump __FreeBSD_version_ in case some out-of-tree module/code relies on the the global cnt variable. Exp-run revealed no ports using it directly. No objection from: arch@ Sponsored by: EMC / Isilon Storage Division	2014-03-22 10:26:09 +00:00
pfg	b7c5f322bc	Revert r263449; ext2fs: minor update to the dirpref policy. The change in UFS r254996, reverted the change as the older code seems to work better. This was not visible in local testing but we can trust UFS is vastly more exercised in diferent environments.	2014-03-21 04:33:38 +00:00
pfg	81396b10ba	ext2fs: minor update to the dirpref policy. Bring in a minor change to the dirpref policy based on r248623. This is pretty minimal change to keep the implementation in sync with UFS but other parts from the original change are not directly applicable so don't expect improvements in fsck times. MFC after: 2 weeks	2014-03-20 21:19:13 +00:00
pfg	0b3f2d5e48	msdosfs: minor format fix - spaces vs tab MFC after: 3 days	2014-03-20 20:14:04 +00:00
rwatson	33fdc14c0c	Update kernel inclusions of capability.h to use capsicum.h instead; some further refinement is required as some device drivers intended to be portable over FreeBSD versions rely on __FreeBSD_version to decide whether to include capability.h. MFC after: 3 weeks	2014-03-16 10:55:57 +00:00
bdrewery	996c2ffe42	Add missing FALLTHROUGH comment in tmpfs_dir_getdents for looking up '.' and '..'. Reviewed by: Russell Cattelan Sponsored by: EMC / Isilon Storage Division MFC after: 2 weeks	2014-03-14 13:58:02 +00:00
bdrewery	104b00d969	Rename cnt to maxcookies and change its use as the condition for when to lookup cookies to be less obscure. No functional change. Since r245115, cnt has not really been needed in tmpfs_dir_getdents(). Keep it for the MPASS() for now though. Sponsored by: EMC / Isilon Storage Division MFC after: 2 weeks	2014-03-14 13:55:48 +00:00
bdrewery	3287a8bdc3	Cleanup redundant logic and add some comments to help explain how it works in lieu of potentially less clear code. Sponsored by: EMC / Isilon Storage Division Discussed with: Russell Cattelan	2014-03-14 02:10:30 +00:00
bdrewery	f6fb7805fc	Fix -o size less than PAGE_SIZE resulting in SIZE_MAX being used. Discussed with: kib MFC after: 2 weeks	2014-03-14 01:43:55 +00:00
pfg	fd04a1ff12	ext2fs: Fix a bug when sorting htree entries. This a typo introduced when bringing the original code from NetBSD. Reported by: Mike Ma MFC after: 3 days	2014-03-06 21:02:16 +00:00
pfg	5b13eca664	ext2fs: small formatting fixes. Remove some redundant spaces. No functional change. MFC after: 3 days	2014-03-01 21:22:20 +00:00
pfg	0df94118ae	ext2fs: use of tab vs spaces. Consistently use a single tab after a #define as mentioned in style(9). Use tabs instead of space for indenting. Fix a typo: "hash_vesion". No functional change. MFC after: 3 days	2014-02-28 21:25:32 +00:00
pfg	9e48499196	ext2fs: fully enable ext4 read-only support. The ext4 developers tend to tag Ext4-specific flags as "incompatible" even when such features are not relevant for read-only support. This is a consequence of the process though which this filesystem is implemented without design and the fact that some new features are not extensible to ext2/3. Organize the features according to what we support and sort them so that we can now read-only mount filesystems with some features that may be found in newly formatted ext4 fs. Submitted by: Zheng Liu Reviewed by: pfg MFC after: 5 days	2014-02-22 22:07:16 +00:00
dim	57082f0537	In sys/fs/nandfs/nandfs_vfsops.c, #if 0 an unused static function. MFC after: 3 days	2014-02-15 11:42:56 +00:00
pfg	034bc4e978	ext2fs: Use i_flag instead of i_flags for Ext4 inode flags. The ext4 inode flags do not have equivalents for chflags (1) and hold information that is private to the implementation. The i_flag field in the inode is a better place to hold the Ext4 inode flags as it saves us from masking flags while setting or getting attributes. It should also make things cleaner if we implement write support for Ext4. Suggested by: bde Tested by: Mike Ma MFC after: 3 days	2014-01-28 14:39:05 +00:00
pfg	145b8688cb	ext2fs: Re-enable reallocblk. The major corruption issues affecting this code have been fixed a while ago. MFC after: 1 week	2014-01-24 20:26:00 +00:00
pfg	737dfd6dbe	ext2fs: fix a bug in dirindex and re-enable. The IN_* flags should be set in i_flag instead of corrupting i_flags [1]. Re-enable HTree dirindex as the last series of bug fixes seems to have fixed the issues. Reported by: bde [1] Tested by: kevlo MFC after: 1 week	2014-01-24 13:51:38 +00:00
pfg	cc5aa413b4	ext2fs: fix logic error in the previous change. Use the bitwise negation instead of bogus boolean negation and move the flag manipulation with the assignment. Fix some grammatical errors introduced in the same change. Reported by: bde MFC after: 3 days	2014-01-22 19:09:41 +00:00
pfg	1317ca3f44	ext2fs: Translate the EXT4_EXTENTS and EXT4_INDEX to the inode flags. r260545 cleared the inode flags to fix corruption problems but we still need to pass some EXT4 flags for the ext4 read-only mode. None of these attributes has an equivalent in FreeBSD and are uninteresting for the system utilities so they should be innaccessible in ext2_getattrib(). Note: we also use EXT4_HUGE_FILE but we use it directly from the dinode structure so it is not necessary to translate it, Suggested by: bde MFC after: 3 days	2014-01-21 19:06:29 +00:00
mav	f63cb2f402	Fix lock leak in purely hypothetical case of TCP connection without SVC_ACK method. This change should be NOP now, but it is better to be future safe. Reported by: rmacklem	2014-01-14 20:18:38 +00:00
pfg	e3a716491d	ext2fs: fix inode flag conversion. After r252890 we are naively attempting to pass through the inode flags. This is technically incorrect as the ext2 inode flags don't match the UFS/system values used in FreeBSD and a clean conversion is needed. Some filtering was left in place so the change didn't cause significant changes in FreeBSD but some of the garbage passed is likely to be the cause for warning messages in linux. Fix the issue by resetting the flags before conversion as was done previously. This also means we will not pass the EXT4_* inode flags into FreeBSD's inode. PR: kern/185448 MFC after: 3 days	2014-01-11 15:19:04 +00:00
mav	c5a69c8307	Fix off-by-one error in r260229. Coverity CID: 1148955	2014-01-07 11:43:51 +00:00
mav	d75fe782e9	Rework NFS Duplicate Request Cache cleanup logic. - Introduce additional hash to group requests by hash of sockref. This allows to process TCP acknowledgements without looping though all the cache, and as result allows to do it every time. - Indroduce additional callbacks to notify application layer about sockets disconnection. Without this last few requests processed just before socket disconnection never processed their ACKs and stuck in cache for many hours. - Implement transport-specific method for tracking reply acknowledgements. New implementation does not cross multiple stack layers to get the data and does not have race conditions that previously made some requests stuck in cache. This could be done more efficiently at sockbuf layer, but that would broke some KBIs, while I don't know other consumers for it aside NFS. - Instead of traversing all DRC twice per request, run cleaning only once per request, and except in some conditions traverse only single hash slot at a time. Together this limits NFS DRC growth only to situations of real connectivity problems. If network is working well, and so all replies are acknowledged, cache remains almost empty even after hours of heavy load. Without this change on the same test cache was growing to many thousand requests even with perfectly working local network. As another result this reduces CPU time spent on the DRC handling during SPEC NFS benchmark from about 10% to 0.5%. Sponsored by: iXsystems, Inc.	2014-01-03 15:09:59 +00:00
mav	723d6ab041	Slightly simplify expiration logic introduced in r254337. - Do not update the histogram for items we are any way deleting from cache. - Do not update the histogram if nfsrc_tcphighwater is not set. - Remove some extra math operations.	2013-12-25 16:58:42 +00:00
rmacklem	12c8434bbd	The NFSv4 server would call VOP_SETATTR() with a shared locked vnode when a Getattr for a file is done by a client other than the one that holds the file's delegation. This would only happen when delegations are enabled and the problem is fixed by this patch. MFC after: 1 week	2013-12-25 01:03:14 +00:00
rmacklem	1889cef2a5	An intermittent problem with NFSv4 exporting of ZFS snapshots was reported to the freebsd-fs mailing list. I believe the problem was caused by the Readdir operation using VFS_VGET() for a snapshot file entry instead of VOP_LOOKUP(). This would not occur for NFSv3, since it will do a VFS_VGET() of "." which fails with ENOTSUPP at the beginning of the directory, whereas NFSv4 does not check "." or "..". This patch adds a call to VFS_VGET() for the directory being read to check for ENOTSUPP. I also observed that the mount_on_fileid and fsid attributes were not correct at the snapshot's auto mountpoints when looking at packet traces for the Readdir. This patch fixes the attributes by doing a check for different v_mount structure, even if the vnode v_mountedhere is not set. Reported by: jas@cse.yorku.ca Tested by: jas@cse.yorku.ca Reviewed by: asomers MFC after: 1 week	2013-12-24 22:24:17 +00:00
rmacklem	4c7a9b47dc	The NFSv4 client was passing both the p and cred arguments to nfsv4_fillattr() as NULLs for the Getattr callback. This caused nfsv4_fillattr() to not fill in the Change attribute for the reply. I believe this was a violation of the RFC, but had little effect on server behaviour. This patch passes a non-NULL p argument to fix this. MFC after: 1 week	2013-12-24 00:48:39 +00:00
pfg	a92784dd70	ext2fs: make the hashing algorithm match the linux code. There appears to be a hash function compatibility issue. The code is currently disabled but fix it nevertheless. PR: kern/183230 MFC after: 3 days	2013-12-23 19:47:34 +00:00
rmacklem	0c86e62082	The NFSv4.1 client didn't return NFSv4.1 specific error codes for the Getattr and Recall callbacks. This patch fixes it. Since the NFSv4.1 specific error codes would only happen for abnormal circumstances, this patch has little effect, in practice. MFC after: 1 week	2013-12-23 15:16:53 +00:00
mav	c84160c1b2	Fix RPC server threads file handle affinity to work better with ZFS. Instead of taking 8 specific bytes of file handle to identify file during RPC thread affitinity handling, use trivial hash of the full file handle. ZFS's struct zfid_short does not have padding field after the length field, as result, originally picked 8 bytes are loosing lower 16 bits of object ID, causing many false matches and unneeded requests affinity to same thread. This fix substantially improves NFS server latency and scalability in SPEC NFS benchmark by more flexible use of multiple NFS threads. Sponsored by: iXsystems, Inc.	2013-12-23 08:43:16 +00:00
kib	6ef68a4208	Do not allow O_EXEC opens for fifo, return EINVAL. Besides not making sense, open(O_EXEC) for fifo creates fifoinfo with zero readers and writers counts, which causes premature free of pipes. Reported and tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week	2013-12-17 17:28:02 +00:00
mav	794856e581	Fix long known bug with handling device aliases residing not in devfs root. Historically creation of device aliases created symbolic links using only name of target device as a link target, not considering current directory. Fix that by adding number of "../" chunks to the terget device name, required to get out of the current directory to devfs root first. MFC after: 1 month	2013-12-12 11:05:48 +00:00
rmacklem	f24ecdf104	For software builds, the NFS client does many small synchronous (with FILE_SYNC) writes because non-contiguous byte ranges in the same buffer cache block are being written. This patch adds a new mount option "noncontigwr" which allows the non-contiguous byte ranges to be combined, with the dirty byte range becoming the superset of the bytes that are dirty, if the file has not been file locked. This reduces the number of writes significantly for software builds. The only case where this change might break existing applications is where an application is writing non-overlapping byte ranges within the same buffer cache block of a file from multiple clients concurrently. Since such an application would normally do file locking on the file, avoiding the byte range merge for files that have been file locked should be sufficient for most (maybe all?) cases. Submitted by: jhb (earlier version) Reviewed by: kib MFC after: 3 weeks	2013-12-07 23:05:59 +00:00
pfg	df919efc58	ext2fs: add two new reserved inodes. According to online documentation [1], Ext4 has two new "special" inodes so add the new exclude and replica inodes. Reference: [1] https://ext4.wiki.kernel.org/index.php/Ext4_Disk_Layout Reported by: Mike Ma MFC after: 3 weeks	2013-12-04 02:27:52 +00:00
pluknet	d8bfc5ed87	- Nuke a second copy of nfscl_attrcache extern declarations from under ifdef KDTRACE_HOOKS. This fixes kernel build with options KDTRACE_HOOKS. - Fix style inconsistencies.	2013-11-26 22:41:40 +00:00
glebius	c0a9aba061	Fix build, attempt two.	2013-11-26 20:27:57 +00:00
glebius	f033525405	Fix build.	2013-11-26 10:34:34 +00:00
attilio	7ee4e910ce	- For kernel compiled only with KDTRACE_HOOKS and not any lock debugging option, unbreak the lock tracing release semantic by embedding calls to LOCKSTAT_PROFILE_RELEASE_LOCK() direclty in the inlined version of the releasing functions for mutex, rwlock and sxlock. Failing to do so skips the lockstat_probe_func invokation for unlocking. - As part of the LOCKSTAT support is inlined in mutex operation, for kernel compiled without lock debugging options, potentially every consumer must be compiled including opt_kdtrace.h. Fix this by moving KDTRACE_HOOKS into opt_global.h and remove the dependency by opt_kdtrace.h for all files, as now only KDTRACE_FRAMES is linked there and it is only used as a compile-time stub [0]. [0] immediately shows some new bug as DTRACE-derived support for debug in sfxge is broken and it was never really tested. As it was not including correctly opt_kdtrace.h before it was never enabled so it was kept broken for a while. Fix this by using a protection stub, leaving sfxge driver authors the responsibility for fixing it appropriately [1]. Sponsored by: EMC / Isilon storage division Discussed with: rstone [0] Reported by: rstone [1] Discussed with: philip	2013-11-25 07:38:45 +00:00
kib	e4deb800b0	Redo r258088 to avoid relying on signed arithmetic overflow, since compiler interprets this as an undefined behaviour. Instead, ensure that the sum of uio_offset and uio_resid is below OFF_MAX using the operation which cannot overflow. Reported and tested by: pho Discussed with: bde Approved by: des (pseudofs maintainer) Sponsored by: The FreeBSD Foundation MFC after: 1 week	2013-11-20 19:41:00 +00:00
kib	2e0b8010f9	Remove useless comparisions of assigned offset and resid with the sources from uio. Both uio_offset and offset, and uio_resid and resid have the same types for some time. Add check for buflen overflow by comparing the buflen with both offset and resid (vs. comparing with offset only, as it is currently done). Reported and tested by: pho Approved by: des (pseudofs maintainer) Sponsored by: The FreeBSD Foundation MFC after: 1 week	2013-11-13 08:55:09 +00:00
rmacklem	1ab8f0512b	Fix an NFSv4.1 client specific case where a forced dismount would hang. The hang occurred in nfsv4_setsequence() when it couldn't find an available session slot and is fixed by checking for a forced dismount in progress and just returning for this case. MFC after: 1 month	2013-11-09 21:24:56 +00:00
rmacklem	7493efdad5	During code inspection, I spotted that there was a code path where CLNT_CONTROL() would be called on "client" after it was released via CLNT_RELEASE(). It was unlikely that this code path gets executed and I have not heard of any problem report caused by this bug. This patch fixes the code so that this cannot happen. MFC after: 2 months	2013-11-03 23:17:30 +00:00
glebius	ff6e113f1b	The r48589 promised to remove implicit inclusion of if_var.h soon. Prepare to this event, adding if_var.h to files that do need it. Also, include all includes that now are included due to implicit pollution via if_var.h Sponsored by: Netflix Sponsored by: Nginx, Inc.	2013-10-26 17:58:36 +00:00
pfg	9b0e32e06b	UFS2: make di_extsize unsigned. di_extsize is the EA size and as such it should be unsigned. Adjust related types for consistency. Reviewed by: mckusick (previous version) MFC after: 3 weeks	2013-10-24 00:33:29 +00:00
kib	3dc87905fa	Similar to debug.iosize_max_clamp sysctl, introduce devfs_iosize_max_clamp sysctl, which allows/disables SSIZE_MAX-sized i/o requests on the devfs files. Sponsored by: The FreeBSD Foundation Reminded by: Dmitry Sivachenko <trtrmitya@gmail.com> MFC after: 1 week	2013-10-15 06:33:10 +00:00
kib	58fc49ff4f	Remove two instances of ARGSUSED comment, and wrap lines nearby the code that is to be changed. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2013-10-15 06:28:11 +00:00
jmg	244d5cfa88	NULL stale pointers (should be a no-op as they should no longer be used)... Reviewed by: dteske Approved by: re (kib) Sponsored by: Vicor MFC after: 3 days	2013-09-25 02:49:18 +00:00
jmg	903f65948b	fix a bug where we access a bread buffer after we have brelse'd it... The kernel normally didn't unmap/context switch away before we accessed the buffer most of the time, but under heavy I/O pressure and lots of mount/unmounting this would cause a fault on nofault panic... Reviewed by: dteske Approved by: re (kib) Sponsored by: Vicor MFC after: 3 days	2013-09-25 02:48:12 +00:00
des	21e6fd796b	Fix the length calculation for the final block of a sendfile(2) transmission which could be tricked into rounding up to the nearest page size, leaking up to a page of kernel memory. [13:11] In IPv6 and NetATM, stop SIOCSIFADDR, SIOCSIFBRDADDR, SIOCSIFDSTADDR and SIOCSIFNETMASK at the socket layer rather than pass them on to the link layer without validation or credential checks. [SA-13:12] Prevent cross-mount hardlinks between different nullfs mounts of the same underlying filesystem. [SA-13:13] Security: CVE-2013-5666 Security: FreeBSD-SA-13:11.sendfile Security: CVE-2013-5691 Security: FreeBSD-SA-13:12.ifioctl Security: CVE-2013-5710 Security: FreeBSD-SA-13:13.nullfs Approved by: re	2013-09-10 10:05:59 +00:00
pfg	807f696b94	ext2fs: temporarily disable htree directory index. Our code does not consider yet the case of hash collisions. This is a rather annoying situation where two or more files that happen to have the same hash value will not appear accessible. The situation is not difficult to work-around but given that things will just work without enabling htree we will save possible embarrassments for the next release. Reported by: Kevin Lo	2013-09-07 02:45:51 +00:00
pjd	1c7defb76e	Handle cases where capability rights are not provided. Reported by: kib	2013-09-05 11:58:12 +00:00
pjd	029a6f5d92	Change the cap_rights_t type from uint64_t to a structure that we can extend in the future in a backward compatible (API and ABI) way. The cap_rights_t represents capability rights. We used to use one bit to represent one right, but we are running out of spare bits. Currently the new structure provides place for 114 rights (so 50 more than the previous cap_rights_t), but it is possible to grow the structure to hold at least 285 rights, although we can make it even larger if 285 rights won't be enough. The structure definition looks like this: struct cap_rights { uint64_t cr_rights[CAP_RIGHTS_VERSION + 2]; }; The initial CAP_RIGHTS_VERSION is 0. The top two bits in the first element of the cr_rights[] array contain total number of elements in the array - 2. This means if those two bits are equal to 0, we have 2 array elements. The top two bits in all remaining array elements should be 0. The next five bits in all array elements contain array index. Only one bit is used and bit position in this five-bits range defines array index. This means there can be at most five array elements in the future. To define new right the CAPRIGHT() macro must be used. The macro takes two arguments - an array index and a bit to set, eg. #define CAP_PDKILL CAPRIGHT(1, 0x0000000000000800ULL) We still support aliases that combine few rights, but the rights have to belong to the same array element, eg: #define CAP_LOOKUP CAPRIGHT(0, 0x0000000000000400ULL) #define CAP_FCHMOD CAPRIGHT(0, 0x0000000000002000ULL) #define CAP_FCHMODAT (CAP_FCHMOD \| CAP_LOOKUP) There is new API to manage the new cap_rights_t structure: cap_rights_t cap_rights_init(cap_rights_t rights, ...); void cap_rights_set(cap_rights_t rights, ...); void cap_rights_clear(cap_rights_t rights, ...); bool cap_rights_is_set(const cap_rights_t rights, ...); bool cap_rights_is_valid(const cap_rights_t rights); void cap_rights_merge(cap_rights_t dst, const cap_rights_t src); void cap_rights_remove(cap_rights_t dst, const cap_rights_t src); bool cap_rights_contains(const cap_rights_t big, const cap_rights_t little); Capability rights to the cap_rights_init(), cap_rights_set(), cap_rights_clear() and cap_rights_is_set() functions are provided by separating them with commas, eg: cap_rights_t rights; cap_rights_init(&rights, CAP_READ, CAP_WRITE, CAP_FSTAT); There is no need to terminate the list of rights, as those functions are actually macros that take care of the termination, eg: #define cap_rights_set(rights, ...) \ __cap_rights_set((rights), __VA_ARGS__, 0ULL) void __cap_rights_set(cap_rights_t *rights, ...); Thanks to using one bit as an array index we can assert in those functions that there are no two rights belonging to different array elements provided together. For example this is illegal and will be detected, because CAP_LOOKUP belongs to element 0 and CAP_PDKILL to element 1: cap_rights_init(&rights, CAP_LOOKUP \| CAP_PDKILL); Providing several rights that belongs to the same array's element this way is correct, but is not advised. It should only be used for aliases definition. This commit also breaks compatibility with some existing Capsicum system calls, but I see no other way to do that. This should be fine as Capsicum is still experimental and this change is not going to 9.x. Sponsored by: The FreeBSD Foundation	2013-09-05 00:09:56 +00:00
rmacklem	99fb6f77e5	Crashes have been observed for NFSv4.1 mounts when the system is being shut down which were caused by the nfscbd_pool being destroyed before the backchannel is disabled. This patch is believed to fix the problem, by simply avoiding ever destroying the nfscbd_pool. Since the NFS client module cannot be unloaded, this should not cause a memory leak. MFC after: 2 weeks	2013-09-04 22:47:56 +00:00
rmacklem	8d06f831a7	Forced dismounts of NFS mounts can fail when thread(s) are stuck waiting for an RPC reply from the server while holding the mount point busy (mnt_lockref incremented). This happens because dounmount() msleep()s waiting for mnt_lockref to become 0, before calling VFS_UNMOUNT(). This patch adds a new VFS operation called VFS_PURGE(), which the NFS client implements as purging RPCs in progress. Making this call before checking mnt_lockref fixes the problem, by ensuring that the VOP_xxx() calls will fail and unbusy the mount point. Reported by: sbruno Reviewed by: kib MFC after: 2 weeks	2013-09-01 23:02:59 +00:00
ken	0b320ea37c	Support storing 7 additional file flags in tmpfs: UF_SYSTEM, UF_SPARSE, UF_OFFLINE, UF_REPARSE, UF_ARCHIVE, UF_READONLY, and UF_HIDDEN. Sort the file flags tmpfs supports alphabetically. tmpfs now supports the same flags as UFS, with the exception of SF_SNAPSHOT. Reported by: bdrewery, antoine Sponsored by: Spectra Logic	2013-08-28 22:12:56 +00:00
jhb	a437be7257	Remove most of the remaining sysctl name list macros. They were only ever intended for use in sysctl(8) and it has not used them for many years. Reviewed by: bde Tested by: exp-run by bdrewery	2013-08-26 18:16:05 +00:00
delphij	b93cf73204	Allow tmpfs be mounted inside jail.	2013-08-23 22:52:20 +00:00
ken	c7af094e18	Expand the use of stat(2) flags to allow storing some Windows/DOS and CIFS file attributes as BSD stat(2) flags. This work is intended to be compatible with ZFS, the Solaris CIFS server's interaction with ZFS, somewhat compatible with MacOS X, and of course compatible with Windows. The Windows attributes that are implemented were chosen based on the attributes that ZFS already supports. The summary of the flags is as follows: UF_SYSTEM: Command line name: "system" or "usystem" ZFS name: XAT_SYSTEM, ZFS_SYSTEM Windows: FILE_ATTRIBUTE_SYSTEM This flag means that the file is used by the operating system. FreeBSD does not enforce any special handling when this flag is set. UF_SPARSE: Command line name: "sparse" or "usparse" ZFS name: XAT_SPARSE, ZFS_SPARSE Windows: FILE_ATTRIBUTE_SPARSE_FILE This flag means that the file is sparse. Although ZFS may modify this in some situations, there is not generally any special handling for this flag. UF_OFFLINE: Command line name: "offline" or "uoffline" ZFS name: XAT_OFFLINE, ZFS_OFFLINE Windows: FILE_ATTRIBUTE_OFFLINE This flag means that the file has been moved to offline storage. FreeBSD does not have any special handling for this flag. UF_REPARSE: Command line name: "reparse" or "ureparse" ZFS name: XAT_REPARSE, ZFS_REPARSE Windows: FILE_ATTRIBUTE_REPARSE_POINT This flag means that the file is a Windows reparse point. ZFS has special handling code for reparse points, but we don't currently have the other supporting infrastructure for them. UF_HIDDEN: Command line name: "hidden" or "uhidden" ZFS name: XAT_HIDDEN, ZFS_HIDDEN Windows: FILE_ATTRIBUTE_HIDDEN This flag means that the file may be excluded from a directory listing if the application honors it. FreeBSD has no special handling for this flag. The name and bit definition for UF_HIDDEN are identical to the definition in MacOS X. UF_READONLY: Command line name: "urdonly", "rdonly", "readonly" ZFS name: XAT_READONLY, ZFS_READONLY Windows: FILE_ATTRIBUTE_READONLY This flag means that the file may not written or appended, but its attributes may be changed. ZFS currently enforces this flag, but Illumos developers have discussed disabling enforcement. The behavior of this flag is different than MacOS X. MacOS X uses UF_IMMUTABLE to represent the DOS readonly permission, but that flag has a stronger meaning than the semantics of DOS readonly permissions. UF_ARCHIVE: Command line name: "uarch", "uarchive" ZFS_NAME: XAT_ARCHIVE, ZFS_ARCHIVE Windows name: FILE_ATTRIBUTE_ARCHIVE The UF_ARCHIVED flag means that the file has changed and needs to be archived. The meaning is same as the Windows FILE_ATTRIBUTE_ARCHIVE attribute, and the ZFS XAT_ARCHIVE and ZFS_ARCHIVE attribute. msdosfs and ZFS have special handling for this flag. i.e. they will set it when the file changes. sys/param.h: Bump __FreeBSD_version to 1000047 for the addition of new stat(2) flags. chflags.1: Document the new command line flag names (e.g. "system", "hidden") available to the user. ls.1: Reference chflags(1) for a list of file flags and their meanings. strtofflags.c: Implement the mapping between the new command line flag names and new stat(2) flags. chflags.2: Document all of the new stat(2) flags, and explain the intended behavior in a little more detail. Explain how they map to Windows file attributes. Different filesystems behave differently with respect to flags, so warn the application developer to take care when using them. zfs_vnops.c: Add support for getting and setting the UF_ARCHIVE, UF_READONLY, UF_SYSTEM, UF_HIDDEN, UF_REPARSE, UF_OFFLINE, and UF_SPARSE flags. All of these flags are implemented using attributes that ZFS already supports, so the on-disk format has not changed. ZFS currently doesn't allow setting the UF_REPARSE flag, and we don't really have the other infrastructure to support reparse points. msdosfs_denode.c, msdosfs_vnops.c: Add support for getting and setting UF_HIDDEN, UF_SYSTEM and UF_READONLY in MSDOSFS. It supported SF_ARCHIVED, but this has been changed to be UF_ARCHIVE, which has the same semantics as the DOS archive attribute instead of inverse semantics like SF_ARCHIVED. After discussion with Bruce Evans, change several things in the msdosfs behavior: Use UF_READONLY to indicate whether a file is writeable instead of file permissions, but don't actually enforce it. Refuse to change attributes on the root directory, because it is special in FAT filesystems, but allow most other attribute changes on directories. Don't set the archive attribute on a directory when its modification time is updated. Windows and DOS don't set the archive attribute in that scenario, so we are now bug-for-bug compatible. smbfs_node.c, smbfs_vnops.c: Add support for UF_HIDDEN, UF_SYSTEM, UF_READONLY and UF_ARCHIVE in SMBFS. This is similar to changes that Apple has made in their version of SMBFS (as of smb-583.8, posted on opensource.apple.com), but not quite the same. We map SMB_FA_READONLY to UF_READONLY, because UF_READONLY is intended to match the semantics of the DOS readonly flag. The MacOS X code maps both UF_IMMUTABLE and SF_IMMUTABLE to SMB_FA_READONLY, but the immutable flags have stronger meaning than the DOS readonly bit. stat.h: Add definitions for UF_SYSTEM, UF_SPARSE, UF_OFFLINE, UF_REPARSE, UF_ARCHIVE, UF_READONLY and UF_HIDDEN. The definition of UF_HIDDEN is the same as the MacOS X definition. Add commented-out definitions of UF_COMPRESSED and UF_TRACKED. They are defined in MacOS X (as of 10.8.2), but we do not implement them (yet). ufs_vnops.c: Add support for getting and setting UF_ARCHIVE, UF_HIDDEN, UF_OFFLINE, UF_READONLY, UF_REPARSE, UF_SPARSE, and UF_SYSTEM in UFS. Alphabetize the flags that are supported. These new flags are only stored, UFS does not take any action if the flag is set. Sponsored by: Spectra Logic Reviewed by: bde (earlier version)	2013-08-21 23:04:48 +00:00
kib	6a459eb27c	Make the seek a method of the struct fileops. Tested by: pho Sponsored by: The FreeBSD Foundation	2013-08-21 17:36:01 +00:00
kib	a3e8f2c6dc	Extract the general-purpose code from tmpfs to perform uiomove from the page queue of some vm object. Discussed with: alc Tested by: pho Sponsored by: The FreeBSD Foundation	2013-08-21 17:23:24 +00:00
kib	408a640438	Restore the previous sendfile(2) behaviour on the block devices. Provide valid .fo_sendfile method for several missed struct fileops. Reviewed by: glebius Sponsored by: The FreeBSD Foundation	2013-08-16 14:22:20 +00:00
rmacklem	802b1728d8	Fix several performance related issues in the new NFS server's DRC for NFS over TCP. - Increase the size of the hash tables. - Create a separate mutex for each hash list of the TCP hash table. - Single thread the code that deletes stale cache entries. - Add a tunable called vfs.nfsd.tcphighwater, which can be increased to allow the cache to grow larger, avoiding the overhead of frequent scans to delete stale cache entries. (The default value will result in frequent scans to delete stale cache entries, analagous to what the pre-patched code does.) - Add a tunable called vfs.nfsd.cachetcp that can be used to disable DRC caching for NFS over TCP, since the old NFS server didn't DRC cache TCP. It also adjusts the size of nfsrc_floodlevel dynamically, so that it is always greater than vfs.nfsd.tcphighwater. For UDP the algorithm remains the same as the pre-patched code, but the tunable vfs.nfsd.udphighwater can be used to allow the cache to grow larger and reduce the overhead caused by frequent scans for stale entries. UDP also uses a larger hash table size than the pre-patched code. Reported by: wollman Tested by: wollman (earlier version of patch) Submitted by: ivoras (earlier patch) Reviewed by: jhb (earlier version of patch) MFC after: 1 month	2013-08-14 21:11:26 +00:00
pfg	8255307210	ext2fs: update format specifiers for ext4 type. Previous bandaid was not appropriate and didn't really work for all platforms. While here, cleanup the surrounding code to match ffs_checkoverlap() Reported by: dim, jmallet and bde MFC after: 3 weeks	2013-08-14 14:22:46 +00:00
pfg	14b4517fa3	ext2fs: update format specifiers for ext4 type. Reported by: Sam Fourman Jr. MFC after: 3 weeks	2013-08-13 18:39:36 +00:00
pfg	1d7e5a040e	Define ext2fs local types and use them. Add definitions for e2fs_daddr_t, e4fs_daddr_t in addition to the already existing e2fs_lbn_t and adjust them for ext4. Other than making the code more readable these changes should fix problems related to big filesystems. Setting the proper types can be tricky so the process was helped by looking at UFS. In our implementation, logical block numbers can be negative and the code depends on it. In ext2, block numbers are unsigned so it is convenient to keep e2fs_daddr_t unsigned and use the complete 32 bits. In the case of e4fs_daddr_t, while the value should be unsigned, for ext4 we only need to support 48 bits so preserving an extra bit from the sign is not an issue. While here also drop the ext2_setblock() prototype that was never used. Discussed with: mckusick, bde MFC after: 3 weeks	2013-08-13 15:40:43 +00:00
pfg	0b111bdfcb	Add read-only support for extents in ext2fs. Basic support for extents was implemented by Zheng Liu as part of his Google Summer of Code in 2010. This support is read-only at this time. In addition to extents we also support the huge_file extension for read-only purposes. This works nicely with the additional support for birthtime/nanosec timestamps and dir_index that have been added lately. The implementation may not work for all ext4 filesystems as it doesn't support some features that are being enabled by default on recent linux like flex_bg. Nevertheless, the feature should be very useful for migration or simple access in filesystems that have been converted from ext2/3 or don't use incompatible features. Special thanks to Zheng Liu for his dedication and continued work to support ext2 in FreeBSD. Submitted by: Zheng Liu (lz@) Reviewed by: Mike Ma, Christoph Mallon (previous version) Sponsored by: Google Inc. MFC after: 3 weeks	2013-08-12 21:34:48 +00:00
attilio	16c7563cf4	The soft and hard busy mechanism rely on the vm object lock to work. Unify the 2 concept into a real, minimal, sxlock where the shared acquisition represent the soft busy and the exclusive acquisition represent the hard busy. The old VPO_WANTED mechanism becames the hard-path for this new lock and it becomes per-page rather than per-object. The vm_object lock becames an interlock for this functionality: it can be held in both read or write mode. However, if the vm_object lock is held in read mode while acquiring or releasing the busy state, the thread owner cannot make any assumption on the busy state unless it is also busying it. Also: - Add a new flag to directly shared busy pages while vm_page_alloc and vm_page_grab are being executed. This will be very helpful once these functions happen under a read object lock. - Move the swapping sleep into its own per-object flag The KPI is heavilly changed this is why the version is bumped. It is very likely that some VM ports users will need to change their own code. Sponsored by: EMC / Isilon storage division Discussed with: alc Reviewed by: jeff, kib Tested by: gavin, bapt (older version) Tested by: pho, scottl	2013-08-09 11:11:11 +00:00
pfg	f78a72ad65	Small typo. MFC after: 3 days	2013-08-08 22:07:59 +00:00
kib	b8fe5eca7f	The tmpfs_alloc_vp() is used to instantiate vnode for the tmpfs node, in particular, from the tmpfs_lookup VOP method. If LK_NOWAIT is not specified in the lkflags, the lookup is supposed to return an alive vnode whenever the underlying node is valid. Currently, the tmpfs_alloc_vp() returns ENOENT if the vnode attached to node exists and is being reclaimed. This causes spurious ENOENT errors from lookup on tmpfs and corresponding random 'No such file' failures from syscalls working with tmpfs files. Fix this by waiting for the doomed vnode to be detached from the tmpfs node if sleepable allocation is requested. Note that filesystems which use vfs_hash.c, correctly handle the case due to vfs_hash_get() looping when vget() returns ENOENT for sleepable requests. Reported and tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2013-08-05 18:53:59 +00:00
attilio	899ab64514	Revert r253939: We cannot busy a page before doing pagefaults. Infact, it can deadlock against vnode lock, as it tries to vget(). Other functions, right now, have an opposite lock ordering, like vm_object_sync(), which acquires the vnode lock first and then sleeps on the busy mechanism. Before this patch is reinserted we need to break this ordering. Sponsored by: EMC / Isilon storage division Reported by: kib	2013-08-05 08:55:35 +00:00
attilio	19b2ea9f81	The page hold mechanism is fast but it has couple of fallouts: - It does not let pages respect the LRU policy - It bloats the active/inactive queues of few pages Try to avoid it as much as possible with the long-term target to completely remove it. Use the soft-busy mechanism to protect page content accesses during short-term operations (like uiomove_fromphys()). After this change only vm_fault_quick_hold_pages() is still using the hold mechanism for page content access. There is an additional complexity there as the quick path cannot immediately access the page object to busy the page and the slow path cannot however busy more than one page a time (to avoid deadlocks). Fixing such primitive can bring to complete removal of the page hold mechanism. Sponsored by: EMC / Isilon storage division Discussed with: alc Reviewed by: jeff Tested by: pho	2013-08-04 21:07:24 +00:00
attilio	e825889721	Remove unnecessary soft busy of the page before to do vn_rdwr() in kern_sendfile() which is unnecessary. The page is already wired so it will not be subjected to pagefault. The content cannot be effectively protected as it is full of races already. Multiple accesses to the same indexes are serialized through vn_rdwr(). Sponsored by: EMC / Isilon storage division Reviewed by: alc, jeff Tested by: pho	2013-08-04 15:56:19 +00:00
pfg	f35ac11d77	Add license for the half MD4 algorithm used in ext2_half_md4(). The htree implementation uses code derived from the RSA Data Security, Inc. MD4 Message-Digest Algorithm. Add a proper licensing statement for the code and clarify the corresponding comments. Approved by: core (hrs)	2013-08-01 16:04:48 +00:00
marius	335ec6e6b3	- Add const-qualifiers to the arguments of isonum_(). - According to ISO 9660 7.1.2, isonum_712() should return a signed value. - Try to get isonum_() closer to style(9).	2013-07-28 12:29:10 +00:00
avg	4b4c561bbf	make path matching in devfs rules consistent and sane (and safer) Before this change path matching had the following features: - for device nodes the patterns were matched against full path - in the above case '/' in a path could be matched by a wildcard - for directories and links only the last component was matched So, for example, a pattern like 're*' could match the following entries: - re0 device - responder/u0 device - zvol/recpool directory Although it was possible to work around this behavior (once it was spotted and understood), it was very confusing and contrary to documentation. Now we always match a full path for all types of devfs entries (devices, directories, links) and a '/' has to be matched explicitly. This behavior follows the shell globbing rules. This change is originally developed by Jaakko Heinonen. Many thanks! PR: kern/122838 Submitted by: jh MFC after: 4 weeks	2013-07-26 14:25:58 +00:00
pfg	d1aa5826fa	ext2fs: Return EINVAL for negative uio_offset as in UFS. While here drop old comment that doesn't really apply. MFC after: 1 month Discussed with: gleb	2013-07-25 19:37:49 +00:00
pfg	b4c061ba11	ext2fs: Drop a check that wan't supposed to be in r253651. MFC after: 1 month	2013-07-25 16:04:55 +00:00
pfg	3cce3ff568	ext2fs: Don't assume that on-disk format of a directory is the same as in <sys/dirent.h> ext2_readdir() has always been very fs specific and different with respect to its ufs_ counterpart. Recent changes from UFS have made it possible to share more closely the implementation. MFUFS r252438: Always start parsing at DIRBLKSIZ aligned offset, skip first entries if uio_offset is not DIRBLKSIZ aligned. Return EINVAL if buffer is too small for single entry. Preallocate buffer for cookies. Skip entries with zero inode number. Reviewed by: gleb, Zheng Liu MFC after: 1 month	2013-07-25 15:34:20 +00:00
pfg	c92369e15f	fuse: revert kernel_header update. It seems to be causing problems due to the lack of the new features. Found by: bapt Pointed hat: pfg	2013-07-24 20:21:29 +00:00
nwhitehorn	697ec4c8f3	tmpfs works perfectly fine with -o union -- there is no reason to exclude it from the list of options.	2013-07-23 14:48:37 +00:00
rmacklem	51bcea9c2c	The NFSv4 server incorrectly assumed that the high order words of the attribute bitmap argument would be non-zero. This caused an interoperability problem for a recent patch to the Linux NFSv4 client. The Linux folks have changed their patch to avoid this, but this patch fixes the problem on the server. Reported and tested by: Andre Heider (a.heider@gmail.com) MFC after: 3 days	2013-07-20 22:35:32 +00:00
pfg	7d8d22f309	fuse: revert birthtime support. The creation time support breaks the data structures used in linux fuse. libfuse carries it's own header. Revert the changes for now. We will try to get an agreement with the fuse upstream maintainers to avoid having to patch the library headers all the time.	2013-07-20 14:50:35 +00:00
pfg	ee86b8d348	Adjust outsizes: Recalculate FUSE_COMPAT_ENTRY_OUT_SIZE and COMPAT_ATTR_OUT_SIZE. These were wrong in the previous commit. They are actually unused in FreeBSD though. Pointed out by: Jan Beich	2013-07-20 03:55:56 +00:00
pfg	2a02ab82bb	Adjust outsizes: When birthtime was added (r253331) we missed adding the weight of the new fields in FUSE_COMPAT_ENTRY_OUT_SIZE and COMPAT_ATTR_OUT_SIZE. Adjust them accordingly. Pointed out by: Jan Beich	2013-07-20 03:08:50 +00:00
pfg	2bfbfc539c	Update fuse_kernel header. Bring in the changes from the FUSE kernel interface 7.10 (available under a BSD license). After 7.10 the linux FUSE developers added support for a controversial CUSE driver and some linux especific features that are unlikely to find its way into FreeBSD. We currently don't implement any of the new features so we are not bumping the FUSE_KERNEL_MINOR_VERSION. The header should, nevertheless, serve as a template to add the new features in a compatible manner. While here adopt some minor cleanups from the upstream version like removing FUSE_MAJOR and FUSE_MINOR which were never used. Also add multiple inclusion header guards,	2013-07-15 00:05:27 +00:00
pfg	e26f629724	Add creation timestamp (birthtime) support for fuse. I was keeping this #ifdef'd for reference with the MacFUSE change[1] but on second thought, this is a FreeBSD-only header so the SVN history should be enough. Add missing padding while here. Reference [1]: http://code.google.com/p/macfuse/source/detail?spec=svn1686&r=1360	2013-07-13 22:06:41 +00:00
pfg	a51cca1f05	Add creation timestamp (birthtime) support for fuse. This is based on similar support in MacFUSE.	2013-07-12 17:22:59 +00:00
pfg	be085216cb	Implement 1003.1-2001 pathconf() keys. This is based on r106058 in UFS. MFC after: 1 month	2013-07-10 22:03:01 +00:00
pfg	5a396d8c5b	Reinstate the assertion from r253045. UFS r232732 reverted the change as the real problem was to be fixed at the syscall level. Reported by: bde	2013-07-09 14:23:00 +00:00
pfg	6ca7bf7cd4	Enhancement when writing an entire block of a file. Merge from UFS r231313: This change first attempts the uiomove() to the newly allocated (and dirty) buffer and only zeros it if the uiomove() fails. The effect is to eliminate the gratuitous zeroing of the buffer in the usual case where the uiomove() successfully fills it. MFC after: 3 days	2013-07-09 01:31:04 +00:00
rmacklem	4050bd5a8c	Add support for host-based (Kerberos 5 service principal) initiator credentials to the kernel rpc. Modify the NFSv4 client to add support for the gssname and allgssname mount options to use this capability. Requires the gssd daemon to be running with the "-h" option. Reviewed by: jhb	2013-07-09 01:05:28 +00:00
pfg	c91c27e5e0	Avoid a panic and return EINVAL instead. Merge from UFS r232692: syscall() fuzzing can trigger this panic. MFC after: 3 days	2013-07-08 20:21:36 +00:00
pfg	6e9d91d26b	Implement SEEK_HOLE/SEEK_DATA for ext2fs. Merged from r236044 on UFS. MFC after: 3 days	2013-07-07 15:51:28 +00:00
pfg	512e5343cf	Fix some typos. MFC after: 1 week	2013-07-07 01:32:52 +00:00
pfg	73bb131f84	Initial implementation of the HTree directory index. This is a port of NetBSD's GSoC 2012 Ext3 HTree directory indexing by Vyacheslav Matyushin. It was cleaned up and enhanced for FreeBSD by Zheng Liu (lz@). This is an excellent example of work shared among different projects: Vyacheslav was able to look at an early prototype from Zheng Liu who was also able to check the code from Haiku (with permission). As in linux, the feature is not available by default and must be enabled explicitly with tune2fs. We still do not support the workarounds required in readdir for NFS. Submitted by: Zheng Liu Tested by: Mike Ma Sponsored by: Google Inc. MFC after: 1 week	2013-07-06 18:28:06 +00:00
kib	e9befff373	The tvp vnode on rename is usually unlinked. Drop the cached null vnode for tvp to allow the free of the lower vnode, if needed. PR: kern/180236 Tested by: smh Sponsored by: The FreeBSD Foundation MFC after: 1 week	2013-07-04 19:01:18 +00:00
davide	6321e04894	- Fix double frees/user after free. - Allocate using smb_rq_alloc() instead of inlining it. Reported by: uqs Found with: Coverity Scan	2013-07-03 10:31:45 +00:00
rmacklem	8b2d48c78b	A problem with the old NFS client where large writes to large files would sometimes result in a corrupted file was reported via email. This problem appears to have been caused by r251719 (reverting r251719 fixed the problem). Although I have not been able to reproduce this problem, I suspect it is caused by another thread increasing np->n_size after the mtx_unlock(&np->n_mtx) but before the vnode_pager_setsize() call. Since the np->n_mtx mutex serializes updates to np->n_size, doing the vnode_pager_setsize() with the mutex locked appears to avoid the problem. Unfortunately, vnode_pager_setsize() where the new size is smaller, cannot be called with a mutex held. This patch returns the semantics to be close to pre-r251719 (actually pre-r248567, r248581, r248567 for the new client) such that the call to vnode_pager_setsize() is only delayed until after the mutex is unlocked when np->n_size is shrinking. Since the file is growing when being written, I believe this will fix the corruption. A better solution might be to replace the mutex with a sleep lock, but that is a non-trivial conversion, so this fix is hoped to be sufficient in the meantime. Reported by: David G. Lawrence (dg@dglawrence.com) Tested by: David G. Lawrence (to be done soon) Reviewed by: kib MFC after: 1 week	2013-07-03 00:19:03 +00:00
pfg	b08e5aff18	ext2fs: Use the complete random() range in i_gen. i_gen is unsigned in ext2fs so we can handle the complete 32 bits. MFC after: 1 week	2013-06-30 00:42:51 +00:00
pfg	7e708941c4	Bring some updates from ufs_lookup to ext2fs. r156418: Don't set IN_CHANGE and IN_UPDATE on inodes for potentially suspended file systems. This could cause deadlocks when creating snapshots. (We can't do snapshots on ext2fs but it is useful to keep things in sync). r183079: - Only set i_offset in the parent directory's i-node during a lookup for non-LOOKUP operations. - Relax a VOP assertion for a DELETE lookup. r187528: Move the code from ufs_lookup.c used to do dotdot lookup, into the helper function. It is supposed to be useful for any filesystem that has to unlock dvp to walk to the ".." entry in lookup routine. MFC after: 5 days	2013-06-29 01:35:28 +00:00
davide	a0c5d96b0a	Properly use v_data field. This magically worked (even if wrong) until now because v_data is the first field of the structure, but it's not something we should rely on.	2013-06-28 20:32:48 +00:00
davide	a47b6265f2	Garbage collect an useless check. smp should be never NULL.	2013-06-28 20:14:30 +00:00
davide	cf63c799c6	Plug a couple of leakages in smbfs_lookup().	2013-06-28 20:07:24 +00:00
pfg	ef282086af	Minor sorting. MFC after: 3 days	2013-06-26 19:43:22 +00:00
pfg	456a58318f	Define and use e2fs_lbn_t in ext2fs. In line to what is done in UFS, define an internal type e2fs_lbn_t for the logical block numbers. This change is basically a no-op as the new type is unchanged (int32_t) but it may be useful as bumping this may be required for ext4fs. Also, as pointed out by Bruce Evans: -Use daddr_t for daddr in ext2_bmaparray(). This seems to improve reliability with the reallocblks option. - Add a cast to the fsbtodb() macro as in UFS. Reviewed by: bde MFC after: 3 days	2013-06-23 02:44:42 +00:00
rmacklem	198db6adb9	Fix r252074 so that it builds on 64bit arches.	2013-06-22 21:58:21 +00:00
rmacklem	121288ceb2	The NFSv4.1 LayoutCommit operation requires a valid offset and length. (0, 0 is not sufficient) This patch a loop for each file layout, using the offset, length of each file layout in a separate LayoutCommit.	2013-06-21 22:46:16 +00:00
rmacklem	b3e84941b0	When the NFSv4.1 client is writing to a pNFS Data Server (DS), the file's size attribute does not get updated. As such, it is necessary to invalidate the attribute cache before clearing NMODIFIED for pNFS. MFC after: 2 weeks	2013-06-21 22:26:18 +00:00
rmacklem	2bc76c9d96	Since some NFSv4 servers enforce the requirement for a reserved port#, enable use of the (no)resvport mount option for NFSv4. I had thought that the RFC required that non-reserved port #s be allowed, but I couldn't find it in the RFC. MFC after: 2 weeks	2013-06-21 19:41:30 +00:00
pfg	ef31b55ff7	Rename some prefixes in the Block Group Descriptor fields to ext4bgd_ Change prefix to avoid confusion and denote that these fields are generally only available starting with ext4. MFC after: 3 days	2013-06-20 00:00:33 +00:00
pfg	9ac60dd284	More ext2fs header cleanups: - Set MAXMNTLEN nearer to where it is used. - Move EXT2_LINK_MAX to ext2_dir.h . MFC after: 3 days	2013-06-18 15:49:30 +00:00
pfg	fb29984476	Rename remaining DIAGNOSTIC to INVARIANTS. MFC after: 3 days	2013-06-17 00:39:23 +00:00
pfg	5cf909a854	Re-sort ext2fs headers to make things easier to find. In the ext2fs driver we have a mixture of headers: - The ext2_ prefixed headers have strong influence from NetBSD and are carry specific ext2/3/4 information. - The unprefixed headers are inspired on UFS and carry implementation specific information. Do some small adjustments so that the information is easier to find coming from either UFS or the NetBSD implementation. MFC after: 3 days	2013-06-16 16:10:45 +00:00
pfg	4d854ec921	Relax some unnecessary unsigned type changes in ext2fs. While the changes in r245820 are in line with the ext2 spec, the code derived from UFS can use negative values so it is better to relax some types to keep them as they were, and somewhat more similar to UFS. While here clean some casts. Some of the original types are still wrong and will require more work. Discussed with: bde MFC after: 3 days	2013-06-13 03:23:24 +00:00
pfg	e9fce1a99b	Turn DIAGNOSTICs to INVARIANTS in ext2fs. This is done to be consistent with what other filesystems and particularly ffs already does (see r173464). MFC after: 5 days	2013-06-12 15:24:48 +00:00
pfg	adb723b809	s/file system/filesystem/g Based on r96755 from UFS. MFC after: 3 days	2013-06-11 02:47:07 +00:00
pfg	a8d161091f	e2fs_bpg and e2fs_isize are always unsigned. The superblock in ext2fs defines all the fields as unsigned but for some reason the in-memory superblock was carrying e2fs_bpg and e2fs_isize as signed. We should preserve the specified types for consistency. MFC after: 5 days	2013-06-09 01:38:51 +00:00
alc	47492525b1	Add missing VM object unlocks in an error case. Reviewed by: kib	2013-06-07 19:42:00 +00:00
alc	e60ab9c72d	Don't busy the page unless we are likely to release the object lock. Reviewed by: kib Sponsored by: EMC / Isilon Storage Division	2013-06-06 06:17:20 +00:00
alc	b4fae70474	Relax the vm object locking. Use a read lock. Sponsored by: EMC / Isilon Storage Division	2013-06-05 17:00:10 +00:00
alc	e4b61d89d1	Eliminate unnecessary vm object locking from tmpfs_nocacheread().	2013-06-04 15:40:45 +00:00
pfg	f11bbd3dc5	ext2fs: space vs tab. Obtained from: Christoph Mallon MFC after: 3 days	2013-06-03 20:33:05 +00:00
pfg	1687488775	ext2fs: Small cosmetic fixes. Make a long macro readable and sort a header. Obtained from: Christoph Mallon MFC after: 3 days	2013-06-03 20:02:45 +00:00
pfg	94ecf49873	ext2fs: Update Block Group Descriptor struct. Uncover some, previously reserved, fields that are used by Ext4. These are currently unused but it is good to have them for future reference. Reviewed by: bde MFC after: 3 days	2013-06-03 18:52:14 +00:00
jeff	d7efebc4db	- Convert the bufobj lock to rwlock. - Use a shared bufobj lock in getblk() and inmem(). - Convert softdep's lk to rwlock to match the bufobj lock. - Move INFREECNT to b_flags and protect it with the buf lock. - Remove unnecessary locking around bremfree() and BKGRDINPROG. Sponsored by: EMC / Isilon Storage Division Discussed with: mckusick, kib, mdf	2013-05-31 00:43:41 +00:00
kib	b94a58f986	Assert that OBJ_TMPFS flag on the vm object for the tmpfs node is cleared when the tmpfs node is going away. Tested by: bdrewery, pho	2013-05-30 19:51:33 +00:00
rmacklem	ff1318447e	Post-r248567, there were times when the client would return a truncated directory for some NFS servers. This turned out to be because the size of a directory reported by an NFS server can be smaller that the ufs-like directory created from the RPC XDR in the client. This patch fixes the problem by changing r248567 so that vnode_pager_setsize() is only done for regular files. Reported and tested by: hartmut.brandt@dlr.de Reviewed by: kib MFC after: 1 week	2013-05-28 22:36:01 +00:00
kib	e68d3a11b7	Do not leak the NULLV_NOUNLOCK flag from the nullfs_unlink_lowervp(), for the case when the nullfs vnode is not reclaimed. Otherwise, later reclamation would not unlock the lower vnode. Reported by: antoine Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week	2013-05-21 11:31:56 +00:00
des	0a904132b4	Fix typo in comment. Submitted by: Alex Weber <alexwebr@gmail.com> MFC after: 1 week	2013-05-15 08:38:49 +00:00
rmacklem	4bf4286804	Add support for the eofflag to nfs_readdir() in the new NFS client so that it works under a unionfs mount. Submitted by: Jared Yanovich (slovichon@gmail.com) Reviewed by: kib MFC after: 2 weeks	2013-05-12 21:48:08 +00:00
eadler	6907881cb8	Fix several typos PR: kern/176054 Submitted by: Christoph Mallon <christoph.mallon@gmx.de> MFC after: 3 days	2013-05-12 16:43:26 +00:00
jilles	88b36000c5	fdescfs: Supply a real value for d_type in readdir. All the fdescfs nodes (except . and ..) appear as character devices to stat(), so DT_CHR is correct.	2013-05-12 15:44:49 +00:00
kib	dfd7a7f46d	- Fix nullfs vnode reference leak in nullfs_reclaim_lowervp(). The null_hashget() obtains the reference on the nullfs vnode, which must be dropped. - Fix a wart which existed from the introduction of the nullfs caching, do not unlock lower vnode in the nullfs_reclaim_lowervp(). It should be innocent, but now it is also formally safe. Inform the nullfs_reclaim() about this using the NULLV_NOUNLOCK flag set on nullfs inode. - Add a callback to the upper filesystems for the lower vnode unlinking. When inactivating a nullfs vnode, check if the lower vnode was unlinked, indicated by nullfs flag NULLV_DROP or VV_NOSYNC on the lower vnode, and reclaim upper vnode if so. This allows nullfs to purge cached vnodes for the unlinked lower vnode, avoiding excessive caching. Reported by: G??ran L??wkrantz <goran.lowkrantz@ismobile.com> Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2013-05-11 11:17:44 +00:00
kib	37807bfa8a	Avoid deactivating the page if it is already on a queue, only requeue the page. This both reduces the number of queues locking and avoids moving the active page to inactive list just because the page was read or written. Based on the suggestion by: alc Reviewed by: alc Tested by: pho	2013-05-06 21:04:42 +00:00
davide	d0199b6a2f	Change VM_OBJECT_LOCK/UNLOCK() -> VM_OBJECT_WLOCK/WUNLOCK() to reflect the recent switch of the vm object lock to a rwlock. Reported by: attilio	2013-05-04 14:27:28 +00:00
davide	8fda0a3d2e	Overhaul locking in netsmb, getting rid of the obsolete lockmgr() primitive. This solves a long standing LOR between smb_conn and smb_vc. Tested by: martymac, pho (previous version)	2013-05-04 14:18:10 +00:00
davide	49171951e3	Completely rewrite the interface to smbdev switching from dev_clone to cdevpriv(9). This commit changes the semantic of mount_smbfs in userland as well, which now passes file descriptor in order to to mount a specific filesystem istance. Reviewed by: attilio, ed Tested by: martymac	2013-05-04 14:03:18 +00:00
kib	3bbdcb77fd	The fsync(2) call should sync the vnode in such way that even after system crash which happen after successfull fsync() return, the data is accessible. For msdosfs, this means that FAT entries for the file must be written. Since we do not track the FAT blocks containing entries for the current file, just do a sloppy sync of the devvp vnode for the mount, which buffers, among other things, contain FAT blocks. Simultaneously, for deupdat(): - optimize by clearing the modified flags before short-circuiting a return, if the mount is read-only; - only ignore the rest of the function for denode with DE_MODIFIED flag clear when the waitfor argument is false. The directory buffer for the entry might be of delayed write; - microoptimize by comparing the updated directory entry with the current block content; - try to cluster the write, fall back to bawrite() if low on resources. Based on the submission by: bde MFC after: 2 weeks	2013-05-02 20:00:11 +00:00
kib	a0955c41ee	Fix the v_object leak for non-regular tmpfs vnodes. Reported and tested by: pho Sponsored by: The FreeBSD Foundation	2013-05-02 18:46:31 +00:00
kib	5830ee0d3f	For the new regular tmpfs vnode, v_object is initialized before insmntque() is called. The standard insmntque destructor resets the vop vector to deadfs one, and calls vgone() on the vnode. As result, v_object is kept unchanged, which triggers an assertion in the reclaim code, on instmntque() failure. Also, in this case, OBJ_TMPFS flag on the backed vm object is not cleared. Provide the tmpfs insmntque() destructor which properly clears OBJ_TMPFS flag and resets v_object. Reported and tested by: pho Sponsored by: The FreeBSD Foundation	2013-05-02 18:44:31 +00:00
kib	3a177062d5	The page read or written could be wired. Do not requeue if the page is not on a queue. Reported and tested by: pho Sponsored by: The FreeBSD Foundation	2013-05-02 18:36:52 +00:00
des	6cfe294ded	Fix a bug that allows NFS clients to issue READDIR on files. PR: kern/178016 Security: CVE-2013-3266 Security: FreeBSD-SA-13:05.nfsserver	2013-04-29 20:09:44 +00:00
kib	2f2c1edec8	Rework the handling of the tmpfs node backing swap object and tmpfs vnode v_object to avoid double-buffering. Use the same object both as the backing store for tmpfs node and as the v_object. Besides reducing memory use up to 2x times for situation of mapping files from tmpfs, it also makes tmpfs read and write operations copy twice bytes less. VM subsystem was already slightly adapted to tolerate OBJT_SWAP object as v_object. Now the vm_object_deallocate() is modified to not reinstantiate OBJ_ONEMAPPING flag and help the VFS to correctly handle VV_TEXT flag on the last dereference of the tmpfs backing object. Reviewed by: alc Tested by: pho, bf MFC after: 1 month	2013-04-28 19:38:59 +00:00
rmacklem	1110825468	When an NFS unmount occurs, once vflush() writes the last dirty buffer for the last vnode on the mount back to the server, it returns. At that point, the code continues with the unmount, including freeing up the nfs specific part of the mount structure. It is possible that an nfsiod thread will try to check for an empty I/O queue in the nfs specific part of the mount structure after it has been free'd by the unmount. This patch avoids this problem by setting the iodmount entries for the mount back to NULL while holding the mutex in the unmount and checking the appropriate entry is non-NULL after acquiring the mutex in the nfsiod thread. Reported and tested by: pho Reviewed by: kib MFC after: 2 weeks	2013-04-18 23:20:16 +00:00
rmacklem	ef3a1169e5	Both NFS clients can deadlock when using the "rdirplus" mount option. This can occur when an nfsiod thread that already holds a buffer lock attempts to acquire a vnode lock on an entry in the directory (a LOR) when another thread holding the vnode lock is waiting on an nfsiod thread. This patch avoids the deadlock by disabling readahead for this case, so the nfsiod threads never do readdirplus. Since readaheads for directories need the directory offset cookie from the previous read, they cannot normally happen in parallel. As such, testing by jhb@ and myself didn't find any performance degredation when this patch is applied. If there is a case where this results in a significant performance degradation, mounting without the "rdirplus" option can be done to re-enable readahead for directories. Reported and tested by: jhb Reviewed by: jhb MFC after: 2 weeks	2013-04-18 13:09:04 +00:00
ken	1ab8c6ab7f	Move the NFS FHA (File Handle Affinity) code from sys/nfsserver to sys/nfs, since it is now shared by the two NFS servers. Suggested by: rmacklem Sponsored by: Spectra Logic MFC after: 2 weeks	2013-04-17 22:42:43 +00:00
ken	fc3fc9c036	Revamp the old NFS server's File Handle Affinity (FHA) code so that it will work with either the old or new server. The FHA code keeps a cache of currently active file handles for NFSv2 and v3 requests, so that read and write requests for the same file are directed to the same group of threads (reads) or thread (writes). It does not currently work for NFSv4 requests. They are more complex, and will take more work to support. This improves read-ahead performance, especially with ZFS, if the FHA tuning parameters are configured appropriately. Without the FHA code, concurrent reads that are part of a sequential read from a file will be directed to separate NFS threads. This has the effect of confusing the ZFS zfetch (prefetch) code and makes sequential reads significantly slower with clients like Linux that do a lot of prefetching. The FHA code has also been updated to direct write requests to nearby file offsets to the same thread in the same way it batches reads, and the FHA code will now also send writes to multiple threads when needed. This improves sequential write performance in ZFS, because writes to a file are now more ordered. Since NFS writes (generally less than 64K) are smaller than the typical ZFS record size (usually 128K), out of order NFS writes to the same block can trigger a read in ZFS. Sending them down the same thread increases the odds of their being in order. In order for multiple write threads per file in the FHA code to be useful, writes in the NFS server have been changed to use a LK_SHARED vnode lock, and upgrade that to LK_EXCLUSIVE if the filesystem doesn't allow multiple writers to a file at once. ZFS is currently the only filesystem that allows multiple writers to a file, because it has internal file range locking. This change does not affect the NFSv4 code. This improves random write performance to a single file in ZFS, since we can now have multiple writers inside ZFS at one time. I have changed the default tuning parameters to a 22 bit (4MB) window size (from 256K) and unlimited commands per thread as a result of my benchmarking with ZFS. The FHA code has been updated to allow configuring the tuning parameters from loader tunable variables in addition to sysctl variables. The read offset window calculation has been slightly modified as well. Instead of having separate bins, each file handle has a rolling window of bin_shift size. This minimizes glitches in throughput when shifting from one bin to another. sys/conf/files: Add nfs_fha_new.c and nfs_fha_old.c. Compile nfs_fha.c when either the old or the new NFS server is built. sys/fs/nfs/nfsport.h, sys/fs/nfs/nfs_commonport.c: Bring in changes from Rick Macklem to newnfs_realign that allow it to operate in blocking (M_WAITOK) or non-blocking (M_NOWAIT) mode. sys/fs/nfs/nfs_commonsubs.c, sys/fs/nfs/nfs_var.h: Bring in a change from Rick Macklem to allow telling nfsm_dissect() whether or not to wait for mallocs. sys/fs/nfs/nfsm_subs.h: Bring in changes from Rick Macklem to create a new nfsm_dissect_nonblock() inline function and NFSM_DISSECT_NONBLOCK() macro. sys/fs/nfs/nfs_commonkrpc.c, sys/fs/nfsclient/nfs_clkrpc.c: Add the malloc wait flag to a newnfs_realign() call. sys/fs/nfsserver/nfs_nfsdkrpc.c: Setup the new NFS server's RPC thread pool so that it will call the FHA code. Add the malloc flag argument to newnfs_realign(). Unstaticize newnfs_nfsv3_procid[] so that we can use it in the FHA code. sys/fs/nfsserver/nfs_nfsdsocket.c: In nfsrvd_dorpc(), add NFSPROC_WRITE to the list of RPC types that use the LK_SHARED lock type. sys/fs/nfsserver/nfs_nfsdport.c: In nfsd_fhtovp(), if we're starting a write, check to see whether the underlying filesystem supports shared writes. If not, upgrade the lock type from LK_SHARED to LK_EXCLUSIVE. sys/nfsserver/nfs_fha.c: Remove all code that is specific to the NFS server implementation. Anything that is server-specific is now accessed through a callback supplied by that server's FHA shim in the new softc. There are now separate sysctls and tunables for the FHA implementations for the old and new NFS servers. The new NFS server has its tunables under vfs.nfsd.fha, the old NFS server's tunables are under vfs.nfsrv.fha as before. In fha_extract_info(), use callouts for all server-specific code. Getting file handles and offsets is now done in the individual server's shim module. In fha_hash_entry_choose_thread(), change the way we decide whether two reads are in proximity to each other. Previously, the calculation was a simple shift operation to see whether the offsets were in the same power of 2 bucket. The issue was that there would be a bucket (and therefore thread) transition, even if the reads were in close proximity. When there is a thread transition, reads wind up going somewhat out of order, and ZFS gets confused. The new calculation simply tries to see whether the offsets are within 1 << bin_shift of each other. If they are, the reads will be sent to the same thread. The effect of this change is that for sequential reads, if the client doesn't exceed the max_reqs_per_nfsd parameter and the bin_shift is set to a reasonable value (22, or 4MB works well in my tests), the reads in any sequential stream will largely be confined to a single thread. Change fha_assign() so that it takes a softc argument. It is now called from the individual server's shim code, which will pass in the softc. Change fhe_stats_sysctl() so that it takes a softc parameter. It is now called from the individual server's shim code. Add the current offset to the list of things printed out about each active thread. Change the num_reads and num_writes counters in the fha_hash_entry structure to 32-bit values, and rename them num_rw and num_exclusive, respectively, to reflect their changed usage. Add an enable sysctl and tunable that allows the user to disable the FHA code (when vfs.XXX.fha.enable = 0). This is useful for before/after performance comparisons. nfs_fha.h: Move most structure definitions out of nfs_fha.c and into the header file, so that the individual server shims can see them. Change the default bin_shift to 22 (4MB) instead of 18 (256K). Allow unlimited commands per thread. sys/nfsserver/nfs_fha_old.c, sys/nfsserver/nfs_fha_old.h, sys/fs/nfsserver/nfs_fha_new.c, sys/fs/nfsserver/nfs_fha_new.h: Add shims for the old and new NFS servers to interface with the FHA code, and callbacks for the The shims contain all of the code and definitions that are specific to the NFS servers. They setup the server-specific callbacks and set the server name for the sysctl and loader tunable variables. sys/nfsserver/nfs_srvkrpc.c: Configure the RPC code to call fhaold_assign() instead of fha_assign(). sys/modules/nfsd/Makefile: Add nfs_fha.c and nfs_fha_new.c. sys/modules/nfsserver/Makefile: Add nfs_fha_old.c. Reviewed by: rmacklem Sponsored by: Spectra Logic MFC after: 2 weeks	2013-04-17 21:00:22 +00:00
gabor	5635c6550b	- Correct spelling in comments Submitted by: Christoph Mallon <christoph.mallon@gmx.de> (via private mail)	2013-04-17 11:56:11 +00:00
gabor	b86fa940aa	- Correct mispellings of the word necessary Submitted by: Christoph Mallon <christoph.mallon@gmx.de> (via private mail)	2013-04-17 11:42:40 +00:00
jeff	fa887dba7b	Prepare to replace the buf splay with a trie: - Don't insert BKGRDMARKER bufs into the splay or dirty/clean buf lists. No consumers need to find them there and it complicates the tree. These flags are all FFS specific and could be moved out of the buf cache. - Use pbgetvp() and pbrelvp() to associate the background and journal bufs with the vp. Not only is this much cheaper it makes more sense for these transient bufs. - Fix the assertions in pbget* and pbrel*. It's not safe to check list pointers which were never initialized. Use the BX flags instead. We also check B_PAGING in reassignbuf() so this should cover all cases. Discussed with: kib, mckusick, attilio Sponsored by: EMC / Isilon Storage Division	2013-04-06 22:21:23 +00:00
kib	7ada0d9324	Strip the unnneeded spaces, mostly at the end of lines. MFC after: 3 days	2013-04-01 09:56:48 +00:00
pjd	91184d303f	- Constify local path variable for chflagsat(). - Use correct format characters (%lx) for u_long. This fixes the build broken in r248599.	2013-03-22 07:40:34 +00:00
pjd	2a3cf7f364	- Make 'flags' argument to chflags(2), fchflags(2) and lchflags(2) of type u_long. Before this change it was of type int for syscalls, but prototypes in sys/stat.h and documentation for chflags(2) and fchflags(2) (but not for lchflags(2)) stated that it was u_long. Now some related functions use u_long type for flags (strtofflags(3), fflagstostr(3)). - Make path argument of type 'const char *' for consistency. Discussed on: arch Sponsored by: The FreeBSD Foundation	2013-03-21 22:44:33 +00:00
kib	7225171d66	Initialize the variable to avoid (false) compiler warning about use of an uninitialized local. Reported by: Ivan Klymenko <fidaj@ukr.net> MFC after: 2 weeks	2013-03-21 12:59:24 +00:00
kib	6620c04e30	Do not call vnode_pager_setsize() while a NFS node mutex is locked. vnode_pager_setsize() might sleep waiting for the page after EOF be unbusied. Call vnode_pager_setsize() both for the regular and directory vnodes. Reported by: mich Reviewed by: rmacklem Discussed with: avg, jhb MFC after: 2 weeks	2013-03-21 07:25:08 +00:00
emaste	2ccefecf01	Fix remainder calculation when biosize is not a power of 2 In common configurations biosize is a power of two, but is not required to be so. Thanks to markj@ for spotting an additional case beyond my original patch. Reviewed by: rmacklem@	2013-03-19 13:06:11 +00:00
kib	1b20f7cc18	Remove negative name cache entry pointing to the target name, which could be instantiated while tdvp was unlocked. Reported by: Rick Miller <vmiller at hostileadmin com> Tested by: pho MFC after: 1 week	2013-03-17 15:11:37 +00:00
kib	9b0c4b125b	Add currently unused flag argument to the cluster_read(), cluster_write() and cluster_wbuild() functions. The flags to be allowed are a subset of the GB_* flags for getblk(). Sponsored by: The FreeBSD Foundation Tested by: pho	2013-03-14 20:28:26 +00:00
jhb	b2e811621c	Revert 195703 and 195821 as this special stop handling in NFS is now implemented via VFCF_SBDRY rather than passing PBDRY to individual sleep calls.	2013-03-13 21:06:03 +00:00
glebius	ce7d7c6757	Finish r243882: mechanically substitute flags from historic mbuf allocator with malloc(9) flags within sys. Sponsored by: Nginx, Inc.	2013-03-12 08:59:51 +00:00
davide	ce7dfce71d	smbfs_lookup() in the DOTDOT case operates on dvp->n_parent without proper locking. This doesn't prevent in any case reclaim of the vnode. Avoid this not going over-the-wire in this case and relying on subsequent smbfs_getattr() call to restore consistency. While I'm here, change a couple of SMBVDEBUG() in MPASS(). sbmfs_smb_lookup() doesn't and shouldn't know about '.' and '..' Reported by: pho's stress2 suite	2013-03-09 13:25:45 +00:00
davide	f30e75d436	- Initialize variable in smbfs_rename() to silent compiler warning - Fix smbfs_mkdir() return value (in case of error). Reported by: pho	2013-03-09 13:05:21 +00:00
attilio	63326e81a3	Garbage collect NWFS and NCP bits which are now completely disconnected from the tree since few months. This patch is not targeted for MFC.	2013-03-09 12:45:36 +00:00
attilio	bf1dc90446	MFC	2013-03-08 00:03:07 +00:00
attilio	5d57dc997e	Garbage collect NTFS bits which are now completely disconnected from the tree since few months. This patch is not targeted for MFC.	2013-03-02 18:40:04 +00:00
attilio	59a3d435c9	Garbage collect PORTALFS bits which are now completely disconnected from the tree since few months. This patch is not targeted for MFC.	2013-03-02 16:43:28 +00:00
attilio	5d33ae7487	Garbage collect CODAFS bits which are now completely disconnected from the tree since few months. This patch is not targeted for MFC.	2013-03-02 16:30:18 +00:00
attilio	4b0353fc07	Garbage collect HPFS bits which are now already completely disconnected from the tree since few months (please note that the userland bits were already disconnected since a long time, thus there is no need to update the OLD* entries). This is not targeted for MFC.	2013-03-02 14:54:33 +00:00
attilio	e98f58faf6	MFC	2013-03-02 14:48:41 +00:00
jilles	869c43b8d9	nullfs: Improve f_flags in statfs(). Include some flags of the nullfs mount itself: MNT_RDONLY, MNT_NOEXEC, MNT_NOSUID, MNT_UNION, MNT_NOSYMFOLLOW. This allows userland code calling statfs() or fstatfs() to see these flags. In particular, this allows opendir() to detect that a -t nullfs -o union mount needs deduplication (otherwise at least . and .. are returned twice) and allows rtld to detect a -t nullfs -o noexec mount as noexec. Turn off the MNT_ROOTFS flag from the underlying filesystem because the nullfs mount is definitely not the root filesystem. Reviewed by: kib MFC after: 1 week	2013-03-02 12:42:23 +00:00
pjd	f07ebb8888	Merge Capsicum overhaul: - Capability is no longer separate descriptor type. Now every descriptor has set of its own capability rights. - The cap_new(2) system call is left, but it is no longer documented and should not be used in new code. - The new syscall cap_rights_limit(2) should be used instead of cap_new(2), which limits capability rights of the given descriptor without creating a new one. - The cap_getrights(2) syscall is renamed to cap_rights_get(2). - If CAP_IOCTL capability right is present we can further reduce allowed ioctls list with the new cap_ioctls_limit(2) syscall. List of allowed ioctls can be retrived with cap_ioctls_get(2) syscall. - If CAP_FCNTL capability right is present we can further reduce fcntls that can be used with the new cap_fcntls_limit(2) syscall and retrive them with cap_fcntls_get(2). - To support ioctl and fcntl white-listing the filedesc structure was heavly modified. - The audit subsystem, kdump and procstat tools were updated to recognize new syscalls. - Capability rights were revised and eventhough I tried hard to provide backward API and ABI compatibility there are some incompatible changes that are described in detail below: CAP_CREATE old behaviour: - Allow for openat(2)+O_CREAT. - Allow for linkat(2). - Allow for symlinkat(2). CAP_CREATE new behaviour: - Allow for openat(2)+O_CREAT. Added CAP_LINKAT: - Allow for linkat(2). ABI: Reuses CAP_RMDIR bit. - Allow to be target for renameat(2). Added CAP_SYMLINKAT: - Allow for symlinkat(2). Removed CAP_DELETE. Old behaviour: - Allow for unlinkat(2) when removing non-directory object. - Allow to be source for renameat(2). Removed CAP_RMDIR. Old behaviour: - Allow for unlinkat(2) when removing directory. Added CAP_RENAMEAT: - Required for source directory for the renameat(2) syscall. Added CAP_UNLINKAT (effectively it replaces CAP_DELETE and CAP_RMDIR): - Allow for unlinkat(2) on any object. - Required if target of renameat(2) exists and will be removed by this call. Removed CAP_MAPEXEC. CAP_MMAP old behaviour: - Allow for mmap(2) with any combination of PROT_NONE, PROT_READ and PROT_WRITE. CAP_MMAP new behaviour: - Allow for mmap(2)+PROT_NONE. Added CAP_MMAP_R: - Allow for mmap(PROT_READ). Added CAP_MMAP_W: - Allow for mmap(PROT_WRITE). Added CAP_MMAP_X: - Allow for mmap(PROT_EXEC). Added CAP_MMAP_RW: - Allow for mmap(PROT_READ \| PROT_WRITE). Added CAP_MMAP_RX: - Allow for mmap(PROT_READ \| PROT_EXEC). Added CAP_MMAP_WX: - Allow for mmap(PROT_WRITE \| PROT_EXEC). Added CAP_MMAP_RWX: - Allow for mmap(PROT_READ \| PROT_WRITE \| PROT_EXEC). Renamed CAP_MKDIR to CAP_MKDIRAT. Renamed CAP_MKFIFO to CAP_MKFIFOAT. Renamed CAP_MKNODE to CAP_MKNODEAT. CAP_READ old behaviour: - Allow pread(2). - Disallow read(2), readv(2) (if there is no CAP_SEEK). CAP_READ new behaviour: - Allow read(2), readv(2). - Disallow pread(2) (CAP_SEEK was also required). CAP_WRITE old behaviour: - Allow pwrite(2). - Disallow write(2), writev(2) (if there is no CAP_SEEK). CAP_WRITE new behaviour: - Allow write(2), writev(2). - Disallow pwrite(2) (CAP_SEEK was also required). Added convinient defines: #define CAP_PREAD (CAP_SEEK \| CAP_READ) #define CAP_PWRITE (CAP_SEEK \| CAP_WRITE) #define CAP_MMAP_R (CAP_MMAP \| CAP_SEEK \| CAP_READ) #define CAP_MMAP_W (CAP_MMAP \| CAP_SEEK \| CAP_WRITE) #define CAP_MMAP_X (CAP_MMAP \| CAP_SEEK \| 0x0000000000000008ULL) #define CAP_MMAP_RW (CAP_MMAP_R \| CAP_MMAP_W) #define CAP_MMAP_RX (CAP_MMAP_R \| CAP_MMAP_X) #define CAP_MMAP_WX (CAP_MMAP_W \| CAP_MMAP_X) #define CAP_MMAP_RWX (CAP_MMAP_R \| CAP_MMAP_W \| CAP_MMAP_X) #define CAP_RECV CAP_READ #define CAP_SEND CAP_WRITE #define CAP_SOCK_CLIENT \ (CAP_CONNECT \| CAP_GETPEERNAME \| CAP_GETSOCKNAME \| CAP_GETSOCKOPT \| \ CAP_PEELOFF \| CAP_RECV \| CAP_SEND \| CAP_SETSOCKOPT \| CAP_SHUTDOWN) #define CAP_SOCK_SERVER \ (CAP_ACCEPT \| CAP_BIND \| CAP_GETPEERNAME \| CAP_GETSOCKNAME \| \ CAP_GETSOCKOPT \| CAP_LISTEN \| CAP_PEELOFF \| CAP_RECV \| CAP_SEND \| \ CAP_SETSOCKOPT \| CAP_SHUTDOWN) Added defines for backward API compatibility: #define CAP_MAPEXEC CAP_MMAP_X #define CAP_DELETE CAP_UNLINKAT #define CAP_MKDIR CAP_MKDIRAT #define CAP_RMDIR CAP_UNLINKAT #define CAP_MKFIFO CAP_MKFIFOAT #define CAP_MKNOD CAP_MKNODAT #define CAP_SOCK_ALL (CAP_SOCK_CLIENT \| CAP_SOCK_SERVER) Sponsored by: The FreeBSD Foundation Reviewed by: Christoph Mallon <christoph.mallon@gmx.de> Many aspects discussed with: rwatson, benl, jonathan ABI compatibility discussed with: kib	2013-03-02 00:53:12 +00:00
attilio	afe5ce0c13	MFC	2013-02-26 17:33:18 +00:00
alc	8eacd44767	Eliminate a duplicate #include. Sponsored by: EMC / Isilon Storage Division	2013-02-26 07:00:24 +00:00
attilio	cb47f0509b	Merge from vmobj-rwlock branch: Remove unused inclusion of vm/vm_pager.h and vm/vnode_pager.h. Sponsored by: EMC / Isilon storage division Tested by: pho Reviewed by: alc	2013-02-26 01:00:11 +00:00
attilio	d883da7ba4	MFC	2013-02-21 21:59:35 +00:00
jhb	ca1e2e0739	Further refine the handling of stop signals in the NFS client. The changes in r246417 were incomplete as they did not add explicit calls to sigdeferstop() around all the places that previously passed SBDRY to _sleep(). In addition, nfs_getcacheblk() could trigger a write RPC from getblk() resulting in sigdeferstop() recursing. Rather than manually deferring stop signals in specific places, change the VFS_() and VOP_() methods to defer stop signals for filesystems which request this behavior via a new VFCF_SBDRY flag. Note that this has to be a VFC flag rather than a MNTK flag so that it works properly with VFS_MOUNT() when the mount is not yet fully constructed. For now, only the NFS clients are set this new flag in VFS_SET(). A few other related changes: - Add an assertion to ensure that TDF_SBDRY doesn't leak to userland. - When a lookup request uses VOP_READLINK() to follow a symlink, mark the request as being on behalf of the thread performing the lookup (cnp_thread) rather than using a NULL thread pointer. This causes NFS to properly handle signals during this VOP on an interruptible mount. PR: kern/176179 Reported by: Russell Cattelan (sigdeferstop() recursion) Reviewed by: kib MFC after: 1 month	2013-02-21 19:02:50 +00:00
attilio	8746bf6a5f	MFC	2013-02-21 15:06:19 +00:00
imp	3bd19e3992	The request queue is already locked, so we don't need the splsofclock/splx here to note future work.	2013-02-21 02:43:44 +00:00
attilio	15bf891afe	Rename VM_OBJECT_LOCK(), VM_OBJECT_UNLOCK() and VM_OBJECT_TRYLOCK() to their "write" versions. Sponsored by: EMC / Isilon storage division	2013-02-20 12:03:20 +00:00
attilio	658534ed5a	Switch vm_object lock to be a rwlock. * VM_OBJECT_LOCK and VM_OBJECT_UNLOCK are mapped to write operations * VM_OBJECT_SLEEP() is introduced as a general purpose primitve to get a sleep operation using a VM_OBJECT_LOCK() as protection * The approach must bear with vm_pager.h namespace pollution so many files require including directly rwlock.h	2013-02-20 10:38:34 +00:00
kib	e5238fcb15	Do not update the fsinfo block on each update of any fat block, this is excessive. Postpone the flush of the fsinfo to VFS_SYNC(), remembering the need for update with the flag MSDOSFS_FSIMOD, stored in pm_flags. FAT32 specification describes both FSI_Free_Count and FSI_Nxt_Free as the advisory hints, not requiring them to be correct. Based on the patch from bde, modified by me. Reviewed by: bde MFC after: 2 weeks	2013-02-17 20:35:54 +00:00
bapt	dafec87d65	Revert r246791 as it needs a security review first Reported by: gavin, rwatson	2013-02-14 15:17:53 +00:00
bapt	99980d8453	Allow fdescfs to be mounted from inside a jail MFC after: 1 week	2013-02-14 13:03:15 +00:00
pfg	1d9f9f37f8	ext2fs: Use prototype declarations for function definitions Submitted by: Christoph Mallon MFC after: 2 weeks	2013-02-10 19:49:37 +00:00
attilio	cde657f5b1	Remove a racy checks on resident and cached pages for tmpfs_mapped{read, write}() functions: - tmpfs_mapped{read, write}() are only called within VOP_{READ, WRITE}(), which check before-hand to work only on valid VREG vnodes. Also the vnode is locked for the duration of the work, making vnode reclaiming impossible, during the operation. Hence, vobj can never be NULL. - Currently check on resident pages and cached pages without vm object lock held is racy and can do even more harm than good, as a page could be transitioning between these 2 pools and then be skipped entirely. Skip the checks as lookups on empty splay trees are very cheap. Discussed with: alc Tested by: flo MFC after: 2 weeks	2013-02-10 01:04:10 +00:00
pfg	c03e3032d5	ext2fs: Replace redundant EXT2_MIN_BLOCK with EXT2_MIN_BLOCK_SIZE. Submitted by: Christoph Mallon MFC after: 2 weeks	2013-02-08 21:09:44 +00:00
pfg	9694490e0d	ext2fs: make e2fs_maxcontig local and remove tautological check. e2fs_maxcontig was modelled after UFS when bringing the "Orlov allocator" to ext2. On UFS fs_maxcontig is kept in the superblock and is used by userland tools (fsck and growfs), In ext2 this information is volatile so it is not available for userland tools, so in this case it doesn't have sense to carry it in the in-memory superblock. Also remove a pointless check for MAX(1, x) > 0. Submitted by: Christoph Mallon MFC after: 2 weeks	2013-02-08 20:58:00 +00:00
pfg	f908ac5c6e	Remove unused MAXSYMLINKLEN macro. Reviewed by: mckusick PR: kern/175794 MFC after: 1 week	2013-02-08 20:30:19 +00:00
kib	92d95b8406	Stop translating the ERESTART error from the open(2) into EINTR. Posix requires that open(2) is restartable for SA_RESTART. For non-posix objects, in particular, devfs nodes, still disable automatic restart of the opens. The open call to a driver could have significant side effects for the hardware. Noted and reviewed by: jilles Discussed with: bde MFC after: 2 weeks	2013-02-07 14:53:33 +00:00
jhb	0fee3f66b8	Rework the handling of stop signals in the NFS client. The changes in 195702, 195703, and 195821 prevented a thread from suspending while holding locks inside of NFS by forcing the thread to fail sleeps with EINTR or ERESTART but defer the thread suspension to the user boundary. However, this had the effect that stopping a process during an NFS request could abort the request and trigger EINTR errors that were visible to userland processes (previously the thread would have suspended and completed the request once it was resumed). This change instead effectively masks stop signals while in the NFS client. It uses the existing TDF_SBDRY flag to effect this since SIGSTOP cannot be masked directly. Also, instead of setting PBDRY on individual sleeps, the NFS client now sets the TDF_SBDRY flag around each NFS request and stop signals are masked for all sleeps during that region (the previous change missed sleeps in lockmgr locks). The end result is that stop signals sent to threads performing an NFS request are completely ignored until after the NFS request has finished processing and the thread prepares to return to userland. This restores the behavior of stop signals being transparent to userland processes while still preventing threads from suspending while holding NFS locks. Reviewed by: kib MFC after: 1 month	2013-02-06 17:06:51 +00:00
pfg	5e55b2c6f7	ext2fs: move assignment where it is not dead. Submitted by: Christoph Mallon MFC after: 2 weeks	2013-02-05 03:26:34 +00:00
pfg	935c860d1b	ext2fs: Remove unused em_e2fsb definition.. Submitted by: Christoph Mallon MFC after: 2 weeks	2013-02-05 03:23:56 +00:00
pfg	c6538dcc30	ext2fs: Remove useless rootino local variable. Submitted by: Christoph Mallon MFC after: 2 weeks	2013-02-05 03:17:41 +00:00
pfg	28dd7f0e2d	ext2fs: Correct off-by-one errors in FFTODT() and DDTOFT(). Submitted by: Christoph Mallon MFC after: 2 weeks	2013-02-05 03:13:05 +00:00
pfg	c181635a65	ext2fs: Use nitems(). Submitted by: Christoph Mallon MFC after: 2 weeks	2013-02-05 03:08:56 +00:00
pfg	affc90ea66	ext2fs: Use EXT2_LINK_MAX instead of LINK_MAX Submitted by: Christoph Mallon MFC after: 2 weeks	2013-02-05 03:01:04 +00:00
pfg	e94b41487b	ext2fs: general cleanup. - Remove unused extern declarations in fs.h - Correct comments in ext2_dir.h - Several panic() messages showed wrong function names. - Remove commented out stray line in ext2_alloc.c. - Remove the unused macro EXT2_BLOCK_SIZE_BITS() and the then write-only member e2fs_blocksize_bits from struct m_ext2fs. - Remove the unused macro EXT2_FIRST_INO() and the then write-only member e2fs_first_inode from struct m_ext2fs. - Remove EXT2_DESC_PER_BLOCK() and the member e2fs_descpb from struct m_ext2fs. - Remove the unused members e2fs_bmask, e2fs_dbpg and e2fs_mount_opt from struct m_ext2fs - Correct harmless off-by-one error for fspath in ext2_vfsops.c. - Remove the unused and broken macros EXT2_ADDR_PER_BLOCK_BITS() and EXT2_DESC_PER_BLOCK_BITS(). - Remove the !_KERNEL versions of the EXT2_* macros. Submitted by: Christoph Mallon MFC after: 2 weeks	2013-02-02 22:23:45 +00:00
kib	5e07d49c62	The MSDOSFSMNT_WAITONFAT flag is bogus and broken. It does less than track the MNT_SYNCHRONOUS flag. It is set to the latter at mount time but not updated by MNT_UPDATE. Use MNT_SYNCHRONOUS to decide to write the FAT updates syncrhonously. Submitted by: bde MFC after: 1 week	2013-02-01 18:30:41 +00:00
kib	35907051bb	Backup FATs were sometimes marked dirty by copying their first block from the primary FAT, and then they were not marked clean on unmount. Force marking them clean when appropriate. Submitted by: bde MFC after: 1 week	2013-02-01 18:25:53 +00:00
kib	96b12145fb	The directory entry for dotdot was corrupted in the FAT32 case when moving a directory to a subdir of the root directory from somewhere else. For all directory moves that change the parent directory, the dotdot entry must be fixed up. For msdosfs, the root directory is magic for non-FAT32. It is less magic for FAT32, but needs the same magic for the dotdot fixup. It didn't have it. Both chkdsk and fsck_msdosfs fix the corrupt directory entries with no problems. The fix is to use the same magic for dotdot in msdosfs_rename() as in msdosfs_mkdir(). For msdosfs_mkdir(), document the magic. When writing the dotdot entry in mkdir, use explicitly set pcl variable instead on relying on the start cluster of the root directory typically has a value < 65536. Submitted by: bde MFC after: 1 week	2013-02-01 18:06:06 +00:00
kib	ad92b9afc4	The mountmsdosfs() function had an insane sanity test, remove it. Trying FAT32 on a small partition failed to mount because pmp->pm_Sectors was nonzero. Normally, FAT32 file systems are so large that the 16-bit pm_Sectors can't hold the size. This is indicated by setting it to 0 and using only pm_HugeSectors. But at least old versions of newfs_msdos use the 16-bit field if possible, and msdosfs supports this except for breaking its own support in the sanity check. This is quite different from the handling of pm_FATsecs -- now the 16-bit value is always ignored for FAT32 except for checking that it is 0, and newfs_msdos doesn't use the 16-bit value for FAT32. Submitted by: bde MFC after: 1 week	2013-02-01 18:01:03 +00:00
kib	31d95b4c31	Fix a backwards comment in markvoldirty(). Submitted by: bde MFC after: 1 week	2013-02-01 17:58:37 +00:00
kib	5012e4bd24	Assert that the mbuf in the chain has sane length. Proper place for this check is somewhere in the network code, but this assertion already proven to be useful in catching what seems to be driver bugs causing NFS scrambling random memory. Discussed with: rmacklem MFC after: 1 week	2013-02-01 16:57:02 +00:00
kib	d622325e53	Be conservative and do not try to consume more bytes than was requested from the server for the read operation. Server shall not reply with too large size, but client should be resilent too. Reviewed by: rmacklem MFC after: 1 week	2013-01-27 09:34:25 +00:00
pfg	245e35ae97	Clean some 'svn:executable' properties in the tree. Submitted by: Christoph Mallon MFC after: 3 days	2013-01-26 22:08:21 +00:00
pfg	440e8ae3c8	Cosmetical off-by-one Technically, the case when all the blocks are released is not a sanity check. Move further the comment while here. Suggested by: bde MFC after: 3 days	2013-01-26 21:50:52 +00:00
jhb	f2293255a9	Further cleanups to use of timestamps in NFS: - Use NFSD_MONOSEC (which maps to time_uptime) instead of the seconds portion of wall-time stamps to manage timeouts on events. - Remove unused nd_starttime from the per-request structure in the new NFS server. - Use nanotime() for the modification time on a delegation to get as precise a time as possible. - Use time_second instead of extracting the second from a call to getmicrotime(). Submitted by: bde (3) Reviewed by: bde, rmacklem MFC after: 2 weeks	2013-01-25 15:25:24 +00:00
pfg	f4f6188cae	ext2fs: fix a check for negative block numbers. The previous change accidentally left the substraction we were trying to avoid in case that i_blocks could become negative. Reported by: bde MFC after: 4 days	2013-01-23 14:29:29 +00:00
pfg	646ebf1c31	ext2fs: make some inode fields match the ext2 spec. Ext2fs uses unsigned fields in its dinode struct. FreeBSD can have negative values in some of those fields and the inode is meant to interact with the system so we have never respected the unsigned nature of most of those fields. Block numbers and the NFS generation number do not need to be signed so redefine them as unsigned to better match the on-disk information. MFC after: 1 week	2013-01-22 18:54:03 +00:00
pfg	7d48f835be	ext2fs: temporarily disable the reallocation code. Testing with fsx has revealed problems and in order to hunt the bugs properly we need reduce the complexity. This seems to help but is not a complete solution. MFC after: 3 days	2013-01-22 18:36:31 +00:00
delphij	adf92df625	Make it possible to force async at server side on new NFS server, similar to the old one's nfs.nfsrv.async. Please note that by enabling this option (default is disabled), the system could potentionally have silent data corruption if the server crashes before write is committed to non-volatile storage, as the client side have no way to tell if the data is already written. Submitted by: rmacklem MFC after: 2 weeks	2013-01-18 19:42:08 +00:00
pfg	2b37b49e2c	ext2fs: Add some DOINGASYNC check to match ffs. This is mostly cosmetical. Reviewed by: bde MFC after: 3 days	2013-01-18 19:11:17 +00:00
jhb	812f7427ff	Use vfs_timestamp() to set file timestamps rather than invoking getmicrotime() or getnanotime() directly in NFS. Reviewed by: rmacklem, bde MFC after: 1 week	2013-01-18 18:43:38 +00:00
jhb	7cc7eee4d9	Remove a no-longer-used variable after the previous change to use VA_UTIMES_NULL. Submitted by: bde, rmacklem MFC after: 1 week	2013-01-17 18:45:20 +00:00
jhb	ecb4042c11	Use the VA_UTIMES_NULL flag to detect when NULL was passed to utimes() instead of comparing the desired time against the current time as a heuristic. Reviewed by: rmacklem MFC after: 1 week	2013-01-16 21:52:31 +00:00
kib	82d5c5773a	Remove the filtering of the acceptable mount options for nullfs, added in r245004. Although the report was for noatime option which is non-functional for the nullfs, other standard options like nosuid or noexec are useful with it. Reported by: Dewayne Geraghty <dewayne.geraghty@heuristicsystems.com.au> MFC after: 3 days	2013-01-16 05:32:49 +00:00
jhb	e7637960eb	- More properly handle interrupted NFS requests on an interruptible mount by returning an error of EINTR rather than EACCES. - While here, bring back some (but not all) of the NFS RPC statistics lost when krpc was committed. Reviewed by: rmacklem MFC after: 1 week	2013-01-15 22:08:17 +00:00
kib	169c0a0a9f	The current default size of the nullfs hash table used to lookup the existing nullfs vnode by the lower vnode is only 16 slots. Since the default mode for the nullfs is to cache the vnodes, hash has extremely huge chains. Size the nullfs hashtbl based on the current value of desiredvnodes. Use vfs_hash_index() to calculate the hash bucket for a given vnode. Pointy hat to: kib Diagnosed and reviewed by: peter Tested by: peter, pho (previous version) Sponsored by: The FreeBSD Foundation MFC after: 5 days	2013-01-14 05:44:47 +00:00
kib	b94d892898	When nullfs mount is forcibly unmounted and nullfs vnode is reclaimed, get back the leased write reference from the lower vnode. There is no other path which can correct v_writecount on the lowervp. Reported by: flo Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 3 days	2013-01-10 18:24:48 +00:00
bapt	afe1d4e213	Add support for IO_APPEND flag in fuse This make open(..., O_APPEND) actually works on fuse filesystem. Reviewed by: attilio	2013-01-08 12:21:50 +00:00
pfg	947a420026	ext2fs: cleanup de dinode structure. It was plagued with style errors and the offsets had been lost. While here took the time to update the fields according to the latest ext4 documentation. Reviewed by: bde MFC after: 3 days	2013-01-07 03:36:32 +00:00
gleb	97e76936ec	tmpfs: Replace directory entry linked list with RB-Tree. Use file name hash as a tree key, handle duplicate keys. Both VOP_LOOKUP and VOP_READDIR operations utilize same tree for search. Directory entry offset (cookie) is either file name hash or incremental id in case of hash collisions (duplicate-cookies). Keep sorted per directory list of duplicate-cookie entries to facilitate cookie number allocation. Don't fail if previous VOP_READDIR() offset is no longer valid, start with next dirent instead. Other file system handle it similarly. Workaround race prone tn_readdir_last[pn] fields update. Add tmpfs_dir_destroy() to free all dirents. Set NFS cookies in tmpfs_dir_getdents(). Return EJUSTRETURN from tmpfs_dir_getdents() instead of hard coded -1. Mark directory traversal routines static as they are no longer used outside of tmpfs_subr.c	2013-01-06 22:15:44 +00:00
kib	f74da69096	Fix reversed condition in the assertion. Pointy hat to: kib MFC after: 13 days	2013-01-04 07:52:47 +00:00
kib	a7c71037df	Add the "nocache" nullfs mount option, which disables the caching of the free nullfs vnodes, switching nullfs behaviour to pre-r240285. The option is mostly intended as the last-resort when higher pressure on the vnode cache due to doubling of the vnode counts is not desirable. Note that disabling the cache costs more than 2x wall time in the metadata-hungry scenarious. The default is "cache". Tested and benchmarked by: pho (previous version) MFC after: 2 weeks	2013-01-03 19:17:57 +00:00
kib	defbe57abe	Remove the last use of the deprecated MNT_VNODE_FOREACH interface in the tree. With the help from: mjg Tested by: Ronald Klop <ronald-freebsd8@klop.yi.org> MFC after: 2 weeks	2013-01-03 19:01:56 +00:00
kib	c6bad3bef7	Do not force a writer to the devfs file to drain the buffer writes. Requested and tested by: Ian Lepore <freebsd@damnhippie.dyndns.org> MFC after: 2 weeks	2012-12-23 22:43:27 +00:00
pfg	16216f308e	More constant renaming in preparation for newer features. We also try to make better use of the fs flags instead of trying adapt the code according to the fs structures. In the case of subsecond timestamps and birthtime we now check that the feature is explicitly enabled: previously we only checked that the reserved space was available and silently wrote them. This approach is much safer, especially if the filesystem happens to use embedded inodes or support EAs. Discussed with: Zheng Liu MFC after: 5 days	2012-12-20 02:22:36 +00:00
rmacklem	1c6e1dc79b	Add "nfsstat -m" support for the two new NFS mount options added by r244042.	2012-12-09 22:23:50 +00:00
rmacklem	c82d89183d	Move the NFSv4.1 client patches over from projects/nfsv4.1-client to head. I don't think the NFS client behaviour will change unless the new "minorversion=1" mount option is used. It includes basic NFSv4.1 support plus support for pNFS using the Files Layout only. All problems detecting during an NFSv4.1 Bakeathon testing event in June 2012 have been resolved in this code and it has been tested against the NFSv4.1 server available to me. Although not reviewed, I believe that kib@ has looked at it.	2012-12-08 22:52:39 +00:00
glebius	8e20fa5ae9	Mechanically substitute flags from historic mbuf allocator with malloc(9) flags within sys. Exceptions: - sys/contrib not touched - sys/mbuf.h edited manually	2012-12-05 08:04:20 +00:00
rmacklem	d79bf0f49f	Add an nfssvc() option to the kernel for the new NFS client which dumps out the actual options being used by an NFS mount. This will be used to implement a "-m" option for nfsstat(1). Reviewed by: alfred MFC after: 2 weeks	2012-12-02 01:16:04 +00:00
pfg	b4fee55cdf	Update some definitions or make them match NetBSD's headers. Bring several definitions required for newer ext4 features. Rename EXT2F_COMPAT_HTREE to EXT2F_COMPAT_DIRHASHINDEX since it is not being used yet and the new name is more compatible with NetBSD and Linux. This change is purely cosmetic and has no effect on the real code. Obtained from: NetBSD MFC after: 3 days	2012-11-28 15:48:32 +00:00
pfg	0077f34174	Partially bring r242520 to ext2fs. When a file is first being written, the dynamic block reallocation (implemented by ext2_reallocblks) relocates the file's blocks so as to cluster them together into a contiguous set of blocks on the disk. When the cluster crosses the boundary into the first indirect block, the first indirect block is initially allocated in a position immediately following the last direct block. Block reallocation would usually destroy locality by moving the indirect block out of the way to keep the data blocks contiguous. The issue was diagnosed long ago by Bruce Evans on ffs and surfaced on ext2fs when block reallocaton was ported. This is only a partial solution based on the similarities with FFS. We still require more review of the allocation details that vary in ext2fs. Reported by: bde MFC after: 1 week	2012-11-28 00:36:40 +00:00
davide	f3a37c7422	- smbfs_rename() might return an error value without correctly upgrading the vnode use count, and this might cause the kernel to panic if compiled with WITNESS enable. - Be sure to put the '\0' terminator to the rpath string. Sponsored by: iXsystems inc.	2012-11-26 04:29:47 +00:00
davide	e52463677a	- Remove reset of vpp pointer in some places as long as it's not really useful and has the side effect of obfuscating the code a bit. - Remove spurious references to simple_lock. Reported by: attilio [1] Sponsored by: iXsystems inc.	2012-11-22 09:13:45 +00:00
davide	017de7d030	Until now, smbfs_fullpath() computed the full path starting from the vnode and following back the chain of n_parent pointers up to the root, without acquiring the locks of the n_parent vnodes analyzed during the computation. This is immediately wrong because if the vnode lock is not held there's no guarantee on the validity of the vnode pointer or the data. In order to fix, store the whole path in the smbnode structure so that smbfs_fullpath() can use this information. Discussed with: kib Reported and tested by: pho Sponsored by: iXsystems inc.	2012-11-22 08:58:29 +00:00
kib	f31aa350da	Remove the check and panic for an impossible condition. The NULL lowervp vnode v_vnlock would cause panic due to NULL pointer dereference much earlier. MFC after: 1 week	2012-11-20 15:25:00 +00:00
attilio	b9a061c88d	r16312 is not any longer real since many years (likely since when VFS received granular locking) but the comment present in UFS has been copied all over other filesystems code incorrectly for several times. Removes comments that makes no sense now. Reviewed by: kib MFC after: 3 days	2012-11-19 22:43:45 +00:00
kib	801de09716	In pget(9), if PGET_NOTWEXIT flag is not specified, also search the zombie list for the pid. This allows several kern.proc sysctls to report useful information for zombies. Hold the allproc_lock around all searches instead of relocking it. Remove private pfind_locked() from the new nfs client code. Requested and reviewed by: pjd Tested by: pho MFC after: 3 weeks	2012-11-16 08:25:06 +00:00
kib	32941467f0	Remove M_USE_RESERVE from the devfs cdp allocator, which is one of two uses of M_USE_RESERVE in the kernel. This allocation is not special. Reviewed by: alc Tested by: pho MFC after: 2 weeks	2012-11-14 19:50:21 +00:00
davide	b4e3eb0677	Get rid of some old debug code. It provides checks similar to the one offered by RedZone so there's no need to keep it. Sponsored by: iXsystems inc.	2012-11-14 19:10:50 +00:00
davide	d7bac61929	Fix the lookup in the DOTDOT case in the same way as other filesystems do, i.e. inlining the vn_vget_ino() algorithm. Sponsored by: iXsystems inc.	2012-11-14 18:43:58 +00:00
attilio	0a289d546b	- Protect mnt_data and mnt_flags under the mount interlock - Move mp->mnt_stat manipulation where all of them happens Reported by: davide Discussed with: kib Tested by: flo MFC after: 2 months X-MFC: 241519, 242536,242616, 242727	2012-11-10 19:32:16 +00:00
attilio	d5d551ec46	Complete MPSAFE VFS interface and remove MNTK_MPSAFE flag. Porters should refer to __FreeBSD_version 1000021 for this change as it may have happened at the same timeframe.	2012-11-09 18:02:25 +00:00
attilio	1e93fc5eeb	- Current caching mode is completely broken because it simply relies on timing of the operations and not real lookup, bringing too many false positives. Remove the whole mechanism. If it needs to be implemented, next time it should really be done in the proper way. - Fix VOP_GETATTR() in order to cope with userland bugs that would change the type of file and not panic. Instead it gets the entry as if it is not existing. Reported and tested by: flo MFC after: 2 months X-MFC: 241519, 242536,242616	2012-11-08 00:32:49 +00:00
attilio	908519dd89	fuse_io* must be able to crunch also VDIR vnodes. Update assert appropriately. Reported and Tested by: flo MFC after: 2 months X-MFC: 241519,242536	2012-11-05 15:23:54 +00:00
attilio	35cb5e8a85	Fix a bug where operations was carried on even if not implemented, leading to handling of an invalid fdip object. Reported and tested by: flo MFC after: 2 months X-MFC: 241519	2012-11-03 23:32:32 +00:00
kib	f16ea99007	The r241025 fixed the case when a binary, executed from nullfs mount, was still possible to open for write from the lower filesystem. There is a symmetric situation where the binary could already has file descriptors opened for write, but it can be executed from the nullfs overlay. Handle the issue by passing one v_writecount reference to the lower vnode if nullfs vnode has non-zero v_writecount. Note that only one write reference can be donated, since nullfs only keeps one use reference on the lower vnode. Always use the lower vnode v_writecount for the checks. Introduce the VOP_GET_WRITECOUNT to read v_writecount, which is currently always bypassed to the lower vnode, and VOP_ADD_WRITECOUNT to manipulate the v_writecount value, which manages a single bypass reference to the lower vnode. Caling the VOPs instead of directly accessing v_writecount provide the fix described in the previous paragraph. Tested by: pho MFC after: 3 weeks	2012-11-02 13:56:36 +00:00
davide	ff8afcf2e9	- Do not put in the mntqueue half-constructed vnodes. - Change the code so that it relies on vfs_hash rather than on a home-made hashtable. - There's no need to inline fnv_32_buf(). Reviewed by: delphij Tested by: pho Sponsored by: iXsystems inc.	2012-10-31 03:55:33 +00:00
davide	793cdde76e	Fix panic due to page faults while in kernel mode, under conditions of VM pressure. The reason is that in some codepaths pointers to stack variables were passed from one thread to another. In collaboration with: pho Reported by: pho's stress2 suite Sponsored by: iXsystems inc.	2012-10-31 03:34:07 +00:00

... 3 4 5 6 7 ...

3330 Commits