freebsd-nq

Author	SHA1	Message	Date
Rick Macklem	c7b560b9b4	For an NFSv4 mount with the "nocto" option, don't get the up to date file attributes upon close. This reduces the Getattr RPC count by about 65% for software builds. MFC after: 2 weeks	2014-04-21 19:10:23 +00:00
Rick Macklem	c3e4a7261c	Modify the NFSv4 client create/mkdir RPC so that it acquires post-create/mkdir directory attributes. This allows the RPC to name cache the newly created directory and reduces the lookup RPC count for applications creating a lot of directories. MFC after: 2 weeks	2014-04-20 22:19:00 +00:00
Rick Macklem	de1a42bd0c	Modify the NFSv4 client open/create RPC so that it acquires post-open/create directory attributes. This allows the RPC to name cache the newly created file and reduces the lookup RPC count by about 10% for software builds. MFC after: 2 weeks	2014-04-19 19:40:20 +00:00
Rick Macklem	a6f8e64e74	Modify the Lookup RPC for NFSv4 so that it acquires directory attributes. This allows the client to cache directory names when they are looked up, reducing the Lookup RPC count by about 40% for software builds. MFC after: 2 weeks	2014-04-18 22:05:34 +00:00
Warner Losh	1bbf66051b	Take out the hack to write -1's to non-NAND. Always do a BIO_DELETE on the ranges we want to erase. This is nicer to SSDs that want TRIMs anyway.	2014-04-18 17:03:43 +00:00
Warner Losh	875ac64f3e	More properly account for free/reserved segments to avoid deadlock or worse when filling up a device and then trying to erase files to make space. Without enough space, you can't do that. Also, ensure that the metadata writes don't generate ENOSPC. They will be retried later since the buffers are still dirty... Submitted by: mjg@	2014-04-18 17:03:35 +00:00
Andrey V. Elsukov	14b2dc3952	Use SMB_QUERY_FS_SIZE_INFO request to populate statfs structure. When server doesn't support this request, try to use SMB_INFO_ALLOCATION. And use SMB_COM_QUERY_INFORMATION_DISK request as fallback. MFC after: 2 weeks	2014-04-15 09:10:01 +00:00
Xin LI	25bfde79d6	Fix NFS deadlock vulnerability. [SA-14:05] Fix "Heartbleed" vulnerability and ECDSA Cache Side-channel Attack in OpenSSL. [SA-14:06]	2014-04-08 18:27:32 +00:00
Bryan Drewery	44f1c91610	Rename global cnt to vm_cnt to avoid shadowing. To reduce the diff struct pcu.cnt field was not renamed, so PCPU_OP(cnt.field) is still used. pc_cnt and pcpu are also used in kvm(3) and vmstat(8). The goal was to not affect externally used KPI. Bump __FreeBSD_version_ in case some out-of-tree module/code relies on the the global cnt variable. Exp-run revealed no ports using it directly. No objection from: arch@ Sponsored by: EMC / Isilon Storage Division	2014-03-22 10:26:09 +00:00
Pedro F. Giffuni	ca73017a2d	Revert r263449; ext2fs: minor update to the dirpref policy. The change in UFS r254996, reverted the change as the older code seems to work better. This was not visible in local testing but we can trust UFS is vastly more exercised in diferent environments.	2014-03-21 04:33:38 +00:00
Pedro F. Giffuni	e23c349230	ext2fs: minor update to the dirpref policy. Bring in a minor change to the dirpref policy based on r248623. This is pretty minimal change to keep the implementation in sync with UFS but other parts from the original change are not directly applicable so don't expect improvements in fsck times. MFC after: 2 weeks	2014-03-20 21:19:13 +00:00
Pedro F. Giffuni	ef78ad0290	msdosfs: minor format fix - spaces vs tab MFC after: 3 days	2014-03-20 20:14:04 +00:00
Robert Watson	4a14441044	Update kernel inclusions of capability.h to use capsicum.h instead; some further refinement is required as some device drivers intended to be portable over FreeBSD versions rely on __FreeBSD_version to decide whether to include capability.h. MFC after: 3 weeks	2014-03-16 10:55:57 +00:00
Bryan Drewery	504bde017a	Add missing FALLTHROUGH comment in tmpfs_dir_getdents for looking up '.' and '..'. Reviewed by: Russell Cattelan Sponsored by: EMC / Isilon Storage Division MFC after: 2 weeks	2014-03-14 13:58:02 +00:00
Bryan Drewery	ac09d109ca	Rename cnt to maxcookies and change its use as the condition for when to lookup cookies to be less obscure. No functional change. Since r245115, cnt has not really been needed in tmpfs_dir_getdents(). Keep it for the MPASS() for now though. Sponsored by: EMC / Isilon Storage Division MFC after: 2 weeks	2014-03-14 13:55:48 +00:00
Bryan Drewery	62dca316da	Cleanup redundant logic and add some comments to help explain how it works in lieu of potentially less clear code. Sponsored by: EMC / Isilon Storage Division Discussed with: Russell Cattelan	2014-03-14 02:10:30 +00:00
Bryan Drewery	0742ebc98f	Fix -o size less than PAGE_SIZE resulting in SIZE_MAX being used. Discussed with: kib MFC after: 2 weeks	2014-03-14 01:43:55 +00:00
Pedro F. Giffuni	157b40af8e	ext2fs: Fix a bug when sorting htree entries. This a typo introduced when bringing the original code from NetBSD. Reported by: Mike Ma MFC after: 3 days	2014-03-06 21:02:16 +00:00
Pedro F. Giffuni	c3b76e1345	ext2fs: small formatting fixes. Remove some redundant spaces. No functional change. MFC after: 3 days	2014-03-01 21:22:20 +00:00
Pedro F. Giffuni	3a54024da4	ext2fs: use of tab vs spaces. Consistently use a single tab after a #define as mentioned in style(9). Use tabs instead of space for indenting. Fix a typo: "hash_vesion". No functional change. MFC after: 3 days	2014-02-28 21:25:32 +00:00
Pedro F. Giffuni	67da48a15b	ext2fs: fully enable ext4 read-only support. The ext4 developers tend to tag Ext4-specific flags as "incompatible" even when such features are not relevant for read-only support. This is a consequence of the process though which this filesystem is implemented without design and the fact that some new features are not extensible to ext2/3. Organize the features according to what we support and sort them so that we can now read-only mount filesystems with some features that may be found in newly formatted ext4 fs. Submitted by: Zheng Liu Reviewed by: pfg MFC after: 5 days	2014-02-22 22:07:16 +00:00
Dimitry Andric	2de7ba0758	In sys/fs/nandfs/nandfs_vfsops.c, #if 0 an unused static function. MFC after: 3 days	2014-02-15 11:42:56 +00:00
Pedro F. Giffuni	ad3d96a730	ext2fs: Use i_flag instead of i_flags for Ext4 inode flags. The ext4 inode flags do not have equivalents for chflags (1) and hold information that is private to the implementation. The i_flag field in the inode is a better place to hold the Ext4 inode flags as it saves us from masking flags while setting or getting attributes. It should also make things cleaner if we implement write support for Ext4. Suggested by: bde Tested by: Mike Ma MFC after: 3 days	2014-01-28 14:39:05 +00:00
Pedro F. Giffuni	99984d229c	ext2fs: Re-enable reallocblk. The major corruption issues affecting this code have been fixed a while ago. MFC after: 1 week	2014-01-24 20:26:00 +00:00
Pedro F. Giffuni	1093104cf7	ext2fs: fix a bug in dirindex and re-enable. The IN_* flags should be set in i_flag instead of corrupting i_flags [1]. Re-enable HTree dirindex as the last series of bug fixes seems to have fixed the issues. Reported by: bde [1] Tested by: kevlo MFC after: 1 week	2014-01-24 13:51:38 +00:00
Pedro F. Giffuni	b7bbf8b9f3	ext2fs: fix logic error in the previous change. Use the bitwise negation instead of bogus boolean negation and move the flag manipulation with the assignment. Fix some grammatical errors introduced in the same change. Reported by: bde MFC after: 3 days	2014-01-22 19:09:41 +00:00
Pedro F. Giffuni	a7710d51c4	ext2fs: Translate the EXT4_EXTENTS and EXT4_INDEX to the inode flags. r260545 cleared the inode flags to fix corruption problems but we still need to pass some EXT4 flags for the ext4 read-only mode. None of these attributes has an equivalent in FreeBSD and are uninteresting for the system utilities so they should be innaccessible in ext2_getattrib(). Note: we also use EXT4_HUGE_FILE but we use it directly from the dinode structure so it is not necessary to translate it, Suggested by: bde MFC after: 3 days	2014-01-21 19:06:29 +00:00
Alexander Motin	6103bae6ae	Fix lock leak in purely hypothetical case of TCP connection without SVC_ACK method. This change should be NOP now, but it is better to be future safe. Reported by: rmacklem	2014-01-14 20:18:38 +00:00
Pedro F. Giffuni	c2e2b77b19	ext2fs: fix inode flag conversion. After r252890 we are naively attempting to pass through the inode flags. This is technically incorrect as the ext2 inode flags don't match the UFS/system values used in FreeBSD and a clean conversion is needed. Some filtering was left in place so the change didn't cause significant changes in FreeBSD but some of the garbage passed is likely to be the cause for warning messages in linux. Fix the issue by resetting the flags before conversion as was done previously. This also means we will not pass the EXT4_* inode flags into FreeBSD's inode. PR: kern/185448 MFC after: 3 days	2014-01-11 15:19:04 +00:00
Alexander Motin	45e18ea7ea	Fix off-by-one error in r260229. Coverity CID: 1148955	2014-01-07 11:43:51 +00:00
Alexander Motin	d473bac729	Rework NFS Duplicate Request Cache cleanup logic. - Introduce additional hash to group requests by hash of sockref. This allows to process TCP acknowledgements without looping though all the cache, and as result allows to do it every time. - Indroduce additional callbacks to notify application layer about sockets disconnection. Without this last few requests processed just before socket disconnection never processed their ACKs and stuck in cache for many hours. - Implement transport-specific method for tracking reply acknowledgements. New implementation does not cross multiple stack layers to get the data and does not have race conditions that previously made some requests stuck in cache. This could be done more efficiently at sockbuf layer, but that would broke some KBIs, while I don't know other consumers for it aside NFS. - Instead of traversing all DRC twice per request, run cleaning only once per request, and except in some conditions traverse only single hash slot at a time. Together this limits NFS DRC growth only to situations of real connectivity problems. If network is working well, and so all replies are acknowledged, cache remains almost empty even after hours of heavy load. Without this change on the same test cache was growing to many thousand requests even with perfectly working local network. As another result this reduces CPU time spent on the DRC handling during SPEC NFS benchmark from about 10% to 0.5%. Sponsored by: iXsystems, Inc.	2014-01-03 15:09:59 +00:00
Alexander Motin	1555cf04fc	Slightly simplify expiration logic introduced in r254337. - Do not update the histogram for items we are any way deleting from cache. - Do not update the histogram if nfsrc_tcphighwater is not set. - Remove some extra math operations.	2013-12-25 16:58:42 +00:00
Rick Macklem	43a213bb92	The NFSv4 server would call VOP_SETATTR() with a shared locked vnode when a Getattr for a file is done by a client other than the one that holds the file's delegation. This would only happen when delegations are enabled and the problem is fixed by this patch. MFC after: 1 week	2013-12-25 01:03:14 +00:00
Rick Macklem	0c695afb96	An intermittent problem with NFSv4 exporting of ZFS snapshots was reported to the freebsd-fs mailing list. I believe the problem was caused by the Readdir operation using VFS_VGET() for a snapshot file entry instead of VOP_LOOKUP(). This would not occur for NFSv3, since it will do a VFS_VGET() of "." which fails with ENOTSUPP at the beginning of the directory, whereas NFSv4 does not check "." or "..". This patch adds a call to VFS_VGET() for the directory being read to check for ENOTSUPP. I also observed that the mount_on_fileid and fsid attributes were not correct at the snapshot's auto mountpoints when looking at packet traces for the Readdir. This patch fixes the attributes by doing a check for different v_mount structure, even if the vnode v_mountedhere is not set. Reported by: jas@cse.yorku.ca Tested by: jas@cse.yorku.ca Reviewed by: asomers MFC after: 1 week	2013-12-24 22:24:17 +00:00
Rick Macklem	b921158ae0	The NFSv4 client was passing both the p and cred arguments to nfsv4_fillattr() as NULLs for the Getattr callback. This caused nfsv4_fillattr() to not fill in the Change attribute for the reply. I believe this was a violation of the RFC, but had little effect on server behaviour. This patch passes a non-NULL p argument to fix this. MFC after: 1 week	2013-12-24 00:48:39 +00:00
Pedro F. Giffuni	b41f53c43b	ext2fs: make the hashing algorithm match the linux code. There appears to be a hash function compatibility issue. The code is currently disabled but fix it nevertheless. PR: kern/183230 MFC after: 3 days	2013-12-23 19:47:34 +00:00
Rick Macklem	6b8fe5d59d	The NFSv4.1 client didn't return NFSv4.1 specific error codes for the Getattr and Recall callbacks. This patch fixes it. Since the NFSv4.1 specific error codes would only happen for abnormal circumstances, this patch has little effect, in practice. MFC after: 1 week	2013-12-23 15:16:53 +00:00
Alexander Motin	10f8f58d4a	Fix RPC server threads file handle affinity to work better with ZFS. Instead of taking 8 specific bytes of file handle to identify file during RPC thread affitinity handling, use trivial hash of the full file handle. ZFS's struct zfid_short does not have padding field after the length field, as result, originally picked 8 bytes are loosing lower 16 bits of object ID, causing many false matches and unneeded requests affinity to same thread. This fix substantially improves NFS server latency and scalability in SPEC NFS benchmark by more flexible use of multiple NFS threads. Sponsored by: iXsystems, Inc.	2013-12-23 08:43:16 +00:00
Konstantin Belousov	f26ca5ecde	Do not allow O_EXEC opens for fifo, return EINVAL. Besides not making sense, open(O_EXEC) for fifo creates fifoinfo with zero readers and writers counts, which causes premature free of pipes. Reported and tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week	2013-12-17 17:28:02 +00:00
Alexander Motin	ca187878c0	Fix long known bug with handling device aliases residing not in devfs root. Historically creation of device aliases created symbolic links using only name of target device as a link target, not considering current directory. Fix that by adding number of "../" chunks to the terget device name, required to get out of the current directory to devfs root first. MFC after: 1 month	2013-12-12 11:05:48 +00:00
Rick Macklem	cf766161ff	For software builds, the NFS client does many small synchronous (with FILE_SYNC) writes because non-contiguous byte ranges in the same buffer cache block are being written. This patch adds a new mount option "noncontigwr" which allows the non-contiguous byte ranges to be combined, with the dirty byte range becoming the superset of the bytes that are dirty, if the file has not been file locked. This reduces the number of writes significantly for software builds. The only case where this change might break existing applications is where an application is writing non-overlapping byte ranges within the same buffer cache block of a file from multiple clients concurrently. Since such an application would normally do file locking on the file, avoiding the byte range merge for files that have been file locked should be sufficient for most (maybe all?) cases. Submitted by: jhb (earlier version) Reviewed by: kib MFC after: 3 weeks	2013-12-07 23:05:59 +00:00
Pedro F. Giffuni	244f00cc0d	ext2fs: add two new reserved inodes. According to online documentation [1], Ext4 has two new "special" inodes so add the new exclude and replica inodes. Reference: [1] https://ext4.wiki.kernel.org/index.php/Ext4_Disk_Layout Reported by: Mike Ma MFC after: 3 weeks	2013-12-04 02:27:52 +00:00
Sergey Kandaurov	0d8dc7cc39	- Nuke a second copy of nfscl_attrcache extern declarations from under ifdef KDTRACE_HOOKS. This fixes kernel build with options KDTRACE_HOOKS. - Fix style inconsistencies.	2013-11-26 22:41:40 +00:00
Gleb Smirnoff	285e7a2d97	Fix build, attempt two.	2013-11-26 20:27:57 +00:00
Gleb Smirnoff	6882b8ea66	Fix build.	2013-11-26 10:34:34 +00:00
Attilio Rao	54366c0bd7	- For kernel compiled only with KDTRACE_HOOKS and not any lock debugging option, unbreak the lock tracing release semantic by embedding calls to LOCKSTAT_PROFILE_RELEASE_LOCK() direclty in the inlined version of the releasing functions for mutex, rwlock and sxlock. Failing to do so skips the lockstat_probe_func invokation for unlocking. - As part of the LOCKSTAT support is inlined in mutex operation, for kernel compiled without lock debugging options, potentially every consumer must be compiled including opt_kdtrace.h. Fix this by moving KDTRACE_HOOKS into opt_global.h and remove the dependency by opt_kdtrace.h for all files, as now only KDTRACE_FRAMES is linked there and it is only used as a compile-time stub [0]. [0] immediately shows some new bug as DTRACE-derived support for debug in sfxge is broken and it was never really tested. As it was not including correctly opt_kdtrace.h before it was never enabled so it was kept broken for a while. Fix this by using a protection stub, leaving sfxge driver authors the responsibility for fixing it appropriately [1]. Sponsored by: EMC / Isilon storage division Discussed with: rstone [0] Reported by: rstone [1] Discussed with: philip	2013-11-25 07:38:45 +00:00
Konstantin Belousov	587430f254	Redo r258088 to avoid relying on signed arithmetic overflow, since compiler interprets this as an undefined behaviour. Instead, ensure that the sum of uio_offset and uio_resid is below OFF_MAX using the operation which cannot overflow. Reported and tested by: pho Discussed with: bde Approved by: des (pseudofs maintainer) Sponsored by: The FreeBSD Foundation MFC after: 1 week	2013-11-20 19:41:00 +00:00
Konstantin Belousov	5ba4de79a7	Remove useless comparisions of assigned offset and resid with the sources from uio. Both uio_offset and offset, and uio_resid and resid have the same types for some time. Add check for buflen overflow by comparing the buflen with both offset and resid (vs. comparing with offset only, as it is currently done). Reported and tested by: pho Approved by: des (pseudofs maintainer) Sponsored by: The FreeBSD Foundation MFC after: 1 week	2013-11-13 08:55:09 +00:00
Rick Macklem	42b6336a98	Fix an NFSv4.1 client specific case where a forced dismount would hang. The hang occurred in nfsv4_setsequence() when it couldn't find an available session slot and is fixed by checking for a forced dismount in progress and just returning for this case. MFC after: 1 month	2013-11-09 21:24:56 +00:00
Rick Macklem	cc085ba84d	During code inspection, I spotted that there was a code path where CLNT_CONTROL() would be called on "client" after it was released via CLNT_RELEASE(). It was unlikely that this code path gets executed and I have not heard of any problem report caused by this bug. This patch fixes the code so that this cannot happen. MFC after: 2 months	2013-11-03 23:17:30 +00:00
Gleb Smirnoff	76039bc84f	The r48589 promised to remove implicit inclusion of if_var.h soon. Prepare to this event, adding if_var.h to files that do need it. Also, include all includes that now are included due to implicit pollution via if_var.h Sponsored by: Netflix Sponsored by: Nginx, Inc.	2013-10-26 17:58:36 +00:00
Pedro F. Giffuni	4b367145f7	UFS2: make di_extsize unsigned. di_extsize is the EA size and as such it should be unsigned. Adjust related types for consistency. Reviewed by: mckusick (previous version) MFC after: 3 weeks	2013-10-24 00:33:29 +00:00
Konstantin Belousov	bf3e483b44	Similar to debug.iosize_max_clamp sysctl, introduce devfs_iosize_max_clamp sysctl, which allows/disables SSIZE_MAX-sized i/o requests on the devfs files. Sponsored by: The FreeBSD Foundation Reminded by: Dmitry Sivachenko <trtrmitya@gmail.com> MFC after: 1 week	2013-10-15 06:33:10 +00:00
Konstantin Belousov	64548150b6	Remove two instances of ARGSUSED comment, and wrap lines nearby the code that is to be changed. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2013-10-15 06:28:11 +00:00
John-Mark Gurney	c9b24e38e8	NULL stale pointers (should be a no-op as they should no longer be used)... Reviewed by: dteske Approved by: re (kib) Sponsored by: Vicor MFC after: 3 days	2013-09-25 02:49:18 +00:00
John-Mark Gurney	fb180e2186	fix a bug where we access a bread buffer after we have brelse'd it... The kernel normally didn't unmap/context switch away before we accessed the buffer most of the time, but under heavy I/O pressure and lots of mount/unmounting this would cause a fault on nofault panic... Reviewed by: dteske Approved by: re (kib) Sponsored by: Vicor MFC after: 3 days	2013-09-25 02:48:12 +00:00
Dag-Erling Smørgrav	1a05c762b9	Fix the length calculation for the final block of a sendfile(2) transmission which could be tricked into rounding up to the nearest page size, leaking up to a page of kernel memory. [13:11] In IPv6 and NetATM, stop SIOCSIFADDR, SIOCSIFBRDADDR, SIOCSIFDSTADDR and SIOCSIFNETMASK at the socket layer rather than pass them on to the link layer without validation or credential checks. [SA-13:12] Prevent cross-mount hardlinks between different nullfs mounts of the same underlying filesystem. [SA-13:13] Security: CVE-2013-5666 Security: FreeBSD-SA-13:11.sendfile Security: CVE-2013-5691 Security: FreeBSD-SA-13:12.ifioctl Security: CVE-2013-5710 Security: FreeBSD-SA-13:13.nullfs Approved by: re	2013-09-10 10:05:59 +00:00
Pedro F. Giffuni	1f7c9f2bc8	ext2fs: temporarily disable htree directory index. Our code does not consider yet the case of hash collisions. This is a rather annoying situation where two or more files that happen to have the same hash value will not appear accessible. The situation is not difficult to work-around but given that things will just work without enabling htree we will save possible embarrassments for the next release. Reported by: Kevin Lo	2013-09-07 02:45:51 +00:00
Pawel Jakub Dawidek	ab568de789	Handle cases where capability rights are not provided. Reported by: kib	2013-09-05 11:58:12 +00:00
Pawel Jakub Dawidek	7008be5bd7	Change the cap_rights_t type from uint64_t to a structure that we can extend in the future in a backward compatible (API and ABI) way. The cap_rights_t represents capability rights. We used to use one bit to represent one right, but we are running out of spare bits. Currently the new structure provides place for 114 rights (so 50 more than the previous cap_rights_t), but it is possible to grow the structure to hold at least 285 rights, although we can make it even larger if 285 rights won't be enough. The structure definition looks like this: struct cap_rights { uint64_t cr_rights[CAP_RIGHTS_VERSION + 2]; }; The initial CAP_RIGHTS_VERSION is 0. The top two bits in the first element of the cr_rights[] array contain total number of elements in the array - 2. This means if those two bits are equal to 0, we have 2 array elements. The top two bits in all remaining array elements should be 0. The next five bits in all array elements contain array index. Only one bit is used and bit position in this five-bits range defines array index. This means there can be at most five array elements in the future. To define new right the CAPRIGHT() macro must be used. The macro takes two arguments - an array index and a bit to set, eg. #define CAP_PDKILL CAPRIGHT(1, 0x0000000000000800ULL) We still support aliases that combine few rights, but the rights have to belong to the same array element, eg: #define CAP_LOOKUP CAPRIGHT(0, 0x0000000000000400ULL) #define CAP_FCHMOD CAPRIGHT(0, 0x0000000000002000ULL) #define CAP_FCHMODAT (CAP_FCHMOD \| CAP_LOOKUP) There is new API to manage the new cap_rights_t structure: cap_rights_t cap_rights_init(cap_rights_t rights, ...); void cap_rights_set(cap_rights_t rights, ...); void cap_rights_clear(cap_rights_t rights, ...); bool cap_rights_is_set(const cap_rights_t rights, ...); bool cap_rights_is_valid(const cap_rights_t rights); void cap_rights_merge(cap_rights_t dst, const cap_rights_t src); void cap_rights_remove(cap_rights_t dst, const cap_rights_t src); bool cap_rights_contains(const cap_rights_t big, const cap_rights_t little); Capability rights to the cap_rights_init(), cap_rights_set(), cap_rights_clear() and cap_rights_is_set() functions are provided by separating them with commas, eg: cap_rights_t rights; cap_rights_init(&rights, CAP_READ, CAP_WRITE, CAP_FSTAT); There is no need to terminate the list of rights, as those functions are actually macros that take care of the termination, eg: #define cap_rights_set(rights, ...) \ __cap_rights_set((rights), __VA_ARGS__, 0ULL) void __cap_rights_set(cap_rights_t *rights, ...); Thanks to using one bit as an array index we can assert in those functions that there are no two rights belonging to different array elements provided together. For example this is illegal and will be detected, because CAP_LOOKUP belongs to element 0 and CAP_PDKILL to element 1: cap_rights_init(&rights, CAP_LOOKUP \| CAP_PDKILL); Providing several rights that belongs to the same array's element this way is correct, but is not advised. It should only be used for aliases definition. This commit also breaks compatibility with some existing Capsicum system calls, but I see no other way to do that. This should be fine as Capsicum is still experimental and this change is not going to 9.x. Sponsored by: The FreeBSD Foundation	2013-09-05 00:09:56 +00:00
Rick Macklem	f7d8291af0	Crashes have been observed for NFSv4.1 mounts when the system is being shut down which were caused by the nfscbd_pool being destroyed before the backchannel is disabled. This patch is believed to fix the problem, by simply avoiding ever destroying the nfscbd_pool. Since the NFS client module cannot be unloaded, this should not cause a memory leak. MFC after: 2 weeks	2013-09-04 22:47:56 +00:00
Rick Macklem	8fe6bddff7	Forced dismounts of NFS mounts can fail when thread(s) are stuck waiting for an RPC reply from the server while holding the mount point busy (mnt_lockref incremented). This happens because dounmount() msleep()s waiting for mnt_lockref to become 0, before calling VFS_UNMOUNT(). This patch adds a new VFS operation called VFS_PURGE(), which the NFS client implements as purging RPCs in progress. Making this call before checking mnt_lockref fixes the problem, by ensuring that the VOP_xxx() calls will fail and unbusy the mount point. Reported by: sbruno Reviewed by: kib MFC after: 2 weeks	2013-09-01 23:02:59 +00:00
Kenneth D. Merry	3b5f179d2a	Support storing 7 additional file flags in tmpfs: UF_SYSTEM, UF_SPARSE, UF_OFFLINE, UF_REPARSE, UF_ARCHIVE, UF_READONLY, and UF_HIDDEN. Sort the file flags tmpfs supports alphabetically. tmpfs now supports the same flags as UFS, with the exception of SF_SNAPSHOT. Reported by: bdrewery, antoine Sponsored by: Spectra Logic	2013-08-28 22:12:56 +00:00
John Baldwin	fd77bbb967	Remove most of the remaining sysctl name list macros. They were only ever intended for use in sysctl(8) and it has not used them for many years. Reviewed by: bde Tested by: exp-run by bdrewery	2013-08-26 18:16:05 +00:00
Xin LI	2454886e05	Allow tmpfs be mounted inside jail.	2013-08-23 22:52:20 +00:00
Kenneth D. Merry	7da1a731c6	Expand the use of stat(2) flags to allow storing some Windows/DOS and CIFS file attributes as BSD stat(2) flags. This work is intended to be compatible with ZFS, the Solaris CIFS server's interaction with ZFS, somewhat compatible with MacOS X, and of course compatible with Windows. The Windows attributes that are implemented were chosen based on the attributes that ZFS already supports. The summary of the flags is as follows: UF_SYSTEM: Command line name: "system" or "usystem" ZFS name: XAT_SYSTEM, ZFS_SYSTEM Windows: FILE_ATTRIBUTE_SYSTEM This flag means that the file is used by the operating system. FreeBSD does not enforce any special handling when this flag is set. UF_SPARSE: Command line name: "sparse" or "usparse" ZFS name: XAT_SPARSE, ZFS_SPARSE Windows: FILE_ATTRIBUTE_SPARSE_FILE This flag means that the file is sparse. Although ZFS may modify this in some situations, there is not generally any special handling for this flag. UF_OFFLINE: Command line name: "offline" or "uoffline" ZFS name: XAT_OFFLINE, ZFS_OFFLINE Windows: FILE_ATTRIBUTE_OFFLINE This flag means that the file has been moved to offline storage. FreeBSD does not have any special handling for this flag. UF_REPARSE: Command line name: "reparse" or "ureparse" ZFS name: XAT_REPARSE, ZFS_REPARSE Windows: FILE_ATTRIBUTE_REPARSE_POINT This flag means that the file is a Windows reparse point. ZFS has special handling code for reparse points, but we don't currently have the other supporting infrastructure for them. UF_HIDDEN: Command line name: "hidden" or "uhidden" ZFS name: XAT_HIDDEN, ZFS_HIDDEN Windows: FILE_ATTRIBUTE_HIDDEN This flag means that the file may be excluded from a directory listing if the application honors it. FreeBSD has no special handling for this flag. The name and bit definition for UF_HIDDEN are identical to the definition in MacOS X. UF_READONLY: Command line name: "urdonly", "rdonly", "readonly" ZFS name: XAT_READONLY, ZFS_READONLY Windows: FILE_ATTRIBUTE_READONLY This flag means that the file may not written or appended, but its attributes may be changed. ZFS currently enforces this flag, but Illumos developers have discussed disabling enforcement. The behavior of this flag is different than MacOS X. MacOS X uses UF_IMMUTABLE to represent the DOS readonly permission, but that flag has a stronger meaning than the semantics of DOS readonly permissions. UF_ARCHIVE: Command line name: "uarch", "uarchive" ZFS_NAME: XAT_ARCHIVE, ZFS_ARCHIVE Windows name: FILE_ATTRIBUTE_ARCHIVE The UF_ARCHIVED flag means that the file has changed and needs to be archived. The meaning is same as the Windows FILE_ATTRIBUTE_ARCHIVE attribute, and the ZFS XAT_ARCHIVE and ZFS_ARCHIVE attribute. msdosfs and ZFS have special handling for this flag. i.e. they will set it when the file changes. sys/param.h: Bump __FreeBSD_version to 1000047 for the addition of new stat(2) flags. chflags.1: Document the new command line flag names (e.g. "system", "hidden") available to the user. ls.1: Reference chflags(1) for a list of file flags and their meanings. strtofflags.c: Implement the mapping between the new command line flag names and new stat(2) flags. chflags.2: Document all of the new stat(2) flags, and explain the intended behavior in a little more detail. Explain how they map to Windows file attributes. Different filesystems behave differently with respect to flags, so warn the application developer to take care when using them. zfs_vnops.c: Add support for getting and setting the UF_ARCHIVE, UF_READONLY, UF_SYSTEM, UF_HIDDEN, UF_REPARSE, UF_OFFLINE, and UF_SPARSE flags. All of these flags are implemented using attributes that ZFS already supports, so the on-disk format has not changed. ZFS currently doesn't allow setting the UF_REPARSE flag, and we don't really have the other infrastructure to support reparse points. msdosfs_denode.c, msdosfs_vnops.c: Add support for getting and setting UF_HIDDEN, UF_SYSTEM and UF_READONLY in MSDOSFS. It supported SF_ARCHIVED, but this has been changed to be UF_ARCHIVE, which has the same semantics as the DOS archive attribute instead of inverse semantics like SF_ARCHIVED. After discussion with Bruce Evans, change several things in the msdosfs behavior: Use UF_READONLY to indicate whether a file is writeable instead of file permissions, but don't actually enforce it. Refuse to change attributes on the root directory, because it is special in FAT filesystems, but allow most other attribute changes on directories. Don't set the archive attribute on a directory when its modification time is updated. Windows and DOS don't set the archive attribute in that scenario, so we are now bug-for-bug compatible. smbfs_node.c, smbfs_vnops.c: Add support for UF_HIDDEN, UF_SYSTEM, UF_READONLY and UF_ARCHIVE in SMBFS. This is similar to changes that Apple has made in their version of SMBFS (as of smb-583.8, posted on opensource.apple.com), but not quite the same. We map SMB_FA_READONLY to UF_READONLY, because UF_READONLY is intended to match the semantics of the DOS readonly flag. The MacOS X code maps both UF_IMMUTABLE and SF_IMMUTABLE to SMB_FA_READONLY, but the immutable flags have stronger meaning than the DOS readonly bit. stat.h: Add definitions for UF_SYSTEM, UF_SPARSE, UF_OFFLINE, UF_REPARSE, UF_ARCHIVE, UF_READONLY and UF_HIDDEN. The definition of UF_HIDDEN is the same as the MacOS X definition. Add commented-out definitions of UF_COMPRESSED and UF_TRACKED. They are defined in MacOS X (as of 10.8.2), but we do not implement them (yet). ufs_vnops.c: Add support for getting and setting UF_ARCHIVE, UF_HIDDEN, UF_OFFLINE, UF_READONLY, UF_REPARSE, UF_SPARSE, and UF_SYSTEM in UFS. Alphabetize the flags that are supported. These new flags are only stored, UFS does not take any action if the flag is set. Sponsored by: Spectra Logic Reviewed by: bde (earlier version)	2013-08-21 23:04:48 +00:00
Konstantin Belousov	c0a46535c4	Make the seek a method of the struct fileops. Tested by: pho Sponsored by: The FreeBSD Foundation	2013-08-21 17:36:01 +00:00
Konstantin Belousov	41cf41fdfd	Extract the general-purpose code from tmpfs to perform uiomove from the page queue of some vm object. Discussed with: alc Tested by: pho Sponsored by: The FreeBSD Foundation	2013-08-21 17:23:24 +00:00
Konstantin Belousov	b1dd38f408	Restore the previous sendfile(2) behaviour on the block devices. Provide valid .fo_sendfile method for several missed struct fileops. Reviewed by: glebius Sponsored by: The FreeBSD Foundation	2013-08-16 14:22:20 +00:00
Rick Macklem	93c5875b24	Fix several performance related issues in the new NFS server's DRC for NFS over TCP. - Increase the size of the hash tables. - Create a separate mutex for each hash list of the TCP hash table. - Single thread the code that deletes stale cache entries. - Add a tunable called vfs.nfsd.tcphighwater, which can be increased to allow the cache to grow larger, avoiding the overhead of frequent scans to delete stale cache entries. (The default value will result in frequent scans to delete stale cache entries, analagous to what the pre-patched code does.) - Add a tunable called vfs.nfsd.cachetcp that can be used to disable DRC caching for NFS over TCP, since the old NFS server didn't DRC cache TCP. It also adjusts the size of nfsrc_floodlevel dynamically, so that it is always greater than vfs.nfsd.tcphighwater. For UDP the algorithm remains the same as the pre-patched code, but the tunable vfs.nfsd.udphighwater can be used to allow the cache to grow larger and reduce the overhead caused by frequent scans for stale entries. UDP also uses a larger hash table size than the pre-patched code. Reported by: wollman Tested by: wollman (earlier version of patch) Submitted by: ivoras (earlier patch) Reviewed by: jhb (earlier version of patch) MFC after: 1 month	2013-08-14 21:11:26 +00:00
Pedro F. Giffuni	4a62545173	ext2fs: update format specifiers for ext4 type. Previous bandaid was not appropriate and didn't really work for all platforms. While here, cleanup the surrounding code to match ffs_checkoverlap() Reported by: dim, jmallet and bde MFC after: 3 weeks	2013-08-14 14:22:46 +00:00
Pedro F. Giffuni	88ae190ea0	ext2fs: update format specifiers for ext4 type. Reported by: Sam Fourman Jr. MFC after: 3 weeks	2013-08-13 18:39:36 +00:00
Pedro F. Giffuni	70097aac13	Define ext2fs local types and use them. Add definitions for e2fs_daddr_t, e4fs_daddr_t in addition to the already existing e2fs_lbn_t and adjust them for ext4. Other than making the code more readable these changes should fix problems related to big filesystems. Setting the proper types can be tricky so the process was helped by looking at UFS. In our implementation, logical block numbers can be negative and the code depends on it. In ext2, block numbers are unsigned so it is convenient to keep e2fs_daddr_t unsigned and use the complete 32 bits. In the case of e4fs_daddr_t, while the value should be unsigned, for ext4 we only need to support 48 bits so preserving an extra bit from the sign is not an issue. While here also drop the ext2_setblock() prototype that was never used. Discussed with: mckusick, bde MFC after: 3 weeks	2013-08-13 15:40:43 +00:00
Pedro F. Giffuni	d7511a40a7	Add read-only support for extents in ext2fs. Basic support for extents was implemented by Zheng Liu as part of his Google Summer of Code in 2010. This support is read-only at this time. In addition to extents we also support the huge_file extension for read-only purposes. This works nicely with the additional support for birthtime/nanosec timestamps and dir_index that have been added lately. The implementation may not work for all ext4 filesystems as it doesn't support some features that are being enabled by default on recent linux like flex_bg. Nevertheless, the feature should be very useful for migration or simple access in filesystems that have been converted from ext2/3 or don't use incompatible features. Special thanks to Zheng Liu for his dedication and continued work to support ext2 in FreeBSD. Submitted by: Zheng Liu (lz@) Reviewed by: Mike Ma, Christoph Mallon (previous version) Sponsored by: Google Inc. MFC after: 3 weeks	2013-08-12 21:34:48 +00:00
Attilio Rao	c7aebda8a1	The soft and hard busy mechanism rely on the vm object lock to work. Unify the 2 concept into a real, minimal, sxlock where the shared acquisition represent the soft busy and the exclusive acquisition represent the hard busy. The old VPO_WANTED mechanism becames the hard-path for this new lock and it becomes per-page rather than per-object. The vm_object lock becames an interlock for this functionality: it can be held in both read or write mode. However, if the vm_object lock is held in read mode while acquiring or releasing the busy state, the thread owner cannot make any assumption on the busy state unless it is also busying it. Also: - Add a new flag to directly shared busy pages while vm_page_alloc and vm_page_grab are being executed. This will be very helpful once these functions happen under a read object lock. - Move the swapping sleep into its own per-object flag The KPI is heavilly changed this is why the version is bumped. It is very likely that some VM ports users will need to change their own code. Sponsored by: EMC / Isilon storage division Discussed with: alc Reviewed by: jeff, kib Tested by: gavin, bapt (older version) Tested by: pho, scottl	2013-08-09 11:11:11 +00:00
Pedro F. Giffuni	95f1f8d262	Small typo. MFC after: 3 days	2013-08-08 22:07:59 +00:00
Konstantin Belousov	8239a7a878	The tmpfs_alloc_vp() is used to instantiate vnode for the tmpfs node, in particular, from the tmpfs_lookup VOP method. If LK_NOWAIT is not specified in the lkflags, the lookup is supposed to return an alive vnode whenever the underlying node is valid. Currently, the tmpfs_alloc_vp() returns ENOENT if the vnode attached to node exists and is being reclaimed. This causes spurious ENOENT errors from lookup on tmpfs and corresponding random 'No such file' failures from syscalls working with tmpfs files. Fix this by waiting for the doomed vnode to be detached from the tmpfs node if sleepable allocation is requested. Note that filesystems which use vfs_hash.c, correctly handle the case due to vfs_hash_get() looping when vget() returns ENOENT for sleepable requests. Reported and tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2013-08-05 18:53:59 +00:00
Attilio Rao	be99683637	Revert r253939: We cannot busy a page before doing pagefaults. Infact, it can deadlock against vnode lock, as it tries to vget(). Other functions, right now, have an opposite lock ordering, like vm_object_sync(), which acquires the vnode lock first and then sleeps on the busy mechanism. Before this patch is reinserted we need to break this ordering. Sponsored by: EMC / Isilon storage division Reported by: kib	2013-08-05 08:55:35 +00:00
Attilio Rao	3b6714cacb	The page hold mechanism is fast but it has couple of fallouts: - It does not let pages respect the LRU policy - It bloats the active/inactive queues of few pages Try to avoid it as much as possible with the long-term target to completely remove it. Use the soft-busy mechanism to protect page content accesses during short-term operations (like uiomove_fromphys()). After this change only vm_fault_quick_hold_pages() is still using the hold mechanism for page content access. There is an additional complexity there as the quick path cannot immediately access the page object to busy the page and the slow path cannot however busy more than one page a time (to avoid deadlocks). Fixing such primitive can bring to complete removal of the page hold mechanism. Sponsored by: EMC / Isilon storage division Discussed with: alc Reviewed by: jeff Tested by: pho	2013-08-04 21:07:24 +00:00
Attilio Rao	878a788734	Remove unnecessary soft busy of the page before to do vn_rdwr() in kern_sendfile() which is unnecessary. The page is already wired so it will not be subjected to pagefault. The content cannot be effectively protected as it is full of races already. Multiple accesses to the same indexes are serialized through vn_rdwr(). Sponsored by: EMC / Isilon storage division Reviewed by: alc, jeff Tested by: pho	2013-08-04 15:56:19 +00:00
Pedro F. Giffuni	d192e40f77	Add license for the half MD4 algorithm used in ext2_half_md4(). The htree implementation uses code derived from the RSA Data Security, Inc. MD4 Message-Digest Algorithm. Add a proper licensing statement for the code and clarify the corresponding comments. Approved by: core (hrs)	2013-08-01 16:04:48 +00:00
Marius Strobl	cd67748bde	- Add const-qualifiers to the arguments of isonum_(). - According to ISO 9660 7.1.2, isonum_712() should return a signed value. - Try to get isonum_() closer to style(9).	2013-07-28 12:29:10 +00:00
Andriy Gapon	8e94193e58	make path matching in devfs rules consistent and sane (and safer) Before this change path matching had the following features: - for device nodes the patterns were matched against full path - in the above case '/' in a path could be matched by a wildcard - for directories and links only the last component was matched So, for example, a pattern like 're*' could match the following entries: - re0 device - responder/u0 device - zvol/recpool directory Although it was possible to work around this behavior (once it was spotted and understood), it was very confusing and contrary to documentation. Now we always match a full path for all types of devfs entries (devices, directories, links) and a '/' has to be matched explicitly. This behavior follows the shell globbing rules. This change is originally developed by Jaakko Heinonen. Many thanks! PR: kern/122838 Submitted by: jh MFC after: 4 weeks	2013-07-26 14:25:58 +00:00
Pedro F. Giffuni	9670f48107	ext2fs: Return EINVAL for negative uio_offset as in UFS. While here drop old comment that doesn't really apply. MFC after: 1 month Discussed with: gleb	2013-07-25 19:37:49 +00:00
Pedro F. Giffuni	0b54fe540c	ext2fs: Drop a check that wan't supposed to be in r253651. MFC after: 1 month	2013-07-25 16:04:55 +00:00
Pedro F. Giffuni	78d912bbc3	ext2fs: Don't assume that on-disk format of a directory is the same as in <sys/dirent.h> ext2_readdir() has always been very fs specific and different with respect to its ufs_ counterpart. Recent changes from UFS have made it possible to share more closely the implementation. MFUFS r252438: Always start parsing at DIRBLKSIZ aligned offset, skip first entries if uio_offset is not DIRBLKSIZ aligned. Return EINVAL if buffer is too small for single entry. Preallocate buffer for cookies. Skip entries with zero inode number. Reviewed by: gleb, Zheng Liu MFC after: 1 month	2013-07-25 15:34:20 +00:00
Pedro F. Giffuni	7d20a270cc	fuse: revert kernel_header update. It seems to be causing problems due to the lack of the new features. Found by: bapt Pointed hat: pfg	2013-07-24 20:21:29 +00:00
Nathan Whitehorn	59169d9156	tmpfs works perfectly fine with -o union -- there is no reason to exclude it from the list of options.	2013-07-23 14:48:37 +00:00
Rick Macklem	a36b76a787	The NFSv4 server incorrectly assumed that the high order words of the attribute bitmap argument would be non-zero. This caused an interoperability problem for a recent patch to the Linux NFSv4 client. The Linux folks have changed their patch to avoid this, but this patch fixes the problem on the server. Reported and tested by: Andre Heider (a.heider@gmail.com) MFC after: 3 days	2013-07-20 22:35:32 +00:00
Pedro F. Giffuni	feba8afb59	fuse: revert birthtime support. The creation time support breaks the data structures used in linux fuse. libfuse carries it's own header. Revert the changes for now. We will try to get an agreement with the fuse upstream maintainers to avoid having to patch the library headers all the time.	2013-07-20 14:50:35 +00:00
Pedro F. Giffuni	77b8f8a998	Adjust outsizes: Recalculate FUSE_COMPAT_ENTRY_OUT_SIZE and COMPAT_ATTR_OUT_SIZE. These were wrong in the previous commit. They are actually unused in FreeBSD though. Pointed out by: Jan Beich	2013-07-20 03:55:56 +00:00
Pedro F. Giffuni	05ad761667	Adjust outsizes: When birthtime was added (r253331) we missed adding the weight of the new fields in FUSE_COMPAT_ENTRY_OUT_SIZE and COMPAT_ATTR_OUT_SIZE. Adjust them accordingly. Pointed out by: Jan Beich	2013-07-20 03:08:50 +00:00
Pedro F. Giffuni	c230e70881	Update fuse_kernel header. Bring in the changes from the FUSE kernel interface 7.10 (available under a BSD license). After 7.10 the linux FUSE developers added support for a controversial CUSE driver and some linux especific features that are unlikely to find its way into FreeBSD. We currently don't implement any of the new features so we are not bumping the FUSE_KERNEL_MINOR_VERSION. The header should, nevertheless, serve as a template to add the new features in a compatible manner. While here adopt some minor cleanups from the upstream version like removing FUSE_MAJOR and FUSE_MINOR which were never used. Also add multiple inclusion header guards,	2013-07-15 00:05:27 +00:00
Pedro F. Giffuni	da7d8f2a65	Add creation timestamp (birthtime) support for fuse. I was keeping this #ifdef'd for reference with the MacFUSE change[1] but on second thought, this is a FreeBSD-only header so the SVN history should be enough. Add missing padding while here. Reference [1]: http://code.google.com/p/macfuse/source/detail?spec=svn1686&r=1360	2013-07-13 22:06:41 +00:00
Pedro F. Giffuni	944d37b123	Add creation timestamp (birthtime) support for fuse. This is based on similar support in MacFUSE.	2013-07-12 17:22:59 +00:00
Pedro F. Giffuni	c5249f35b8	Implement 1003.1-2001 pathconf() keys. This is based on r106058 in UFS. MFC after: 1 month	2013-07-10 22:03:01 +00:00
Pedro F. Giffuni	db20714a87	Reinstate the assertion from r253045. UFS r232732 reverted the change as the real problem was to be fixed at the syscall level. Reported by: bde	2013-07-09 14:23:00 +00:00
Pedro F. Giffuni	bf3c9330ba	Enhancement when writing an entire block of a file. Merge from UFS r231313: This change first attempts the uiomove() to the newly allocated (and dirty) buffer and only zeros it if the uiomove() fails. The effect is to eliminate the gratuitous zeroing of the buffer in the usual case where the uiomove() successfully fills it. MFC after: 3 days	2013-07-09 01:31:04 +00:00
Rick Macklem	88a2437a65	Add support for host-based (Kerberos 5 service principal) initiator credentials to the kernel rpc. Modify the NFSv4 client to add support for the gssname and allgssname mount options to use this capability. Requires the gssd daemon to be running with the "-h" option. Reviewed by: jhb	2013-07-09 01:05:28 +00:00
Pedro F. Giffuni	7ce75e5f1f	Avoid a panic and return EINVAL instead. Merge from UFS r232692: syscall() fuzzing can trigger this panic. MFC after: 3 days	2013-07-08 20:21:36 +00:00

1 2 3 4 5 ...

3183 Commits