freebsd-skq

Author	SHA1	Message	Date
mjg	f03b37f3e8	vfs: add vrefact, to be used when the vnode has to be already active This allows blind increment of relevant counters which under contention is cheaper than inc-not-zero loops at least on amd64. Use it in some of the places which are guaranteed to see already active vnodes. Reviewed by: kib (previous version)	2016-12-12 15:37:11 +00:00
kib	127162a8d9	Enable lookup_cap_dotdot and lookup_cap_dotdot_nonlocal. Requested and reviewed by: cem Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D8746	2016-12-12 11:12:04 +00:00
mjg	168e218f92	vfs: provide fake locking primitives for the crossmp vnode Since the vnode is only expected to be shared locked, we can save a little overhead by only pretending we are locking in the first place. Reviewed by: kib Tested by: pho	2016-12-02 18:03:15 +00:00
mjg	18de920423	vfs: fix a whitespace nit in r309307	2016-11-30 02:17:03 +00:00
mjg	35980d306d	vfs: avoid VOP_ISLOCKED in the common case in lookup	2016-11-30 02:14:53 +00:00
kib	a41f4cc9a5	Allow some dotdot lookups in capability mode. If dotdot lookup does not escape from the file descriptor passed as the lookup root, we can allow the component traversal. Track the directories traversed, and check the result of dotdot lookup against the recorded list of the directory vnodes. Dotdot lookups are enabled by sysctl vfs.lookup_cap_dotdot, currently disabled by default until more verification of the approach is done. Disallow non-local filesystems for dotdot, since remote server might conspire with the local process to allow it to escape the namespace. This might be too cautious, provide the knob vfs.lookup_cap_dotdot_nonlocal to override as well. Idea by: rwatson Discussed with: emaste, jonathan, rwatson Reviewed by: mjg (previous version) Tested by: pho (previous version) Sponsored by: The FreeBSD Foundation MFC after: 2 week Differential revision: https://reviews.freebsd.org/D8110	2016-11-02 12:43:15 +00:00
kib	b9d3dfb1e0	Remove tautological casts. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2016-11-02 12:10:39 +00:00
kib	1323c841ed	Style fixes. Discussed with: emaste Sponsored by: The FreeBSD Foundation MFC after: 1 week	2016-11-02 12:02:31 +00:00
emaste	00b67b15b9	Renumber license clauses in sys/kern to avoid skipping #3	2016-09-15 13:16:20 +00:00
mjg	7b64fd1139	vfs: provide a common exit point in namei for error cases This shortens the function, adds the SDT_PROBE use for error cases and consistenly unrefs rootdir last. Reviewed by: kib MFC after: 2 weeks	2016-08-27 22:43:41 +00:00
trasz	255ed885fa	Replace all remaining calls to vprint(9) with vn_printf(9), and remove the old macro. MFC after: 1 month	2016-08-10 16:12:31 +00:00
pfg	28823d0656	sys/kern: spelling fixes in comments. No functional change.	2016-04-29 22:15:33 +00:00
pfg	5b3421712d	kern: for pointers replace 0 with NULL. These are mostly cosmetical, no functional change. Found with devel/coccinelle.	2016-04-15 16:10:11 +00:00
trasz	8804916675	Refactor the way we restore cn_lkflags; no functional changes. MFC after: 1 month Sponsored by: The FreeBSD Foundation	2016-03-12 09:05:43 +00:00
trasz	beb648d9cc	Remove cn_consume from 'struct componentname'. It was never set to anything other than 0. Reviewed by: kib@ MFC after: 1 month Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D5611	2016-03-12 08:50:38 +00:00
trasz	faec271eeb	Fix autofs triggering problem. Assume you have an NFS server, 192.168.1.1, with share "share". This commit fixes a problem where "mkdir /net/192.168.1.1/share/meh" would return spurious error instead of creating the directory if the target filesystem wasn't mounted yet; subsequent attempts would work correctly. The failure scenario is kind of complicated to explain, but it all boils down to calling VOP_MKDIR() for the target filesystem (NFS) with wrong dvp - the autofs vnode instead of the filesystem root mounted over it. Reviewed by: kib@ MFC after: 1 month Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D5442	2016-03-12 07:54:42 +00:00
avg	425c0bb088	save some bytes by using more concise SDT_PROBE<n> instead of SDT_PROBE SDT_PROBE requires 5 parameters whereas SDT_PROBE<n> requires n parameters where n is typically smaller than 5. Perhaps SDT_PROBE should be made a private implementation detail. MFC after: 20 days	2015-09-28 12:14:16 +00:00
ed	e8ba6b4817	Properly return ENOTDIR when calling at() on a non-vnode. We already properly return ENOTDIR when calling at() on a non-directory vnode, but it turns out that if you call it on a socket, we see EINVAL. Patch up namei to properly translate this to ENOTDIR.	2015-08-12 16:17:00 +00:00
mjg	03b75d4638	vfs: cosmetic changes to namei and namei_handle_root - don't initialize cnp during declaration - don't test error/!error, compare to 0 instead	2015-07-09 17:17:26 +00:00
mjg	2c26f2224b	vfs: simplify error handling in namei The logic is reorganised so that there is one exit point prior to the lookup loop. This is an intermediate step to making audit logging functions use found vnode instead of translating ni_dirfd on their own. ni_startdir validation is removed. The only in-tree consumer is nfs which already makes sure it is a directory. Reviewed by: kib	2015-07-09 16:32:58 +00:00
mjg	5c01a53df7	vfs: avoid spurious vref/vrele for absolute lookups namei used to vref fd_cdir, which was immediatley vrele'd on entry to the loop. Check for absolute lookup and vref the right vnode the first time. Reviewed by: kib	2015-07-09 15:06:58 +00:00
mjg	5bbdbadc3d	vfs: plug a use-after-free of fd_rdir in namei fd_rdir vnode was stored in ni_rootdir without refing it in any way, after which the filedsc lock was being dropped. The vnode could have been freed by mountcheckdirs or another thread doing chroot. VREF the vnode while the lock is held. Reviewed by: kib MFC after: 1 week	2015-07-09 15:06:24 +00:00
markj	7cfe35b5a5	Move the comment describing namei(9) back to namei()'s definition. MFC after: 3 days	2015-07-05 22:56:41 +00:00
kib	cf11d25e18	Fix two issues with lockmgr(9) LK_CAN_SHARE() test, which determines whether the shared request for already shared-locked lock could be granted. Both problems result in the exclusive locker starvation. The concurrent exclusive request is indicated by either LK_EXCLUSIVE_WAITERS or LK_EXCLUSIVE_SPINNERS flags. The reverse condition, i.e. no exclusive waiters, must check that both flags are cleared. Add a flag LK_NODDLKTREAT for shared lock request to indicate that current thread guarantees that it does not own the lock in shared mode. This turns back the exclusive lock starvation avoidance code; see man page update for detailed description. Use LK_NODDLKTREAT when doing lookup(9). Reported and tested by: pho No objections from: attilio Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2014-11-02 13:10:31 +00:00
mjg	b110d1e264	Plug a memory leak in case of failed lookups in capability mode. Put common cnp cleanup into one function and use it for this purpose. MFC after: 1 week	2014-08-24 12:51:12 +00:00
hselasky	35b126e324	Pull in r267961 and r267973 again. Fix for issues reported will follow.	2014-06-28 03:56:17 +00:00
gjb	fc21f40567	Revert r267961, r267973: These changes prevent sysctl(8) from returning proper output, such as: 1) no output from sysctl(8) 2) erroneously returning ENOMEM with tools like truss(1) or uname(1) truss: can not get etype: Cannot allocate memory	2014-06-27 22:05:21 +00:00
hselasky	bd1ed65f0f	Extend the meaning of the CTLFLAG_TUN flag to automatically check if there is an environment variable which shall initialize the SYSCTL during early boot. This works for all SYSCTL types both statically and dynamically created ones, except for the SYSCTL NODE type and SYSCTLs which belong to VNETs. A new flag, CTLFLAG_NOFETCH, has been added to be used in the case a tunable sysctl has a custom initialisation function allowing the sysctl to still be marked as a tunable. The kernel SYSCTL API is mostly the same, with a few exceptions for some special operations like iterating childrens of a static/extern SYSCTL node. This operation should probably be made into a factored out common macro, hence some device drivers use this. The reason for changing the SYSCTL API was the need for a SYSCTL parent OID pointer and not only the SYSCTL parent OID list pointer in order to quickly generate the sysctl path. The motivation behind this patch is to avoid parameter loading cludges inside the OFED driver subsystem. Instead of adding special code to the OFED driver subsystem to post-load tunables into dynamically created sysctls, we generalize this in the kernel. Other changes: - Corrected a possibly incorrect sysctl name from "hw.cbb.intr_mask" to "hw.pcic.intr_mask". - Removed redundant TUNABLE statements throughout the kernel. - Some minor code rewrites in connection to removing not needed TUNABLE statements. - Added a missing SYSCTL_DECL(). - Wrapped two very long lines. - Avoid malloc()/free() inside sysctl string handling, in case it is called to initialize a sysctl from a tunable, hence malloc()/free() is not ready when sysctls from the sysctl dataset are registered. - Bumped FreeBSD version to indicate SYSCTL API change. MFC after: 2 weeks Sponsored by: Mellanox Technologies	2014-06-27 16:33:43 +00:00
rwatson	33fdc14c0c	Update kernel inclusions of capability.h to use capsicum.h instead; some further refinement is required as some device drivers intended to be portable over FreeBSD versions rely on __FreeBSD_version to decide whether to include capability.h. MFC after: 3 weeks	2014-03-16 10:55:57 +00:00
avg	71889a5eff	dtrace sdt: remove the ugly sname parameter of SDT_PROBE_DEFINE In its stead use the Solaris / illumos approach of emulating '-' (dash) in probe names with '__' (two consecutive underscores). Reviewed by: markj MFC after: 3 weeks	2013-11-26 08:46:27 +00:00
attilio	7ee4e910ce	- For kernel compiled only with KDTRACE_HOOKS and not any lock debugging option, unbreak the lock tracing release semantic by embedding calls to LOCKSTAT_PROFILE_RELEASE_LOCK() direclty in the inlined version of the releasing functions for mutex, rwlock and sxlock. Failing to do so skips the lockstat_probe_func invokation for unlocking. - As part of the LOCKSTAT support is inlined in mutex operation, for kernel compiled without lock debugging options, potentially every consumer must be compiled including opt_kdtrace.h. Fix this by moving KDTRACE_HOOKS into opt_global.h and remove the dependency by opt_kdtrace.h for all files, as now only KDTRACE_FRAMES is linked there and it is only used as a compile-time stub [0]. [0] immediately shows some new bug as DTRACE-derived support for debug in sfxge is broken and it was never really tested. As it was not including correctly opt_kdtrace.h before it was never enabled so it was kept broken for a while. Fix this by using a protection stub, leaving sfxge driver authors the responsibility for fixing it appropriately [1]. Sponsored by: EMC / Isilon storage division Discussed with: rstone [0] Reported by: rstone [1] Discussed with: philip	2013-11-25 07:38:45 +00:00
kib	f9f4aa68f7	Both vn_close() and VFS_PROLOGUE() evaluate vp->v_mount twice, without holding the vnode lock; vp->v_mount is checked first for NULL equiality, and then dereferenced if not NULL. If vnode is reclaimed meantime, second dereference would still give NULL. Change VFS_PROLOGUE() to evaluate the mp once, convert MNTK_SHARED_WRITES and MNTK_EXTENDED_SHARED tests into inline functions. Reviewed by: alc Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2013-11-09 20:30:13 +00:00
pjd	667d7255be	Fix panic in ktrcapfail() when no capability rights are passed. While here, correct all consumers to pass NULL instead of 0 as we pass capability rights as pointers now, not uint64_t. Reported by: Daniel Peyrolon Tested by: Daniel Peyrolon Approved by: re (marius)	2013-09-18 19:26:08 +00:00
pjd	029a6f5d92	Change the cap_rights_t type from uint64_t to a structure that we can extend in the future in a backward compatible (API and ABI) way. The cap_rights_t represents capability rights. We used to use one bit to represent one right, but we are running out of spare bits. Currently the new structure provides place for 114 rights (so 50 more than the previous cap_rights_t), but it is possible to grow the structure to hold at least 285 rights, although we can make it even larger if 285 rights won't be enough. The structure definition looks like this: struct cap_rights { uint64_t cr_rights[CAP_RIGHTS_VERSION + 2]; }; The initial CAP_RIGHTS_VERSION is 0. The top two bits in the first element of the cr_rights[] array contain total number of elements in the array - 2. This means if those two bits are equal to 0, we have 2 array elements. The top two bits in all remaining array elements should be 0. The next five bits in all array elements contain array index. Only one bit is used and bit position in this five-bits range defines array index. This means there can be at most five array elements in the future. To define new right the CAPRIGHT() macro must be used. The macro takes two arguments - an array index and a bit to set, eg. #define CAP_PDKILL CAPRIGHT(1, 0x0000000000000800ULL) We still support aliases that combine few rights, but the rights have to belong to the same array element, eg: #define CAP_LOOKUP CAPRIGHT(0, 0x0000000000000400ULL) #define CAP_FCHMOD CAPRIGHT(0, 0x0000000000002000ULL) #define CAP_FCHMODAT (CAP_FCHMOD \| CAP_LOOKUP) There is new API to manage the new cap_rights_t structure: cap_rights_t cap_rights_init(cap_rights_t rights, ...); void cap_rights_set(cap_rights_t rights, ...); void cap_rights_clear(cap_rights_t rights, ...); bool cap_rights_is_set(const cap_rights_t rights, ...); bool cap_rights_is_valid(const cap_rights_t rights); void cap_rights_merge(cap_rights_t dst, const cap_rights_t src); void cap_rights_remove(cap_rights_t dst, const cap_rights_t src); bool cap_rights_contains(const cap_rights_t big, const cap_rights_t little); Capability rights to the cap_rights_init(), cap_rights_set(), cap_rights_clear() and cap_rights_is_set() functions are provided by separating them with commas, eg: cap_rights_t rights; cap_rights_init(&rights, CAP_READ, CAP_WRITE, CAP_FSTAT); There is no need to terminate the list of rights, as those functions are actually macros that take care of the termination, eg: #define cap_rights_set(rights, ...) \ __cap_rights_set((rights), __VA_ARGS__, 0ULL) void __cap_rights_set(cap_rights_t *rights, ...); Thanks to using one bit as an array index we can assert in those functions that there are no two rights belonging to different array elements provided together. For example this is illegal and will be detected, because CAP_LOOKUP belongs to element 0 and CAP_PDKILL to element 1: cap_rights_init(&rights, CAP_LOOKUP \| CAP_PDKILL); Providing several rights that belongs to the same array's element this way is correct, but is not advised. It should only be used for aliases definition. This commit also breaks compatibility with some existing Capsicum system calls, but I see no other way to do that. This should be fine as Capsicum is still experimental and this change is not going to 9.x. Sponsored by: The FreeBSD Foundation	2013-09-05 00:09:56 +00:00
kib	103825c951	Do not override the ENOENT error for the empty path, or EFAULT errors from copyins, with the relative lookup check. Discussed with: rwatson Sponsored by: The FreeBSD Foundation MFC after: 1 week	2013-08-05 19:42:03 +00:00
kib	060b76c803	Do not call the VOP_LOOKUP() for the doomed directory vnode. The vnode could be reclaimed while lock upgrade was performed. Sponsored by: The FreeBSD Foundation Reported and tested by: pho Diagnosed and reviewed by: rmacklem MFC after: 1 week	2013-04-01 09:59:38 +00:00
pjd	f07ebb8888	Merge Capsicum overhaul: - Capability is no longer separate descriptor type. Now every descriptor has set of its own capability rights. - The cap_new(2) system call is left, but it is no longer documented and should not be used in new code. - The new syscall cap_rights_limit(2) should be used instead of cap_new(2), which limits capability rights of the given descriptor without creating a new one. - The cap_getrights(2) syscall is renamed to cap_rights_get(2). - If CAP_IOCTL capability right is present we can further reduce allowed ioctls list with the new cap_ioctls_limit(2) syscall. List of allowed ioctls can be retrived with cap_ioctls_get(2) syscall. - If CAP_FCNTL capability right is present we can further reduce fcntls that can be used with the new cap_fcntls_limit(2) syscall and retrive them with cap_fcntls_get(2). - To support ioctl and fcntl white-listing the filedesc structure was heavly modified. - The audit subsystem, kdump and procstat tools were updated to recognize new syscalls. - Capability rights were revised and eventhough I tried hard to provide backward API and ABI compatibility there are some incompatible changes that are described in detail below: CAP_CREATE old behaviour: - Allow for openat(2)+O_CREAT. - Allow for linkat(2). - Allow for symlinkat(2). CAP_CREATE new behaviour: - Allow for openat(2)+O_CREAT. Added CAP_LINKAT: - Allow for linkat(2). ABI: Reuses CAP_RMDIR bit. - Allow to be target for renameat(2). Added CAP_SYMLINKAT: - Allow for symlinkat(2). Removed CAP_DELETE. Old behaviour: - Allow for unlinkat(2) when removing non-directory object. - Allow to be source for renameat(2). Removed CAP_RMDIR. Old behaviour: - Allow for unlinkat(2) when removing directory. Added CAP_RENAMEAT: - Required for source directory for the renameat(2) syscall. Added CAP_UNLINKAT (effectively it replaces CAP_DELETE and CAP_RMDIR): - Allow for unlinkat(2) on any object. - Required if target of renameat(2) exists and will be removed by this call. Removed CAP_MAPEXEC. CAP_MMAP old behaviour: - Allow for mmap(2) with any combination of PROT_NONE, PROT_READ and PROT_WRITE. CAP_MMAP new behaviour: - Allow for mmap(2)+PROT_NONE. Added CAP_MMAP_R: - Allow for mmap(PROT_READ). Added CAP_MMAP_W: - Allow for mmap(PROT_WRITE). Added CAP_MMAP_X: - Allow for mmap(PROT_EXEC). Added CAP_MMAP_RW: - Allow for mmap(PROT_READ \| PROT_WRITE). Added CAP_MMAP_RX: - Allow for mmap(PROT_READ \| PROT_EXEC). Added CAP_MMAP_WX: - Allow for mmap(PROT_WRITE \| PROT_EXEC). Added CAP_MMAP_RWX: - Allow for mmap(PROT_READ \| PROT_WRITE \| PROT_EXEC). Renamed CAP_MKDIR to CAP_MKDIRAT. Renamed CAP_MKFIFO to CAP_MKFIFOAT. Renamed CAP_MKNODE to CAP_MKNODEAT. CAP_READ old behaviour: - Allow pread(2). - Disallow read(2), readv(2) (if there is no CAP_SEEK). CAP_READ new behaviour: - Allow read(2), readv(2). - Disallow pread(2) (CAP_SEEK was also required). CAP_WRITE old behaviour: - Allow pwrite(2). - Disallow write(2), writev(2) (if there is no CAP_SEEK). CAP_WRITE new behaviour: - Allow write(2), writev(2). - Disallow pwrite(2) (CAP_SEEK was also required). Added convinient defines: #define CAP_PREAD (CAP_SEEK \| CAP_READ) #define CAP_PWRITE (CAP_SEEK \| CAP_WRITE) #define CAP_MMAP_R (CAP_MMAP \| CAP_SEEK \| CAP_READ) #define CAP_MMAP_W (CAP_MMAP \| CAP_SEEK \| CAP_WRITE) #define CAP_MMAP_X (CAP_MMAP \| CAP_SEEK \| 0x0000000000000008ULL) #define CAP_MMAP_RW (CAP_MMAP_R \| CAP_MMAP_W) #define CAP_MMAP_RX (CAP_MMAP_R \| CAP_MMAP_X) #define CAP_MMAP_WX (CAP_MMAP_W \| CAP_MMAP_X) #define CAP_MMAP_RWX (CAP_MMAP_R \| CAP_MMAP_W \| CAP_MMAP_X) #define CAP_RECV CAP_READ #define CAP_SEND CAP_WRITE #define CAP_SOCK_CLIENT \ (CAP_CONNECT \| CAP_GETPEERNAME \| CAP_GETSOCKNAME \| CAP_GETSOCKOPT \| \ CAP_PEELOFF \| CAP_RECV \| CAP_SEND \| CAP_SETSOCKOPT \| CAP_SHUTDOWN) #define CAP_SOCK_SERVER \ (CAP_ACCEPT \| CAP_BIND \| CAP_GETPEERNAME \| CAP_GETSOCKNAME \| \ CAP_GETSOCKOPT \| CAP_LISTEN \| CAP_PEELOFF \| CAP_RECV \| CAP_SEND \| \ CAP_SETSOCKOPT \| CAP_SHUTDOWN) Added defines for backward API compatibility: #define CAP_MAPEXEC CAP_MMAP_X #define CAP_DELETE CAP_UNLINKAT #define CAP_MKDIR CAP_MKDIRAT #define CAP_RMDIR CAP_UNLINKAT #define CAP_MKFIFO CAP_MKFIFOAT #define CAP_MKNOD CAP_MKNODAT #define CAP_SOCK_ALL (CAP_SOCK_CLIENT \| CAP_SOCK_SERVER) Sponsored by: The FreeBSD Foundation Reviewed by: Christoph Mallon <christoph.mallon@gmx.de> Many aspects discussed with: rwatson, benl, jonathan ABI compatibility discussed with: kib	2013-03-02 00:53:12 +00:00
jhb	ca1e2e0739	Further refine the handling of stop signals in the NFS client. The changes in r246417 were incomplete as they did not add explicit calls to sigdeferstop() around all the places that previously passed SBDRY to _sleep(). In addition, nfs_getcacheblk() could trigger a write RPC from getblk() resulting in sigdeferstop() recursing. Rather than manually deferring stop signals in specific places, change the VFS_() and VOP_() methods to defer stop signals for filesystems which request this behavior via a new VFCF_SBDRY flag. Note that this has to be a VFC flag rather than a MNTK flag so that it works properly with VFS_MOUNT() when the mount is not yet fully constructed. For now, only the NFS clients are set this new flag in VFS_SET(). A few other related changes: - Add an assertion to ensure that TDF_SBDRY doesn't leak to userland. - When a lookup request uses VOP_READLINK() to follow a symlink, mark the request as being on behalf of the thread performing the lookup (cnp_thread) rather than using a NULL thread pointer. This causes NFS to properly handle signals during this VOP on an interruptible mount. PR: kern/176179 Reported by: Russell Cattelan (sigdeferstop() recursion) Reviewed by: kib MFC after: 1 month	2013-02-21 19:02:50 +00:00
pjd	c6ea39d1ef	Fix one more compilation issue.	2012-12-01 08:59:36 +00:00
pjd	632d7191a2	IFp4 @208451: Fix path handling for *at() syscalls. Before the change directory descriptor was totally ignored, so the relative path argument was appended to current working directory path and not to the path provided by descriptor, thus wrong paths were stored in audit logs. Now that we use directory descriptor in vfs_lookup, move AUDIT_ARG_UPATH1() and AUDIT_ARG_UPATH2() calls to the place where we hold file descriptors table lock, so we are sure paths will be resolved according to the same directory in audit record and in actual operation. Sponsored by: FreeBSD Foundation (auditdistd) Reviewed by: rwatson MFC after: 2 weeks	2012-11-30 23:18:49 +00:00
pjd	7a831b4b8c	- Add NOCAPCHECK flag to namei that allows lookup to work even if the process is in capability mode. - Add VN_OPEN_NOCAPCHECK flag for vn_open_cred() to will ne converted into NOCAPCHECK namei flag. This functionality will be used to enable core dumps for sandboxed processes. Reviewed by: rwatson Obtained from: WHEEL Systems MFC after: 2 weeks	2012-11-27 10:32:35 +00:00
kib	560aa751e0	Remove the support for using non-mpsafe filesystem modules. In particular, do not lock Giant conditionally when calling into the filesystem module, remove the VFS_LOCK_GIANT() and related macros. Stop handling buffers belonging to non-mpsafe filesystems. The VFS_VERSION is bumped to indicate the interface change which does not result in the interface signatures changes. Conducted and reviewed by: attilio Tested by: pho	2012-10-22 17:50:54 +00:00
kib	01b848f0ee	Add MNTK_LOOKUP_EXCL_DOTDOT struct mount flag, which specifies to the lookup code that dotdot lookups shall override any shared lock requests with the exclusive one. The flag is useful for filesystems which sometimes need to upgrade shared lock to exclusive inside the VOP_LOOKUP or later, which cannot be done safely for dotdot, due to dvp also locked and causing LOR. In collaboration with: pho MFC after: 3 weeks	2012-09-09 19:11:52 +00:00
kib	c3ff56243b	Avoid double-unlock or double unreference for ndp->ni_dvp when the vnode dp lock upgrade right after the 'success' label fails. In collaboration with: pho MFC after: 1 week	2012-01-01 18:45:59 +00:00
des	1b405df8ba	Revisit the capability failure trace points. The initial implementation only logged instances where an operation on a file descriptor required capabilities which the file descriptor did not have. By adding a type enum to struct ktr_cap_fail, we can catch other types of capability failures as well, such as disallowed system calls or attempts to wrap a file descriptor with more capabilities than it had to begin with.	2011-10-18 07:28:58 +00:00
jonathan	f63d2e9205	Allow Capsicum capabilities to delegate constrained access to file system subtrees to sandboxed processes. - Use of absolute paths and '..' are limited in capability mode. - Use of absolute paths and '..' are limited when looking up relative to a capability. - When a name lookup is performed, identify what operation is to be performed (such as CAP_MKDIR) as well as check for CAP_LOOKUP. With these constraints, openat() and friends are now safe in capability mode, and can then be used by code such as the capability-mode runtime linker. Approved by: re (bz), mentor (rwatson) Sponsored by: Google Inc	2011-08-13 09:21:16 +00:00
rwatson	4af919b491	Second-to-last commit implementing Capsicum capabilities in the FreeBSD kernel for FreeBSD 9.0: Add a new capability mask argument to fget(9) and friends, allowing system call code to declare what capabilities are required when an integer file descriptor is converted into an in-kernel struct file *. With options CAPABILITIES compiled into the kernel, this enforces capability protection; without, this change is effectively a no-op. Some cases require special handling, such as mmap(2), which must preserve information about the maximum rights at the time of mapping in the memory map so that they can later be enforced in mprotect(2) -- this is done by narrowing the rights in the existing max_protection field used for similar purposes with file permissions. In namei(9), we assert that the code is not reached from within capability mode, as we're not yet ready to enforce namespace capabilities there. This will follow in a later commit. Update two capability names: CAP_EVENT and CAP_KEVENT become CAP_POST_KEVENT and CAP_POLL_KEVENT to more accurately indicate what they represent. Approved by: re (bz) Submitted by: jonathan Sponsored by: Google Inc	2011-08-11 12:30:23 +00:00
rpaulo	ea11ba6788	Add an extra comment to the SDT probes definition. This allows us to get use '-' in probe names, matching the probe names in Solaris.[1] Add userland SDT probes definitions to sys/sdt.h. Sponsored by: The FreeBSD Foundation Discussed with: rwaston [1]	2010-08-22 11:18:57 +00:00
jhb	d4890c88b0	Add dedicated routines to toggle lockmgr flags such as LK_NOSHARE and LK_CANRECURSE after a lock is created. Use them to implement macros that otherwise manipulated the flags directly. Assert that the associated lockmgr lock is exclusively locked by the current thread when manipulating these flags to ensure the flag updates are safe. This last change required some minor shuffling in a few filesystems to exclusively lock a brand new vnode slightly earlier. Reviewed by: kib MFC after: 3 days	2010-08-20 19:46:50 +00:00
jh	7a3d363cfe	Support only LOOKUP operation for "/" in relookup() because lookup() can't succeed for CREATE, DELETE and RENAME. Discussed with: bde	2010-03-26 11:33:12 +00:00

1 2 3 4

184 Commits