freebsd-skq

Author	SHA1	Message	Date
mjg	f121d45000	vfs: drop the mostly unused flags argument from VOP_UNLOCK Filesystems which want to use it in limited capacity can employ the VOP_UNLOCK_FLAGS macro. Reviewed by: kib (previous version) Differential Revision: https://reviews.freebsd.org/D21427	2020-01-03 22:29:58 +00:00
mjg	048a894ebc	vfs: flatten vop vectors This eliminates the following loop from all VOP calls: while(vop != NULL && \ vop->vop_spare2 == NULL && vop->vop_bypass == NULL) vop = vop->vop_default; Reviewed by: jeff Tesetd by: pho Differential Revision: https://reviews.freebsd.org/D22738	2019-12-16 00:06:22 +00:00
mjg	bcfa67ab8b	vfs: introduce v_irflag and make v_type smaller The current vnode layout is not smp-friendly by having frequently read data avoidably sharing cachelines with very frequently modified fields. In particular v_iflag inspected for VI_DOOMED can be found in the same line with v_usecount. Instead make it available in the same cacheline as the v_op, v_data and v_type which all get read all the time. v_type is avoidably 4 bytes while the necessary data will easily fit in 1. Shrinking it frees up 3 bytes, 2 of which get used here to introduce a new flag field with a new value: VIRF_DOOMED. Reviewed by: kib, jeff Differential Revision: https://reviews.freebsd.org/D22715	2019-12-08 21:30:04 +00:00
kib	c84facbfbd	NDFREE(): Fix unlocking for LOCKPARENT\|LOCKLEAF and ndp->ni_dvp == ndp->ni_vp. NDFREE() calculates unlock_dvp after ndp->ni_vp is unlocked and zeroed out. This makes the comparision of ni_dvp with ni_vp always fail. Move the calculation of unlock_dvp right after unlock_vp, so that the code sees correct ni_vp value. Reproduced by chdir("/usr"); open("/..", O_BENEATH \| O_RDONLY); Reported by: syzkaller Reviewed by: markj, mckusick Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D20304	2019-05-21 15:12:13 +00:00
kib	fe9ffad6ed	Fix renameat(2) for CAPABILITIES kernels. When renameat(2) is used with: - absolute path for to; - tofd not set to AT_FDCWD; - the target exists kern_renameat() requires CAP_UNLINK capability on tofd, but corresponding namei ni_filecap is not initialized at all because the lookup is absolute. As result, the check was done against empty filecap and syscall fails erronously. Fix it by creating a return flags namei member and reporting if the lookup was absolute, then do not touch to.ni_filecaps at all. PR: 222258 Reviewed by: jilles, ngie Sponsored by: The FreeBSD Foundation MFC after: 1 week X-MFC-note: KBI breakage Differential revision: https://reviews.freebsd.org/D19096	2019-02-08 04:18:17 +00:00
mjg	8598ea893e	vfs: mostly depessimize NDINIT_ALL 1) filecaps_init was unnecesarily a function call 2) an asignment at the end was preventing tail calling of cap_rights_init Sponsored by: The FreeBSD Foundation	2018-12-14 03:55:08 +00:00
kib	1d9125b01c	If BENEATH is specified, always latch the topping directory vnode. It is possible that we started with a relative path but during the lookup, found an absolute symlink. In this case, BENEATH handling code needs the latch, but it is too late to calculate it. While there, somewhat improve the assertions. Clear the NI_LCF_LATCH flag when the latch vnode is released, so that asserts know the state. Assert that there is a latch if we entered beneath+abs path mode, after the starting point is processed. Reported by: wulf With more input from: pho Sponsored by: The FreeBSD Foundation	2018-11-29 19:13:10 +00:00
kib	b609d75f3a	Allow absolute paths for O_BENEATH. The path must have a tail which does not escape starting/topping directory. The documentation will come shortly, see the man pages commit message for the reason of separate commit. Reviewed by: jilles (previous version) Discussed with: emaste Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D17714	2018-11-11 00:04:36 +00:00
kib	de9d57cf38	Implement O_BENEATH and AT_BENEATH. Flags prevent open(2) and *at(2) vfs syscalls name lookup from escaping the starting directory. Supposedly the interface is similar to the same proposed Linux flags. Reviewed by: jilles (code, previous version of manpages), 0mp (manpages) Discussed with: allanjude, emaste, jonathan Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D17547	2018-10-25 22:16:34 +00:00
mjg	846a8dd029	vfs: remove lookup_shared tunable Reviewed by: kib, jhb Approved by: re (gjb) Differential Revision: https://reviews.freebsd.org/D17253	2018-09-20 18:25:26 +00:00
mmacy	4eacc08586	vfs: annotate variables only used by debug builds as __unused	2018-05-19 04:59:39 +00:00
pfg	4736ccfd9c	sys: further adoption of SPDX licensing ID tags. Mainly focus on files that use BSD 3-Clause license. The Software Package Data Exchange (SPDX) group provides a specification to make it easier for automated tools to detect and summarize well known opensource licenses. We are gradually adopting the specification, noting that the tags are considered only advisory and do not, in any way, superceed or replace the license texts. Special thanks to Wind River for providing access to "The Duke of Highlander" tool: an older (2014) run over FreeBSD tree was useful as a starting point.	2017-11-20 19:43:44 +00:00
cem	30b90e55c4	vfs_lookup: Allow PATH_MAX-1 symlinks Previously, symlinks in FreeBSD were artificially limited to PATH_MAX-2. Add a short test case to verify the change. Submitted by: Gaurav Gangalwar <ggangalwar AT isilon.com> Reviewed by: kib Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D12589	2017-11-17 19:25:39 +00:00
tijl	5b4d8cee9d	When a Linux program tries to access a /path the kernel tries /compat/linux/path before /path. Stop following symbolic links when looking up /compat/linux/path so dead symbolic links aren't ignored. This allows syscalls like readlink(2) and lstat(2) to work on such links. And open(2) will return an error now instead of trying /path.	2017-10-15 18:53:21 +00:00
jhb	fd0aeed164	Use UMA_ALIGN_PTR instead of sizeof(void ) for zone alignment. uma_zcreate()'s alignment argument is supposed to be sizeof(foo) - 1, and uma.h provides a set of helper macros for common types. Passing sizeof(void ) results in all of the members being misaligned triggering unaligned access faults on certain architectures (notably MIPS). Reported by: brooks Obtained from: CheriBSD MFC after: 3 days Sponsored by: DARPA / AFRL	2017-03-15 18:23:32 +00:00
kib	eacfb4abea	Provide fallback VOP methods for crossmp vnode. In particular, crossmp vnode might leak into rename code. PR: 216380 Reported by: fnacl@protonmail.com Sponsored by: The FreeBSD Foundation X-MFC with: r309425	2017-01-22 19:36:02 +00:00
trasz	5166e57c9a	Fix bug that would result in a kernel crash in some cases involving a symlink and an autofs mount request. The crash was caused by namei() calling bcopy() with a negative length, caused by numeric underflow: in lookup(), in the relookup path, the ni_pathlen was decremented too many times. The bug was introduced in r296715. Big thanks to Alex Deiter for his help with debugging this. Reviewed by: kib@ Tested by: Alex Deiter <alex.deiter at gmail.com> MFC after: 1 month	2017-01-04 14:43:57 +00:00
mjg	f03b37f3e8	vfs: add vrefact, to be used when the vnode has to be already active This allows blind increment of relevant counters which under contention is cheaper than inc-not-zero loops at least on amd64. Use it in some of the places which are guaranteed to see already active vnodes. Reviewed by: kib (previous version)	2016-12-12 15:37:11 +00:00
kib	127162a8d9	Enable lookup_cap_dotdot and lookup_cap_dotdot_nonlocal. Requested and reviewed by: cem Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D8746	2016-12-12 11:12:04 +00:00
mjg	168e218f92	vfs: provide fake locking primitives for the crossmp vnode Since the vnode is only expected to be shared locked, we can save a little overhead by only pretending we are locking in the first place. Reviewed by: kib Tested by: pho	2016-12-02 18:03:15 +00:00
mjg	18de920423	vfs: fix a whitespace nit in r309307	2016-11-30 02:17:03 +00:00
mjg	35980d306d	vfs: avoid VOP_ISLOCKED in the common case in lookup	2016-11-30 02:14:53 +00:00
kib	a41f4cc9a5	Allow some dotdot lookups in capability mode. If dotdot lookup does not escape from the file descriptor passed as the lookup root, we can allow the component traversal. Track the directories traversed, and check the result of dotdot lookup against the recorded list of the directory vnodes. Dotdot lookups are enabled by sysctl vfs.lookup_cap_dotdot, currently disabled by default until more verification of the approach is done. Disallow non-local filesystems for dotdot, since remote server might conspire with the local process to allow it to escape the namespace. This might be too cautious, provide the knob vfs.lookup_cap_dotdot_nonlocal to override as well. Idea by: rwatson Discussed with: emaste, jonathan, rwatson Reviewed by: mjg (previous version) Tested by: pho (previous version) Sponsored by: The FreeBSD Foundation MFC after: 2 week Differential revision: https://reviews.freebsd.org/D8110	2016-11-02 12:43:15 +00:00
kib	b9d3dfb1e0	Remove tautological casts. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2016-11-02 12:10:39 +00:00
kib	1323c841ed	Style fixes. Discussed with: emaste Sponsored by: The FreeBSD Foundation MFC after: 1 week	2016-11-02 12:02:31 +00:00
emaste	00b67b15b9	Renumber license clauses in sys/kern to avoid skipping #3	2016-09-15 13:16:20 +00:00
mjg	7b64fd1139	vfs: provide a common exit point in namei for error cases This shortens the function, adds the SDT_PROBE use for error cases and consistenly unrefs rootdir last. Reviewed by: kib MFC after: 2 weeks	2016-08-27 22:43:41 +00:00
trasz	255ed885fa	Replace all remaining calls to vprint(9) with vn_printf(9), and remove the old macro. MFC after: 1 month	2016-08-10 16:12:31 +00:00
pfg	28823d0656	sys/kern: spelling fixes in comments. No functional change.	2016-04-29 22:15:33 +00:00
pfg	5b3421712d	kern: for pointers replace 0 with NULL. These are mostly cosmetical, no functional change. Found with devel/coccinelle.	2016-04-15 16:10:11 +00:00
trasz	8804916675	Refactor the way we restore cn_lkflags; no functional changes. MFC after: 1 month Sponsored by: The FreeBSD Foundation	2016-03-12 09:05:43 +00:00
trasz	beb648d9cc	Remove cn_consume from 'struct componentname'. It was never set to anything other than 0. Reviewed by: kib@ MFC after: 1 month Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D5611	2016-03-12 08:50:38 +00:00
trasz	faec271eeb	Fix autofs triggering problem. Assume you have an NFS server, 192.168.1.1, with share "share". This commit fixes a problem where "mkdir /net/192.168.1.1/share/meh" would return spurious error instead of creating the directory if the target filesystem wasn't mounted yet; subsequent attempts would work correctly. The failure scenario is kind of complicated to explain, but it all boils down to calling VOP_MKDIR() for the target filesystem (NFS) with wrong dvp - the autofs vnode instead of the filesystem root mounted over it. Reviewed by: kib@ MFC after: 1 month Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D5442	2016-03-12 07:54:42 +00:00
avg	425c0bb088	save some bytes by using more concise SDT_PROBE<n> instead of SDT_PROBE SDT_PROBE requires 5 parameters whereas SDT_PROBE<n> requires n parameters where n is typically smaller than 5. Perhaps SDT_PROBE should be made a private implementation detail. MFC after: 20 days	2015-09-28 12:14:16 +00:00
ed	e8ba6b4817	Properly return ENOTDIR when calling at() on a non-vnode. We already properly return ENOTDIR when calling at() on a non-directory vnode, but it turns out that if you call it on a socket, we see EINVAL. Patch up namei to properly translate this to ENOTDIR.	2015-08-12 16:17:00 +00:00
mjg	03b75d4638	vfs: cosmetic changes to namei and namei_handle_root - don't initialize cnp during declaration - don't test error/!error, compare to 0 instead	2015-07-09 17:17:26 +00:00
mjg	2c26f2224b	vfs: simplify error handling in namei The logic is reorganised so that there is one exit point prior to the lookup loop. This is an intermediate step to making audit logging functions use found vnode instead of translating ni_dirfd on their own. ni_startdir validation is removed. The only in-tree consumer is nfs which already makes sure it is a directory. Reviewed by: kib	2015-07-09 16:32:58 +00:00
mjg	5c01a53df7	vfs: avoid spurious vref/vrele for absolute lookups namei used to vref fd_cdir, which was immediatley vrele'd on entry to the loop. Check for absolute lookup and vref the right vnode the first time. Reviewed by: kib	2015-07-09 15:06:58 +00:00
mjg	5bbdbadc3d	vfs: plug a use-after-free of fd_rdir in namei fd_rdir vnode was stored in ni_rootdir without refing it in any way, after which the filedsc lock was being dropped. The vnode could have been freed by mountcheckdirs or another thread doing chroot. VREF the vnode while the lock is held. Reviewed by: kib MFC after: 1 week	2015-07-09 15:06:24 +00:00
markj	7cfe35b5a5	Move the comment describing namei(9) back to namei()'s definition. MFC after: 3 days	2015-07-05 22:56:41 +00:00
kib	cf11d25e18	Fix two issues with lockmgr(9) LK_CAN_SHARE() test, which determines whether the shared request for already shared-locked lock could be granted. Both problems result in the exclusive locker starvation. The concurrent exclusive request is indicated by either LK_EXCLUSIVE_WAITERS or LK_EXCLUSIVE_SPINNERS flags. The reverse condition, i.e. no exclusive waiters, must check that both flags are cleared. Add a flag LK_NODDLKTREAT for shared lock request to indicate that current thread guarantees that it does not own the lock in shared mode. This turns back the exclusive lock starvation avoidance code; see man page update for detailed description. Use LK_NODDLKTREAT when doing lookup(9). Reported and tested by: pho No objections from: attilio Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2014-11-02 13:10:31 +00:00
mjg	b110d1e264	Plug a memory leak in case of failed lookups in capability mode. Put common cnp cleanup into one function and use it for this purpose. MFC after: 1 week	2014-08-24 12:51:12 +00:00
hselasky	35b126e324	Pull in r267961 and r267973 again. Fix for issues reported will follow.	2014-06-28 03:56:17 +00:00
gjb	fc21f40567	Revert r267961, r267973: These changes prevent sysctl(8) from returning proper output, such as: 1) no output from sysctl(8) 2) erroneously returning ENOMEM with tools like truss(1) or uname(1) truss: can not get etype: Cannot allocate memory	2014-06-27 22:05:21 +00:00
hselasky	bd1ed65f0f	Extend the meaning of the CTLFLAG_TUN flag to automatically check if there is an environment variable which shall initialize the SYSCTL during early boot. This works for all SYSCTL types both statically and dynamically created ones, except for the SYSCTL NODE type and SYSCTLs which belong to VNETs. A new flag, CTLFLAG_NOFETCH, has been added to be used in the case a tunable sysctl has a custom initialisation function allowing the sysctl to still be marked as a tunable. The kernel SYSCTL API is mostly the same, with a few exceptions for some special operations like iterating childrens of a static/extern SYSCTL node. This operation should probably be made into a factored out common macro, hence some device drivers use this. The reason for changing the SYSCTL API was the need for a SYSCTL parent OID pointer and not only the SYSCTL parent OID list pointer in order to quickly generate the sysctl path. The motivation behind this patch is to avoid parameter loading cludges inside the OFED driver subsystem. Instead of adding special code to the OFED driver subsystem to post-load tunables into dynamically created sysctls, we generalize this in the kernel. Other changes: - Corrected a possibly incorrect sysctl name from "hw.cbb.intr_mask" to "hw.pcic.intr_mask". - Removed redundant TUNABLE statements throughout the kernel. - Some minor code rewrites in connection to removing not needed TUNABLE statements. - Added a missing SYSCTL_DECL(). - Wrapped two very long lines. - Avoid malloc()/free() inside sysctl string handling, in case it is called to initialize a sysctl from a tunable, hence malloc()/free() is not ready when sysctls from the sysctl dataset are registered. - Bumped FreeBSD version to indicate SYSCTL API change. MFC after: 2 weeks Sponsored by: Mellanox Technologies	2014-06-27 16:33:43 +00:00
rwatson	33fdc14c0c	Update kernel inclusions of capability.h to use capsicum.h instead; some further refinement is required as some device drivers intended to be portable over FreeBSD versions rely on __FreeBSD_version to decide whether to include capability.h. MFC after: 3 weeks	2014-03-16 10:55:57 +00:00
avg	71889a5eff	dtrace sdt: remove the ugly sname parameter of SDT_PROBE_DEFINE In its stead use the Solaris / illumos approach of emulating '-' (dash) in probe names with '__' (two consecutive underscores). Reviewed by: markj MFC after: 3 weeks	2013-11-26 08:46:27 +00:00
attilio	7ee4e910ce	- For kernel compiled only with KDTRACE_HOOKS and not any lock debugging option, unbreak the lock tracing release semantic by embedding calls to LOCKSTAT_PROFILE_RELEASE_LOCK() direclty in the inlined version of the releasing functions for mutex, rwlock and sxlock. Failing to do so skips the lockstat_probe_func invokation for unlocking. - As part of the LOCKSTAT support is inlined in mutex operation, for kernel compiled without lock debugging options, potentially every consumer must be compiled including opt_kdtrace.h. Fix this by moving KDTRACE_HOOKS into opt_global.h and remove the dependency by opt_kdtrace.h for all files, as now only KDTRACE_FRAMES is linked there and it is only used as a compile-time stub [0]. [0] immediately shows some new bug as DTRACE-derived support for debug in sfxge is broken and it was never really tested. As it was not including correctly opt_kdtrace.h before it was never enabled so it was kept broken for a while. Fix this by using a protection stub, leaving sfxge driver authors the responsibility for fixing it appropriately [1]. Sponsored by: EMC / Isilon storage division Discussed with: rstone [0] Reported by: rstone [1] Discussed with: philip	2013-11-25 07:38:45 +00:00
kib	f9f4aa68f7	Both vn_close() and VFS_PROLOGUE() evaluate vp->v_mount twice, without holding the vnode lock; vp->v_mount is checked first for NULL equiality, and then dereferenced if not NULL. If vnode is reclaimed meantime, second dereference would still give NULL. Change VFS_PROLOGUE() to evaluate the mp once, convert MNTK_SHARED_WRITES and MNTK_EXTENDED_SHARED tests into inline functions. Reviewed by: alc Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2013-11-09 20:30:13 +00:00
pjd	667d7255be	Fix panic in ktrcapfail() when no capability rights are passed. While here, correct all consumers to pass NULL instead of 0 as we pass capability rights as pointers now, not uint64_t. Reported by: Daniel Peyrolon Tested by: Daniel Peyrolon Approved by: re (marius)	2013-09-18 19:26:08 +00:00

1 2 3 4 5

201 Commits