freebsd-dev

Author	SHA1	Message	Date
Mark Johnston	429537caeb	kern_dup(): Call filecaps_free_prep() in a write section. filecaps_free_prep() bzeros the capabilities structure and we need to be careful to synchronize with unlocked readers, which expect a consistent rights structure. Reviewed by: kib, mjg Reported by: syzbot+5f30b507f91ddedded21@syzkaller.appspotmail.com MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D24120	2020-03-19 15:40:05 +00:00
Mateusz Guzik	d2222aa0e9	fd: use smr for managing struct pwd This has a side effect of eliminating filedesc slock/sunlock during path lookup, which in turn removes contention vs concurrent modifications to the fd table. Reviewed by: markj, kib Differential Revision: https://reviews.freebsd.org/D23889	2020-03-08 00:23:36 +00:00
Mateusz Guzik	8d03b99b9d	fd: move vnodes out of filedesc into a dedicated structure The new structure is copy-on-write. With the assumption that path lookups are significantly more frequent than chdirs and chrooting this is a win. This provides stable root and jail root vnodes without the need to reference them on lookup, which in turn means less work on globally shared structures. Note this also happens to fix a bug where jail vnode was never referenced, meaning subsequent access on lookup could run into use-after-free. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D23884	2020-03-01 21:53:46 +00:00
Mateusz Guzik	8243063f9b	fd: make fgetvp_rights work without the filedesc lock Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D23883	2020-03-01 21:50:13 +00:00
Mateusz Guzik	32a86c44ee	fd: use new capsicum helpers	2020-02-15 01:28:55 +00:00
Mateusz Guzik	8f86349f8b	fd: remove no longer needed atomic_load_ptr casts	2020-02-14 23:18:22 +00:00
Mateusz Guzik	6ed30ea4c0	fd: annotate finstall with prediction branches	2020-02-14 11:22:12 +00:00
Kyle Evans	0f5f49eff7	u_char -> vm_prot_t in a couple of places, NFC The latter is a typedef of the former; the typedef exists and these bits are representing vmprot values, so use the correct type. Submitted by: sigsys@gmail.com MFC after: 3 days	2020-02-14 02:22:08 +00:00
Mateusz Guzik	1a9fe4528b	fd: always nullify fdp in fget routines Some consumers depend on the pointer being NULL if an error is returned. The guarantee got broken in r357469. Reported by: https://syzkaller.appspot.com/bug?extid=0c9b05e2b727aae21eef Noted by: markj	2020-02-05 00:20:26 +00:00
Mateusz Guzik	8151b6e92a	fd: partially unengrish the previous commit	2020-02-03 22:34:50 +00:00
Mateusz Guzik	e10f063b30	fd: streamline fget_unlocked clang has the unfortunate property of paying little attention to prediction hints when faced with a loop spanning the majority of the rotuine. In particular fget_unlocked has an unlikely corner case where it starts almost from scratch. Faced with this clang generates a maze of taken jumps, whereas gcc produces jump-free code (in the expected case). Work around the problem by providing a variant which only tries once and resorts to calling the original code if anything goes wrong. While here note that the 'seq' parameter is almost never passed, thus the seldom users are redirected to call it directly.	2020-02-03 22:32:49 +00:00
Mateusz Guzik	52604ed792	fd: remove the seq argument from fget_unlocked It is almost always NULL.	2020-02-03 22:27:55 +00:00
Mateusz Guzik	7f1566f884	fd: remove the seq argument from fget routines It is almost always NULL.	2020-02-03 22:27:03 +00:00
Mateusz Guzik	0a1427c5ab	ktrace: provide ktrstat_error This eliminates a branch from its consumers trading it for an extra call if ktrace is enabled for curthread. Given that this is almost never true, the tradeoff is worth it.	2020-02-03 22:26:00 +00:00
Mateusz Guzik	bcd1cf4f03	capsicum: faster cap_rights_contains Instead of doing a 2 iteration loop (determined at runeimt), take advantage of the fact that the size is already known. While here provdie cap_check_inline so that fget_unlocked does not have to do a function call. Verified with the capsicum suite /usr/tests.	2020-02-03 17:08:11 +00:00
Mateusz Guzik	fee204544e	fd: fix f_count acquire in fget_unlocked The code was using a hand-rolled fcmpset loop, while in other places the same count is manipulated with the refcount API. This transferred from a stylistic issue into a bug after the API got extended to support flags. As a result the hand-rolled loop could bump the count high enough to set the bit flag. Another bump + refcount_release would then free the file prematurely. The bug is only present in -CURRENT.	2020-02-03 14:28:31 +00:00
Mateusz Guzik	2568d5bb79	fd: sprinkle some predits around fget clang inlines fget -> _fget into kern_fstat and eliminates several checkes, but prior to this change it would assume fget_unlocked was likely to fail and consequently avoidable jumps got generated.	2020-02-02 09:38:40 +00:00
Mateusz Guzik	da4f45ea5c	fd: use atomic_load_ptr instead of hand-rolled cast through volatile No change in assembly.	2020-02-02 09:37:16 +00:00
Mateusz Guzik	d3cc535474	vfs: provide F_ISUNIONSTACK as a kludge for libc Prior to introduction of this op libc's readdir would call fstatfs(2), in effect unnecessarily copying kilobytes of data just to check fs name and a mount flag. Reviewed by: kib (previous version) Differential Revision: https://reviews.freebsd.org/D23162	2020-01-17 14:42:25 +00:00
Mateusz Guzik	b249ce48ea	vfs: drop the mostly unused flags argument from VOP_UNLOCK Filesystems which want to use it in limited capacity can employ the VOP_UNLOCK_FLAGS macro. Reviewed by: kib (previous version) Differential Revision: https://reviews.freebsd.org/D21427	2020-01-03 22:29:58 +00:00
Mateusz Guzik	55eb92db8d	fd: static-ize and devolatile openfiles Almost all access is using atomics. The only read is sysctl which should use a whole-int-at-a-time friendly read internally.	2019-12-11 23:09:12 +00:00
Mark Johnston	4a7b33ecf4	Disallow fcntl(F_READAHEAD) when the vnode is not a regular file. The mountpoint may not have defined an iosize parameter, so an attempt to configure readahead on a device file can lead to a divide-by-zero crash. The sequential heuristic is not applied to I/O to or from device files, and posix_fadvise(2) returns an error when v_type != VREG, so perform the same check here. Reported by: syzbot+e4b682208761aa5bc53a@syzkaller.appspotmail.com Reviewed by: kib MFC after: 3 days Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D21864	2019-10-02 15:45:49 +00:00
Kyle Evans	af755d3e48	[1/3] Add mostly Linux-compatible file sealing support File sealing applies protections against certain actions (currently: write, growth, shrink) at the inode level. New fileops are added to accommodate seals - EINVAL is returned by fcntl(2) if they are not implemented. Reviewed by: markj, kib Differential Revision: https://reviews.freebsd.org/D21391	2019-09-25 17:32:43 +00:00
Konstantin Belousov	f1cf2b9dcb	Check and avoid overflow when incrementing fp->f_count in fget_unlocked() and fhold(). On sufficiently large machine, f_count can be legitimately very large, e.g. malicious code can dup same fd up to the per-process filedescriptors limit, and then fork as much as it can. On some smaller machine, I see kern.maxfilesperproc: 939132 kern.maxprocperuid: 34203 which already overflows u_int. More, the malicious code can create transient references by sending fds over unix sockets. I realized that this check is missed after reading https://secfault-security.com/blog/FreeBSD-SA-1902.fd.html Reviewed by: markj (previous version), mjg Tested by: pho (previous version) Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D20947	2019-07-21 15:07:12 +00:00
Mark Johnston	7c3703a694	Use a consistent snapshot of the fd's rights in fget_mmap(). fget_mmap() translates rights on the descriptor to a VM protection mask. It was doing so without holding any locks on the descriptor table, so a writer could simultaneously be modifying those rights. Such a situation would be detected using a sequence counter, but not before an inconsistency could trigger assertion failures in the capability code. Fix the problem by copying the fd's rights to a structure on the stack, and perform the translation only once we know that that snapshot is consistent. Reported by: syzbot+ae359438769fda1840f8@syzkaller.appspotmail.com Reviewed by: brooks, mjg MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D20800	2019-06-29 16:11:09 +00:00
Alan Somers	38b06f8ac4	fcntl: fix overflow when setting F_READAHEAD VOP_READ and VOP_WRITE take the seqcount in blocks in a 16-bit field. However, fcntl allows you to set the seqcount in bytes to any nonnegative 31-bit value. The result can be a 16-bit overflow, which will be sign-extended in functions like ffs_read. Fix this by sanitizing the argument in kern_fcntl. As a matter of policy, limit to IO_SEQMAX rather than INT16_MAX. Also, fifos have overloaded the f_seqcount field for a completely different purpose ever since r238936. Formalize that by using a union type. Reviewed by: cem MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D20710	2019-06-20 23:07:20 +00:00
Konstantin Belousov	bc2d137acb	Make pack_kinfo() available for external callers. Reviewed by: jilles, tmunro Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D20258	2019-05-23 12:25:03 +00:00
Mark Johnston	fd76e780a7	Reject F_SETLK_REMOTE commands when sysid == 0. A sysid of 0 denotes the local system, and some handlers for remote locking commands do not attempt to deal with local locks. Note that F_SETLK_REMOTE is only available to privileged users as it is intended to be used as a testing interface. Reviewed by: kib Reported by: syzbot+9c457a6ae014a3281eb8@syzkaller.appspotmail.com MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D19702	2019-03-25 21:38:58 +00:00
Mateusz Guzik	55fda58146	Rename seq to seqc to avoid namespace clashes with Linux Linux generates the content of procfs files using a mechanism prefixed with seq_*. This in particular came up with recent gcov import. Sponsored by: The FreeBSD Foundation	2019-02-27 22:56:55 +00:00
Matt Macy	ebe0b35a18	Change seq_read to seq_load to avoid namespace conflicts with lkpi MFC after: 1 week Sponsored by: iX Systems	2019-02-23 21:04:48 +00:00
Mark Johnston	093295ae49	Remove an obsolete comment. MFC after: 3 days	2019-02-20 18:29:52 +00:00
Mateusz Guzik	24d64be4c5	vfs: mostly depessimize NDINIT_ALL 1) filecaps_init was unnecesarily a function call 2) an asignment at the end was preventing tail calling of cap_rights_init Sponsored by: The FreeBSD Foundation	2018-12-14 03:55:08 +00:00
Mateusz Guzik	6b2d61136f	fd: dedup code in sys_getdtablesize Sponsored by: The FreeBSD Foundation	2018-12-11 12:08:18 +00:00
Mateusz Guzik	86db4d40ac	fd: tidy up closing a fd - avoid a call to knote_close in the common case - annotate mqueue as unlikely Sponsored by: The FreeBSD Foundation	2018-12-11 11:58:44 +00:00
Mateusz Guzik	663de8167e	fd: stop looking for exact freefile after allocation If a lower fd is closed later, the lookup goes to waste. Allocation always performs the lookup anyway. Sponsored by: The FreeBSD Foundation	2018-12-11 11:57:12 +00:00
Mateusz Guzik	08d005e6a3	fd: use racct_set_unlocked Sponsored by: The FreeBSD Foundation	2018-12-07 16:51:38 +00:00
Mateusz Guzik	82f4b82634	fd: try do less work with the lock in dup Sponsored by: The FreeBSD Foundation	2018-12-07 16:44:52 +00:00
Mateusz Guzik	d47f3fdb0a	fd: unify fd range check across the routines While here annotate out of range as unlikely. Sponsored by: The FreeBSD Foundation	2018-11-29 08:53:39 +00:00
Mateusz Guzik	98fca94d22	capsicum: provide cap_rights_fde_inline Reading caps is in the hot path (on each successful fd lookup), but completely unnecessarily requires a function call. Approved by: re (gjb) Sponsored by: The FreeBSD Foundation	2018-10-12 23:48:10 +00:00
Mateusz Guzik	51e13c93b6	fd: prevent inlining of _fdrop thorough kern_descrip.c fdrop is used in several places in the file and almost never has to call _fdrop. Thus inlining it is a pure waste of space. Approved by: re (kib)	2018-09-20 13:32:40 +00:00
Mateusz Guzik	bcbc8d35eb	fd: stop passing M_ZERO to uma_zalloc The optimisation seen with malloc cannot be used here as zone sizes are now known at compilation. Thus bzero by hand to get the optimisation instead.	2018-07-12 22:48:18 +00:00
Brooks Davis	3a20f06a1c	Use uintptr_t alone when assigning to kvaddr_t variables. Suggested by: jhb	2018-07-10 13:03:06 +00:00
Brooks Davis	7524b4c14b	Correct breakage on 32-bit platforms from r335979.	2018-07-06 10:03:33 +00:00
Brooks Davis	f38b68ae8a	Make struct xinpcb and friends word-size independent. Replace size_t members with ksize_t (uint64_t) and pointer members (never used as pointers in userspace, but instead as unique idenitifiers) with kvaddr_t (uint64_t). This makes the structs identical between 32-bit and 64-bit ABIs. On 64-bit bit systems, the ABI is maintained. On 32-bit systems, this is an ABI breaking change. The ABI of most of these structs was previously broken in r315662. This also imposes a small API change on userspace consumers who must handle kernel pointers becoming virtual addresses. PR: 228301 (exp-run by antoine) Reviewed by: jtl, kib, rwatson (various versions) Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D15386	2018-07-05 13:13:48 +00:00
Ed Maste	b8d908b71e	ANSIfy sys/kern	2018-06-01 13:26:45 +00:00
Matt Macy	acbde29858	capsicum: propagate const correctness	2018-05-19 05:14:05 +00:00
Matt Macy	cbd92ce62e	Eliminate the overhead of gratuitous repeated reinitialization of cap_rights - Add macros to allow preinitialization of cap_rights_t. - Convert most commonly used code paths to use preinitialized cap_rights_t. A 3.6% speedup in fstat was measured with this change. Reported by: mjg Reviewed by: oshogbo Approved by: sbruno MFC after: 1 month	2018-05-09 18:47:24 +00:00
Matt Macy	748ff486b0	`dup1_processes -t 96 -s 5` on a dual 8160 x dup_before + dup_after +------------------------------------------------------------+ \| x + \| \|x x x x ++ ++\| \| \|____AM___\| \|AM\|\| +------------------------------------------------------------+ N Min Max Median Avg Stddev x 5 1.514954e+08 1.5230351e+08 1.5206157e+08 1.5199371e+08 341205.71 + 5 1.5494336e+08 1.5519569e+08 1.5511982e+08 1.5508323e+08 96232.829 Difference at 95.0% confidence 3.08952e+06 +/- 365604 2.03266% +/- 0.245071% (Student's t, pooled s = 250681) Reported by: mjg@ MFC after: 1 week	2018-05-04 06:51:01 +00:00
Mateusz Guzik	7d853f62bf	lockf: slightly depessimize 1. check if P_ADVLOCK is already set and if so, don't lock to set it (stolen from DragonFly) 2. when trying for fast path unlock, check that we are doing unlock first instead of taking the interlock for no reason (e.g. if we want to lock). whilere make it more likely that falling fast path will not take the interlock either by checking for state Note the code is severely pessimized both single- and multithreaded.	2018-04-22 09:30:07 +00:00
John Baldwin	8ce99bb405	Properly do a deep copy of the ioctls capability array for fget_cap(). fget_cap() tries to do a cheaper snapshot of a file descriptor without holding the file descriptor lock. This snapshot does not do a deep copy of the ioctls capability array, but instead uses a different return value to inform the caller to retry the copy with the lock held. However, filecaps_copy() was returning 1 to indicate that a retry was required, and fget_cap() was checking for 0 (actually '!filecaps_copy()'). As a result, fget_cap() did not do a deep copy of the ioctls array and just reused the original pointer. This cause multiple file descriptor entries to think they owned the same pointer and eventually resulted in duplicate frees. The only code path that I'm aware of that triggers this is to create a listen socket that has a restricted list of ioctls and then call accept() which calls fget_cap() with a valid filecaps structure from getsock_cap(). To fix, change the return value of filecaps_copy() to return true if it succeeds in copying the caps and false if it fails because the lock is required. I find this more intuitive than fixing the caller in this case. While here, change the return type from 'int' to 'bool'. Finally, make filecaps_copy() more robust in the failure case by not copying any of the source filecaps structure over. This avoids the possibility of leaking a pointer into a structure if a similar future caller doesn't properly handle the return value from filecaps_copy() at the expense of one more branch. I also added a test case that panics before this change and now passes. Reviewed by: kib Discussed with: mjg (not a fan of the extra branch) MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D15047	2018-04-17 18:07:40 +00:00

1 2 3 4 5 ...

637 Commits