freebsd-skq

Author	SHA1	Message	Date
Mateusz Guzik	ea77ce6ef9	rms: use newly added zpcpu routines instead of direct access where appropriate	2020-02-07 22:44:41 +00:00
Jeff Roberson	a40068e524	Fix a race in smr_advance() that could result in unnecessary poll calls. This was relatively harmless but surprising to see in counters. The race occurred when rd_seq was read after the goal was updated and we incorrectly calculated the delta between them. Reviewed by: rlibby Differential Revision: https://reviews.freebsd.org/D23464	2020-02-06 20:51:46 +00:00
Jeff Roberson	8d7f16a5db	Add some global counters for SMR. These may eventually become per-smr counters. In my stress test there is only one poll for every 15,000 frees. This means we are effectively amortizing the cache coherency overhead even with very high write rates (3M/s/core). Reviewed by: markj, rlibby Differential Revision: https://reviews.freebsd.org/D23463	2020-02-06 20:10:21 +00:00
Pawel Biernacki	210176ad76	sysctl(9): add CTLFLAG_NEEDGIANT flag Add CTLFLAG_NEEDGIANT flag (modelled after D_NEEDGIANT) that will be used to mark sysctls that still require locking Giant. Rewrite sysctl_handle_string() to use internal locking instead of locking Giant. Mark SYSCTL_STRING, SYSCTL_OPAQUE and their variants as MPSAFE. Add infrastructure support for enforcing proper use of CTLFLAG_NEEDGIANT and CTLFLAG_MPSAFE flags with SYSCTL_PROC and SYSCTL_NODE, not enabled yet. Reviewed by: kib (mentor) Approved by: kib (mentor) Differential Revision: https://reviews.freebsd.org/D23378	2020-02-06 12:45:58 +00:00
Mark Johnston	d3631aa582	Avoid releasing object PIP in vn_sendfile() if no pages were grabbed. sendfile(2) optionally takes a set of headers that get prepended to the file data. If the request length is less than that of the headers, sendfile may not allocate an sfio structure, in which case its pointer is null and we should be careful not to dereference. This was introduced in r356902. Reported by: syzkaller Sponsored by: The FreeBSD Foundation	2020-02-05 16:09:21 +00:00
Leandro Lupori	eb5a41cf2f	Add SYSCTL to get KERNBASE and relocated KERNBASE This change adds 2 new SYSCTLs, to retrieve the original and relocated KERNBASE values. This provides an easy, architecture independent way to calculate the running kernel displacement (current/load address minus original base address). The initial goal for this change is to add a new libkvm function that returns the kernel displacement, both for live kernels and crashdumps. This would in turn be used by kgdb to find out how to relocate kernel symbols (if needed). Reviewed by: jhb Differential Revision: https://reviews.freebsd.org/D23284	2020-02-05 11:34:10 +00:00
Mateusz Guzik	1a9fe4528b	fd: always nullify fdp in fget routines Some consumers depend on the pointer being NULL if an error is returned. The guarantee got broken in r357469. Reported by: https://syzkaller.appspot.com/bug?extid=0c9b05e2b727aae21eef Noted by: markj	2020-02-05 00:20:26 +00:00
Ryan Libby	10c8fb47d9	uma: convert mbuf_jumbo_alloc to UMA_ZONE_CONTIG & tag others Remove mbuf_jumbo_alloc and let large mbuf zones use the new uma default contig allocator (a copy of mbuf_jumbo_alloc). Tag other zones which require contiguous objects, even if they don't use the new default contig allocator, so that uma knows about their constraints. Reviewed by: jeff, markj Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D23238	2020-02-04 22:40:23 +00:00
Konstantin Belousov	0783b70974	Remove unneeded assert for curproc. Simplify. Reported by: syzkaller by markj Sponsored by: The FreeBSD Foundation	2020-02-04 21:02:08 +00:00
Mark Johnston	60185d649b	Correct the malloc tag used when freeing the temporary semop(2) buffer. MFC after: 1 week Sponsored by: The FreeBSD Foundation	2020-02-04 20:00:45 +00:00
Dmitry Chagin	cbc1089190	For code reuse in Linuxulator rename get_proccess_cputime() and get_thread_cputime() and add prototypes for it to <sys/syscallsubr.h>. As both functions become a public interface add process lock assert to ensure that the process is not exiting under it. Fix whitespace nit while here. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D23340 MFC after 2 weeks	2020-02-04 05:25:51 +00:00
Jeff Roberson	bc6509845d	Implement a deferred write advancement feature that can be used to further amortize shared cacheline writes. Discussed with: rlibby Differential Revision: https://reviews.freebsd.org/D23462	2020-02-04 02:44:52 +00:00
Jeff Roberson	c8ea36e881	Fix a recursion on the thread lock by acquiring it after call rtp_to_pri(). Reported by: swills Reviewed by: kib, markj Differential Revision: https://reviews.freebsd.org/D23495	2020-02-04 02:42:54 +00:00
Mark Johnston	e489450589	Fix the !SMP case in sched_add() after r355779. If the thread's lock is already that of the runqueue, don't recurse on the queue lock. Reviewed by: jeff, kib Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D23492	2020-02-03 22:49:05 +00:00
Mateusz Guzik	8151b6e92a	fd: partially unengrish the previous commit	2020-02-03 22:34:50 +00:00
Mateusz Guzik	e10f063b30	fd: streamline fget_unlocked clang has the unfortunate property of paying little attention to prediction hints when faced with a loop spanning the majority of the rotuine. In particular fget_unlocked has an unlikely corner case where it starts almost from scratch. Faced with this clang generates a maze of taken jumps, whereas gcc produces jump-free code (in the expected case). Work around the problem by providing a variant which only tries once and resorts to calling the original code if anything goes wrong. While here note that the 'seq' parameter is almost never passed, thus the seldom users are redirected to call it directly.	2020-02-03 22:32:49 +00:00
Mateusz Guzik	52604ed792	fd: remove the seq argument from fget_unlocked It is almost always NULL.	2020-02-03 22:27:55 +00:00
Mateusz Guzik	7f1566f884	fd: remove the seq argument from fget routines It is almost always NULL.	2020-02-03 22:27:03 +00:00
Mateusz Guzik	0a1427c5ab	ktrace: provide ktrstat_error This eliminates a branch from its consumers trading it for an extra call if ktrace is enabled for curthread. Given that this is almost never true, the tradeoff is worth it.	2020-02-03 22:26:00 +00:00
Gleb Smirnoff	0017b2adac	Couple protocol drain routines (frag6_drain and sctp_drain) may send packets. An unexpected behaviour for memory reclamation routine. Anyway, we need enter the network epoch for doing that.	2020-02-03 20:48:57 +00:00
Kyle Evans	3d62f685d5	namei: preserve errors from fget_cap_locked Most notably, we want to make sure we don't clobber any capabilities-related errors. This is a regression from r357412 (O_SEARCH) that was picked up by the capsicum tests. PR: 243839 Reviewed by: kib (committed form recommended by) Tested by: lwhsu Differential Revision: https://reviews.freebsd.org/D23479	2020-02-03 18:59:07 +00:00
Warner Losh	58aa35d429	Remove sparc64 kernel support Remove all sparc64 specific files Remove all sparc64 ifdefs Removee indireeect sparc64 ifdefs	2020-02-03 17:35:11 +00:00
Mateusz Guzik	bcd1cf4f03	capsicum: faster cap_rights_contains Instead of doing a 2 iteration loop (determined at runeimt), take advantage of the fact that the size is already known. While here provdie cap_check_inline so that fget_unlocked does not have to do a function call. Verified with the capsicum suite /usr/tests.	2020-02-03 17:08:11 +00:00
Mateusz Guzik	fee204544e	fd: fix f_count acquire in fget_unlocked The code was using a hand-rolled fcmpset loop, while in other places the same count is manipulated with the refcount API. This transferred from a stylistic issue into a bug after the API got extended to support flags. As a result the hand-rolled loop could bump the count high enough to set the bit flag. Another bump + refcount_release would then free the file prematurely. The bug is only present in -CURRENT.	2020-02-03 14:28:31 +00:00
Mateusz Guzik	f1fa1ba3d0	Fix up various vnode-related asserts which did not dump the used vnode	2020-02-03 14:25:32 +00:00
Kyle Evans	6a5abb1ee5	Provide O_SEARCH O_SEARCH is defined by POSIX [0] to open a directory for searching, skipping permissions checks on the directory itself after the initial open(). This is close to the semantics we've historically applied for O_EXEC on a directory, which is UB according to POSIX. Conveniently, O_SEARCH on a file is also explicitly undefined behavior according to POSIX, so O_EXEC would be a fine choice. The spec goes on to state that O_SEARCH and O_EXEC need not be distinct values, but they're not defined to be the same value. This was pointed out as an incompatibility with other systems that had made its way into libarchive, which had assumed that O_EXEC was an alias for O_SEARCH. This defines compatibility O_SEARCH/FSEARCH (equivalent to O_EXEC and FEXEC respectively) and expands our UB for O_EXEC on a directory. O_EXEC on a directory is checked in vn_open_vnode already, so for completeness we add a NOEXECCHECK when O_SEARCH has been specified on the top-level fd and do not re-check that when descending in namei. [0] https://pubs.opengroup.org/onlinepubs/9699919799/ Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D23247	2020-02-02 16:34:57 +00:00
Mateusz Guzik	2568d5bb79	fd: sprinkle some predits around fget clang inlines fget -> _fget into kern_fstat and eliminates several checkes, but prior to this change it would assume fget_unlocked was likely to fail and consequently avoidable jumps got generated.	2020-02-02 09:38:40 +00:00
Mateusz Guzik	da4f45ea5c	fd: use atomic_load_ptr instead of hand-rolled cast through volatile No change in assembly.	2020-02-02 09:37:16 +00:00
Mateusz Guzik	6698e11f4b	vfs: remove the now empty vop_unlock_post	2020-02-02 09:36:32 +00:00
Mateusz Guzik	7739d92766	cache: replace kern___getcwd with vn_getcwd The previous routine was resulting in extra data copies most notably in linux_getcwd.	2020-02-01 20:38:38 +00:00
Mateusz Guzik	921e7210f8	cache: return the total length from vn_fullpath1 This removes strlen from getcwd.	2020-02-01 20:37:11 +00:00
Mateusz Guzik	4511dd9d41	cache: remove vnode -> path lookup disablement It seems to be of little to no use even when debugging. Interested parties can resurrect it and gate compilation with a macro.	2020-02-01 20:36:35 +00:00
Mateusz Guzik	45757984f8	vfs: consistently use size_t for buflen around VOP_VPTOCNP	2020-02-01 20:34:43 +00:00
Mateusz Guzik	643656cfaf	vfs: replace VOP_MARKATIME with VOP_MMAPPED The routine is only provided by ufs and is only used on mmap and exec. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D23422	2020-02-01 06:46:55 +00:00
Mateusz Guzik	90f4ec3328	vfs: save on atomics on the root vnode for absolute lookups There are 2 back-to-back atomics on the vnode, but we can check upfront if one is sufficient. Similarly we can handle relative lookups where current working directory == root directory. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D23427	2020-02-01 06:40:35 +00:00
Mateusz Guzik	21c4f1041e	vfs: add vrefactn Differential Revision: https://reviews.freebsd.org/D23427	2020-02-01 06:39:49 +00:00
Jeff Roberson	915c367e8e	Add two missing fences with comments describing them. These were found by inspection and after a lengthy discussion with jhb and kib. They have not produced test failures. Don't pointer chase through cpu0's smr. Use cpu correct smr even when not in a critical section to reduce the likelihood of false sharing.	2020-01-31 22:21:15 +00:00
Mark Johnston	1c29da0279	Reimplement stack capture of running threads on i386 and amd64. After r355784 the td_oncpu field is no longer synchronized by the thread lock, so the stack capture interrupt cannot be delievered precisely. Fix this using a loop which drops the thread lock and restarts if the wrong thread was sampled from the stack capture interrupt handler. Change the implementation to use a regular interrupt instead of an NMI. Now that we drop the thread lock, there is no advantage to the latter. Simplify the KPIs. Remove stack_save_td_running() and add a return value to stack_save_td(). On platforms that do not support stack capture of running threads, stack_save_td() returns EOPNOTSUPP. If the target thread is running in user mode, stack_save_td() returns EBUSY. Reviewed by: kib Reported by: mjg, pho Tested by: pho Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D23355	2020-01-31 15:43:33 +00:00
Mateusz Guzik	0f4d8b77c0	vfs: revert the overzealous assert added in r357285 to vgone The intent was to make it more likely to catch filesystems with custom need_inactive routines which fail to call vn_need_pageq_flush (or do an equivalent). One immediate case which is missed is vgone from called by inactive itself. A better assertion may land later. The routine is not added to vputx because it is of no use to tmpfs et al. Reported by: syzbot+5f697ec11f89b60941db@syzkaller.appspotmail.com	2020-01-31 11:31:14 +00:00
Mateusz Guzik	1a78ac2416	Add rms_try_rlock and rms_wowned.	2020-01-31 08:36:49 +00:00
Mateusz Guzik	cedad2916e	Remove an overzealous assert from rms_runlock.	2020-01-31 08:36:23 +00:00
Jeff Roberson	da6e9935e4	Don't use "All rights reserved" in new copyrights. Requested by: rgrimes	2020-01-31 02:08:09 +00:00
Jeff Roberson	d4665eaa66	Implement a safe memory reclamation feature that is tightly coupled with UMA. This is in the same family of algorithms as Epoch/QSBR/RCU/PARSEC but is a unique algorithm. This has 3x the performance of epoch in a write heavy workload with less than half of the read side cost. The memory overhead is significantly lessened by limiting the free-to-use latency. A synthetic test uses 1/20th of the memory vs Epoch. There is significant further discussion in the comments and code review. This code should be considered experimental. I will write a man page after it has settled. After further validation the VM will begin using this feature to permit lockless page lookups. Both markj and cperciva tested on arm64 at large core counts to verify fences on weaker ordering architectures. I will commit a stress testing tool in a follow-up. Reviewed by: mmacy, markj, rlibby, hselasky Discussed with: sbahara Differential Revision: https://reviews.freebsd.org/D22586	2020-01-31 00:49:51 +00:00
Mateusz Guzik	3ff65f71cb	Remove duplicated empty lines from kern/*.c No functional changes.	2020-01-30 20:05:05 +00:00
Mateusz Guzik	2823710f05	Tidy up 2 comments in smp_rendezvous_cpus.	2020-01-30 20:02:14 +00:00
Mateusz Guzik	7ab99925fd	Assert that smp_rendezvous_cpus is called with interrupts enabled.	2020-01-30 19:38:51 +00:00
Mateusz Guzik	d53d924f60	vfs: keep the mount point referenced across sys_quotactl Otherwise we risk running into use-after-free. In particular this codepath ends up dropping all protection before suspending writes: ufs_quotactl -> quotaoff_inchange -> vfs_write_suspend_umnt Reported by: pho	2020-01-30 19:38:12 +00:00
John Baldwin	fbb9879c0c	Fix use of an uninitialized variable. ctx (and thus ctx.flags) is stack garbage at the start of this function, so initialize ctx.flags to an explicit value instead of using binary operations on the garbage. Reported by: gcc9 Reviewed by: imp Differential Revision: https://reviews.freebsd.org/D23368	2020-01-30 18:28:02 +00:00
Mateusz Guzik	c2ef6aa3d5	vfs: assert that doomed vnodes don't need to call vm_object_page_clean ... after the optional inactive processing.	2020-01-30 04:59:08 +00:00
Mateusz Guzik	07c6e2f4ab	vfs: unlazy before dooming the vnode With this change having the listmtx lock held postpones dooming the vnode. Use this fact to simplify iteration over the lazy list. It also allows filters to safely access ->v_data. Reviewed by: kib (early version) Differential Revision: https://reviews.freebsd.org/D23397	2020-01-30 02:12:52 +00:00

... 3 4 5 6 7 ...

17384 Commits