freebsd-dev

Author	SHA1	Message	Date
Randall Stewart	89e560f441	This commit brings in a new refactored TCP stack called Rack. Rack includes the following features: - A different SACK processing scheme (the old sack structures are not used). - RACK (Recent acknowledgment) where counting dup-acks is no longer done instead time is used to knwo when to retransmit. (see the I-D) - TLP (Tail Loss Probe) where we will probe for tail-losses to attempt to try not to take a retransmit time-out. (see the I-D) - Burst mitigation using TCPHTPS - PRR (partial rate reduction) see the RFC. Once built into your kernel, you can select this stack by either socket option with the name of the stack is "rack" or by setting the global sysctl so the default is rack. Note that any connection that does not support SACK will be kicked back to the "default" base FreeBSD stack (currently known as "default"). To build this into your kernel you will need to enable in your kernel: makeoptions WITH_EXTRA_TCP_STACKS=1 options TCPHPTS Sponsored by: Netflix Inc. Differential Revision: https://reviews.freebsd.org/D15525	2018-06-07 18:18:13 +00:00
Alan Cox	5b274055d1	When pidctrl_daemon() is called multiple times within an interval, it should use the cumulative error to calculate the output.	2018-06-07 07:48:50 +00:00
Matt Macy	fcabd54160	AF_UNIX: check for unp == unp2 on disconnect	2018-06-07 04:57:40 +00:00
Alan Cox	e768070ca9	pidctrl_daemon() implements a variation on the classical, discrete PID controller that tries to handle early invocations of the controller, in other words, invocations before the expected end of the interval. However, there were some calculation errors in this early invocation case. Notably, if an early invocation occurred while the error was negative, the derivative term was off by a large amount. One visible effect of this error was that processes were being killed by the virtual memory system's OOM killer when in fact there was plentiful free memory. Correct a couple minor errors in the sysctl descriptions, and apply some style fixes. Reviewed by: jeff, markj	2018-06-07 02:54:11 +00:00
Sean Bruno	1a43cff92a	Load balance sockets with new SO_REUSEPORT_LB option. This patch adds a new socket option, SO_REUSEPORT_LB, which allow multiple programs or threads to bind to the same port and incoming connections will be load balanced using a hash function. Most of the code was copied from a similar patch for DragonflyBSD. However, in DragonflyBSD, load balancing is a global on/off setting and can not be set per socket. This patch allows for simultaneous use of both the current SO_REUSEPORT and the new SO_REUSEPORT_LB options on the same system. Required changes to structures: Globally change so_options from 16 to 32 bit value to allow for more options. Add hashtable in pcbinfo to hold all SO_REUSEPORT_LB sockets. Limitations: As DragonflyBSD, a load balance group is limited to 256 pcbs (256 programs or threads sharing the same socket). This is a substantially different contribution as compared to its original incarnation at svn r332894 and reverted at svn r332967. Thanks to rwatson@ for the substantive feedback that is included in this commit. Submitted by: Johannes Lundberg <johalun0@gmail.com> Obtained from: DragonflyBSD Relnotes: Yes Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D11003	2018-06-06 15:45:57 +00:00
Justin Hibbits	3f9e1fc8ee	Revert r334708 This is the wrong place to put the barrier. Requested by: kib,mjg	2018-06-06 15:12:19 +00:00
Justin Hibbits	32c369f40c	Add a memory barrier after taking a reference on the vnode holdcnt in _vhold This is needed to avoid a race between the VNASSERT() below, and another thread updating the VI_FREE flag, on weakly-ordered architectures. On a 72-thread POWER9, without this barrier a 'make -j72 buildworld' would panic on the assert regularly. It may be possible to use a weaker barrier, and I'll investigate that once all stability issues are worked out on POWER9.	2018-06-06 12:57:11 +00:00
Matt Macy	ebfaf69cc0	hwpmc: log name->pid, name->tid mappings By logging all threads and processes 'pmc filter' can now filter on process or thread name, relieving the user of the burden of determining which tid or pid was which when the sample was taken. % pmc filter -T if_io_tqg -P nginx pmc.log pmc-iflib.log % pmc filter -x -T idle pmc.log pmc-noidle.log	2018-06-05 04:26:40 +00:00
Mark Johnston	97bc9a9384	Regen after r334626.	2018-06-04 19:36:47 +00:00
Mark Johnston	9f9c9b22ec	Reimplement brk() and sbrk() to avoid the use of _end. Previously, libc.so would initialize its notion of the break address using _end, a special symbol emitted by the static linker following the bss section. Compatibility issues between lld and ld.bfd could cause the wrong definition of _end (libc.so's definition rather than that of the executable) to be used, breaking the brk()/sbrk() interface. Avoid this problem and future interoperability issues by simply not relying on _end. Instead, modify the break() system call to return the kernel's view of the current break address, and have libc initialize its state using an extra syscall upon the first use of the interface. As a side effect, this appears to fix brk()/sbrk() usage in executables run with rtld direct exec, since the kernel and libc.so no longer maintain separate views of the process' break address. PR: 228574 Reviewed by: kib (previous version) MFC after: 2 months Differential Revision: https://reviews.freebsd.org/D15663	2018-06-04 19:35:15 +00:00
Alan Cox	3e7cb27cdd	Use a single, consistent approach to returning success versus failure in vm_map_madvise(). Previously, vm_map_madvise() used a traditional Unix- style "return (0);" to indicate success in the common case, but Mach- style return values in the edge cases. Since KERN_SUCCESS equals zero, the only problem with this inconsistency was stylistic. vm_map_madvise() has exactly two callers in the entire source tree, and only one of them cares about the return value. That caller, kern_madvise(), can be simplified if vm_map_madvise() consistently uses Unix-style return values. Since vm_map_madvise() uses the variable modify_map as a Boolean, make it one. Eliminate a redundant error check from kern_madvise(). Add a comment explaining where the check is performed. Explicitly note that exec_release_args_kva() doesn't care about vm_map_madvise()'s return value. Since MADV_FREE is passed as the behavior, the return value will always be zero. Reviewed by: kib, markj MFC after: 7 days	2018-06-04 16:28:06 +00:00
Matt Macy	5de96e33d6	hwpmc: support sampling both kernel and user stacks when interrupted in kernel This adds the -U options to pmcstat which will attribute in-kernel samples back to the user stack that invoked the system call. It is not the default, because when looking at kernel profiles it is generally more desirable to merge all instances of a given system call together. Although heavily revised, this change is directly derived from D7350 by Jonathan T. Looney. Obtained from: jtl Sponsored by: Juniper Networks, Limelight Networks	2018-06-04 01:10:23 +00:00
Mateusz Guzik	d0a22279db	Remove an unused argument to turnstile_unpend. PR: 228694 Submitted by: Julian Pszczołowski <julian.pszczolowski@gmail.com>	2018-06-02 22:37:53 +00:00
Mateusz Guzik	34c538c356	malloc: try to use builtins for zeroing at the callsite Plenty of allocation sites pass M_ZERO and sizes which are small and known at compilation time. Handling them internally in malloc loses this information and results in avoidable calls to memset. Instead, let the compiler take the advantage of it whenever possible. Discussed with: jeff	2018-06-02 22:20:09 +00:00
Mark Johnston	3fb14f61e1	Avoid completing I/O when dumping core after a panic. Filesystem or pager completion callbacks are generally non-functional after a panic and may trigger deadlocks if invoked in this context (e.g., by attempting to destroying a buffer mapping). To avoid this situation, short-circuit I/O completion in biodone(). Reviewed by: imp Discussed with: mav MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D15592	2018-06-01 23:49:32 +00:00
Ed Maste	b8d908b71e	ANSIfy sys/kern	2018-06-01 13:26:45 +00:00
Warner Losh	c580ca4cf4	Make the data returned by devinfo harder to overflow. Rather than using fixed-length strings, pack them into a string table to return. Also expand the buffer from ~300 charaters to 3k. This should be enough, even for USB. This fixes a problem where USB pnp info is truncated on return to userland. Differential Revision: https://reviews.freebsd.org/D15629	2018-05-31 02:57:58 +00:00
Brooks Davis	64b378f1e1	Remove alternative names that are identical to the default. Verified by make sysent producing no changes.	2018-05-30 22:22:58 +00:00
Ed Maste	f912a970e6	link_elf_obj: correct an error message Previously we'd report that a file has "no valid symbol table" if it in fact had two or more. Change the message to report that there must be exactly one.	2018-05-30 12:55:27 +00:00
Matt Macy	e445381f13	epoch(9): make epoch closer to style(9)	2018-05-30 03:39:57 +00:00
Stephen Hurd	3e0e6330b5	iflib: mark irq allocation name parameter as constant The name parameter passed to iflib_irq_alloc_generic and iflib_softirq_alloc_generic is never modified. Many places in code pass string literals and thus should not be modified. Mark the name parameter as a const char * instead, so that we enforce that the name is not modified before passing to bus_describe_intr() Submitted by: Jacob Keller <jacob.e.keller@intel.com> Reviewed by: kmacy Sponsored by: Intel Corporation Differential Revision: https://reviews.freebsd.org/D15343	2018-05-29 21:56:39 +00:00
Gleb Smirnoff	147cd40fe7	Revert second chunk of r333860. The warning from gcc is false positive. The npages won't be ever used in no space case.	2018-05-29 21:45:15 +00:00
Matt Macy	b99aa0fbb2	hwpmc: don't enter epoch section across mmap hook	2018-05-29 18:03:48 +00:00
Brooks Davis	d8b2f0790b	Correct pointer subtraction in KASSERT(). The assertion would never fire without truly spectacular future programming errors. Reported by: Coverity CID: 1391367, 1391368 Sponsored by: DARPA, AFRL	2018-05-29 17:49:03 +00:00
Andriy Gapon	ec6faf94c4	add support for console resuming, implement it for uart, use on x86 This change adds a new optional console method cn_resume and a kernel console interface cnresume. Consoles that may need to re-initialize their hardware after suspend (e.g., because firmware does not care to do it) will implement cn_resume. Note that it is called in rather early environment not unlike early boot, so the same restrictions apply. Platform specific code, for platforms that support hardware suspend, should call cnresume early after resume, before any console output is expected. This change fixes a problem with a system of mine failing to resume when a serial console is used. I found that the serial port was in a strange configuration and an attempt to write to it likely resulted in an infinite loop. To avoid adding cn_resume method to every console driver, CONSOLE_DRIVER macro has been extended to support optional methods. Reviewed by: imp, mav MFC after: 3 weeks Differential Revision: https://reviews.freebsd.org/D15552	2018-05-29 16:16:24 +00:00
Matt Macy	552b3e1798	witness/hwpmc: fix locking order for pmc locks	2018-05-28 23:14:38 +00:00
Eric van Gyzen	70d66bcf28	kern_cpuset: fix small leak on error path The "mask" was leaked on some error paths. Reported by: Coverity CID: 1384683 Sponsored by: Dell EMC	2018-05-26 14:23:11 +00:00
Eric van Gyzen	16b51429d2	kdb_trap: Fix use of uninitialized data In some cases, other_cpus was used without being initialized. Thankfully, it was harmless. Reported by: Coverity CID: 1385265 Sponsored by: Dell EMC	2018-05-26 14:01:44 +00:00
Brooks Davis	659a2e9243	Regen after r334223: make vadvise compat freebsd11.	2018-05-25 20:41:26 +00:00
Brooks Davis	7351a8bdb5	Make vadvise compat freebsd11. The vadvise syscall (aka ovadvise) is undocumented and has always been implmented as returning EINVAL. Put the syscall under COMPAT11 and provide a userspace implementation. Reviewed by: kib Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D15557	2018-05-25 20:40:23 +00:00
Matt Macy	acf9fd05d8	AF_UNIX: It is possible for UNIX datagram sockets to be connected to themselves. The updated code assumed that that could not happen and would try to lock the unp mutex twice. There may be a lingering issue here but this fixes it for the reporter. PR: 228458 Reported by: marieheleneka at gmail.com	2018-05-24 21:13:46 +00:00
Matt Macy	c684c14ce3	AF_UNIX: evidently Samba likes to connect a unix socket to itself, fix locking	2018-05-24 18:22:13 +00:00
Matt Macy	a3a734908b	AF_UNIX in connectat unp and unp2 can be the same	2018-05-24 18:22:05 +00:00
Conrad Meyer	a0638b33f7	Yank crufty INTR_FILTER option It was introduced to the tree in r169320 and r169321 in May 2007. It never got much use and never became a kernel default. The code duplicates the default path quite a bit, with slight modifications. Just yank out the cruft. Whatever goals were being aimed for can probably be met within the existing framework, without a flag day option. Mostly mechanical change: 'unifdef -m -UINTR_FILTER'. Reviewed by: mmacy Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D15546	2018-05-24 17:06:00 +00:00
Brooks Davis	5f77b8a88b	Avoid two suword() calls per auxarg entry. Instead, construct an auxargs array and copy it out all at once. Use an array of Elf_Auxinfo rather than pairs of Elf_Addr * to represent the array. This is the correct type where pairs of words just happend to work. To reduce the size of the diff, AUXARGS_ENTRY is altered to act on this array rather than introducing a new macro. Return errors on copyout() and suword() failures and handle them in the caller. Incidentally fixes AT_RANDOM and AT_EXECFN in 32-bit linux on amd64 which incorrectly used AUXARG_ENTRY instead of AUXARGS_ENTRY_32 (now removed due to the use of proper types). Reviewed by: kib Comments from: emaste, jhb Obtained from: CheriBSD Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D15485	2018-05-24 16:25:18 +00:00
Bjoern A. Zeeb	bb8f162363	Try to be consistent and spell "vnet" lower case like all the other options (and as we do on command line). Sponsored by: iXsystems, Inc.	2018-05-24 15:31:05 +00:00
Bjoern A. Zeeb	36b41cc336	Improve the KASSERT to also have the prison pointer. Helpful when debugging from ddb. Sponsored by: iXsystems, Inc.	2018-05-24 15:28:21 +00:00
Matt Macy	16529dace8	AF_UNIX: assert that we're not acquiring the same lock	2018-05-24 15:28:16 +00:00
Mateusz Guzik	6fee84e35e	Remove incorrect owepreempt assertion added in r334062 Yet another preemption request hitting between the counter being 0 and the check being reached will result in the flag no longer being set. Note the situation was already present prior to r334062 and is harmless. Reported by: pho Reviewed by: kib	2018-05-23 10:13:17 +00:00
Matt Macy	8a656309b3	kern_sendit: use pre-initialized rights	2018-05-23 01:48:09 +00:00
Mateusz Guzik	748b15fc02	Move preemption handling out of critical_exit. In preperataion for making the enter/exit pair inline. Reviewed by: kib	2018-05-22 19:24:57 +00:00
Fabien Thomas	f8e73c47d8	Add a SPD cache to speed up lookups. When large SPDs are used, we face two problems: - too many CPU cycles are spent during the linear searches in the SPD for each packet - too much contention on multi socket systems, since we use a single shared lock. Main changes: - added the sysctl tree 'net.key.spdcache' to control the SPD cache (disabled by default). - cache the sp indexes that are used to perform SP lookups. - use a range of dedicated mutexes to protect the cache lines. Submitted by: Emeric Poupon <emeric.poupon@stormshield.eu> Reviewed by: ae Sponsored by: Stormshield Differential Revision: https://reviews.freebsd.org/D15050	2018-05-22 15:54:25 +00:00
Mateusz Guzik	ee252fc995	sx: fixup a braino in r334024 If a thread waiting on sx dropped Giant it would not be properly reacquired on exit from the routine, later resulting in panics indicating Giant is not held (when it should be). The bug was not present in the original patch sent to pho, I wittingly added it just prior to the commit and only smoke-tested it. Reported by: pho	2018-05-22 15:13:25 +00:00
Mateusz Guzik	99ece3a9cd	Reduce sdt-related branch-fest in mi_switch. The code was evaluating flags before resorting to checking if dtrace is enabled. This was inducing forward jumps in the common case.	2018-05-22 08:27:33 +00:00
Mateusz Guzik	2466d12b09	sx: port over writer starvation prevention measures from rwlock A constant stream of readers could completely starve writers and this is not a hypothetical scenario. The 'poll2_threads' test from the will-it-scale suite reliably starves writers even with concurrency < 10 threads. The problem was run into and diagnosed by dillon@backplane.com There was next to no change in lock contention profile during -j 128 pkg build, despite an sx lock being at the top. Tested by: pho	2018-05-22 07:20:22 +00:00
Mateusz Guzik	9feec7ef69	rw: decrease writer starvation Writers waiting on readers to finish can set the RW_LOCK_WRITE_SPINNER bit. This prevents most new readers from coming on. However, the last reader to unlock also clears the bit which means new readers can sneak in and the cycle starts over. Change the code to keep the bit after last unlock. Note that starvation potential is still there: no matter how many write spinners are there, there is one bit. After the writer unlocks, the lock is free to get raided by readers again. It is good enough for the time being. The real fix would include counting writers. This runs into a caveat: the writer which set the bit may now be preempted. In order to get rid of the problem all attempts to set the bit are preceeded with critical_enter. The bit gets cleared when the thread which set it goes to sleep. This way an invariant holds that if the bit is set, someone is actively spinning and will grab the lock soon. In particular this means that readers which find the lock in this transient state can safely spin until the lock finds itself an owner (i.e. they don't need to block nor speculate how long to spin speculatively). Tested by: pho	2018-05-22 07:16:39 +00:00
Andriy Gapon	27dca831a6	stop and restart kernel event timers in the suspend / resume cycle I have a system that is very unstable after resuming from suspend-to-RAM but only if HPET is used as the event timer. The theory is that SMM code / firmware could be enabling HPET for its own uses and unexpected interrupts cause a trouble for it. Originally I wanted to solve the problem in hpet_suspend() method, but that was insufficient as the event timer could get reprogrammed again. So, it's better, for my case and in general, to stop the event timer(s) before entering the hardware suspend. MFC after: 4 weeks Differential Revision: https://reviews.freebsd.org/D15413	2018-05-21 20:23:04 +00:00
Mark Johnston	13679ebac9	Don't pass a section cookie to CK for non-preemptible epoch sections. They're only useful when multiple threads may share an epoch record, and that can't happen with non-preemptible sections. Reviewed by: mmacy Differential Revision: https://reviews.freebsd.org/D15507	2018-05-21 16:03:51 +00:00
Matt Macy	9725c9cef7	AF_UNIX gc unused label ...sigh	2018-05-20 21:37:34 +00:00
Matt Macy	cb8f450b94	AF_UNIX: Don't unlock unp/unp2 if they're not locked Reported by: mjg	2018-05-20 21:20:26 +00:00
Matt Macy	e10ef65d23	AF_UNIX: fix LOR introduced by the locking rewrite	2018-05-20 05:50:53 +00:00
Matt Macy	7118990962	Add additional preinitialized cap_rights	2018-05-20 05:13:12 +00:00
Mateusz Guzik	2186ee6e72	vfs: simplify vop_stdlock/unlock The interlock pointer is non-NULL by definition and the compiler see through that and eliminates the NULL checks. Just remove them from the code as they play no role. No difference in generated assembly.	2018-05-20 04:45:05 +00:00
Matt Macy	d95253403f	AF_UNIX: make unpcb lock name line up with what's in witness	2018-05-20 04:32:48 +00:00
Warner Losh	f344fb0b4b	Restore the all rights reserved language. Put it on each of the prior two copyrights. The line originated with the Berkeely Regents, who we have not approached about removing it (it's honestly too trivial to be worth that fight). Restore it to rwatson's line as well. He can decide if he wants it or not on his own. Matt clearly doesn't want it, per project preference and his own statements on IRC. Noticed by: rgrimes@	2018-05-19 17:29:57 +00:00
Ed Maste	1b30e10e48	Remove duplicate cap_no_rights from r333874 Archs using in-tree gcc were broken with `warning: redundant redeclaration of 'cap_no_rights' [-Wredundant-decls]`. Sponsored by: The FreeBSD Foundation	2018-05-19 11:37:02 +00:00
Matt Macy	f6a1a10613	Unbreak BeagleBone Black boot by collapsing 29 SYSINITs in to 1 Reported by: ilya at bakulin.de	2018-05-19 07:31:35 +00:00
Matt Macy	fc2e87be2b	intr unbreak KTR/LINT build	2018-05-19 07:04:43 +00:00
Matt Macy	4b06dee1e5	AF_UNIX: switch to annotations to avoid warnings	2018-05-19 05:37:58 +00:00
Matt Macy	acbde29858	capsicum: propagate const correctness	2018-05-19 05:14:05 +00:00
Matt Macy	ba3f7276c0	intr: eliminate / annotate unused stack locals	2018-05-19 05:12:18 +00:00
Matt Macy	7fd6841438	sendfile: annotate unused value and ensure that npages is actually initialized	2018-05-19 05:10:51 +00:00
Matt Macy	e1a92f058f	umtx: don't call umtxq_getchain unless the value is needed	2018-05-19 05:09:10 +00:00
Matt Macy	a6c7423a92	cpuset: revert and annotate instead	2018-05-19 05:07:31 +00:00
Matt Macy	6fa5abfdda	conf: revert last change and annotate unused var instead	2018-05-19 05:07:03 +00:00
Matt Macy	1c0336c1c1	kevent: annotate unused stack local	2018-05-19 05:06:18 +00:00
Matt Macy	788390df0a	lockf: annotate LOCKF_DEBUG only var	2018-05-19 05:04:38 +00:00
Matt Macy	d1230b1159	capsicum: annotate variable only used by debug	2018-05-19 05:02:40 +00:00
Matt Macy	3adccf38e3	turnstile / sleepqueue: annotate variables only used by debug builds	2018-05-19 05:00:16 +00:00
Matt Macy	84482abd21	vfs: annotate variables only used by debug builds as __unused	2018-05-19 04:59:39 +00:00
Matt Macy	a2bb4e080e	tty: use __unused annotation instead to silence warnings	2018-05-19 04:48:26 +00:00
Matt Macy	5072a5f465	malloc: avoid possibly returning stack garbage if MALLOC_DEBUG is defined	2018-05-19 04:43:49 +00:00
Matt Macy	39eef2f45a	cpuset_thread0: avoid unused assignment on non debug build	2018-05-19 04:14:00 +00:00
Matt Macy	926cfdb8da	make_dev: avoid unused assignments on non debug builds	2018-05-19 04:13:20 +00:00
Matt Macy	4949ad7264	mqueue: avoid unused variables	2018-05-19 04:10:53 +00:00
Matt Macy	cd6ba3f086	physio: avoid uninitialized variables	2018-05-19 04:09:58 +00:00
Matt Macy	e9b1074bc7	cache_lookup remove unused variable and initialize used	2018-05-19 04:08:11 +00:00
Matt Macy	ec8d23352b	filt_timerdetach: only assign to old if we're going to check it in a KASSERT	2018-05-19 04:07:00 +00:00
Matt Macy	5cc2d25a2b	getnextevent: put variable only used by KTR under ifdef KTR	2018-05-19 04:05:36 +00:00
Matt Macy	bfd0eacb02	simplify control flow so that gcc knows we never pass save to curthread_pflags_restore without initializing	2018-05-19 04:04:44 +00:00
Matt Macy	3ef78c9c96	tty: conditionally assign to ret value only used by MPASS statement	2018-05-19 04:02:29 +00:00
Matt Macy	02fe8a2409	remove unused locked variable in lockmgr_unlock_fast_path	2018-05-19 03:58:40 +00:00
Matt Macy	ddd4d15ecd	signotify: don't create a stack local that isn't used on non-debug builds	2018-05-19 03:57:41 +00:00
Matt Macy	46117e1f0c	sysv_msg initialize saved_msgsz	2018-05-19 03:56:39 +00:00
Matt Macy	11d4f748d7	remove unused variable	2018-05-19 03:55:42 +00:00
Matt Macy	1dce110f63	fix uninitialized variable warning in reader locks	2018-05-19 03:52:55 +00:00
Matt Macy	b203713694	fix uninitialized variable warning	2018-05-19 03:49:36 +00:00
Matt Macy	ac8b2d5cb1	sys_process.c fix set but not used warning	2018-05-19 03:48:35 +00:00
Matt Macy	e339e43685	subr_epoch.c fix unused variable warnings	2018-05-19 03:47:37 +00:00
Matt Macy	ae6be8e6f7	pidctrl Actually use the variables that we assign to as seatbelts to prevent divide by zero Reviewed by: jeffr	2018-05-19 02:17:18 +00:00
Matt Macy	c0874c3468	fix gcc8 unused variable and set but not used variable in unix sockets add copyright from lock rewrite while here	2018-05-19 02:15:40 +00:00
Mateusz Guzik	10391db530	lockmgr: avoid atomic on unlock in the slow path The code is pretty much guaranteed not to be able to unlock. This is a minor nit. The code still performs way too many reads. The altered exclusive-locked condition is supposed to be always true as well, to be cleaned up at a later date.	2018-05-18 22:57:52 +00:00
Matt Macy	d7c5a620e2	ifnet: Replace if_addr_lock rwlock with epoch + mutex Run on LLNW canaries and tested by pho@ gallatin: Using a 14-core, 28-HTT single socket E5-2697 v3 with a 40GbE MLX5 based ConnectX 4-LX NIC, I see an almost 12% improvement in received packet rate, and a larger improvement in bytes delivered all the way to userspace. When the host receiving 64 streams of netperf -H $DUT -t UDP_STREAM -- -m 1, I see, using nstat -I mce0 1 before the patch: InMpps OMpps InGbs OGbs err TCP Est %CPU syscalls csw irq GBfree 4.98 0.00 4.42 0.00 4235592 33 83.80 4720653 2149771 1235 247.32 4.73 0.00 4.20 0.00 4025260 33 82.99 4724900 2139833 1204 247.32 4.72 0.00 4.20 0.00 4035252 33 82.14 4719162 2132023 1264 247.32 4.71 0.00 4.21 0.00 4073206 33 83.68 4744973 2123317 1347 247.32 4.72 0.00 4.21 0.00 4061118 33 80.82 4713615 2188091 1490 247.32 4.72 0.00 4.21 0.00 4051675 33 85.29 4727399 2109011 1205 247.32 4.73 0.00 4.21 0.00 4039056 33 84.65 4724735 2102603 1053 247.32 After the patch InMpps OMpps InGbs OGbs err TCP Est %CPU syscalls csw irq GBfree 5.43 0.00 4.20 0.00 3313143 33 84.96 5434214 1900162 2656 245.51 5.43 0.00 4.20 0.00 3308527 33 85.24 5439695 1809382 2521 245.51 5.42 0.00 4.19 0.00 3316778 33 87.54 5416028 1805835 2256 245.51 5.42 0.00 4.19 0.00 3317673 33 90.44 5426044 1763056 2332 245.51 5.42 0.00 4.19 0.00 3314839 33 88.11 5435732 1792218 2499 245.52 5.44 0.00 4.19 0.00 3293228 33 91.84 5426301 1668597 2121 245.52 Similarly, netperf reports 230Mb/s before the patch, and 270Mb/s after the patch Reviewed by: gallatin Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D15366	2018-05-18 20:13:34 +00:00
Matt Macy	20ba6811e6	epoch(9): assert that epoch is allocated post-configure	2018-05-18 18:27:17 +00:00
Ed Maste	891cf3ed44	Use NULL for SYSINIT's last arg, which is a pointer type Sponsored by: The FreeBSD Foundation	2018-05-18 17:58:09 +00:00
Matt Macy	70398c2f86	epoch(9): Make epochs non-preemptible by default There are risks associated with waiting on a preemptible epoch section. Change the name to make them not be the default and document the issue under CAVEATS. Reported by: markj	2018-05-18 17:29:43 +00:00
Matt Macy	60b7b90d65	epoch: actually allocate the counters we've assigned sysctls too Approved by: sbruno	2018-05-18 02:57:39 +00:00
Matt Macy	5e68a3dfe3	epoch: add non-preemptible "critical" variant adds: - epoch_enter_critical() - can be called inside a different epoch, starts a section that will acquire any MTX_DEF mutexes or do anything that might sleep. - epoch_exit_critical() - corresponding exit call - epoch_wait_critical() - wait variant that is guaranteed that any threads in a section are running. - epoch_global_critical - an epoch_wait_critical safe epoch instance Requested by: markj Approved by: sbruno	2018-05-18 01:52:51 +00:00
Brooks Davis	dedc82ae26	Use strsep() to parse init_path in start_init(). This simplifies the use of the path variable by making it NUL terminated. This is a prerequisite for further cleanups. Reviewed by: imp Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D15467	2018-05-17 23:07:51 +00:00
Matt Macy	a5f1042498	epoch: skip poll function call in hardclock unless there are callbacks pending Reported by: mjg Approved by: sbruno	2018-05-17 21:39:15 +00:00
Matt Macy	c4d901e9bd	epoch(9): schedule pcpu callback task in hardclock if there are callbacks pending Approved by: sbruno	2018-05-17 19:57:07 +00:00
Matt Macy	2a45e8282a	epoch(9): eliminate the need to wait when polling for callbacks to run by using ck's own callback handling mechanism we can simply check which callbacks have had a grace period elapse Approved by: sbruno	2018-05-17 19:50:55 +00:00
Matt Macy	d1bcb409f6	epoch(9): fix potential deadlock Don't acquire a waiting thread's lock while holding our own Approved by: sbruno	2018-05-17 19:41:58 +00:00
Matt Macy	766d225326	epoch(9): restore thread priority on exit if it was changed by a waiter Reported by: markj Approved by: sbruno	2018-05-17 19:08:28 +00:00
Matt Macy	75a67bf3d0	AF_UNIX: make unix socket locking finer grained This change moves to using a reference count across lock drop / reacquire to guarantee liveness. Currently sends on unix sockets contend heavily on read locking the list lock. unix1_processes in will-it-scale peaks at 6 processes and then declines. With this change I get a substantial improvement in number of operations per second with 96 processes: x before + after N Min Max Median Avg Stddev x 11 1688420 1696389 1693578 1692766.3 2971.1702 + 10 63417955 71030114 70662504 69576423 2374684.6 Difference at 95.0% confidence 6.78837e+07 +/- 1.49463e+06 4010.22% +/- 88.4246% (Student's t, pooled s = 1.63437e+06) And even for 2 processes shows a ~18% improvement. "Small" iron changes (1, 2, and 4 processes): x before1 + after1.2 +------------------------------------------------------------------------+ \| + \| \| x + \| \| x + \| \| x + \| \| x ++ \| \| xx ++ \| \|x x xx ++ \| \| \|__________________A_____M_____AM____\|\| +------------------------------------------------------------------------+ N Min Max Median Avg Stddev x 10 1131648 1197750 1197138.5 1190369.3 20651.839 + 10 1203840 1205056 1204919 1204827.9 353.27404 Difference at 95.0% confidence 14458.6 +/- 13723 1.21463% +/- 1.16683% (Student's t, pooled s = 14605.2) x before2 + after2.2 +------------------------------------------------------------------------+ \| +\| \| +\| \| +\| \| +\| \| +\| \| +\| \| x +\| \| x +\| \| x xx +\| \|x xxxx +\| \| \|___AM_\| A\| +------------------------------------------------------------------------+ N Min Max Median Avg Stddev x 10 1972843 2045866 2038186.5 2030443.8 21367.694 + 10 2400853 2402196 2401043.5 2401172.7 385.40024 Difference at 95.0% confidence 370729 +/- 14198.9 18.2585% +/- 0.826943% (Student's t, pooled s = 15111.7) x before4 + after4.2 N Min Max Median Avg Stddev x 10 3986994 3991728 3990137.5 3989985.2 1300.0164 + 10 4799990 4806664 4806116.5 4805194 1990.6625 Difference at 95.0% confidence 815209 +/- 1579.64 20.4314% +/- 0.0421713% (Student's t, pooled s = 1681.19) Tested by: pho Reported by: mjg Approved by: sbruno Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D15430	2018-05-17 17:59:35 +00:00
Matt Macy	fdf71aeb54	epoch(9): make recursion lighter weight There isn't any real work to do except bump td_epochnest when recursing. Skip the additional work in this case. Approved by: sbruno	2018-05-17 01:13:40 +00:00
Matt Macy	b8205686b4	epoch(9): Guarantee forward progress on busy sections Add epoch section to struct thread. We can use this to ennable epoch counter to advance even if a section is perpetually occupied by a thread. Approved by: sbruno	2018-05-17 00:45:35 +00:00
Matt Macy	6161b98c99	hwpmc: Implement per-thread counters for PMC sampling This implements per-thread counters for PMC sampling. The thread descriptors are stored in a list attached to the process descriptor. These thread descriptors can store any per-thread information necessary for current or future features. For the moment, they just store the counters for sampling. The thread descriptors are created when the process descriptor is created. Additionally, thread descriptors are created or freed when threads are started or stopped. Because the thread exit function is called in a critical section, we can't directly free the thread descriptors. Hence, they are freed to a cache, which is also used as a source of allocations when needed for new threads. Approved by: sbruno Obtained from: jtl Sponsored by: Juniper Networks, Limelight Networks Differential Revision: https://reviews.freebsd.org/D15335	2018-05-16 22:29:20 +00:00
Jean-Sébastien Pédron	547e74a8be	teken, vt(4): New callbacks to lock the terminal once ... to process input, instead of inside each smaller operations such as appending a character or moving the cursor forward. In other words, before we were doing (oversimplified): teken_input() <for each input character> vtterm_putchar() VTBUF_LOCK() VTBUF_UNLOCK() vtterm_cursor_position() VTBUF_LOCK() VTBUF_UNLOCK() Now, we are doing: vtterm_pre_input() VTBUF_LOCK() teken_input() <for each input character> vtterm_putchar() vtterm_cursor_position() vtterm_post_input() VTBUF_UNLOCK() The situation was even worse when the vtterm_copy() and vtterm_fill() callbacks were involved. The new callbacks are: * struct terminal_class->tc_pre_input() * struct terminal_class->tc_post_input() They are called in teken_input(), surrounding the while() loop. The goal is to improve input processing speed of vt(4). As a benchmark, here is the time taken to write a text file of 360 000 lines (26 MiB) on `ttyv0`: * vt(4), unmodified: 1500 ms * vt(4), with this patch: 1200 ms * syscons(4): 700 ms This is on a Haswell laptop with a GENERIC-NODEBUG kernel. At the same time, the locking is changed in the vt_flush() function which is responsible to draw the text on screen. So instead of (indirectly) using VTBUF_LOCK() just to read and reset the dirty area of the internal buffer, the lock is held for about the entire function, including the drawing part. The change is mostly visible while content is scrolling fast: before, lines could appear garbled while scrolling because the internal buffer was accessed without locks (once the scrolling was finished, the output was correct). Now, the scrolling appears correct. In the end, the locking model is closer to what syscons(4) does. Differential Revision: https://reviews.freebsd.org/D15302	2018-05-16 09:01:02 +00:00
Ed Maste	ea0939f0af	subr_pidctrl: use standard 2-Clause FreeBSD license and disclaimer Approved by: jeff	2018-05-15 00:50:09 +00:00
Matt Macy	0f00315cb3	hwpmc: fix load/unload race and vm map LOR - fix load/unload race by allocating the per-domain list structure at boot - fix long extant vm map LOR by replacing pmc_sx sx_slock with global_epoch to protect the liveness of elements of the pmc_ss_owners list Reported by: pho Approved by: sbruno	2018-05-14 00:21:04 +00:00
Matt Macy	0c58f85b8d	epoch(9): allow sx locks to be held across epoch_wait() The INVARIANTS checks in epoch_wait() were intended to prevent the block handler from returning with locks held. What it in fact did was preventing anything except Giant from being held across it. Check that the number of locks held has not changed instead. Approved by: sbruno@	2018-05-14 00:14:00 +00:00
Matt Macy	1f4beb6312	epoch(9): cleanups, additional debug checks, and add global_epoch - GC the _nopreempt routines - to really benefit we'd need a separate routine - they're not currently in use - they complicate the API for no benefit at this time - check that we're actually in a epoch section at exit - handle epoch_call() early in boot - Fix copyright declaration language Approved by: sbruno@	2018-05-13 23:24:48 +00:00
Konstantin Belousov	2ebc882927	Detect and optimize reads from the hole on UFS. - Create getblkx(9) variant of getblk(9) which can return error. - Add GB_NOSPARSE flag for getblk()/getblkx() which requests that BMAP was performed before the buffer is created, and EJUSTRETURN returned in case the requested block does not exist. - Make ffs_read() use GB_NOSPARSE to avoid instantiating buffer (and allocating the pages for it), copying from zero_region instead. The end result is less page allocations and buffer recycling when a hole is read, which is important for some benchmarks. Requested and reviewed by: jeff Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D14917	2018-05-13 09:47:28 +00:00
Matt Macy	f1401123c5	hwpmc/epoch - don't reference domain if NUMA is not set It appears that domain information is set correctly independent of whether or not NUMA is defined. However, there is no memory backing secondary domains leading to allocation failure. Reported by: pho@, np@ Approved by: sbruno@	2018-05-12 20:00:29 +00:00
Matt Macy	e6b475e0af	hwpmc(9): Make pmclog buffer pcpu and update constants On non-trivial SMP systems the contention on the pmc_owner mutex leads to a substantial number of samples captured being from the pmc process itself. This change a) makes buffers larger to avoid contention on the global list b) makes the working sample buffer per cpu. Run pmcstat in the background (default event rate of 64k): pmcstat -S UNHALTED_CORE_CYCLES -O /dev/null sleep 600 & Before: make -j96 buildkernel -s >&/dev/null 3336.68s user 24684.10s system 7442% cpu 6:16.50 total After: make -j96 buildkernel -s >&/dev/null 2697.82s user 1347.35s system 6058% cpu 1:06.77 total For more realistic overhead measurement set the sample rate for ~2khz on a 2.1Ghz processor: pmcstat -n 1050000 -S UNHALTED_CORE_CYCLES -O /dev/null sleep 6000 & Collecting 10 samples of `make -j96 buildkernel` from each: x before + after real time: N Min Max Median Avg Stddev x 10 76.4 127.62 84.845 88.577 15.100031 + 10 59.71 60.79 60.135 60.179 0.29957192 Difference at 95.0% confidence -28.398 +/- 10.0344 -32.0602% +/- 7.69825% (Student's t, pooled s = 10.6794) system time: N Min Max Median Avg Stddev x 10 2277.96 6948.53 2949.47 3341.492 1385.2677 + 10 1038.7 1081.06 1070.555 1064.017 15.85404 Difference at 95.0% confidence -2277.47 +/- 920.425 -68.1574% +/- 8.77623% (Student's t, pooled s = 979.596) x no pmc + pmc running real time: HEAD: N Min Max Median Avg Stddev x 10 58.38 59.15 58.86 58.847 0.22504567 + 10 76.4 127.62 84.845 88.577 15.100031 Difference at 95.0% confidence 29.73 +/- 10.0335 50.5208% +/- 17.0525% (Student's t, pooled s = 10.6785) patched: N Min Max Median Avg Stddev x 10 58.38 59.15 58.86 58.847 0.22504567 + 10 59.71 60.79 60.135 60.179 0.29957192 Difference at 95.0% confidence 1.332 +/- 0.248939 2.2635% +/- 0.426506% (Student's t, pooled s = 0.264942) system time: HEAD: N Min Max Median Avg Stddev x 10 1010.15 1073.31 1025.465 1031.524 18.135705 + 10 2277.96 6948.53 2949.47 3341.492 1385.2677 Difference at 95.0% confidence 2309.97 +/- 920.443 223.937% +/- 89.3039% (Student's t, pooled s = 979.616) patched: N Min Max Median Avg Stddev x 10 1010.15 1073.31 1025.465 1031.524 18.135705 + 10 1038.7 1081.06 1070.555 1064.017 15.85404 Difference at 95.0% confidence 32.493 +/- 16.0042 3.15% +/- 1.5794% (Student's t, pooled s = 17.0331) Reviewed by: jeff@ Approved by: sbruno@ Differential Revision: https://reviews.freebsd.org/D15155	2018-05-12 01:26:34 +00:00
Matt Macy	8dcbd0eae6	epoch(9): always set inited in epoch_init - set inited in the !usedomains case Reported by: jhibbits Approved by: sbruno	2018-05-11 18:37:14 +00:00
Matt Macy	4aa302dfc9	epoch(9): callback task fixes - initialize the pcpu STAILQ in the NUMA case - don't enqueue the callback task if there isn't sufficient work to be done Reported by: pho@ Approved by: sbruno@	2018-05-11 08:16:56 +00:00
Mateusz Guzik	85c1b3c1cb	rmlock: partially depessimize lock/unlock fastpath Previusly the slow path was folded in and partially jumped over in the common case.	2018-05-11 06:59:54 +00:00
Matt Macy	b2cb28963b	epoch(9): fix priority handling, make callback lists pcpu, and other fixes - Lend priority to preempted threads in epoch_wait to handle the case in which we've had priority lent to us. Previously we borrowed the priority of the lowest priority preempted thread. (pointed out by mjg@) - Don't attempt allocate memory per-domain on powerpc, we don't currently handle empty sockets (as is the case on jhibbits Talos' board). - Handle deferred callbacks as pcpu lists and poll the lists periodically. Currently the interval is 1/hz. - Drop the thread lock when adaptive spinning. Holding the lock starves other threads and can even lead to lockups. - Keep a generation count pcpu so that we don't keep spining if a thread has left and re-entered an epoch section. - Actually removed the callback from the callback list so that we don't double free. Sigh ... Approved by: sbruno@	2018-05-11 04:54:12 +00:00
Matt Macy	06bf2a6aef	Add simple preempt safe epoch API Read locking is over used in the kernel to guarantee liveness. This API makes it easy to provide livenes guarantees without atomics. Includes epoch_test kernel module to stress test the API. Documentation will follow initial use case. Test case and improvements to preemption handling in response to discussion with mjg@ Reviewed by: imp@, shurd@ Approved by: sbruno@	2018-05-10 17:55:24 +00:00
Andrew Gallatin	d5cdcc3a06	Fix the build after r333457 In r333457, the arguments to kern_pwritev() were accidentally re-ordered as part of ANSIfication, breaking the build.	2018-05-10 13:19:42 +00:00
Ed Maste	cc3c9df80f	ANSIfy sys_generic.c	2018-05-10 11:36:16 +00:00
Matt Macy	36688f706e	Add taskqgroup_config_gtask_deinit to support teardown after taskqgroup_config_gtask_init. Approved by: sbruno	2018-05-09 18:51:35 +00:00
Matt Macy	cbd92ce62e	Eliminate the overhead of gratuitous repeated reinitialization of cap_rights - Add macros to allow preinitialization of cap_rights_t. - Convert most commonly used code paths to use preinitialized cap_rights_t. A 3.6% speedup in fstat was measured with this change. Reported by: mjg Reviewed by: oshogbo Approved by: sbruno MFC after: 1 month	2018-05-09 18:47:24 +00:00
Konstantin Belousov	55c9d75e6b	Avoid calls to bzero() before ireloc. Evaluate cpu_stdext_feature early to have moved link_elf_ireloc() see correct flags, most important is SMAP. Tested by: mjg Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D15367	2018-05-09 14:39:24 +00:00
Matt Macy	ad738f3791	Reduce overhead of ktrace checks in the common case. KTRPOINT() checks both if we are tracing _and_ if we are recursing within ktrace. The second condition is only ever executed if ktrace is actually enabled. This change moves the check out of the hot path in to the functions themselves. Discussed with mjg@ Reported by: mjg@ Approved by: sbruno@	2018-05-09 00:00:47 +00:00
Mateusz Guzik	2824088536	Inlined sched_userret. The tested condition is rarely true and it induces a function call on each return to userspace. Bumps getuid rate by about 1% on Broadwell.	2018-05-07 23:36:16 +00:00
Mateusz Guzik	75e9b455a9	Change trap_enotcap to bool and annotate with __read_frequently It is read on each return to user space.	2018-05-07 23:10:12 +00:00
Mateusz Guzik	79ca7cbf09	Avoid calls to syscall_thread_enter/exit for statically defined syscalls The entire mechanism is rarely used and is quite not performant due to atomci ops on the syscall table. It also has added overhead for completely unrelated syscalls. Reduce it by avoiding the func calls if possible (which consistutes vast majority of cases). Provides about 3% syscall rate speed up for getuid on Broadwell.	2018-05-07 22:29:32 +00:00
Warner Losh	ad7142757b	Add device_quiet_children() and device_has_quiet_children() If you add a child to a device that has quiet children, we'll automatically set the quiet flag on the children, and its children. This is indended for things like CPU that have a large amount of repetition in booting that adds nothing.	2018-05-07 21:09:08 +00:00
Andrew Gallatin	e7bd0750af	Boost thread priority while changing CPU frequency Boost the priority of user-space threads when they set their affinity to a core to adjust its frequency. This avoids a situation where a CPU bound kernel thread with the same affinity is running on a down-clocked core, and will "block" powerd from up-clocking the core until the kernel thread yields. This can lead to poor perfomance, and to things potentially getting stuck on Giant. Reviewed by: kib (imp reviewed earlier version) Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D15246	2018-05-07 15:24:03 +00:00
Mark Johnston	bd92e6b6f5	Refactor some of the MI kernel dump code in preparation for netdump. - Add clear_dumper() to complement set_dumper(). - Drain netdump's preallocated mbuf pool when clearing the dumper. - Don't do bounds checking for dumpers with mediasize 0. - Add dumper callbacks for initialization for writing out headers. Reviewed by: sbruno MFC after: 1 month Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D15252	2018-05-06 00:22:38 +00:00
Mark Johnston	5475ca5aca	Add an mbuf allocator for netdump. The aim is to permit mbuf allocations after a panic without calling into the page allocator, without imposing any runtime overhead during regular operation of the system, and without modifying driver code. The approach taken is to preallocate a number of mbufs and clusters, storing them in linked lists, and using the lists to back some UMA cache zones. At panic time, the mbuf and cluster zone pointers are overwritten with those of the cache zones so that the mbuf allocator returns preallocated items. Using this scheme, drivers which cache mbuf zone pointers from m_getzone() require special handling when implementing netdump support. Reviewed by: cem (earlier version), julian, sbruno MFC after: 1 month Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D15251	2018-05-06 00:19:48 +00:00
Mark Johnston	c2ba2d1b0e	Style. MFC after: 3 days	2018-05-06 00:11:30 +00:00
Andriy Gapon	bd3afae0ca	for bus suspend, detach and shutdown iterate children in reverse order For most buses all children are equal, so the order does not matter. Other buses, such as acpi, carefully order their child devices to express implicit dependencies between them. For such buses it is safer to bring down devices in the reverse order. I believe that this is the reason why hpet_suspend had to be disabled. Some drivers depend on a working event timer until they are suspended. But previously we would suspend hpet very early. I tested this change by makinbg hpet_suspend actually stop HPET timers and tested that too. Note that this change is not a complete solution as it does not take into account bus passes. A better approach would be to track the actual attach order of the devices and to use the reverse of that. Reviewed by: imp, mav MFC after: 3 weeks Differential Revision: https://reviews.freebsd.org/D15291	2018-05-05 05:19:32 +00:00
Mateusz Guzik	5ec2c93667	tc: bcopy -> memcpy	2018-05-04 22:48:10 +00:00
Jamie Gritton	0e5c6bd436	Make it easier for filesystems to count themselves as jail-enabled, by doing most of the work in a new function prison_add_vfs in kern_jail.c Now a jail-enabled filesystem need only mark itself with VFCF_JAIL, and the rest is taken care of. This includes adding a jail parameter like allow.mount.foofs, and a sysctl like security.jail.mount_foofs_allowed. Both of these used to be a static list of known filesystems, with predefined permission bits. Reviewed by: kib Differential Revision: D14681	2018-05-04 20:54:27 +00:00
Mark Johnston	1b5c869d64	Fix some races introduced in r332974. With r332974, when performing a synchronized access of a page's "queue" field, one must first check whether the page is logically dequeued. If so, then the page lock does not prevent the page from being removed from its page queue. Intoduce vm_page_queue(), which returns the page's logical queue index. In some cases, direct access to the "queue" field is still required, but such accesses should be confined to sys/vm. Reported and tested by: pho Reviewed by: kib Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D15280	2018-05-04 17:17:30 +00:00
Matt Macy	748ff486b0	`dup1_processes -t 96 -s 5` on a dual 8160 x dup_before + dup_after +------------------------------------------------------------+ \| x + \| \|x x x x ++ ++\| \| \|____AM___\| \|AM\|\| +------------------------------------------------------------+ N Min Max Median Avg Stddev x 5 1.514954e+08 1.5230351e+08 1.5206157e+08 1.5199371e+08 341205.71 + 5 1.5494336e+08 1.5519569e+08 1.5511982e+08 1.5508323e+08 96232.829 Difference at 95.0% confidence 3.08952e+06 +/- 365604 2.03266% +/- 0.245071% (Student's t, pooled s = 250681) Reported by: mjg@ MFC after: 1 week	2018-05-04 06:51:01 +00:00
Konstantin Belousov	7035cf14ee	Implement support for ifuncs in the kernel linker. Required MD bits are only provided for x86. Reviewed by: jhb (previous version, as part of the larger patch) Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D13838	2018-05-03 21:37:46 +00:00
Stephen Hurd	f3e1324b41	Separate list manipulation locking from state change in multicast Multicast incorrectly calls in to drivers with a mutex held causing drivers to have to go through all manner of contortions to use a non sleepable lock. Serialize multicast updates instead. Submitted by: mmacy <mmacy@mattmacy.io> Reviewed by: shurd, sbruno Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D14969	2018-05-02 19:36:29 +00:00
Mark Johnston	20f85b1ddd	Print the dump progress indicator after calling dump_start(). Dumpers may wish to print messages from an initialization hook; this change ensures that such messages aren't mixed with output from the generic dump code. MFC after: 1 week	2018-05-01 17:32:43 +00:00
Nathan Whitehorn	ee900504cf	Report the kernel base address properly in kldstat when using PowerPC kernels loaded at addresses other than their link address.	2018-05-01 04:06:59 +00:00
Ed Maste	2216c6933c	Disable connectat/bindat with AT_FDCWD in capmode Previously it was possible to connect a socket (which had the CAP_CONNECT right) by calling "connectat(AT_FDCWD, ...)" even in capabilties mode. This combination should be treated the same as a call to connect (i.e. forbidden in capabilities mode). Similarly for bindat. Disable connectat/bindat with AT_FDCWD in capabilities mode, fix up the documentation and add tests. PR: 222632 Submitted by: Jan Kokemüller <jan.kokemueller@gmail.com> Reviewed by: Domagoj Stolfa MFC after: 1 week Relnotes: Yes Differential Revision: https://reviews.freebsd.org/D15221	2018-04-30 17:31:06 +00:00
Mateusz Guzik	9d68f7741f	systrace: track it like sdt probes While here predict false. Note the code is wrong (regardless of this change). Dereference of the pointer can race with module unload. A fix would set the probe to a nop stub instead of NULL.	2018-04-27 15:16:34 +00:00
Emmanuel Vadot	ee710ecf32	clk: Put the sysctls under hw.clock instead of clock This is more consistant with hw.regulator and other hardware related sysctls.	2018-04-27 00:12:00 +00:00
Mark Johnston	5cd29d0f3c	Improve VM page queue scalability. Currently both the page lock and a page queue lock must be held in order to enqueue, dequeue or requeue a page in a given page queue. The queue locks are a scalability bottleneck in many workloads. This change reduces page queue lock contention by batching queue operations. To detangle the page and page queue locks, per-CPU batch queues are used to reference pages with pending queue operations. The requested operation is encoded in the page's aflags field with the page lock held, after which the page is enqueued for a deferred batch operation. Page queue scans are similarly optimized to minimize the amount of work performed with a page queue lock held. Reviewed by: kib, jeff (previous versions) Tested by: pho Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D14893	2018-04-24 21:15:54 +00:00
Sean Bruno	7875017ca9	Revert r332894 at the request of the submitter. Submitted by: Johannes Lundberg <johalun0_gmail.com> Sponsored by: Limelight Networks	2018-04-24 19:55:12 +00:00
Conrad Meyer	65df124845	Do not totally silence suppressed secondary kasserts unless debug.kassert.do_log is disabled To totally silence and ignore secondary kassert violations after a primary panic, set debug.kassert.do_log=0 and debug.kassert.suppress_in_panic=1. Additional assertion warnings shouldn't block core dump and may alert the developer to another erroneous condition. Secondary stack traces may be printed, identically to the unsuppressed case where panic() is reentered -- controlled via debug.trace_all_panics. Sponsored by: Dell EMC Isilon	2018-04-24 19:10:51 +00:00
Conrad Meyer	07aa6ea677	Fix debug.kassert.do_log description text This has been an (incorrect) copy-paste duplicate of debug.kassert.warn_only since it was originally committed in r243980. Sponsored by: Dell EMC Isilon	2018-04-24 18:59:40 +00:00
Conrad Meyer	ad1fc31570	panic: Optionally, trace secondary panics To diagnose and fix secondary panics, it is useful to have a stack trace. When panic tracing is enabled, optionally trace secondary panics as well. The option is configured with the tunable/sysctl debug.trace_all_panics. (The original concern that inspired only tracing the primary panic was likely that the secondary trace may scroll the original panic message or trace off the screen. This is less of a concern for serial consoles with logging. Not everything has a serial console, though, so the behavior is optional.) Discussed with: jhb Sponsored by: Dell EMC Isilon	2018-04-24 18:54:20 +00:00
Jonathan T. Looney	18959b695d	Update r332860 by changing the default from suppressing post-panic assertions to not suppressing post-panic assertions. There are some post-panic assertions that are valuable and we shouldn't default to disabling them. However, when a user trips over them, the user can still adjust the tunable/sysctl to suppress them temporarily to get conduct troubleshooting (e.g. get a core dump). Reported by: cem, markj	2018-04-24 18:47:35 +00:00
Conrad Meyer	b543c98cab	lockmgr: Add missed neutering during panic r313683 introduced new lockmgr APIs that missed the panic-time neutering present in the rest of our locks. Correct that by adding the usual check. Additionally, move the __lockmgr_args neutering above the assertions at the top of the function. Drop the interlock unlock because we shouldn't have an unneutered interlock either. No point trying to unlock it. PR: 227749 Reported by: jtl Sponsored by: Dell EMC Isilon	2018-04-24 18:41:14 +00:00
Mateusz Guzik	d357c16adc	lockf: change the owner hash from pid to vnode-based This adds a bit missed due to the patch split, see r332882 Tested by: pho	2018-04-24 06:10:36 +00:00
Mateusz Guzik	7cd794214a	dtrace: depessimize dtmalloc when dtrace is active Each malloc/free was testing dtrace_malloc_enabled and forcing extra reads from the malloc type struct to see if perhaps a dtmalloc probe was on. Treat it like lockstat and sdt: have a global bolean.	2018-04-24 01:06:20 +00:00
Mateusz Guzik	4c5209cb21	lockstat: track lockstat just like sdt probes In particular flip the frequently tested var to bool.	2018-04-24 01:04:10 +00:00
Mateusz Guzik	c9e05ccd62	malloc: stop reading the subzone if MALLOC_DEBUG_MAXZONES == 1 (the default) malloc was showing at the top of profile during while running microbenchmarks. #define DTMALLOC_PROBE_MAX 2 struct malloc_type_internal { uint32_t mti_probes[DTMALLOC_PROBE_MAX]; u_char mti_zone; struct malloc_type_stats mti_stats[MAXCPU]; }; Reading mti_zone it wastes a cacheline to hold mti_probes + mti_zone (which we know is 0) + part of malloc stats of the first cpu which on top induces false-sharing. In particular will-it-scale lock1_processes -t 128 -s 10: before: average:45879692 after: average:51655596 Note the counters can be padded but the right fix is to move them to counter(9), leaving the struct read-only after creation (modulo dtrace probes).	2018-04-23 22:28:49 +00:00
Sean Bruno	7b7796eea5	Load balance sockets with new SO_REUSEPORT_LB option This patch adds a new socket option, SO_REUSEPORT_LB, which allow multiple programs or threads to bind to the same port and incoming connections will be load balanced using a hash function. Most of the code was copied from a similar patch for DragonflyBSD. However, in DragonflyBSD, load balancing is a global on/off setting and can not be set per socket. This patch allows for simultaneous use of both the current SO_REUSEPORT and the new SO_REUSEPORT_LB options on the same system. Required changes to structures Globally change so_options from 16 to 32 bit value to allow for more options. Add hashtable in pcbinfo to hold all SO_REUSEPORT_LB sockets. Limitations As DragonflyBSD, a load balance group is limited to 256 pcbs (256 programs or threads sharing the same socket). Submitted by: Johannes Lundberg <johanlun0@gmail.com> Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D11003	2018-04-23 19:51:00 +00:00
Mateusz Guzik	833dc05a6e	lockf: add per-chain locks to the owner hash This combined with previous changes significantly depessimizes the behaviour under contentnion. In particular the lock1_processes test (locking/unlocking separate files) from the will-it-scale suite was executed with 128 concurrency on a 4-socket Broadwell with 128 hardware threads. Operations/second (lock+unlock) go from ~750000 to ~45000000 (6000%) For reference single-process is ~1680000 (i.e. on stock kernel the resulting perf is less than half of the single-threaded run), Note this still does not really scale all that well as the locks were just bolted on top of the current implementation. Significant room for improvement is still here. In particular the top performance fluctuates depending on the extent of false sharing in given run (which extends beyond the file). Added chain+lock pairs were not padded w.r.t. cacheline size. One big ticket item is the hash used for spreading threads: it used to be the process pid (which basically serialized all threaded ops). Temporarily the vnode addr was slapped in instead. Tested by: pho	2018-04-23 08:23:10 +00:00
Mateusz Guzik	63286976b5	lockf: skip locking the graph if not necessary (common case) Tested by: pho	2018-04-23 07:54:02 +00:00
Mateusz Guzik	717df0b0e8	lockf: perform wakeup onlly when there is anybody waiting Tested by: pho	2018-04-23 07:52:56 +00:00
Mateusz Guzik	c72ead2815	lockf: skip the hard work in lf_purgelocks if possible Tested by: pho	2018-04-23 07:52:10 +00:00
Mateusz Guzik	0d3323f557	lockf: free state only when recycling the vnode This avoids malloc/free cycles when locking/unlocking the vnode when nobody is contending. Tested by: pho	2018-04-23 07:51:19 +00:00
Tijl Coosemans	7dfbbc613b	Make bufdaemon and bufspacedaemon use kthread_suspend_check instead of kproc_suspend_check. In r329612 bufspacedaemon was turned into a thread of the bufdaemon process causing both to call kproc_suspend_check with the same proc argument and that function contains the following while loop: while (SIGISMEMBER(p->p_siglist, SIGSTOP)) { wakeup(&p->p_siglist); msleep(&p->p_siglist, &p->p_mtx, PPAUSE, "kpsusp", 0); } So one thread wakes up the other and the other wakes up the first again, locking up UP machines on shutdown. Also register the shutdown handlers with SHUTDOWN_PRI_LAST + 100 so they run after the syncer has shutdown, because the syncer can cause a situation where bufdaemon help is needed to proceed. PR: 227404 Reviewed by: kib Tested by: cy, rmacklem	2018-04-22 16:05:29 +00:00
Mateusz Guzik	7d853f62bf	lockf: slightly depessimize 1. check if P_ADVLOCK is already set and if so, don't lock to set it (stolen from DragonFly) 2. when trying for fast path unlock, check that we are doing unlock first instead of taking the interlock for no reason (e.g. if we want to lock). whilere make it more likely that falling fast path will not take the interlock either by checking for state Note the code is severely pessimized both single- and multithreaded.	2018-04-22 09:30:07 +00:00
Jonathan T. Looney	44b71282b5	When running with INVARIANTS, the kernel contains extra checks. However, these assumptions may not hold true once we've panic'd. Therefore, the checks hold less value after a panic. Additionally, if one of the checks fails while we are already panic'd, this creates a double-panic which can interfere with debugging the original panic. Therefore, this commit allows an administrator to suppress a response to KASSERT checks after a panic by setting a tunable/sysctl. The tunable/sysctl (debug.kassert.suppress_in_panic) defaults to being enabled. Reviewed by: kib Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D12920	2018-04-21 17:05:00 +00:00
Konstantin Belousov	1302eea7bb	Rename PROC_PDEATHSIG_SET -> PROC_PDEATHSIG_CTL and PROC_PDEATHSIG_GET -> PROC_PDEATHSIG_STATUS for consistency with other procctl(2) operations names. Requested by: emaste Sponsored by: The FreeBSD Foundation MFC after: 13 days	2018-04-20 15:19:27 +00:00
Andriy Gapon	f87beb93e8	call racct_proc_ucred_changed() under the proc lock The lock is required to ensure that the switch to the new credentials and the transfer of the process's accounting data from the old credentials to the new ones is done atomically. Otherwise, some updates may be applied to the new credentials and then additionally transferred from the old credentials if the updates happen after proc_set_cred() and before racct_proc_ucred_changed(). The problem is especially pronounced for RACCT_RSS because - there is a strict accounting for this resource (it's reclaimable) - it's updated asynchronously by the vm daemon - it's updated by setting an absolute value instead of applying a delta I had to remove a call to rctl_proc_ucred_changed() from racct_proc_ucred_changed() and make all callers of latter call the former as well. The reason is that rctl_proc_ucred_changed, as it is implemented now, cannot be called while holding the proc lock, so the lock is dropped after calling racct_proc_ucred_changed. Additionally, I've added calls to crhold / crfree around the rctl call, because without the proc lock there is no gurantee that the new credentials, owned by the process, will stay stable. That does not eliminate a possibility that the credentials passed to the rctl will get stale. Ideally, rctl_proc_ucred_changed should be able to work under the proc lock. Many thanks to kib for pointing out the above problems. PR: 222027 Discussed with: kib No comment: trasz MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D15048	2018-04-20 13:08:04 +00:00
John Baldwin	73c8686e91	Simplify the code to allocate stack for auxv, argv[], and environment vectors. Remove auxarg_size as it was only used once right after a confusing assignment in each of the variants of exec_copyout_strings(). Reviewed by: emaste MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D15123	2018-04-19 16:00:34 +00:00
Konstantin Belousov	b940886338	Add PROC_PDEATHSIG_SET to procctl interface. Allow processes to request the delivery of a signal upon death of their parent process. Supposed consumer of the feature is PostgreSQL. Submitted by: Thomas Munro Reviewed by: jilles, mjg MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D15106	2018-04-18 21:31:13 +00:00
John Baldwin	8ce99bb405	Properly do a deep copy of the ioctls capability array for fget_cap(). fget_cap() tries to do a cheaper snapshot of a file descriptor without holding the file descriptor lock. This snapshot does not do a deep copy of the ioctls capability array, but instead uses a different return value to inform the caller to retry the copy with the lock held. However, filecaps_copy() was returning 1 to indicate that a retry was required, and fget_cap() was checking for 0 (actually '!filecaps_copy()'). As a result, fget_cap() did not do a deep copy of the ioctls array and just reused the original pointer. This cause multiple file descriptor entries to think they owned the same pointer and eventually resulted in duplicate frees. The only code path that I'm aware of that triggers this is to create a listen socket that has a restricted list of ioctls and then call accept() which calls fget_cap() with a valid filecaps structure from getsock_cap(). To fix, change the return value of filecaps_copy() to return true if it succeeds in copying the caps and false if it fails because the lock is required. I find this more intuitive than fixing the caller in this case. While here, change the return type from 'int' to 'bool'. Finally, make filecaps_copy() more robust in the failure case by not copying any of the source filecaps structure over. This avoids the possibility of leaking a pointer into a structure if a similar future caller doesn't properly handle the return value from filecaps_copy() at the expense of one more branch. I also added a test case that panics before this change and now passes. Reviewed by: kib Discussed with: mjg (not a fan of the extra branch) MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D15047	2018-04-17 18:07:40 +00:00
Brooks Davis	cee61c8cac	Stop using fuswintr() and suswintr() in the profiler. Always take the AST path rather than calling MD functions which are often implemented as always failing. The is the case on amd64, arm, i386, and powerpc. This optimization (inherited from 4.4 Lite) is a pessimization on those architectures and is the sole use of these functions. They will be removed in a seperate commit. Reviewed by: kib Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D15101	2018-04-17 16:36:53 +00:00
Alan Somers	52c0983128	lio_listio: return EAGAIN instead of EIO when out of resources This behavior is already documented by the man page, and suggested by POSIX. Reviewed by: jhb MFC after: 3 weeks Differential Revision: https://reviews.freebsd.org/D15099	2018-04-16 18:12:15 +00:00
Konstantin Belousov	d86c1f0dc1	i386 4/4G split. The change makes the user and kernel address spaces on i386 independent, giving each almost the full 4G of usable virtual addresses except for one PDE at top used for trampoline and per-CPU trampoline stacks, and system structures that must be always mapped, namely IDT, GDT, common TSS and LDT, and process-private TSS and LDT if allocated. By using 1:1 mapping for the kernel text and data, it appeared possible to eliminate assembler part of the locore.S which bootstraps initial page table and KPTmap. The code is rewritten in C and moved into the pmap_cold(). The comment in vmparam.h explains the KVA layout. There is no PCID mechanism available in protected mode, so each kernel/user switch forth and back completely flushes the TLB, except for the trampoline PTD region. The TLB invalidations for userspace becomes trivial, because IPI handlers switch page tables. On the other hand, context switches no longer need to reload %cr3. copyout(9) was rewritten to use vm_fault_quick_hold(). An issue for new copyout(9) is compatibility with wiring user buffers around sysctl handlers. This explains two kind of locks for copyout ptes and accounting of the vslock() calls. The vm_fault_quick_hold() AKA slow path, is only tried after the 'fast path' failed, which temporary changes mapping to the userspace and copies the data to/from small per-cpu buffer in the trampoline. If a page fault occurs during the copy, it is short-circuit by exception.s to not even reach C code. The change was motivated by the need to implement the Meltdown mitigation, but instead of KPTI the full split is done. The i386 architecture already shows the sizing problems, in particular, it is impossible to link clang and lld with debugging. I expect that the issues due to the virtual address space limits would only exaggerate and the split gives more liveness to the platform. Tested by: pho Discussed with: bde Sponsored by: The FreeBSD Foundation MFC after: 1 month Differential revision: https://reviews.freebsd.org/D14633	2018-04-13 20:30:49 +00:00
Mateusz Guzik	e0e259a888	locks: extend speculative spin waiting for readers to drain Now that 10 years have passed since the original limit of 10000 was committed, bump it a little bit. Spinning waiting for writers is semi-informed in the sense that we always know if the owner is running and base the decision to spin on that. However, no such information is provided for read-locking. In particular this means that it is possible for a write-spinner to completely waste cpu time waiting for the lock to be released, while the reader holding it was preempted and is now waiting for the spinner to go off cpu. Nonetheless, in majority of cases it is an improvement to spin instead of instantly giving up and going to sleep. The current approach is pretty simple: snatch the number of current readers and performs that many pauses before checking again. The total number of pauses to execute is limited to 10k. If the lock is still not free by that time, go to sleep. Given the previously noted problem of not knowing whether spinning makes any sense to begin with the new limit has to remain rather conservative. But at the very least it should also be related to the machine. Waiting for writers uses parameters selected based on the number of activated hardware threads. The upper limit of pause instructions to be executed in-between re-reads of the lock is typically 16384 or 32678. It was selected as the limit of total spins. The lower bound is set to already present 10000 as to not change it for smaller machines. Bumping the limit reduces system time by few % during benchmarks like buildworld, buildkernel and others. Tested on 2 and 4 socket machines (Broadwell, Skylake). Figuring out how to make a more informed decision while not pessimizing the fast path is left as an exercise for the reader.	2018-04-11 01:43:29 +00:00
Ian Lepore	97603f1da2	Use explicit_bzero() when cleaning values out of the kernel environment. Sometimes the values contain geli passphrases being communicated from loader(8) to the kernel, and some day the compiler may decide to start eliding calls to memset() for a pointer which is not dereferenced again before being passed to free().	2018-04-10 22:57:56 +00:00
Mateusz Guzik	04457342a3	rw: whack avoidable re-reads in try_upgrade	2018-04-10 22:32:31 +00:00
Stephen Hurd	f422673e10	Make BPF global lock an SX This allows NIC drivers to sleep on polling config operations. Submitted by: Matthew Macy <mmacy@mattmacy.io> Reviewed by: shurd Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D14982	2018-04-10 19:42:50 +00:00
Mateusz Guzik	a045941bd2	locks: tweak backoff a little bit Previous limits were chosen when locking primitives had spurious lock accesses. Flipping the starting point to 1 (or rather 2 as the first call shifts it) provides a modest win when mild contention is seen while not hurting worse cases. Tested on a bunch of one, two and four socket old and new systems (Westmere, Skylake, Threadreaper and others) by doing concurrent page faults, buildkernel/buildworld and other stuff (although not all systems got all the tests). Another thing is the upper limit. It is semi-arbitrarily chosen as it was getting out of hand for slightly less small systems (e.g. a 128-thread one). Note that backoff is fundamentally a speculative bandaid and this change just makes it fit a little bit better. It remains completely oblivious to the hardware topology or the contention pattern. This is being experimented with.	2018-04-08 16:34:10 +00:00
Brooks Davis	6469bdcdb6	Move most of the contents of opt_compat.h to opt_global.h. opt_compat.h is mentioned in nearly 180 files. In-progress network driver compabibility improvements may add over 100 more so this is closer to "just about everywhere" than "only some files" per the guidance in sys/conf/options. Keep COMPAT_LINUX32 in opt_compat.h as it is confined to a subset of sys/compat/linux/*.c. A fake _COMPAT_LINUX option ensure opt_compat.h is created on all architectures. Move COMPAT_LINUXKPI to opt_dontuse.h as it is only used to control the set of compiled files. Reviewed by: kib, cem, jhb, jtl Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D14941	2018-04-06 17:35:35 +00:00
Brooks Davis	89ea4a30d6	Added SAL annotatations to system calls. Modify makesyscalls.sh to strip out SAL annotations. No functional change. This is based on work I started in CheriBSD and use to validate fat pointers at the syscall boundary. Tal Garfinkel reviewed the changes, added annotations to COMPAT* syscalls and is using them in a record and playback framework. One can envision other uses such as a WITNESS-like validator for copyin/out as speculated on in the review. As this time we are only annotating sys/kern/syscalls.master as that is sufficient for userspace work. If kernel use cases materialize, we can annotate other syscalls.master as needed. Submitted by: Tal Garfinkel <talg@cs.stanford.edu> Sponsored by: DARPA, AFRL (in part) Differential Revision: https://reviews.freebsd.org/D14285	2018-04-05 20:31:45 +00:00
Jeff Roberson	e5818a53db	Implement several enhancements to NUMA policies. Add a new "interleave" allocation policy which stripes pages across domains with a stride or width keeping contiguity within a multi-page region. Move the kernel to the dedicated numbered cpuset #2 making it possible to assign kernel threads and memory policy separately from user. This also eliminates the need for the complicated interrupt binding code. Add a sysctl API for viewing and manipulating domainsets. Refactor some of the cpuset_t manipulation code using the generic bitset type so that it can be used for both. This probably belongs in a dedicated subr file. Attempt to improve the include situation. Reviewed by: kib Discussed with: jhb (cpuset parts) Tested by: pho (before review feedback) Sponsored by: Netflix, Dell/EMC Isilon Differential Revision: https://reviews.freebsd.org/D14839	2018-03-29 02:54:50 +00:00
Jeff Roberson	27a3c9d710	Restore r331606 with a bugfix to setup cpuset_domain[] earlier on all platforms. Original commit message as follows: Only use CPUs in the domain the device is attached to for default assignment. Device drivers are able to override the default assignment if they bind directly. There are severe performance penalties for handling interrupts on remote CPUs and this should only be done in very controlled circumstances. Reviewed by: jhb, kib Tested by: pho Sponsored by: Netflix, Dell/EMC Isilon Differential Revision: https://reviews.freebsd.org/D14838	2018-03-28 18:47:35 +00:00
Andriy Gapon	f4043145f2	ZFS vn_rele_async: catch up with the use of refcount(9) for the vnode use count It's not sufficient nor required to use the vnode interlock when checking if we are going to drop the last use count as the code in vputx() uses refcount (atomic) operations for both checking and decrementing the use code. Apply the same method to vn_rele_async(). While here, remove vn_rele_inactive(), a wrapper around vrele() that didn't add any value. Also, the change required making vfs_refcount_release_if_not_last() public. I've made vfs_refcount_acquire_if_not_zero() public as well. They are in sys/refcount.h now. While making the move I've dropped the vfs_ prefix. Reviewed by: mjg MFC after: 2 weeks Sponsored by: Panzura Differential Revision: https://reviews.freebsd.org/D14869	2018-03-28 08:55:31 +00:00
Mateusz Guzik	179da98f71	fd: tighten seq protected areas to not contain malloc/free	2018-03-28 03:07:02 +00:00
Konstantin Belousov	fb441a8829	Fix several leaks of kernel stack data through paddings. It is random collection of fixes for issues not yet corrected, reported at https://tsyrklevi.ch/clang_analyzer/freebsd_013017/. Many issues from that list were already corrected. Most of them are for compat32, old compat32 or affect both primary host ABI and compat32. The freebsd32_kldstat(), for instance, was already fixed by using malloc(M_ZERO). Patch includes correction to report the supplied version back, which is just pedantic. Reviewed by: brooks, emaste (previous version) Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D14868	2018-03-27 18:05:51 +00:00
Brooks Davis	34a77b9741	Move uio enums to sys/_uio.h. Include _uio.h instead of uio.h in several headers to reduce header polution. Fix a few places that relied on header polution to get the uio.h header. I have not moved struct uio as many more things that use it rely on header polution to get other definitions from uio.h. Reviewed by: cem, kib, markj Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D14811	2018-03-27 15:20:03 +00:00
Andriy Gapon	31260bf042	vfs_donmount: in certain cases try r/o mount if r/w mount fails If the operation is not an update, if neither r/w nor r/o mode is explicitly requested, if the error code hints at the possibility of the media being read-only, and if the fallback is allowed, then we can try to automatically downgrade to the readonly mode. This is especially useful for auto-mounting of removable media that sometimes can happen to be write-protected. The fallback to r/o is not enabled by default. It can be requested on a per-mount basis with a new mount option, 'autoro'. Or it can be globally allowed by setting vfs.default_autoro. Reviewed by: cem, kib MFC after: 3 weeks Relnotes: yes Differential Revision: https://reviews.freebsd.org/D13361	2018-03-27 14:31:42 +00:00
Jeff Roberson	e8cbe51a04	Fix a bug introduced in r329612 that slowly invalidates all clean bufs. Reported by: bde Reviewed by: bde Sponsored by: Netflix, Dell/EMC Isilon	2018-03-26 18:36:17 +00:00
Mark Johnston	803c11a3a6	Use LIST_FOREACH_SAFE in sleepq_chains_remove_matching(). We may remove a sleepqueue from the hash table in sleepq_resume_thread(). Reviewed by: kib MFC after: 3 days Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D14847	2018-03-25 20:12:14 +00:00
Konstantin Belousov	ed9e8bc468	Account the size of the vslock-ed memory by the thread. Assert that all such memory is unwired on return to usermode. The count of the wired memory will be used to detect the copyout mode. Tested by: pho (as part of the larger patch) Sponsored by: The FreeBSD Foundation MFC after: 1 week	2018-03-24 13:51:27 +00:00
Konstantin Belousov	161bf65f8a	In vn_io_fault1(), reduce the scope where pagefaults are disabled. Most important for the future use, do not call vm_fault_quick_hold_pages() with disabled pagefaults. Reported and tested by: pho (as part of the larger patch) Sponsored by: The FreeBSD Foundation MFC after: 1 week	2018-03-24 13:13:52 +00:00
Konstantin Belousov	c398200721	Do not send signals to init directly from shutdown_nice(9), do it from the task context. shutdown_nice() is used from the fast interrupt handlers, mostly for console drivers, where we cannot lock blockable locks. Schedule the task in the fast queue to send the signal from the proper context. Reviewed by: imp Discussed with: bde Sponsored by: The FreeBSD Foundation MFC after: 1 week	2018-03-22 20:47:25 +00:00
Jeff Roberson	9a4b4cd3bc	Start witness much earlier in boot so that we can shrink the pend list and make it more immune to further change. Reviewed by: markj, imp (Part of D14707) Sponsored by: Netflix, Dell/EMC Isilon	2018-03-22 19:11:43 +00:00
Warner Losh	f0d847af61	Drop any recursed taking of Giant once and for all at the top of kern_reboot(). The shutdown path is now safe to run without Giant. Discussed with: kib@ Sponsored by: Netflix	2018-03-22 15:34:37 +00:00
Jonathan T. Looney	2529f56ed3	Add the "TCP Blackbox Recorder" which we discussed at the developer summits at BSDCan and BSDCam in 2017. The TCP Blackbox Recorder allows you to capture events on a TCP connection in a ring buffer. It stores metadata with the event. It optionally stores the TCP header associated with an event (if the event is associated with a packet) and also optionally stores information on the sockets. It supports setting a log ID on a TCP connection and using this to correlate multiple connections that share a common log ID. You can log connections in different modes. If you are doing a coordinated test with a particular connection, you may tell the system to put it in mode 4 (continuous dump). Or, if you just want to monitor for errors, you can put it in mode 1 (ring buffer) and dump all the ring buffers associated with the connection ID when we receive an error signal for that connection ID. You can set a default mode that will be applied to a particular ratio of incoming connections. You can also manually set a mode using a socket option. This commit includes only basic probes. rrs@ has added quite an abundance of probes in his TCP development work. He plans to commit those soon. There are user-space programs which we plan to commit as ports. These read the data from the log device and output pcapng files, and then let you analyze the data (and metadata) in the pcapng files. Reviewed by: gnn (previous version) Obtained from: Netflix, Inc. Relnotes: yes Differential Revision: https://reviews.freebsd.org/D11085	2018-03-22 09:40:08 +00:00
Gleb Smirnoff	27cd06b391	Redo r331328. We need to fix not only type but also format. While here again notice that we are fixing regression from r331106.	2018-03-22 05:26:27 +00:00
Gleb Smirnoff	5aab68f24a	Fix sysctl types broken in r329612.	2018-03-21 23:21:32 +00:00
Mark Johnston	a7defaea9a	Elide the object lock in the common case in vfs_vmio_unwire(). The object lock was only needed when attempting to free B_DIRECT buffer pages, and for testing for invalid pages (and freeing them if so). Handle the latter by instead moving invalid pages near the head of the inactive queue, where they will be reclaimed quickly. Reviewed by: alc, kib, jeff MFC after: 3 weeks Differential Revision: https://reviews.freebsd.org/D14778	2018-03-21 21:15:43 +00:00
Warner Losh	3e867f24cb	bufshutdown is no longer called with Giant held, so there's no need to drop or pickup Giant anymore. Remove that code and adjust comments.	2018-03-21 14:46:59 +00:00
Warner Losh	d5292812f8	Remove Giant from init creation and vfs_mountroot. Sponsored by: Netflix Discussed with: kib@, mckusick@ Differential Review: https://reviews.freebsd.org/D14712	2018-03-21 14:46:54 +00:00
Conrad Meyer	c37125d9e5	Add missed sys/limits.h include Apparently header pollution on x86 hid its absense. Sorry, other arch users. Fix the missed header introduced in r331279. Reported by: tinderbox	2018-03-21 03:43:40 +00:00
Conrad Meyer	4948f7bf11	Regenerate sysent files after r331279.	2018-03-21 01:17:01 +00:00
Conrad Meyer	e9ac27430c	Implement getrandom(2) and getentropy(3) The general idea here is to provide userspace programs with well-defined sources of entropy, in a fashion that doesn't require opening a new file descriptor (ulimits) or accessing paths (/dev/urandom may be restricted by chroot or capsicum). getrandom(2) is the more general API, and comes from the Linux world. Since our urandom and random devices are identical, the GRND_RANDOM flag is ignored. getentropy(3) is added as a compatibility shim for the OpenBSD API. truss(1) support is included. Tests for both system calls are provided. Coverage is believed to be at least as comprehensive as LTP getrandom(2) test coverage. Additionally, instructions for running the LTP tests directly against FreeBSD are provided in the "Test Plan" section of the Differential revision linked below. (They pass, of course.) PR: 194204 Reported by: David CARLIER <david.carlier AT hardenedbsd.org> Discussed with: cperciva, delphij, jhb, markj Relnotes: maybe Differential Revision: https://reviews.freebsd.org/D14500	2018-03-21 01:15:45 +00:00
Jamie Gritton	672756aa9f	Represent boolean jail options as an array of structures containing the flag and both the regular and "no" names, instead of two different string arrays whose indices need to match the flag's bit position. This makes them similar to the say "jailsys" options are represented. Loop through either kind of option array with a structure pointer rather then an integer index.	2018-03-20 23:08:42 +00:00
Gleb Smirnoff	83fc34ea0d	At this point iwmesg isn't initialized yet, so print pointer to lock rather than panic before panicing.	2018-03-20 22:05:21 +00:00
Mark Johnston	8c7549da2b	Drop KTR_CONTENTION. It is incomplete, has not been adopted in the other locking primitives, and we have other means of measuring lock contention (lock_profiling, lockstat, KTR_LOCK). Drop it to slightly de-clutter the mutex code and free up a precious KTR class index. Reviewed by: jhb, mjg MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D14771	2018-03-20 15:51:05 +00:00
Justin Hibbits	2acde6a85a	Cast through uintptr_t to narrow the buf domain pointer on 32-bit archs arg2 is an intmax_t, which on 32-bit architectures is 64 bits, wider than a pointer. When &bdomain[i] is added to arg2 it widens from uintptr_t to intmax_t, then gcc whines when it gets cast to a pointer. Casting through uintptr_t silences this warning.	2018-03-20 02:01:30 +00:00
Matt Joras	d6160f6079	Fix initialization of eventhandler mutex. mtx_init does not do a copy of the name string it is passed. The eventhandler code incorrectly passed the parameter string directly to mtx_init instead of using the copy it makes. This was an existing problem with the code that I dutifully copied over in my changes in r325621. Reported by: Anton Rang <rang AT acm.org> Reviewed by: rstone, markj Approved by: rstone (mentor) MFC after: 1 week Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D14764	2018-03-19 22:43:27 +00:00
Mark Johnston	0eb50f9cd2	Have vm_page_{deactivate,launder}() requeue already-queued pages. In many cases the page is not enqueued so the change will have no effect. However, the change is needed to support an optimization in the fault handler and in some cases (sendfile, the buffer cache) it was being emulated by the caller anyway. Reviewed by: alc Tested by: pho MFC after: 2 weeks X-Differential Revision: https://reviews.freebsd.org/D14625	2018-03-18 16:40:56 +00:00
Mateusz Guzik	09bdec20a0	locks: slightly depessimize lockstat The slow path is always taken when lockstat is enabled. This induces rdtsc (or other) calls to get the cycle count even when there was no contention. Still go to the slow path to not mess with the fast path, but avoid the heavy lifting unless necessary. This reduces sys and real time during -j 80 buildkernel: before: 3651.84s user 1105.59s system 5394% cpu 1:28.18 total after: 3685.99s user 975.74s system 5450% cpu 1:25.53 total disabled: 3697.96s user 411.13s system 5261% cpu 1:18.10 total So note this is still a significant hit. LOCK_PROFILING results are not affected.	2018-03-17 19:26:33 +00:00
Jeff Roberson	3cec5c77d6	Move the dirty queues inside the per-domain structure. This resolves a bug where we had not hit global dirty limits but a single queue was starved for space by dirty buffers. A single buf_daemon is maintained for now. Add a bd_speedup() when we are low on bufspace. This can happen due to SUJ keeping many bufs locked until a cg block is written. Document this with a comment. Fix sysctls to work with per-domain variables. Add more ddb debugging. Reported by: pho Reviewed by: kib Tested by: pho Sponsored by: Netflix, Dell/EMC Isilon Differential Revision: https://reviews.freebsd.org/D14705	2018-03-17 18:14:49 +00:00
Conrad Meyer	330b675f65	vfs_bio.c: Apply cleanups motivated by Coverity analysis It is believed that the conditions Coverity indicated were actually impossible to hit. So this patch just adds a cleanup to only compute v_mount once in brelse(), and in vfs_bio_getpages() always initializes error to zero to appease the static analyzer. No functional change intended. Submitted by: Darrick Lew <darrick.freebsd AT gmail.com> Reviewed by: kib Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D14613	2018-03-14 22:11:45 +00:00
Ed Maste	a95659f75f	Use C99 boolean type for translate_osrel Migrate to modern types before creating MD Linuxolator bits for new architectures. Reviewed by: cem Sponsored by: Turing Robotic Industries Inc. Differential Revision: https://reviews.freebsd.org/D14676	2018-03-13 16:40:29 +00:00
Ed Maste	b7feabf906	Use C99 designated initializers for struct execsw It it makes use slightly more clear and facilitates grepping.	2018-03-13 13:09:10 +00:00
Brooks Davis	467e627672	Use the stack for temporary storage in OTIOCCONS. The old code used the thread's pcb via the uap->data pointer. Reviewed by: ed Approved by: CheriBSD Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D14674	2018-03-12 23:04:42 +00:00
Brooks Davis	97519ff698	MIPS: Implement fueword and casueword* in assembly. Remove NO_FUEWORD so the 'e' variants are wrapped by the non-'e' variants. This is more correct and leaves sparc64 as the outlier. Reviewed by: jmallett, kib Obtained from: CheriBSD Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D14603	2018-03-12 22:10:06 +00:00
Ed Maste	5cc6d253f4	ANSIfy sys/kern/imgact_*	2018-03-12 15:45:50 +00:00
Ian Lepore	72721caf04	Make root mount timeout logic work for filesystems other than ufs. The vfs.mountroot.timeout tunable and .timeout directive in a mount.conf(5) file allow specifying a wait timeout for the device(s) hosting the root filesystem to become usable. The current mechanism for waiting for devices and detecting their availability can't be used for zfs-hosted filesystems. See the comment #20 in the PR for some expanded detail on these points. This change adds retry logic to the actual root filesystem mount. That is, insted of relying on device availability using device name lookups, it uses the kernel_mount() call itself to detect whether the filesystem can be mounted, and loops until it succeeds or the configured timeout is exceeded. These changes are based on the patch attached to the PR, but it's rewritten enough that all mistakes belong to me. PR: 208882 X-MFC after: sufficient testing, and hopefully in time for 11.1	2018-03-10 22:07:57 +00:00
Conrad Meyer	dec5441a32	subr_gtaskqueue: Fix braino from r330715 Submitted by: markj Sponsored by: Dell EMC Isilon	2018-03-10 01:53:42 +00:00
Conrad Meyer	8e0e6abc1f	subr_gtaskqueue: Fix minor leak of tq_name in error case Reported by: cppcheck Sponsored by: Dell EMC Isilon	2018-03-10 01:01:01 +00:00
Brooks Davis	dd51fec3b9	Copyout a whole int to cpuset_domain's policy pointer. The previous code only copied 16-bits and corrupted the target int. Reviewed by: kib, markj Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D14611	2018-03-09 00:50:40 +00:00
Mark Johnston	bde3b1e1a5	Return E2BIG if we run out of space writing a compressed kernel dump. ENOSPC causes the MD kernel dump code to retry the dump, but this is undesirable in the case where we legitimately ran out of space.	2018-03-08 17:04:36 +00:00
Brooks Davis	91a743004c	Use umtx_copyin_umtx_time32() in __umtx_op_lock_umutex_compat32(). Non-NULL timeouts where copied in improperly and could produce failures due to incompatible data structures. Reviewed by: kib MFC after: 3 days Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D14587	2018-03-06 01:52:04 +00:00
Brooks Davis	aec37bad99	Regen after r330517.	2018-03-05 17:02:50 +00:00
Brooks Davis	1c1b4c66b6	Remove remenants of 1990s efforts to let us run Net/OpenBSD binaries. No functional change (comments change in some generated files.) Reviewed by: kib Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D14571	2018-03-05 17:02:16 +00:00
Mateusz Guzik	0ad122a966	lockmgr: save on sleepq when cmpset fails	2018-03-05 00:30:07 +00:00
Mateusz Guzik	93d41967da	lockmgr: whack unused lockmgr_note_exclusive_upgrade	2018-03-04 22:14:20 +00:00
Mateusz Guzik	9f4e008d4a	mtx: tidy up recursion handling in thread lock Normally after grabbing the lock it has to be verified we got the right one to begin with. However, if we are recursing, it must not change thus the check can be avoided. In particular this avoids a lock read for non-recursing case which found out the lock was changed. While here avoid an irq trip of this happens. Tested by: pho (previous version)	2018-03-04 22:01:23 +00:00
Mateusz Guzik	a8e747c5e7	sx: don't do an atomic op in upgrade if it cananot succeed The code already pays the cost of reading the lock to obtain the waiters flag. Checking whether there is more than one reader is not a problem and avoids dirtying the line. This also fixes a small corner case: if waiters were to show up between reading the flag and upgrading the lock, the operation would fail even though it should not. No correctness change here though.	2018-03-04 21:41:05 +00:00
Mateusz Guzik	d94df98c5c	locks: fix a corner case in r327399 If there were exactly rowner_retries/asx_retries (by default: 10) transitions between read and write state and the waiters still did not get the lock, the next owner -> reader transition would result in the code correctly falling back to turnstile/sleepq where it would incorrectly think it was waiting for a writer and decide to leave turnstile/sleepq to loop back. From this point it would take ts/sq trips until the lock gets released. The bug sometimes manifested itself in stalls during -j 128 package builds. Refactor the code to fix the bug, while here remove some of the gratituous differences between rw and sx locks.	2018-03-04 21:38:30 +00:00
Mateusz Guzik	1c6987ebc5	lockmgr: start decomposing the main routine The main routine takes 8 args, 3 of which are almost the same for most uses. This in particular pushes it above the limit of 6 arguments passable through registers on amd64 making it impossible to tail call. This is a prerequisite for further cleanups. Tested by: pho	2018-03-04 19:12:54 +00:00
Hans Petter Selasky	2077229b56	Allow pause_sbt() to catch signals during sleep by passing C_CATCH flag. Define pause_sig() function macro helper similarly to other kernel functions which catch signals. Update outdated function description. Discussed with: kib@ MFC after: 1 week Sponsored by: Mellanox Technologies	2018-03-03 18:36:38 +00:00
Hans Petter Selasky	54fc03834a	Correct the return code from pause() during cold startup from zero to EWOULDBLOCK. This also matches the description in pause(9). Discussed with: kib@ MFC after: 1 week Sponsored by: Mellanox Technologies	2018-03-03 18:12:21 +00:00
Brooks Davis	93e48a303a	Rename kernel-only members of semid_ds and msgid_ds. This deliberately breaks the API in preperation for future syscall revisions which will remove these nonstandard members. In an exp-run a single port (devel/qemu-user-static) was found to use them which it did becuase it emulates system calls. This has been fixed in the ports tree. PR: 224443 (exp-run) Reviewed by: kib, jhb (previous version) Exp-run by: antoine Sponsored by: DARPA, AFRP Differential Revision: https://reviews.freebsd.org/D14490	2018-03-02 22:10:48 +00:00
Mateusz Guzik	c505b59961	sx: fix adaptive spinning broken in r327397 The condition was flipped. In particular heavy multithreaded kernel builds on zfs started suffering due to nested sx locks. For instance make -s -j 128 buildkernel: before: 3326.67s user 1269.62s system 6981% cpu 1:05.84 total after: 3365.55s user 911.27s system 6871% cpu 1:02.24 total ps. .-'---`-. .-'---`-. ,' `. ,' `. \| \ \| \ \| \ \| \ \ _ \ \ _ \ ,\ _ ,'-,/-)\ ,\ _ ,'-,/-)\ ( * \ \,' ,' ,'-) ( * \ \,' ,' ,'-) `._,) -',-') `._,) -',-') \/ ''/ \/ ''/ ) / / ) / / / ,'-' / ,'-'	2018-03-02 21:26:27 +00:00
Mateusz Guzik	9d4e369ae8	Don't generate data in sysctl_out_proc unless we intend to copy out. The first call is used to gauge how much spaces is needed. Just computing the size instead of generating the output allows to not take the proctree lock.	2018-02-25 15:16:58 +00:00
Jeff Roberson	1c2529ab32	Fix issues with sparse cpu allocation. Consistently use mp_maxid + 1. Reported by: pho Reviewed by: markj Sponsored by: Netflix, Dell/EMC Isilon	2018-02-25 00:35:21 +00:00
Conrad Meyer	63901c0171	kern/sys_generic.c: style(9) return(foo) -> return (foo) No functional change. Sponsored by: Dell EMC Isilon	2018-02-24 01:15:33 +00:00
Jeff Roberson	5f8cd1c0bf	Add a generic Proportional Integral Derivative (PID) controller algorithm and use it to regulate page daemon output. This provides much smoother and more responsive page daemon output, anticipating demand and avoiding pageout stalls by increasing the number of pages to match the workload. This is a reimplementation of work done by myself and mlaier at Isilon. Reviewed by: bsdimp Tested by: pho Sponsored by: Netflix, Dell/EMC Isilon Differential Revision: https://reviews.freebsd.org/D14402	2018-02-23 22:51:51 +00:00
Kirk McKusick	16680b6af5	Include error number in the "fsync: giving up on dirty" message (in case it ever starts happening again in spite of 328444). Submitted by: Andreas Longwitz <longwitz at incore.de>	2018-02-23 21:57:10 +00:00
Konstantin Belousov	4c8a8cfcde	Restore UP build. Reviewed by: truckman Sponsored by: The FreeBSD Foundation	2018-02-23 18:26:31 +00:00
Ed Maste	315fbaeca2	Correct pseudo misspelling in sys/ comments contrib code and #define in intel_ata.h unchanged.	2018-02-23 18:15:50 +00:00
Don Lewis	97e9382d56	Decrease latency by not wrapping the idle loop's potentially lengthy search for a thread to steal inside a critical section. Since this allows the search to be preempted, restart the search if preemption happens since the search results found earlier may no longer be valid. Decrease the latency of starting a thread that may be assigned to this CPU during the search by polling for incoming threads during the search and switching to that thread instead of continuing the search. Test for stale search results and restart the search before going through the expense of calling tdq_lock_pair(). Retry some tests after grabbing the locks since things may have changed while waiting to get both locks. Eliminate special case handling for stealing from an SMT peer that uses 1 as the steal threshold. This can only succeed if a thread has been assigned but our SMT peer has not yet started executing it. This is quite rare and when it happens the other SMT thread is generally waiting for the same tdq lock that we hold. Basically both SMT threads are racing to grab the same spin lock. Add the kern.sched.always_steal knob from a ULE patch by jeff@. Incorporate another idea from Jeff's ULE patch. If the sched_switch() detects that the CPU is about to go idle, try to steal a thread before switching to the idle thread. Since the search for a thread to steal has to be done inside a critical section in this context, limit the impact on latency by adding the knob kern.sched.trysteal_limit to limit the topological distance of the search and don't restart the search if we detect stale results. If this search can't find an stealable thread, the idle loop can do a more complete search. Also poll for threads being assigned to this CPU during the search and switch to them instead of continuing the search. This change is responsibile for the majority of the improvement in parallel buildworld times. In sched_balance_group() change the minimum threshold from stealing a thread from 1 to 2. Poaching a newly assigned thread from a CPU that is waking up hasn't yet switched to that thread from idle is likely very rare and is likely to have the same lock race as is seen when stealing threads in the idle loop. Also use tdq_notify() to kick the destintation CPU instead of always sending an IPI. Update a stale comment, the number of transferable threads is not calculated. Reviewed by: kib (earlier version) Comments by: avg, jeff, mav MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D12130	2018-02-23 00:12:51 +00:00
Mateusz Guzik	a0c722bdbf	Fix up sysctl vfs.buffercache broken in r329612 Sample problem: top: sysctl(vfs.bufspace...) expected 8, got 4 Reported by: O. Hartmann <ohartmann walstatt.org>	2018-02-22 20:39:25 +00:00
Eric van Gyzen	0127914caa	sched_ule: update a comment to reflect reality MFC after: 3 days Sponsored by: Dell EMC	2018-02-22 17:09:26 +00:00
Jeff Roberson	683ca3a432	Fix the broken subqueue assignment for the cleanq. Reported by: pho Tested by: pho Sponsored by: Netflix, Dell/EMC Isilon	2018-02-20 21:27:17 +00:00
Mateusz Guzik	500ca73d43	mtx: add debug assertions to mtx_spin_wait_unlocked	2018-02-20 20:39:34 +00:00
Mateusz Guzik	862db53fb5	Fix reaping on process fd close broken after r329449 The only consumer of proc_reap other than proc_to_reap was not updated to not PROC_SLOCK. Reported by: Juan Ramon Molina Menor <listjm club.fr>	2018-02-20 20:19:38 +00:00

... 3 4 5 6 7 ...

16312 Commits