freebsd-skq

Author	SHA1	Message	Date
Hiren Panchasara	7d03ff1fe9	Add kevent EVFILT_EMPTY for notification when a client has received all data i.e. everything outstanding has been acked. Reviewed by: bz, gnn (previous version) MFC after: 3 days Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D9150	2017-01-16 08:25:33 +00:00
Conrad Meyer	db4fcadf52	"Buses" is the preferred plural of "bus" Replace archaic "busses" with modern form "buses." Intentionally excluded: * Old/random drivers I didn't recognize * Old hardware in general * Use of "busses" in code as identifiers No functional change. http://grammarist.com/spelling/buses-busses/ PR: 216099 Reported by: bltsrc at mail.ru Sponsored by: Dell EMC Isilon	2017-01-15 17:54:01 +00:00
Enji Cooper	d75a788085	Revert r312119 and reword the intent to fix -Wshadow issues between exp(3) and `exp` var. The approach taken previously was not ideal for multiple functional and stylistic reasons. Add to existing sed call in Makefile to replace `exp` with `exponent` instead. MFC after: 13 days Requested by: bde	2017-01-15 09:25:33 +00:00
Mark Johnston	d53d6fa9a8	Suppress a warning about m_assertbuf being unused. MFC after: 1 week	2017-01-15 03:53:20 +00:00
Sean Bruno	4ecb427a49	Fix hangs in a uniprocessor configuration (qemu, virtualbox, real hw). sys/net/iflib.c: Add ctx to filter_info and don't skpi interrupt early on unless we're on an SMP system sys/kern/subr_gtaskqueue.c: Skip smp check if we're running UP Submitted by: Matt Macy <mmacy@nextbsd.org> Reported by: emaste bde	2017-01-15 00:50:10 +00:00
Mark Johnston	42d33c1f4d	Stop the scheduler upon panic even in non-SMP kernels. This is needed for kernel dumps to work, as the panicking thread will call into code that makes use of kernel locks. Reported and tested by: Eugene Grosbein MFC after: 1 week	2017-01-14 22:16:03 +00:00
Enji Cooper	d467b2ee0c	encode_long, encode_timeval: mechanically replace `exp` with `exponent` This helps fix a -Wshadow issue with exp(3) with tests/sys/acct/acct_test, which include math.h, which in turn defines exp(3) MFC after: 2 weeks Tested with: clang, gcc 4.2.1, gcc 4.9 Sponsored by: Dell EMC Isilon	2017-01-14 05:06:14 +00:00
Enji Cooper	66db8cca1a	Clean up trailing whitespace MFC after: 3 days Sponsored by: Dell EMC Isilon	2017-01-14 04:16:13 +00:00
Enji Cooper	5e8fcdfe1b	Fix -Wunused on gcc 4.9 (x was set but not used) MFC after: 3 days Sponsored by: Dell EMC Isilon	2017-01-14 04:13:28 +00:00
Gleb Smirnoff	4fce19da8d	Remove deprecated fgetsock() and fputsock().	2017-01-13 22:16:41 +00:00
Ian Lepore	d5b937680c	Correct the comments about how much buffer is allocated.	2017-01-13 17:03:23 +00:00
Ian Lepore	a6f63533a7	Check tty_gone() after allocating IO buffers. The tty lock has to be dropped then reacquired due to using M_WAITOK, which opens a window in which the tty device can disappear. Check for this and return ENXIO back up the call chain so that callers can cope. This closes a race where TF_GONE would get set while buffers were being allocated as part of ttydev_open(), causing a subsequent call to ttydevsw_modem() later in ttydev_open() to assert. Reported by: pho Reviewed by: kib	2017-01-13 16:37:38 +00:00
Ian Lepore	e046e8e680	Restructure the tty_drain loop so that device-busy is checked one more time after tty_timedwait() returns an error only if the error is EWOULDBLOCK; other errors cause an immediate return. This fixes the case of the tty disappearing while in tty_drain(). Reported by: pho	2017-01-12 21:18:43 +00:00
Ravi Pokala	8e712af70b	Remove writability requirement for single-mbuf, contiguous-range m_pulldown() m_pulldown() only needs to determine if a mbuf is writable if it is going to copy data into the data region of an existing mbuf. It does this to create a contiguous data region in a single mbuf from multiple mbufs in the chain. If the requested memory region is already contiguous and nothing needs to change, the mbuf does not need to be writeable. Submitted by: Brian Mueller <bmueller@panasas.com> Reviewed by: bz MFC after: 1 week Sponsored by: Panasas Differential Revision: https://reviews.freebsd.org/D9053	2017-01-12 06:38:03 +00:00
Ian Lepore	f64342e354	Rework tty_drain() to poll the hardware for completion, and restore drain timeout handling to historical freebsd behavior. The primary reason for these changes is the need to have tty_drain() call ttydevsw_busy() at some reasonable sub-second rate, to poll hardware that doesn't signal an interrupt when the transmit shift register becomes empty (which includes virtually all USB serial hardware). Such hardware hangs in a ttyout wait, because it never gets an opportunity to trigger a wakeup from the sleep in tty_drain() by calling ttydisc_getc() again, after handing the last of the buffered data to the hardware. While researching the history of changes to tty_drain() I stumbled across some email describing the historical BSD behavior of tcdrain() and close() on serial ports, and the ability of comcontrol(1) to control timeout behavior. Using that and some advice from Bruce Evans as a guide, I've put together these changes to implement the hardware polling and restore the historical timeout behaviors... - tty_drain() now calls ttydevsw_busy() in a loop at 10 Hz to accomodate hardware that requires polling for busy state. - The "new historical" behavior for draining during close(2) is retained: the drain timeout is "1 second without making any progress". When the 1-second timeout expires, if the count of bytes remaining in the tty layer buffer is smaller than last time, the timeout is extended for another second. Unfortunately, the same logic cannot be extended all the way down to the hardware, because the interface to that layer is a simple busy/not-busy indication. - Due to the previous point, an application that needs a guarantee that all data has been transmitted must use TIOCDRAIN/tcdrain(3) before calling close(2). - The historical behavior of honoring the drainwait setting for TIOCDRAIN (used by tcdrain(3)) is restored. - The historical kern.drainwait sysctl to control the global default drainwait time is restored, but is now named kern.tty_drainwait. - The historical default drainwait timeout of 300 seconds is restored. - Handling of TIOCGDRAINWAIT and TIOCSDRAINWAIT ioctls is restored (this also makes the comcontrol(1) drainwait verb work again). - Manpages are updated to document these behaviors. Reviewed by: bde (prior version)	2017-01-12 00:48:06 +00:00
Mark Johnston	90e17792c8	Do not set BIO_DONE if the BIO specifies a completion handler. biowait() will otherwise race with completions of such BIOs. In-tree code only calls biowait() on BIOs that do not specify a handler, so this change should not have any functional impact. Reviewed by: mav MFC after: 1 month Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D9070	2017-01-10 21:41:28 +00:00
John Baldwin	14da48cbe4	Set MORETOCOME for AIO write requests on a socket. Add a MSG_MOREOTOCOME message flag. When this flag is set, sosend* set PRUS_MOREOTOCOME when invoking the protocol send method. The aio worker tasks for sending on a socket set this flag when there are additional write jobs waiting on the socket buffer. Reviewed by: adrian MFC after: 1 month Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D8955	2017-01-06 23:41:45 +00:00
Konstantin Belousov	6e89d383c7	Explicitely add "opt_compat.h" to kern_exec.c: fix powerpc LINT builds. sys/ptrace.h includes sys/signal.h, which includes sys/_sigset.h. Note that sys/_sigset.h only defines osigset_t if COMPAT_43 was defined. Two lines later, sys/ptrace.h includes machine/reg.h, which in case of powerpc, includes opt_compat.h. After the include headers reordering in r311345, we have sys/ptrace.h included before sys/sysproto.h. If COMPAT_43 was requested in the kernel config, the result is that sys/_sigset.h does not define osigset_t, but sys/sysproto.h sees COMPAT_43 and uses osigset_t. Fix this by explicitely including opt_compat.h to cover the whole kern/kern_exec.c scope. Sponsored by: The FreeBSD Foundation	2017-01-06 16:56:24 +00:00
Konstantin Belousov	2f304845e2	Do not allocate struct statfs on kernel stack. Right now size of the structure is 472 bytes on amd64, which is already large and stack allocations are indesirable. With the ino64 work, MNAMELEN is increased to 1024, which will make it impossible to have struct statfs on the stack. Extracted from: ino64 work by gleb Discussed with: mckusick Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week	2017-01-05 17:19:26 +00:00
Konstantin Belousov	607fa849d2	Some style fixes for getfstat(2)-related code. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2017-01-05 17:03:35 +00:00
Mark Johnston	ec492b13f1	Add a small allocator for exec_map entries. Upon each execve, we allocate a KVA range for use in copying data to the new image. Pages must be faulted into the range, and when the range is freed, the backing pages are freed and their mappings are destroyed. This is a lot of needless overhead, and the exec_map management becomes a bottleneck when many CPUs are executing execve concurrently. Moreover, the number of available ranges is fixed at 16, which is insufficient on large systems and potentially excessive on 32-bit systems. The new allocator reduces overhead by making exec_map allocations persistent. When a range is freed, pages backing the range are marked clean and made easy to reclaim. With this change, the exec_map is sized based on the number of CPUs. Reviewed by: kib MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D8921	2017-01-05 01:44:12 +00:00
Mark Johnston	eeeaa7ba22	Sort includes in kern_exec.c. MFC after: 1 week	2017-01-05 01:28:08 +00:00
Gleb Smirnoff	bfc8c24c73	Move bogus_page declaration to vm_page.h and initialization to vm_page.c. Reviewed by: kib	2017-01-04 22:27:19 +00:00
Konstantin Belousov	6c4338f2ef	The callers of kern_getfsstat(UIO_SYSSPACE) expect that buf always returns memory which must be freed, regardless of the error. Assign NULL to buf in case we are not going to allocate any memory due to invalid mode. Reported and tested by: pho Reviewed by: jhb Sponsored by: The FreeBSD Foundation MFC after: 3 weeks (together with r310638) Differential revision: https://reviews.freebsd.org/D9042	2017-01-04 16:09:45 +00:00
Edward Tomasz Napierala	5ec7cde488	Fix bug that would result in a kernel crash in some cases involving a symlink and an autofs mount request. The crash was caused by namei() calling bcopy() with a negative length, caused by numeric underflow: in lookup(), in the relookup path, the ni_pathlen was decremented too many times. The bug was introduced in r296715. Big thanks to Alex Deiter for his help with debugging this. Reviewed by: kib@ Tested by: Alex Deiter <alex.deiter at gmail.com> MFC after: 1 month	2017-01-04 14:43:57 +00:00
Mateusz Guzik	391df78ad4	mtx: plug open-coded mtx_lock access missed in r311172	2017-01-04 02:25:31 +00:00
Mateusz Guzik	5e5ad162ad	Reduce lock accesses in thread lock similarly to r311172.	2017-01-03 23:08:11 +00:00
Mateusz Guzik	2604eb9e17	mtx: reduce lock accesses Instead of spuriously re-reading the lock value, read it once. This change also has a side effect of fixing a performance bug: on failed _mtx_obtain_lock, it was possible that re-read would find the lock is unowned, but in this case the primitive would make a trip through turnstile code. This is diff reduction to a variant which uses atomic_fcmpset. Discussed with: jhb (previous version) Tested by: pho (previous version)	2017-01-03 21:36:15 +00:00
Konstantin Belousov	7ee34a31fd	There is no need to use temporary statfs buffer for fsid obliteration and prison enforcement. Do it on the caller buffer directly. Besides eliminating memory copies, this change also removes large structure from the kernel stack. Extracted from: ino64 work by gleb Sponsored by: The FreeBSD Foundation MFC after: 1 week	2017-01-02 18:59:23 +00:00
Konstantin Belousov	b961dc3193	Style. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2017-01-02 18:49:48 +00:00
Konstantin Belousov	f2af4041fa	Move common code from kern_statfs() and kern_fstatfs() into a new helper. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2017-01-02 18:20:22 +00:00
Mark Johnston	b5442eba5c	Factor out instances of a knote detach followed by a knote_drop() call. Reviewed by: kib (previous version) MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D9015	2017-01-02 01:23:21 +00:00
Sean Bruno	1248952a50	2017 IFLIB updates in preparation for commits to e1000 and ixgbe. - iflib - add checksum in place support (mmacy) - iflib - initialize IP for TSO (going to be needed for e1000) (mmacy) - iflib - move isc_txrx from shared context to softc context (mmacy) - iflib - Normalize checks in TXQ drainage. (shurd) - iflib - Fix queue capping checks (mmacy) - iflib - Fix invalid assert, em can need 2 sentinels (mmacy) - iflib - let the driver determine what capabilities are set and what tx csum flags are used (mmacy) - add INVARIANTS debugging hooks to gtaskqueue enqueue (mmacy) - update bnxt(4) to support the changes to iflib (shurd) Some other various, sundry updates. Slightly more verbose changelog: Submitted by: mmacy@nextbsd.org Reviewed by: shurd mFC after: Sponsored by: LimeLight Networks and Dell EMC Isilon	2017-01-02 00:56:33 +00:00
Mateusz Guzik	d4db49c4c7	fd: access openfiles once in falloc_noinstall This is similar to what's done with nprocs. Note this is only a band aid.	2017-01-01 08:55:28 +00:00
Mateusz Guzik	41b0046a4d	vfs: switch nodes_created, recycles_count and free_owe_inact to counter(9) Reviewed by: kib	2016-12-31 19:59:31 +00:00
Mateusz Guzik	0b3b55a0f2	Remove cpu_spinwait after seq_consistent. It does not add any benefit as the read routine will do it as necessary.	2016-12-30 06:26:17 +00:00
Mateusz Guzik	4938d86764	cache: sprinkle __predict_false	2016-12-29 16:35:49 +00:00
Mateusz Guzik	b37707533e	cache: move shrink lock init to nchinit This gets rid of unnecesary sysinit usage. While here also rename the lock to be consistent with the rest.	2016-12-29 12:01:54 +00:00
Mateusz Guzik	0569bc9ca9	cache: depessimize hashing macros/inlines All hash sizes are power-of-2, but the compiler does not know that for sure and 'foo % size' forces doing a division. Store the size - 1 and use 'foo & hash' instead which allows mere shift.	2016-12-29 08:41:25 +00:00
Mateusz Guzik	6dd9661b77	cache: drop the NULL check from VP2VNODELOCK Now that negative entries are annotated with a dedicated flag, NULL vnodes are no longer passed.	2016-12-29 08:34:50 +00:00
John Baldwin	1fabda45c3	Regen after r310638. Differential Revision: https://reviews.freebsd.org/D8854	2016-12-27 20:22:17 +00:00
John Baldwin	34ed0c63c8	Rename the 'flags' argument to getfsstat() to 'mode' and validate it. This argument is not a bitmask of flags, but only accepts a single value. Fail with EINVAL if an invalid value is passed to 'flag'. Rename the 'flags' argument to getmntinfo(3) to 'mode' as well to match. This is a followup to r308088. Reviewed by: kib MFC after: 1 month	2016-12-27 20:21:11 +00:00
Konstantin Belousov	fd30dd7c26	Make knote KN_INFLUX state counted. This is final fix for the issue closed by r310302 for knote(). If KN_INFLUX \| KN_SCAN flags are set for the note passed to knote() or knote_fork(), i.e. the knote is scanned, we might erronously clear INFLUX when finishing notification. For normal knote() it was fixed in r310302 simply by remembering the fact that we do not own KN_INFLUX, since there we own knlist lock and scan thread cannot clear KN_INFLUX until we drop the lock. For knote_fork(), the situation is more complicated, e must drop knlist lock AKA the process lock, since we need to register new knotes. Change KN_INFLUX into counter and allow shared ownership of the in-flux state between scan and knote_fork() or knote(). Both in-flux setters need to ensure that knote is not dropped in parallel. Added assert about kn_influx == 1 in knote_drop() verifies that in-flux state is not shared when knote is destroyed. Since KBI of the struct knote is changed by addition of the int kn_influx field, reorder kn_hook and kn_hookid to fill pad on LP64 arches [1]. This keeps sizeof(struct knote) to same 128 bytes as it was before addition of kn_influx, on amd64. Reviewed by: markj Suggested by: markj [1] Tested by: pho (previous version) Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D8898	2016-12-26 19:33:40 +00:00
Konstantin Belousov	5c36b2e8cb	Change knlist_destroy() to assert that knlist is empty instead of accepting the wrong state and printing warning. Do not obliterate kl_lock and kl_unlock pointers, they are often useful for post-mortem analysis. Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks X-Differential revision: https://reviews.freebsd.org/D8898	2016-12-26 19:28:10 +00:00
Konstantin Belousov	34311568dc	Style. Reviewed by: markj Sponsored by: The FreeBSD Foundation MFC after: 1 week X-Differential revision: https://reviews.freebsd.org/D8898	2016-12-26 19:26:40 +00:00
Konstantin Belousov	fc05543fa7	Some optimizations for kqueue timers. There is no need to do two allocations per kqueue timer. Gather all data needed by the timer callout into the structure and allocate it at once. Use the structure to preserve the result of timer2sbintime(), to not perform repeated 64bit calculations in callout. Remove tautological casts. Remove now unused p_nexttime [1]. Noted by: markj [1] Reviewed by: markj (previous version) Sponsored by: The FreeBSD Foundation MFC after: 1 week X-MFC note: do not remove p_nexttime Differential revision: https://reviews.freebsd.org/D8901	2016-12-25 19:49:35 +00:00
Konstantin Belousov	7611b72816	Some style. Reviewed by: markj Sponsored by: The FreeBSD Foundation MFC after: 1 week X-Differential revision: https://reviews.freebsd.org/D8901	2016-12-25 19:38:07 +00:00
Mark Johnston	eab80d9276	Add a comment explaining the race fixed by r310423. Suggested and reviewed by: jhb X-MFC With: r310423	2016-12-23 05:02:17 +00:00
Mark Johnston	aa3c544349	Revert part of r300109. The removal of TAILQ_FOREACH_SAFE introduced a small race: when the last thread on a sleepqueue is awoken, it reclaims the sleepqueue and may begin executing on a different CPU before sleepq_resume_thread() returns. This leaves a window during which it may go back to sleep and incorrectly be awoken again by the caller of sleepq_broadcast(). Reported and tested by: pho MFC after: 3 days Sponsored by: Dell EMC Isilon	2016-12-22 17:51:44 +00:00
John Baldwin	99bc7e4123	Don't spin in pause() during early boot for kthreads other than thread0. pause() uses a spin loop to simulate a sleep during early boot. However, we only need this for thread0 to get far enough in the boot process to enable timers (at which point pause() can sleep). For other kthreads, sleeping in pause() is ok as the callout will be scheduled and will eventually fire once thread0 initializes timers. Tested by: Steven Kargl Sleuthing by: markj MFC after: 1 week Sponsored by: Netflix	2016-12-20 19:44:44 +00:00
Konstantin Belousov	4afd808be7	Do not clear KN_INFLUX when not owning influx state. For notes in KN_INFLUX\|KN_SCAN state, the influx bit is set by a parallel scan. When knote() reports event for the vnode filters, which require kqueue unlocked, it unconditionally sets and then clears influx to keep note around kqueue unlock. There, do not clear influx flag if a scan set it, since we do not own it, instead we prevent scan from executing by holding knlist lock. The knote_fork() function has somewhat similar problem, it might set KN_INFLUX for scanned note, drop kqueue and list locks, and then clear the flag after relock. A solution there would be different enough, as well as the test program, so close the reported issue first. Reported and test case provided by: yjh0502@gmail.com PR: 214923 Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week	2016-12-19 22:18:36 +00:00
Konstantin Belousov	69baec3619	Switch from stdatomic.h to atomic.h for kernel. Apparently stdatomic.h implementation for gcc 4.2 on sparc64 does not work properly. This effectively reverts r251803. Reported and tested by: lidl Discussed with: ed Sponsored by: The FreeBSD Foundation MFC after: 1 week	2016-12-16 17:41:20 +00:00
Ed Schouten	669a25b50d	Document the existence of the {0, 6, ...} sysctl.	2016-12-15 15:45:11 +00:00
Jilles Tjoelker	b9a6fb9343	reaper: Make REAPER_KILL_SUBTREE actually work. MFC after: 2 weeks	2016-12-14 22:49:20 +00:00
Ed Schouten	ae15715360	Add a "device_index" label to all sysctls under dev.$driver.$index. This way it becomes possible to graph a property for all instances of a single driver. For example, graphing the number of packets across all USB controllers, the amount of dropped packets on all NICs, etc. Reviewed by: cem Differential Revision: https://reviews.freebsd.org/D8775	2016-12-14 13:03:01 +00:00
Ed Schouten	fd0f59709d	Add labels to sysctls related to clocks. Sysctls like kern.eventtimer.et.*.quality currently embed the name of the clock device. This is problematic for the Prometheus metrics exporter for two reasons: - Some of those clocks have dashes in their names, which Prometheus doesn't allow to be used in metric names. - It doesn't allow for extracting the same property of all clocks on the system from within a single query. Attach these nodes to have a label, so that the Prometheus metrics exporter gives these metric a uniform name with the name of the clock attached as a label. Reviewed by: cem Differential Revision: https://reviews.freebsd.org/D8775	2016-12-14 12:56:58 +00:00
Ed Schouten	1e1f3941e4	Add support for attaching aggregation labels to sysctl objects. I'm currently working on writing a metrics exporter for the Prometheus monitoring system to provide access to sysctl metrics. Prometheus and sysctl have some structural differences: - sysctl is a tree of string component names. - Prometheus uses a flat namespace for its metrics, but allows you to attach labels with values to them, so that you can do aggregation. An initial version of my exporter simply translated hw.acpi.thermal.tz1.temperature to sysctl_hw_acpi_thermal_tz1_temperature_celcius while we should ideally have sysctl_hw_acpi_thermal_temperature_celcius{thermal_zone="tz1"} allowing you to graph all thermal zones on a system in one go. The change presented in this commit adds support for accomplishing this, by providing the ability to attach labels to nodes. In the example I gave above, the label "thermal_zone" would be attached to "tz1". As this is a feature that will only be used very rarely, I decided to not change the KPI too aggressively. Discussed on: hackers@ Reviewed by: cem Differential Revision: https://reviews.freebsd.org/D8775	2016-12-14 12:47:34 +00:00
Gleb Smirnoff	1276a8363c	Zero return value when counter_rate() switches over to next second and value is positive, but below the limit.	2016-12-13 20:11:45 +00:00
Mateusz Guzik	25e578de55	vfs: use vrefact in getcwd and fchdir	2016-12-12 19:16:35 +00:00
Edward Tomasz Napierala	e3d4c4dcde	Undo r309891. Konstantin is right in that this condition normally cannot happen - the um_dev field is assigned at mount and never written to afterwards.	2016-12-12 19:11:04 +00:00
Mateusz Guzik	5afb134c32	vfs: add vrefact, to be used when the vnode has to be already active This allows blind increment of relevant counters which under contention is cheaper than inc-not-zero loops at least on amd64. Use it in some of the places which are guaranteed to see already active vnodes. Reviewed by: kib (previous version)	2016-12-12 15:37:11 +00:00
Edward Tomasz Napierala	223cb0e434	Avoid dereferencing NULL pointers in devtoname(). I've seen it panic, called from ufs_print() in DDB. MFC after: 1 month	2016-12-12 15:22:21 +00:00
Konstantin Belousov	778aa66a68	Enable lookup_cap_dotdot and lookup_cap_dotdot_nonlocal. Requested and reviewed by: cem Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D8746	2016-12-12 11:12:04 +00:00
Konstantin Belousov	545d312293	When a zombie gets reparented due to the parent exit, send SIGCHLD to the reaper. The traditional reaper init(8) is aware of zombies silently reparented to it after the parents exit, it loops around waitpid(2) to collect them. For other reapers, the silent reparenting is surprising and collecting zombies requires a thread blocking in waitpid(2) just for that purpose. It seems that sending second SIGCHLD is a better workaround than forcing all reapers to obey the setup. Reported by: Michael Zuo <muh.muhten@gmail.com>, jilles PR: 213928 Reviewed by: jilles (previous version) Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2016-12-12 11:11:50 +00:00
Alan Cox	2d612d2dd2	When tmpfs and POSIX shm pagein a page for the sole purpose of performing truncation, immediately queue the page for asynchronous laundering rather than making the page pass through inactive queue first. Reviewed by: kib, markj	2016-12-11 19:24:41 +00:00
Konrad Witaszczyk	480f31c214	Add support for encrypted kernel crash dumps. Changes include modifications in kernel crash dump routines, dumpon(8) and savecore(8). A new tool called decryptcore(8) was added. A new DIOCSKERNELDUMP I/O control was added to send a kernel crash dump configuration in the diocskerneldump_arg structure to the kernel. The old DIOCSKERNELDUMP I/O control was renamed to DIOCSKERNELDUMP_FREEBSD11 for backward ABI compatibility. dumpon(8) generates an one-time random symmetric key and encrypts it using an RSA public key in capability mode. Currently only AES-256-CBC is supported but EKCD was designed to implement support for other algorithms in the future. The public key is chosen using the -k flag. The dumpon rc(8) script can do this automatically during startup using the dumppubkey rc.conf(5) variable. Once the keys are calculated dumpon sends them to the kernel via DIOCSKERNELDUMP I/O control. When the kernel receives the DIOCSKERNELDUMP I/O control it generates a random IV and sets up the key schedule for the specified algorithm. Each time the kernel tries to write a crash dump to the dump device, the IV is replaced by a SHA-256 hash of the previous value. This is intended to make a possible differential cryptanalysis harder since it is possible to write multiple crash dumps without reboot by repeating the following commands: # sysctl debug.kdb.enter=1 db> call doadump(0) db> continue # savecore A kernel dump key consists of an algorithm identifier, an IV and an encrypted symmetric key. The kernel dump key size is included in a kernel dump header. The size is an unsigned 32-bit integer and it is aligned to a block size. The header structure has 512 bytes to match the block size so it was required to make a panic string 4 bytes shorter to add a new field to the header structure. If the kernel dump key size in the header is nonzero it is assumed that the kernel dump key is placed after the first header on the dump device and the core dump is encrypted. Separate functions were implemented to write the kernel dump header and the kernel dump key as they need to be unencrypted. The dump_write function encrypts data if the kernel was compiled with the EKCD option. Encrypted kernel textdumps are not supported due to the way they are constructed which makes it impossible to use the CBC mode for encryption. It should be also noted that textdumps don't contain sensitive data by design as a user decides what information should be dumped. savecore(8) writes the kernel dump key to a key.# file if its size in the header is nonzero. # is the number of the current core dump. decryptcore(8) decrypts the core dump using a private RSA key and the kernel dump key. This is performed by a child process in capability mode. If the decryption was not successful the parent process removes a partially decrypted core dump. Description on how to encrypt crash dumps was added to the decryptcore(8), dumpon(8), rc.conf(5) and savecore(8) manual pages. EKCD was tested on amd64 using bhyve and i386, mipsel and sparc64 using QEMU. The feature still has to be tested on arm and arm64 as it wasn't possible to run FreeBSD due to the problems with QEMU emulation and lack of hardware. Designed by: def, pjd Reviewed by: cem, oshogbo, pjd Partial review: delphij, emaste, jhb, kib Approved by: pjd (mentor) Differential Revision: https://reviews.freebsd.org/D4712	2016-12-10 16:20:39 +00:00
Mark Johnston	02315a6759	Use a consistent snapshot of the lock state in owner_mtx(). MFC after: 2 weeks	2016-12-10 02:59:34 +00:00
Mark Johnston	c365a2934e	Return a non-NULL owner only if the lock is exclusively held in owner_sx(). Fix some whitespace bugs while here. MFC after: 2 weeks	2016-12-10 02:56:44 +00:00
Gleb Smirnoff	5040da77c1	Use acquire write to cr_lock to complement with release write at end of locked region. Submitted by: kib	2016-12-09 19:07:31 +00:00
Gleb Smirnoff	169170209c	Provide counter_ratecheck(), a MP-friendly substitution to ppsratecheck(). When rated event happens at a very quick rate, the ppsratecheck() is not only racy, but also becomes a performance bottleneck. Together with: rrs, jtl	2016-12-09 17:58:34 +00:00
Robert Watson	52b42f6287	Regnerate system-call definitions following r309677 correcting a whitespace glitch in syscalls.master.	2016-12-07 16:12:27 +00:00
Robert Watson	82d8d2b8bc	Replace spaces with tabs in definition of SCTP system calls, for consistency with the remainder of the syscalls.master file. This problem does not occur in the freebsd32 version of the same system calls.	2016-12-07 16:11:55 +00:00
Eric van Gyzen	3d32d4a7c9	Export the whole thread name in kinfo_proc kinfo_proc::ki_tdname is three characters shorter than thread::td_name. Add a ki_moretdname field for these three extra characters. Add the new field to kinfo_proc32, as well. Update all in-tree consumers to read the new field and assemble the full name, except for lldb's HostThreadFreeBSD.cpp, which I will handle separately. Bump __FreeBSD_version. Reviewed by: kib MFC after: 1 week Relnotes: yes Sponsored by: Dell EMC Differential Revision: https://reviews.freebsd.org/D8722	2016-12-07 15:04:22 +00:00
Konstantin Belousov	435da98564	Restructure the code to handle reporting of non-exited processes from wait(2). - Do not acquire the process spinlock if neither WTRAPPED nor WUNTRACED options were passed [1]. - Extract the code to report alive process into a new helper report_alive_proc() and use it for trapped, stopped and continued childrens. Note that the process spinlock is required around the WTRAPPED and WUNTRACED tests, because P_STOPPED_TRACE and P_STOPPED_SIG flags are set before other threads are stopped at the suspension point, and that threads increment p_suspcount while owning only the process spinlock, the process lock is dropped by them. If the spinlock is not taken for tests, the syscall thread might miss both p_suspcount increment and wakeup in wakeup in thread_suspend_switch(). Based on the submission by: mjg [1] Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week	2016-12-04 20:44:58 +00:00
Eric van Gyzen	ff07dd913e	thr_set_name(): silently truncate the given name as needed Instead of failing with ENAMETOOLONG, which is swallowed by pthread_set_name_np() anyway, truncate the given name to MAXCOMLEN+1 bytes. This is more likely what the user wants, and saves the caller from truncating it before the call (which was the only recourse). Polish pthread_set_name_np(3) and add a .Xr to thr_set_name(2) so the user might find the documentation for this behavior. Reviewed by: jilles MFC after: 3 days Sponsored by: Dell EMC	2016-12-03 01:14:21 +00:00
Mateusz Guzik	a2d3554542	vfs: provide fake locking primitives for the crossmp vnode Since the vnode is only expected to be shared locked, we can save a little overhead by only pretending we are locking in the first place. Reviewed by: kib Tested by: pho	2016-12-02 18:03:15 +00:00
Mateusz Guzik	a4ce25b5b0	vfs: fix a whitespace nit in r309307	2016-11-30 02:17:03 +00:00
Mateusz Guzik	1babea0341	vfs: avoid VOP_ISLOCKED in the common case in lookup	2016-11-30 02:14:53 +00:00
Mark Johnston	64910ddbff	Launder VPO_NOSYNC pages upon vnode deactivation. As of r234483, vnode deactivation causes non-VPO_NOSYNC pages to be laundered. This behaviour has two problems: 1. Dirty VPO_NOSYNC pages must be laundered before the vnode can be reclaimed, and this work may be unfairly deferred to the vnlru process or an unrelated application when the system is under vnode pressure. 2. Deactivation of a vnode with dirty VPO_NOSYNC pages requires a scan of the corresponding VM object's memq for non-VPO_NOSYNC dirty pages; if the laundry thread needs to launder pages from an unreferenced such vnode, it will reactivate and deactivate the vnode with each laundering, potentially resulting in a large number of expensive scans. Therefore, ensure that all dirty pages are laundered upon deactivation, i.e., when all maps of the vnode are removed and all references are released. Reviewed by: alc, kib MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D8641	2016-11-26 21:00:27 +00:00
John Baldwin	9f3aabb9eb	Permit timed sleeps for threads other than thread0 before timers are working. The callout subsystem already handles early callouts and schedules the first clock interrupt appropriately based on the currently pending callouts. The one nit to fix was that callouts scheduled via C_HARDCLOCK during early boot could fire too early once timers were enabled as the per-CPU base time is always zero until timers are initialized. The change in callout_when() handles this case by using the current uptime as the base time of the callout during bootup if the per-CPU base time is zero. Reviewed by: kib MFC after: 2 weeks Sponsored by: Netflix	2016-11-25 18:02:43 +00:00
Mateusz Guzik	746b6e8176	wait: avoid relocking the child if proc_to_reap returns 1 proc_to_reap would always unlock. However, if it returned 1, kern_wait6 would immediately lock it again. Save the dance. Reviewed by: kib	2016-11-24 18:21:48 +00:00
Mateusz Guzik	8b0e0c91e0	cache: ensure that the number of bucket locks does not exceed hash size The size can be changed by side effect of modifying kern.maxvnodes. Since numbucketlocks was not modified, setting a sufficiently low value would give more locks than actual buckets, which would then lead to corruption. Force the number of buckets to be not smaller. Note this should not matter for real world cases. Reported and tested by: pho	2016-11-23 19:50:12 +00:00
Mark Johnston	99e6e1930c	Release laundered vnode pages to the head of the inactive queue. The swap pager enqueues laundered pages near the head of the inactive queue to avoid another trip through LRU before reclamation. This change adds support for this behaviour to the vnode pager and makes use of it in UFS and ext2fs. Some ioflag handling is consolidated into a common subroutine so that this support can be easily extended to other filesystems which make use of the buffer cache. No changes are needed for ZFS since its putpages routine always undirties the pages before returning, and the laundry thread requeues the pages appropriately in this case. Reviewed by: alc, kib Differential Revision: https://reviews.freebsd.org/D8589	2016-11-23 17:53:07 +00:00
Ruslan Bukin	dd7d4f199e	Revert r306186 ("Adjust the sopt_val pointer on bigendian systems"). This logic doesn't work with bigger sopt_valsize (e.g. when ipfw passing 2048 bytes rule). Reported by: adrian Sponsored by: DARPA, AFRL	2016-11-22 18:31:43 +00:00
Konstantin Belousov	eb962424ba	Restore vnode pager statistic for buffer pagers. Reviewed by: alc, markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D8585	2016-11-22 10:06:39 +00:00
John Baldwin	5d8cce1764	Initialize 'ticks' earlier in boot after 'hz' is set. This avoids the time-warp after kthreads have started running and the required fixup to td_slptick and td_blktick in the EARLY_AP_STARTUP case. Now, 'ticks' is initialized before any kthreads are created or any context switches are performed. Tested by: gavin MFC after: 2 weeks Sponsored by: Netflix	2016-11-22 01:02:59 +00:00
Robert Watson	1279fdafce	Audit 'fd' and 'cmd' arguments to fcntl(2), and when generating BSM, always audit the file-descriptor number and vnode information for all fnctl(2) commands, not just locking-related ones. This was likely an oversight in the original adaptation of this code from XNU. MFC after: 3 days Sponsored by: DARPA, AFRL	2016-11-22 00:41:24 +00:00
Gleb Smirnoff	00b5ffde8e	Add flag SF_USER_READAHEAD to sendfile(2). When specified, the syscall won't do any speculations about readahead, and use exactly the amount of readahead specified by user. E.g. setting SF_FLAGS(0, SF_USER_READAHEAD) will guarantee that no readahead at all will be performed.	2016-11-17 21:36:18 +00:00
Gleb Smirnoff	5dba303d01	Use bogus_page to properly reduce number of I/Os in sendfile(2). The new sendfile_swapin() loop works this way: - Find first invalid page in the request. - Do vm_pager_has_page() and get count of pages, that can be taken in single I/O. - Trim valid pages from the end of the request. - Cycle through the request and substitute to bogus_page all valid pages that are in the middle of the request. - After I/O launched (pager copies array of pages into buf(9), it is important to restore proper page pointers with help vm_page_lookup(). Count bogus pages used and report them in sendfile stats.	2016-11-17 21:02:55 +00:00
Ruslan Bukin	6e18247a3d	Fix build when no INET and INET6 in kernel config. Submitted by: kan Sponsored by: DARPA, AFRL	2016-11-17 16:13:30 +00:00
Alan Cox	7667839a7e	Remove most of the code for implementing PG_CACHED pages. (This change does not remove user-space visible fields from vm_cnt or all of the references to cached pages from comments. Those changes will come later.) Reviewed by: kib, markj Tested by: pho Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D8497	2016-11-15 18:22:50 +00:00
Mateusz Guzik	6ce45c6ac3	cache: plug a write-only variable in cache_negative_zap_one	2016-11-15 03:43:10 +00:00
Mateusz Guzik	317cac6d5a	cache: fix a race between entry removal and demotion The negative list shrinker can demote an entry with only hotlist + neglist locks held. On the other hand entry removal possibly sets the NCF_DVDROP without aformentioned locks held prior to detaching it from the respective netlist., which can lose the update made by the shrinker. Reported and tested by: truckman	2016-11-15 03:38:05 +00:00
Adrian Chadd	8ffa01a061	[mips] enable relbuf on mips for now to work around page aliasing in mips hardware. Although the higher end MIPS hardware handles cache aliasing issues in hardware, the older cores (r4k, etc) and some compile versions of the newer cores (mips24k, mips34k, mips74k) don't have this feature. This means we end up with some very unfortunate behaviour that was made very obvious by some recent changes to the FFS pager by kib. So, flip this off until we get our MIPS pmap/cache code upgraded to handle aliased pages in software. Discussed with: kib, bsdimp, juli	2016-11-15 01:41:45 +00:00
Adrian Chadd	0046bef85a	[mips] make UMTX_CHAINS configurable at compile time. The default (512) wastes quite a bit of space which doesn't really buy us much on highly embedded systems which don't take a lot of locks in parallel. This makes it at least build time configurable so people can experiment.	2016-11-15 01:34:38 +00:00
Konstantin Belousov	ae44bb0146	Initialize reserved bytes in struct mq_attr and its 32compat counterpart, to avoid kernel stack content leak in kmq_setattr(2) syscall. Also slightly simplify the checks around copyout()s. Reported by: Vlad Tsyrklevich <vlad902+spam@gmail.com> PR: 214488 MFC after: 1 week	2016-11-14 13:20:10 +00:00
Konstantin Belousov	714b7df502	Provide simple mutual exclusion between mount point update and unmount. Currently mount update keeps vfs_busy(9) reference on the mount point during MNT_UPDATE VFS_MOUNT() vfsops call. This already provides the exclusion, but is problematic for filesystems which need to perform namei(9) during VFS_MOUNT(MNT_UPDATE) operations, e.g. to refresh mnt_from path, because namei(9) must not be called while the vfs_busy(9) reference is owned. Check for MNT_UPDATE flag before setting MNTK_UNMOUNT, and for MNTK_UNMOUNT before entering innards of vfs_domount_update(), failing syscalls with EBUSY if conflict is detected. Keep vfs_busy(9) reference around VFS_MOUNT(MNT_UPDATE) calls still to not change VFS KPI. In the update path in ffs_mount(), drop vfs_busy() reference around namei(), which is now safe due to unmount never executing in parallel with VFS_MOUNT(MNT_UPDATE), and which avoids the deadlock. Reported and tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2016-11-13 21:49:51 +00:00
Konstantin Belousov	9eb8f495b8	Move common cleanup code into helper. Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week	2016-11-13 21:39:55 +00:00
Justin Hibbits	6487a709f3	Add two new ddb commands: show device/show all devices Shows several useful pieces of information from the device including the softc and ivars pointers.	2016-11-13 00:46:11 +00:00
John Baldwin	892f0ab0ab	Allow scheduling during early boot. - Send IPI wakeups once SMP is started even if cold is true. - Permit preemptions when cold is true. These changes are needed for EARLY_AP_STARTUP. MFC after: 2 weeks Sponsored by: Netflix	2016-11-12 00:23:09 +00:00
John Baldwin	a6b91f0f45	Don't place threads on the run queue after waking up other CPUs. The other CPU might resume and see a still-empty runq and go back to sleep before sched_add() adds the thread to the runq. This results in a lost wakeup and a potential hang if the system is otherwise completely idle. The race originated due to a micro-optimization (my fault) in 4BSD in that it avoided putting a thread on the run queue if the scheduler was going to preempt to the new thread. To avoid complexity while fixing this race, just drop this optimization. 4BSD now always sets the "owepreempt" flag when a preemption is warranted and defers the actual preemption to the thread_unlock of the caller the same as ULE. MFC after: 2 weeks Sponsored by: Netflix	2016-11-12 00:14:13 +00:00
Bryan Drewery	28323add09	Fix improper use of "its". Sponsored by: Dell EMC Isilon	2016-11-08 23:59:41 +00:00
Konstantin Belousov	9a639daf77	Tweaks for the buffer pager. Pass current thread credentials instead of NOCRED. Only allow unmapped buffers for filesystem which proclaimed the support. For all filesystems which currently use buffer pager (UFS, msdosfs and cd9660), the changes are effectively nop. Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week	2016-11-08 10:10:55 +00:00
Konstantin Belousov	9bd4f0a2c6	vn_fullpath1() checked VV_ROOT and then unreferenced vp->v_mount->mnt_vnodecovered unlocked. This allowed unmount to race. Lock vnode after we noticed the VV_ROOT flag. See comments for explanation why unlocked check for the flag is considered safe. Reported and tested by: avg Sponsored by: The FreeBSD Foundation MFC after: 1 week	2016-11-07 10:55:56 +00:00
Konstantin Belousov	75409ce1dd	Remove remnants of the recursive sleep support. Instead assert that we never try to sleep while the thread is on a sleepqueue. Reviewed by: jhb Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D8422	2016-11-02 20:57:20 +00:00
Konstantin Belousov	7359fdcf5f	Allow some dotdot lookups in capability mode. If dotdot lookup does not escape from the file descriptor passed as the lookup root, we can allow the component traversal. Track the directories traversed, and check the result of dotdot lookup against the recorded list of the directory vnodes. Dotdot lookups are enabled by sysctl vfs.lookup_cap_dotdot, currently disabled by default until more verification of the approach is done. Disallow non-local filesystems for dotdot, since remote server might conspire with the local process to allow it to escape the namespace. This might be too cautious, provide the knob vfs.lookup_cap_dotdot_nonlocal to override as well. Idea by: rwatson Discussed with: emaste, jonathan, rwatson Reviewed by: mjg (previous version) Tested by: pho (previous version) Sponsored by: The FreeBSD Foundation MFC after: 2 week Differential revision: https://reviews.freebsd.org/D8110	2016-11-02 12:43:15 +00:00
Konstantin Belousov	1bf6a0900d	Remove tautological casts. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2016-11-02 12:10:39 +00:00
Konstantin Belousov	ec84693535	Style fixes. Discussed with: emaste Sponsored by: The FreeBSD Foundation MFC after: 1 week	2016-11-02 12:02:31 +00:00
Edward Tomasz Napierala	53ae7e833c	Fix getfsstat(2) with MNT_WAIT to not skip filesystems that are in the process of being unmounted. Previously it would skip them, even if the unmount eventually failed eg due to the filesystem being busy. This behaviour broke autounmountd(8) - if you tried to manually unmount a mounted filesystem, using 'automount -u', and the autounmountd attempted to refresh the filesystem list in that very moment, it would conclude that the filesystem got unmounted and not try to unmount it afterwards. Reviewed by: kib@ Tested by: pho@ MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D8030	2016-11-02 09:43:19 +00:00
Conrad Meyer	8532d381a9	Add BUF_TRACKING and FULL_BUF_TRACKING buffer debugging Upstream the BUF_TRACKING and FULL_BUF_TRACKING buffer debugging code. This can be handy in tracking down what code touched hung bios and bufs last. The full history is especially useful, but adds enough bloat that it shouldn't be enabled in release builds. Function names (or arbitrary string constants) are tracked in a fixed-size ring in bufs. Bios gain a pointer to the upper buf for tracking. SCSI CCBs gain a pointer to the upper bio for tracking. Reviewed by: markj Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D8366	2016-10-31 23:09:52 +00:00
Mark Johnston	ac91917211	Fix WITNESS hints for pagequeue locks. MFC after: 1 week	2016-10-29 20:01:48 +00:00
Edward Tomasz Napierala	6eeff7a7b2	Fix getfsstat(2) handling of flags. The 'flags' argument is an enum, not a bitfield. For the intended usage - being passed either MNT_WAIT, or MNT_NOWAIT - this shouldn't introduce any changes in behaviour. Reviewed by: jhb@ MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D8373	2016-10-29 12:38:30 +00:00
Konstantin Belousov	c39baa7480	Generalize UFS buffer pager to allow it serving other filesystems which also use buffer cache. Most important addition to the code is the handling of filesystems where the block size is less than the machine page size, which might require reading several buffers to validate single page. Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2016-10-28 11:43:59 +00:00
Marcel Moolenaar	07f862a769	Include <stdarg.h> instead of <machine/stdarg.h> when compiled as part of libsbuf. The former is the standard header, and allows us to compile libsbuf on macOS/linux.	2016-10-24 18:03:04 +00:00
Konstantin Belousov	835c2787be	Handle broadcast NMIs. On several Intel chipsets, diagnostic NMIs sent from BMC or NMIs reporting hardware errors are broadcasted to all CPUs. When kernel is configured to enter kdb on NMI, the outcome is problematic, because each CPU tries to enter kdb. All CPUs are executing NMI handlers, which set the latches disabling the nested NMI delivery; this means that stop_cpus_hard(), used by kdb_enter() to stop other cpus by broadcasting IPI_STOP_HARD NMI, cannot work. One indication of this is the harmless but annoying diagnostic "timeout stopping cpus". Much more harming behaviour is that because all CPUs try to enter kdb, and if ddb is used as debugger, all CPUs issue prompt on console and race for the input, not to mention the simultaneous use of the ddb shared state. Try to fix this by introducing a pseudo-lock for simultaneous attempts to handle NMIs. If one core happens to enter NMI trap handler, other cores see it and simulate reception of the IPI_STOP_HARD. More, generic_stop_cpus() avoids sending IPI_STOP_HARD and avoids waiting for the acknowledgement, relying on the nmi handler on other cores suspending and then restarting the CPU. Since it is impossible to detect at runtime whether some stray NMI is broadcast or unicast, add a knob for administrator (really developer) to configure debugging NMI handling mode. The updated patch was debugged with the help from Andrey Gapon (avg) and discussed with him. Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D8249	2016-10-24 16:40:27 +00:00
Konstantin Belousov	55ee7a4c5f	In the fueword64(9) wrapper for architectures which do not implemented native fueword64(9) still, use proper type for local where fuword64() result is stored. Note that fueword64() is unused in the tree. Submitted by: Chunhui He <hchunhui@mail.ustc.edu.cn> PR: 212520 MFC after: 1 week	2016-10-23 11:23:17 +00:00
Conrad Meyer	8798ef0679	ddb(4): Add sleepchains to "show allchains" Reported by: markj Reviewed by: markj Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D8320	2016-10-22 18:02:20 +00:00
Hiren Panchasara	9d71a3975e	Rework r306337. In sendit(), if mp->msg_control is present, then in sockargs() we are allocating mbuf to store mp->msg_control. Later in kern_sendit(), call to getsock_cap(), will check validity of file pointer passed, if this fails EBADF is returned but mbuf allocated in sockargs() is not freed. Made code changes to free the same. Since freeing control mbuf in sendit() after checking (control != NULL) may lead to double freeing of control mbuf in sendit(), we can free control mbuf in kern_sendit() if there are any errors in the routine. Submitted by: Lohith Bellad <lohith.bellad@me.com> Reviewed by: glebius MFC after: 3 weeks Differential Revision: https://reviews.freebsd.org/D8152	2016-10-21 18:27:30 +00:00
Mariusz Zaborski	4b83a77606	capsicum: perform copyout without the fildesc lock held in sys_cap_ioctls_get Reviewed by: pjd	2016-10-21 16:12:23 +00:00
Mateusz Guzik	bb697a20d7	cache: fix up a corner case in r307650 If no negative entry is found on the last list, the ncp pointer will be left uninitialized and a non-null value will make the function assume an entry was found. Fix the problem by initializing to NULL on entry. Reported by: glebius	2016-10-20 19:55:50 +00:00
Kevin Lo	61f481fb7e	Remove register keyword. Reviewed by: kib	2016-10-20 01:21:10 +00:00
Kevin Lo	7c68685366	Remove a sentence about putting initialization in init_proc.c or kern_proc.c and useless comment. Reviewed by: kib	2016-10-20 01:19:37 +00:00
Sean Bruno	026204b4c6	Resolve whitespace diff to NextBSD. Check to see that the taskqueue thread count requires us to acutally iterate over the thread count to bind to cpus. Submitted by: mmacy@nextbsd.org	2016-10-19 21:01:24 +00:00
Mateusz Guzik	53dc58f2dc	Mark a bunch of mpsafe sysctls as such. This gives me a sysctl Giant-free buildworld.	2016-10-19 19:42:01 +00:00
Mateusz Guzik	a45a1a25b8	cache: split negative entry LRU into multiple lists This splits the ncneg_mtx lock while preserving the hit ratio at least during buildworld. Create N dedicated lists for new negative entries. Entries with at least one hit get promoted to the hot list, where they get requeued every M hits. Shrinking demotes one hot entry and performs a round-robin shrinking of regular lists. Reviewed by: kib	2016-10-19 18:29:52 +00:00
Sean Bruno	abf38392c6	Assert that we're assigning a non-null taskqueue. ref: `535865d02c` Fix cpu assignment by assuring stride is non-zero, assert that all tasks have a valid taskqueue. ref: `db39817623` Start cpu assignment from zero. ref: `d99d39b6b6` Submitted by: mmacy@nextbsd.org	2016-10-18 14:00:26 +00:00
Sean Bruno	12d1b8c9f3	Ensure that tasks with a specific cpu set prior to smp starting get re-attached to a thread running on that cpu. ref: `fcc20e306b` Submitted by: mmacy@nextbsd.org	2016-10-18 13:55:34 +00:00
Sean Bruno	dc35f36560	Tell gtask to what we've been bound. ref: `54414984cf` Submitted by: mmacy@nextbsd.org	2016-10-18 13:16:27 +00:00
Ed Maste	9e62195361	makesyscalls.sh: remove trailing space on the "created from" line In r10905 and r10906 makesyscalls was modified to avoid emitting a literal $Id$ string in the generated file, with: gsub("[$]Id: ", "", $0) gsub(" [$]", "", $0) Then r11294 added some functionality and also tried to address the $Id$ problem in a different way, by removing every $: sed -e 's/\$//g ... This rendered the gsub infeffective. The gsub was later updated to track the $Id$ -> $FreeBSD$ switch, even though it did not do anything. Revert the addition of the s/\$//g, and update the gsub to keep the resulting format the same. Discussed with: bde MFC after: 1 week Sponsored by: The FreeBSD Foundation	2016-10-17 13:52:24 +00:00
Hans Petter Selasky	d3bf5efc1f	Fix device delete child function. When detaching device trees parent devices must be detached prior to detaching its children. This is because parent devices can have pointers to the child devices in their softcs which are not invalidated by device_delete_child(). This can cause use after free issues and panic(). Device drivers implementing trees, must ensure its detach function detaches or deletes all its children before returning. While at it remove now redundant device_detach() calls before device_delete_child() and device_delete_children(), mostly in the USB controller drivers. Tested by: Jan Henrik Sylvester <me@janh.de> Reviewed by: jhb Differential Revision: https://reviews.freebsd.org/D8070 MFC after: 2 weeks	2016-10-17 10:20:38 +00:00
Konstantin Belousov	5975e53d40	Fix a race in vm_page_busy_sleep(9). Suppose that we have an exclusively busy page, and a thread which can accept shared-busy page. In this case, typical code waiting for the page xbusy state to pass is again: VM_OBJECT_WLOCK(object); ... if (vm_page_xbusied(m)) { vm_page_lock(m); VM_OBJECT_WUNLOCK(object); <---1 vm_page_busy_sleep(p, "vmopax"); goto again; } Suppose that the xbusy state owner locked the object, unbusied the page and unlocked the object after we are at the line [1], but before we executed the load of the busy_lock word in vm_page_busy_sleep(). If it happens that there is still no waiters recorded for the busy state, the xbusy owner did not acquired the page lock, so it proceeded. More, suppose that some other thread happen to share-busy the page after xbusy state was relinquished but before the m->busy_lock is read in vm_page_busy_sleep(). Again, that thread only needs vm_object lock to proceed. Then, vm_page_busy_sleep() reads busy_lock value equal to the VPB_SHARERS_WORD(1). In this case, all tests in vm_page_busy_sleep(9) pass and we are going to sleep, despite the page being share-busied. Update check for m->busy_lock == VPB_UNBUSIED in vm_page_busy_sleep(9) to also accept shared-busy state if we only wait for the xbusy state to pass. Merge sequential if()s with the same 'then' clause in vm_page_busy_sleep(). Note that the current code does not share-busy pages from parallel threads, the only way to have more that one sbusy owner is right now is to recurse. Reported and tested by: pho (previous version) Reviewed by: alc, markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D8196	2016-10-13 14:41:05 +00:00
Conrad Meyer	d9ce8a41ea	kern_linker: Handle module-loading failures in preloaded .ko files The runtime kernel loader, linker_load_file, unloads kernel files that failed to load all of their modules. For consistency, treat preloaded (loader.conf loaded) kernel files in the same way. Reviewed by: kib Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D8200	2016-10-13 02:06:23 +00:00
Ed Maste	2a059700b6	Use correct size type in do_setopt_accept_filter Submitted by: ecturt@gmail.com	2016-10-12 00:56:49 +00:00
Oleksandr Tymoshenko	609b0fe966	INTRNG - fix MSI/MSIX release path Use isrc in attached MSI data structure instead of using map's isrc directly. map's isrc is set to NULL on IRQ deactivation which happens prior to pci_release_msi so MSI_RELEASE_MSI receives array of NULLs Reviewed by: mmel Differential Revision: https://reviews.freebsd.org/D8206	2016-10-11 17:00:29 +00:00
Sean Bruno	1ee17b070d	Fix bug where malloc(.., M_NOWAIT) return value is not checked, Change to M_WAITOK and move outside the mutex Submitted by: shurd Reviewed by: mmacy@nextbsd.org MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D7649	2016-10-11 14:08:53 +00:00
Mateusz Guzik	45571f8886	vfs: assert empty tmp free list on unmount	2016-10-08 13:38:05 +00:00
Mateusz Guzik	c6c44ff7eb	vfs: clear the tmp free list flag before taking the free vnode list lock Safe access is already guaranteed because of the mnt_listmx lock.	2016-10-08 13:36:59 +00:00
Konstantin Belousov	f71d08566c	Limit scope of the optimization in r306608 to dounmount() caller only. Other uses of cache_purgevfs() do rely on the cache purge for correct operations, when paths are invalidated without unmount. Reported and tested by: jkim Discussed with: mjg Sponsored by: The FreeBSD Foundation	2016-10-07 11:38:28 +00:00
Bryan Drewery	32641585a9	vrefl: Assert that the interlock is held. Sponsored by: Dell EMC Isilon MFC after: 2 weeks	2016-10-06 18:10:19 +00:00
Bryan Drewery	5a22c9582c	Add vrecyclel() to vrecycle() a vnode with the interlock already held. Obtained from: OneFS Sponsored by: Dell EMC Isilon MFC after: 2 weeks	2016-10-06 18:09:22 +00:00
Conrad Meyer	f43292ecf4	vfs_bio: Remove a leading space (style) Introduced in r282085. Sponsored by: Dell EMC Isilon	2016-10-05 23:42:02 +00:00
Bryan Drewery	0617f64ec6	Correct some comments after r294299. Sponsored by: Dell EMC Isilon	2016-10-04 21:44:20 +00:00
Ed Maste	65eea7ede6	ANSIfy inflate.c Reviewed by: kib Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D8143	2016-10-04 17:57:30 +00:00
Konstantin Belousov	5420f76b59	Style. Reviewed by: emaste Sponsored by: The FreeBSD Foundation MFC after: 3 days	2016-10-04 15:23:03 +00:00
Mateusz Guzik	4876636eb7	cache: ignore purgevfs requests for filesystems with few vnodes purgevfs is purely optional and induces lock contention in workloads which frequently mount and unmount filesystems. In particular, poudriere will do this for filesystems with 4 vnodes or less. Full cache scan is clearly wasteful. Since there is no explicit counter for namecache entries, the number of vnodes used by the target fs is checked. The default limit is the number of bucket locks. Reviewed by: kib	2016-10-03 00:02:32 +00:00
Mateusz Guzik	5bb81f9b2d	vfs: batch free vnodes in per-mnt lists Previously free vnodes would always by directly returned to the global LRU list. With this change up to mnt_free_list_batch vnodes are collected first. syncer runs always return the batch regardless of its size. While vnodes on per-mnt lists are not counted as free, they can be returned in case of vnode shortage. Reviewed by: kib Tested by: pho	2016-09-30 17:27:17 +00:00
Mateusz Guzik	8660b707ff	vfs: remove the __bo_vnode field from struct vnode The pointer can be obtained using __containerof instead. Reviewed by: kib	2016-09-30 17:11:03 +00:00
Gleb Smirnoff	7ed6b78b92	Provide kern.maxphys sysctl, which returns MAXPHYS. Naming matches NetBSD.	2016-09-29 23:07:28 +00:00
Allan Jude	0176ca2ed5	Allow reading the following sysctl MIBs in capability mode: kern.hostname, kern.domainname, and kern.hostuuid This allows sandboxed applications to read these sysctls Submitted by: cem (original version) Reviewed by: cem, jonathan, rwatson (original version) Sponsored by: ScaleEngine Inc. Differential Revision: https://reviews.freebsd.org/D8015	2016-09-29 16:29:49 +00:00
Hans Petter Selasky	99eca1b2b3	While draining a timeout task prevent the taskqueue_enqueue_timeout() function from restarting the timer. Commonly taskqueue_enqueue_timeout() is called from within the task function itself without any checks for teardown. Then it can happen the timer stays active after the return of taskqueue_drain_timeout(), because the timeout and task is drained separately. This patch factors out the teardown flag into the timeout task itself, allowing existing code to stay as-is instead of applying a teardown flag to each and every of the timeout task consumers. Add assert to taskqueue_drain_timeout() which prevents parallel execution on the same timeout task. Update manual page documenting the return value of taskqueue_enqueue_timeout(). Differential Revision: https://reviews.freebsd.org/D8012 Reviewed by: kib, trasz MFC after: 1 week	2016-09-29 10:38:20 +00:00
Hiren Panchasara	7c9a4d09d6	Revert r306337. dhw@ reproted a panic which seems related to this and bde@ has raised some issues.	2016-09-26 15:45:30 +00:00
Eric van Gyzen	310ab671b8	Make no assertions about mutex state when the scheduler is stopped. This changes the assert path to match the lock and unlock paths. MFC after: 1 week Sponsored by: Dell EMC	2016-09-26 15:30:30 +00:00
Hiren Panchasara	41bb1a25a9	In sendit(), if mp->msg_control is present, then in sockargs() we are allocating mbuf to store mp->msg_control. Later in kern_sendit(), call to getsock_cap(), will check validity of file pointer passed, if this fails EBADF is returned but mbuf allocated in sockargs() is not freed. Fix this possible leak. Submitted by: Lohith Bellad <lohith.bellad@me.com> Reviewed by: adrian MFC after: 3 weeks Differential Revision: https://reviews.freebsd.org/D7910	2016-09-26 10:13:58 +00:00
Julian Elischer	1c8260b61d	Give the user a clue as to which process hit maxfiles. MFC after: 1 week Sponsored by: Panzura	2016-09-24 22:56:13 +00:00
Konstantin Belousov	939457e3e0	Add the foundation copyrights to procctl kernel sources. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2016-09-23 12:32:20 +00:00
Mariusz Zaborski	ad5e83dd3c	fd: fix up fget_cap If the kernel is not compiled with the CAPABILITIES kernel options fget_unlocked doesn't return the sequence number so fd_modify will always report modification, in that case we got infinity loop. Reported by: br Reviewed by: mjg Tested by: br, def	2016-09-23 08:13:46 +00:00
Mateusz Guzik	deffc4a026	fd: fix up fgetvp_rights after r306184 fget_cap_locked returns a referenced file, but the fgetvp_rights does not need it. Instead, due to the filedesc lock being held, it can ref the vnode after the file was looked up. Fix up fget_cap_locked to be consistent with other _locked helpers and not ref the file. This plugs a leak introduced in r306184. Pointy hat to: mjg, oshogbo	2016-09-23 06:51:46 +00:00
Mateusz Guzik	1d2541fd1a	cache: get rid of the global lock Add a table of vnode locks and use them along with bucketlocks to provide concurrent modification support. The approach taken is to preserve the current behaviour of the namecache and just lock all relevant parts before any changes are made. Lookups still require the relevant bucket to be locked. Discussed with: kib Tested by: pho	2016-09-23 04:45:11 +00:00
Gleb Smirnoff	a2d8f9d2fc	Fix regression from r297400, which truncates headers in case of low socket buffer and put a small optimization for low socket buffer case: - Do not hack uio_resid, and let m_uiotombuf() properly take care of it. This fixes truncation of headers at low buffer. - If headers ate all the space, jump right to the end of the cycle, to avoid doing single page I/O and allocating zero length mbuf. - Clear hdr_uio only if space is positive, which indicates that all uio was copied in. Reviewed by: pluknet, jtl, emax, rrs, lstewart, emax, gallatin, scottl	2016-09-22 20:34:44 +00:00
Ruslan Bukin	30f3bfe58e	Adjust the sopt_val pointer on bigendian systems (e.g. MIPS64EB). sooptcopyin() checks if size of data provided by user is <= than we can accept, else it strips down the size. On bigendian platforms we have to move pointer as well so we copy the actual data. Reviewed by: gnn Sponsored by: DARPA, AFRL Sponsored by: HEIF5 Differential Revision: https://reviews.freebsd.org/D7980	2016-09-22 12:41:53 +00:00
Mariusz Zaborski	6490bc6529	fd: simplify fgetvp_rights by using fget_cap_locked Reviewed by: mjg	2016-09-22 11:54:20 +00:00
Mariusz Zaborski	85b0f9de11	capsicum: propagate rights on accept(2) Descriptor returned by accept(2) should inherits capabilities rights from the listening socket. PR: 201052 Reviewed by: emaste, jonathan Discussed with: many Differential Revision: https://reviews.freebsd.org/D7724	2016-09-22 09:58:46 +00:00
Mark Johnston	bdaf6d6913	Regenerate syscall provider argument strings.	2016-09-22 04:50:03 +00:00
Mark Johnston	5a4dfc8d83	Annotate syscall provider pointer arguments with the "userland" keyword. This causes dtrace to automatically copyin arguments from userland, so one no longer has to explicitly use the copyin() action to do so. Moreover, copyin() on userland addresses is a no-op, so existing scripts should be unaffected by this change. Discussed with: rstone MFC after: 2 weeks	2016-09-22 04:49:31 +00:00
Konstantin Belousov	851194715d	Make resettodr_lock accessible outside subr_rtc.c. Protect CLOCK_GETTIME() with the lock. Now all time-related accesses to the CMOS for RTC should be under the lock. This is needed to allow upcoming EFI Runtime Services support to provide required execution environment for the firmware calls. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2016-09-21 10:15:08 +00:00
Konstantin Belousov	643f6f47fd	Add PROC_TRAPCAP procctl(2) controls and global sysctl kern.trap_enocap. Both can be used to cause processes in capability mode to receive SIGTRAP when ENOTCAPABLE or ECAPMODE errors are returned from syscalls. Idea by: emaste Reviewed by: oshogbo (previous version), emaste Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D7965	2016-09-21 08:23:33 +00:00
Edward Tomasz Napierala	e313b4dd95	Fix bug introduced with r302388, which could cause processes accessing automounted shares to hang with "vfs_busy" wchan. (As a workaround one can run 'automount -u' from cron.) Reviewed by: kib@ MFC after: 1 month	2016-09-21 05:44:13 +00:00
Sepherosa Ziehau	a5ec35dfee	Fix LINT building. Sponsored by: Microsoft	2016-09-18 07:37:00 +00:00
Ed Maste	69a2875821	Renumber license clauses in sys/kern to avoid skipping #3	2016-09-15 13:16:20 +00:00
Kevin Lo	c3bef61e58	Remove the 4.3BSD compatible macro m_copy(), use m_copym() instead. Reviewed by: gnn Differential Revision: https://reviews.freebsd.org/D7878	2016-09-15 07:41:48 +00:00
Mariusz Zaborski	6e70b4f058	fd: add fget_cap and fget_cap_locked primitives They can be used to obtain capabilities along with a referenced fp. Reviewed by: mjg@	2016-09-12 22:46:19 +00:00
John Baldwin	71499f6a2d	Make device_quiet() an attachment property. In particular, reset the DF_QUIET flag when detaching from a device so that a driver that marks a device quiet doesn't dictate policy for a different driver that may claim the device in the future. Reviewed by: rpokala, wblock MFC after: 2 weeks Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D7803	2016-09-12 18:06:42 +00:00
Mateusz Guzik	a27815330c	cache: improve scalability by introducing bucket locks An array of bucket locks is added. All modifications still require the global cache_lock to be held for writing. However, most readers only need the relevant bucket lock and in effect can run concurrently to the writer as long as they use a different lock. See the added comment for more details. This is an intermediate step towards removal of the global lock. Reviewed by: kib Tested by: pho	2016-09-10 16:29:53 +00:00
Konstantin Belousov	2e4fd101fa	Fix build	2016-09-10 09:00:12 +00:00
Jilles Tjoelker	d30e66e53a	wait: Do not copyout uninitialized status/rusage/wrusage. If wait4() or wait6() return 0 because of WNOHANG, the status, rusage and wrusage information should not be returned. PR: 212048 Reported by: Casey Lucas MFC after: 2 weeks	2016-09-09 21:58:48 +00:00
Mateusz Guzik	a0d45f0fc8	locks: add backoff for spin mutexes and thread lock Reviewed by: jhb	2016-09-09 19:13:02 +00:00
Ed Maste	82b3cec52b	ANSIfy uipc_syscalls.c Reviewed by: kib Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D7839	2016-09-09 17:40:26 +00:00
Ed Maste	e62264e2dd	Update capabilities.conf comment getdtablesize is per-process state, not global state	2016-09-08 14:04:04 +00:00
Kevin Lo	cee4a05669	In m_devget(), if the data fits in a packet header mbuf, check the amount of data is less than or equal to MHLEN instead of MLEN when placing initial small packet header at end of mbuf. Reviewed by: glebius MFC after: 3 days	2016-09-08 01:02:53 +00:00
Brooks Davis	ed6d876b19	Modernize the initalization of sigproptbl. Use C99 designators to set the value of each slot and the nitems macro to check for valid entries. In the process, switch to indexing by signal number rather than signal-1 for improved clarity. Obtained from: CheriBSD (a6053c5abf03a5f53bbfcdd3a26429383f67e09f) Sponsored by: DARPA, AFRL Reviewed by: kib	2016-09-06 22:03:53 +00:00
Mateusz Guzik	5b7d9ae2fd	cv: do a lockless check for no waiters in cv_signal and cv_broadcastpri In case of some consumers like zfs there are no waiters vast majority of the time Reviewed by: jhb MFC after: 1 week	2016-09-06 17:16:59 +00:00
Mateusz Guzik	591df14528	cache: defer freeing entries until after the global lock is dropped This also defers vdrop for held vnodes. Glanced at by: kib	2016-09-04 16:52:14 +00:00
Mateusz Guzik	31977b420a	cache: manage negative entry list with a dedicated lock Since negative entries are managed with a LRU list, a hit requires a modificaton. Currently the code tries to upgrade the global lock if needed and is forced to retry the lookup if it fails. Provide a dedicated lock for use when the cache is only shared-locked. Reviewed by: kib MFC after: 1 week	2016-09-04 08:58:35 +00:00
Mateusz Guzik	b9042ae1bf	cache: put all negative entry management code into dedicated functions Reviewed by: kib MFC after: 1 week	2016-09-04 08:55:15 +00:00
Mark Johnston	3da0f3c9ae	Micro-optimize sleepq_signal(). Lift a comparison out of the loop that finds the highest-priority thread on the queue. MFC after: 1 week	2016-09-04 00:29:48 +00:00
Brooks Davis	fd50a70770	Merge from CheriBSD: Rename sigprop-table constants to SIGPROP_ from SA_ to reduce the impression of a namespace collision. Submitted by: rwatson Reviewed by: jhb, kib (slightly different versions) Obtained from: CheriBSD (814ec5771cb1cb53deba317c561de62a91ae7684) Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D7616	2016-09-02 18:22:56 +00:00
Ed Maste	dd38731e09	allow kern.proc.nfds sysctl in capability mode Reviewed by: allanjude MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D7733	2016-09-01 02:51:50 +00:00
Patrick Kelsey	da2ded6575	_taskqueue_start_threads() now fails if it doesn't actually start any threads. Reviewed by: jhb MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D7701	2016-09-01 02:05:46 +00:00
Mark Johnston	99ab95db4d	Rename unp_dispose_so() to unp_dispose(). It implements the dom_dispose method for local socket domain, so its name should match the method name.	2016-08-31 21:48:22 +00:00
Ed Maste	bce38b9f35	Regnerate after r305140, getdtablesize in capability mode Sponsored by: The FreeBSD Foundation	2016-08-31 18:37:51 +00:00
Ed Maste	ca380195ab	Allow getdtablesize in capability mode getdtablesize is "trivial global state" and is similar to getrlimit(RLIMIT_NOFILE), so should be permitted in capability mode. Reviewed by: oshogbo MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D7719	2016-08-31 18:33:15 +00:00
Allan Jude	61bd7ae0ec	Eliminate unnecessary loop in _cap_check() Calling cap_rights_contains() several times with the same inputs is not going to produce a different output. The variable being iterated, i, is never used inside the for loop. The loop is actually done in cap_rights_contains() Submitted by: Ryan Moeller <ryan@freqlabs.com> Reviewed by: oshogbo, ed MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D7369	2016-08-31 17:52:11 +00:00
Nathan Whitehorn	09c697016b	Back out misfired extra file in r305108.	2016-08-31 04:03:55 +00:00
Nathan Whitehorn	c9a124dc9a	Refix operation on sparse CPU mappings as in r302372, temporarily broken by r304716. PR: kern/210106 MFC after: 2 days	2016-08-31 04:02:52 +00:00
Mateusz Guzik	4cbafea09c	fd: add fdeget_locked and use in kern_descrip	2016-08-30 21:53:22 +00:00
Bryan Drewery	533f3e1026	Reduce duplicated logic for !SMP Sponsored by: EMC / Isilon Storage Division	2016-08-30 19:26:07 +00:00
John Baldwin	e05ec081fe	Implement 'devctl clear driver' to undo a previous 'devctl set driver'. Add a new 'clear driver' command for devctl along with the accompanying ioctl and devctl_clear_driver() library routine to reset a device to use a wildcard devclass instead of a fixed devclass. This can be used to undo a previous 'set driver' command. After the device's name has been reset to permit wildcard names, it is reprobed so that it can attach to newly-available (to it) device drivers. MFC after: 1 month Sponsored by: Chelsio Communications	2016-08-29 22:48:36 +00:00
Mateusz Guzik	11d3ad2eab	vfs: provide a common exit point in namei for error cases This shortens the function, adds the SDT_PROBE use for error cases and consistenly unrefs rootdir last. Reviewed by: kib MFC after: 2 weeks	2016-08-27 22:43:41 +00:00
Konstantin Belousov	9ce60e28fd	Consistently delimit each vnode description block with two blank lines. Sponsored by: The FreeBSD Foundation MFC after: 3 days	2016-08-27 18:12:42 +00:00
Konstantin Belousov	0f2d97838d	In both do_rw_wrlock() and do_rw_rdlock() after r304808, do not obliterate possible error from sleep with errors from umtxq_check_susp(), when looping to clear URWLOCK_{READ,WRITE}_WAITERS. Noted and reviewed by: vangyzen Sponsored by: The FreeBSD Foundation MFC after: 1 week	2016-08-25 19:15:02 +00:00

... 2 3 4 5 6 ...

15377 Commits