freebsd-skq

Author	SHA1	Message	Date
Colin Percival	d5d7606c0c	Use the TSLOG framework to record entry/exit timestamps for DELAY and _vprintf; these functions are called in many places and can contribute meaningfully to the total time spent booting.	2017-12-31 09:24:41 +00:00
Colin Percival	49a4e3b4b4	Instrument thread creations for the the benefit of the TSLOG framework. This assists in tracking time spent while the boot is being "held" waiting for something to happen.	2017-12-31 09:24:11 +00:00
Colin Percival	8b8a7c43a9	Instrument "boot holds" for the benefit of the TSLOG framework. These are places where the "main thread" of the booting kernel (either the thread which later becomes swapper or the thread which later becomes init) has to stop and wait for action to take place in another thread before continuing. There are currently three such holds: 1. The intr_config_hooks SYSINIT waits for hooks registered via the config_intrhook_establish function; this allows (typically) devices which need interrupts enabled to complete their initialization to do so before root is mounted. 2. The g_waitidle function waits for the GEOM event queue to be empty; this ensures that all of the disks which have been attached have been tasted before we attempt to mount root. 3. The vfs_mountroot_wait function (in addition to calling g_waitidle) waits for holds registered via root_mount_hold; among other things, this is used by the USB subsystem to ensure that we don't fail to mount root if it's located on a USB disk which takes a while to probe.	2017-12-31 09:23:52 +00:00
Colin Percival	a21a2da599	Teach makeobjops.awk to accept PROLOG and EPILOG blocks before METHOD and STATICMETHOD declarations; that code will be inserted into the dispatch function before and after the method call. Use this functionality and the TSLOG framework to record DEVICE_ATTACH and DEVICE_PROBE entry/exit timestamps.	2017-12-31 09:23:19 +00:00
Colin Percival	6032e08810	Use the TSLOG framework to record entry/exit timestamps for machine independent functions with important roles in the early boot process: mi_startup (with the "exit" recorded when it becomes swapper), start_init (with the "exit" recorded when the thread is about to "return" into the newly created init process), vfs_mountroot, and vfs_mountroot_wait.	2017-12-31 09:22:31 +00:00
Colin Percival	e31e71991a	Code for recording timestamps of events, especially function entries/exits. This is a very primitive system, intended for use in measuring performance during the early system boot, before more sophisticated tools like DTrace or infrastructure like kernel memory allocation and mutexes are available. Because this code records pointers to strings rather than copying strings (in order to keep the memory usage more manageable), if a kernel module is unloaded after logging an event, Bad Things can happen. Users are advised to not do that. Since cycle counts from the early kernel boot are used as an initial entropy source, publishing this information to userland could result in inadequate entropy being kept private to the kernel RNG. Users are advised to not enable this on systems with untrusted users. Discussed on: freebsd-current	2017-12-31 09:21:01 +00:00
Pedro F. Giffuni	0879ca728a	sysv_{ipc\|shm}: update the NetBSD VCS tags to match nearer our files. Both files originated in NetBSD: sysv_ipc.c CVS 1.9: Most of their changes don't apply to us as we already have similar changes. This is a better reference for future merges. sysv_shm.c CVS 1.39: Most of their changes don't apply to our code but interestingly this revision merged our changes and is a better point for reference. Move the VCS tags to the position recommended in our committers guide (section 8), No functional change.	2017-12-31 03:34:00 +00:00
Mateusz Guzik	efa9f177f5	locks: adjust loop limit check when waiting for readers The check was for the exact value, but since the counter started being incremented by the number of readers it could have jumped over.	2017-12-31 02:31:01 +00:00
Mateusz Guzik	cde25ed4cd	sx: fix up non-smp compilation after r327397	2017-12-31 01:59:56 +00:00
Mateusz Guzik	28f1a9e3ff	locks: re-check the reason to go to sleep after locking sleepq/turnstile In both rw and sx locks we always go to sleep if the lock owner is not running. We do spin for some time if the lock is read-locked. However, if we decide to go to sleep due to the lock owner being off cpu and after sleepq/turnstile gets acquired the lock is read-locked, we should fallback to the aforementioned wait.	2017-12-31 00:47:04 +00:00
Mateusz Guzik	fb10612355	sx: read the SX_NOADAPTIVE flag and Giant ownership only once These used to be read multiple times when waiting for the lock the become free, which had the potential to issue completely avoidable traffic.	2017-12-31 00:37:50 +00:00
Mateusz Guzik	15140a8ade	mtx: deduplicate indefinite wait check in spinlocks and thread lock	2017-12-31 00:34:29 +00:00
Mateusz Guzik	1f4d28c7ea	mtx: pre-read the lock value in thread_lock_flags_ Since this function is effectively slow path, if we get here the lock is most likely already taken in which case it is cheaper to not blindly attempt the atomic op. While here move hwpmc probe out of the loop to match other primitives.	2017-12-31 00:33:28 +00:00
Mateusz Guzik	80c39f6c37	rwlock: tidy up __rw_runlock_hard similarly to r325921	2017-12-31 00:31:14 +00:00
Konstantin Belousov	baaa79699a	Make kern_proc_vmmap_resident() externally accesible, and move the vmmap_skip_res_cnt control check inside it. Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D13595	2017-12-28 13:16:32 +00:00
Eitan Adler	caa7e52f3f	kernel: Fix several typos and minor errors - duplicate words - typos - references to old versions of FreeBSD Reviewed by: imp, benno	2017-12-27 03:23:21 +00:00
Alexander Kabaev	151ba7933a	Do pass removing some write-only variables from the kernel. This reduces noise when kernel is compiled by newer GCC versions, such as one used by external toolchain ports. Reviewed by: kib, andrew(sys/arm and sys/arm64), emaste(partial), erj(partial) Reviewed by: jhb (sys/dev/pci/* sys/kern/vfs_aio.c and sys/kern/kern_synch.c) Differential Revision: https://reviews.freebsd.org/D10385	2017-12-25 04:48:39 +00:00
Alexander Kabaev	6d41588b6b	Reverse the check to allocate the buffer if cached pointer is NULL. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D13596	2017-12-23 17:55:19 +00:00
Alexander Kabaev	4daa09f343	Remove dead store to local variable.	2017-12-23 16:49:57 +00:00
Bruce Evans	da9fba5447	Use resume_cpus() instead of restart_cpus() to resume from ACPI suspension. restart_cpus() worked well enough by accident. Before this set of fixes, resume_cpus() used the same cpuset (started_cpus, meaning CPUs directed to restart) as restart_cpus(). resume_cpus() waited for the wrong cpuset (stopped_cpus) to become empty, but since mixtures of stopped and suspended CPUs are not close to working, stopped_cpus must be empty when resuming so the wait is null -- restart_cpus just allows the other CPUs to restart and returns without waiting. Fix resume_cpus() to wait on a non-wrong cpuset for the ACPI case, and add further kludges to try to keep it working for the XEN case. It was only used for XEN. It waited on suspended_cpus. This works for XEN. However, for ACPI, resuming is a 2-step process. ACPI has already woken up the other CPUs and removed them from suspended_cpus. This fix records the move by putting them in a new cpuset resuming_cpus. Waiting on suspended_cpus would give the same null wait as waiting on stopped_cpus. Wait on resuming_cpus instead. Add a cpuset toresume_cpus to map the CPUs being told to resume to keep this separate from the cpuset started_cpus for mapping the CPUs being told to restart. Mixtures of stopped and suspended/resuming CPUs are still far from working. Describe new and some old cpusets in comments. Add further kludges to cpususpend_handler() to try to avoid breaking it for XEN. XEN doesn't use resumectx(), so it doesn't use the second return path for savectx(), and it goes from the suspended state directly to the restarted state, while ACPI resume goes through the resuming state. Enter the resuming state early for all cases so that resume_cpus can test for being in this state and not have to worry about the intermediate !suspended state for ACPI only. Reviewed by: kib	2017-12-21 09:17:48 +00:00
John Baldwin	b501cc5da6	Rework pathconf handling for FIFOs. On the one hand, FIFOs should respect other variables not supported by the fifofs vnode operation (such as _PC_NAME_MAX, _PC_LINK_MAX, etc.). These values are fs-specific and must come from a fs-specific method. On the other hand, filesystems that support FIFOs are required to support _PC_PIPE_BUF on directory vnodes that can contain FIFOs. Given this latter requirement, once the fs-specific VOP_PATHCONF method supports _PC_PIPE_BUF for directories, it is also suitable for FIFOs permitting a single VOP_PATHCONF method to be used for both FIFOs and non-FIFOs. To that end, retire all of the FIFO-specific pathconf methods from filesystems and change FIFO-specific vnode operation switches to use the existing fs-specific VOP_PATHCONF method. For fifofs, set it's VOP_PATHCONF to VOP_PANIC since it should no longer be used. While here, move _PC_PIPE_BUF handling out of vop_stdpathconf() so that only filesystems supporting FIFOs will report a value. In addition, only report a valid _PC_PIPE_BUF for directories and FIFOs. Discussed with: bde Reviewed by: kib (part of a larger patch) MFC after: 1 month Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D12572	2017-12-19 22:39:05 +00:00
John Baldwin	599afe53a8	Move NAME_MAX, LINK_MAX, and CHOWN_RESTRICTED out of vop_stdpathconf(). Having all filesystems fall through to default values isn't always correct and these values can vary for different filesystem implementations. Most of these changes just use the existing default values with a few exceptions: - Don't report CHOWN_RESTRICTED for ZFS since it doesn't do the exact permissions check this claims for chown(). - Use NANDFS_NAME_LEN for NAME_MAX for nandfs. - Don't report a LINK_MAX of 0 on smbfs. Now fail with EINVAL to indicate hard links aren't supported. Requested by: bde (though perhaps not this exact implementation) Reviewed by: kib (earlier version) MFC after: 1 month Sponsored by: Chelsio Communications	2017-12-19 19:51:36 +00:00
John Baldwin	dd688800e1	Add a custom VOP_PATHCONF method for fdescfs. The method handles NAME_MAX and LINK_MAX explicitly. For all other pathconf variables, the method passes the request down to the underlying file descriptor. This requires splitting a kern_fpathconf() syscallsubr routine out of sys_fpathconf(). Also, to avoid lock order reversals with vnode locks, the fdescfs vnode is unlocked around the call to kern_fpathconf(), but with the usecount of the vnode bumped. MFC after: 1 month Sponsored by: Chelsio Communications	2017-12-19 18:20:38 +00:00
Konstantin Belousov	6f697994fd	Use atomic_load(9) to read ppsinfo sequence numbers. In this case volatile qualifiers enusre that a compiler does not optimize the accesses out. Reviewed by: alc, jhb Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D13534	2017-12-19 10:05:45 +00:00
Pedro F. Giffuni	62cf53fdac	SPDX: some uses of the RSA-MD license.	2017-12-13 16:30:39 +00:00
Fedor Uporov	4ba058c0cf	Fix kernel build if MAC is not defined. Reported by: Ravi Pokala, Andrew Turner Approved by: pfg (mentor) MFC after: 1 week	2017-12-13 16:14:38 +00:00
Fedor Uporov	61b214f338	Move buffer size checks outside of the vnode locks. Reviewed by: kib, cem, pfg (mentor) Approved by: pfg (mentor) MFC after: 1 weeks Differential Revision: https://reviews.freebsd.org/D13405	2017-12-12 20:15:57 +00:00
Bruce Evans	fb3cc1c37d	Move instantiation of msgbufp from 9 MD files to subr_prf.c. This variable should be pure MI except possibly for reading it in MD dump routines. Its initialization was pure MD in 4.4BSD, but FreeBSD changed this in r36441 in 1998. There were many imperfections in r36441. This commit fixes only a small one, to simplify fixing the others 1 arch at a time. (r47678 added support for special/early/multiple message buffer initialization which I want in a more general form, but this was too fragile to use because hacking on the msgbufp global corrupted it, and was only used for 5 hours in -current...)	2017-12-07 07:55:38 +00:00
Mark Johnston	e1703ef5ae	Plug a name cache lock leak. Reviewed by: mjg MFC after: 1 week Sponsored by: Dell EMC Isilon	2017-12-01 22:51:02 +00:00
Konstantin Belousov	36bce27be9	Destroy seltd st_mtx and st_wait in seltdfini(). A correct destruction is important for WITNESS(4) and LOCK_PROFILING(9). Submitted by: Sebastian Huber <sebastian.huber@embedded-brains.de> MFC after: 1 week	2017-12-01 11:18:19 +00:00
Pedro F. Giffuni	64de3fdd58	SPDX: use the Beerware identifier.	2017-11-30 20:33:45 +00:00
Hans Petter Selasky	1408b84a26	The sched_add() function is not only used when the thread is initially started, but also by the turnstiles to mark a thread as runnable for all locks, for instance sleepqueues do: setrunnable()->sched_wakeup()->sched_add() In r326218 code was added to allow booting from non-zero CPU numbers by setting the ts_cpu field inside the ULE scheduler's sched_add() function. This had an undesired side-effect that prior sched_pin() and sched_bind() calls got disregarded. This patch fixes the initialization of the ts_cpu field for the ULE scheduler to only happen once when the initial thread is constructed during system init. Forking will then later on ensure that a valid ts_cpu value gets copied to all children. Reviewed by: jhb, kib Discussed with: nwhitehorn MFC after: 1 month Differential revision: https://reviews.freebsd.org/D13298 Sponsored by: Mellanox Technologies	2017-11-29 23:28:40 +00:00
Alexey Dokuchaev	2c9ec07528	Fix several noticed style issues. Reviewed by: bde Approved by: bapt	2017-11-29 12:49:22 +00:00
Jeff Roberson	2e47807c21	Eliminate kmem_arena and kmem_object in preparation for further NUMA commits. The arena argument to kmem_*() is now only used in an assert. A follow-up commit will remove the argument altogether before we freeze the API for the next release. This replaces the hard limit on kmem size with a soft limit imposed by UMA. When the soft limit is exceeded we periodically wakeup the UMA reclaim thread to attempt to shrink KVA. On 32bit architectures this should behave much more gracefully as we exhaust KVA. On 64bit the limits are likely never hit. Reviewed by: markj, kib (some objections) Discussed with: alc Tested by: pho Sponsored by: Netflix / Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D13187	2017-11-28 23:40:54 +00:00
Brooks Davis	5cd667e65f	Disable vim syntax highlighting. Vim's default pick doesn't understand that ';' is a comment character and the result looks horrible. Reviewed by: emaste	2017-11-28 18:23:17 +00:00
Edward Tomasz Napierala	212ff84f4a	Make kdb_reenter() silent when explicitly called from db_error(). This removes the useless backtrace on various ddb(4) user errors. Reviewed by: jhb@ Obtained from: CheriBSD MFC after: 2 weeks Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D13212	2017-11-28 12:53:55 +00:00
Nathan Whitehorn	51de47e3f8	Remove assertion that a CPU be present before returning a PCPU for it. It is up to the caller to check for a NULL return value. The assert was meant to catch buggy code that did not check the return value. Some code, however, was smart and used the return value to see if a CPU existed, which this broke. Requested by: jhb@	2017-11-28 05:39:48 +00:00
Pedro F. Giffuni	8a36da99de	sys/kern: adoption of SPDX licensing ID tags. Mainly focus on files that use BSD 2-Clause license, however the tool I was using misidentified many licenses so this was mostly a manual - error prone - task. The Software Package Data Exchange (SPDX) group provides a specification to make it easier for automated tools to detect and summarize well known opensource licenses. We are gradually adopting the specification, noting that the tags are considered only advisory and do not, in any way, superceed or replace the license texts.	2017-11-27 15:20:12 +00:00
Mateusz Guzik	e57b2b1830	rw: fix runlock_hard when new readers show up When waiters/writer spinner flags are set no new readers can show up unless they already have a different rw rock read locked. The change in r326195 failed to take that into account - in presence of new readers it would spin until they all drain, which would be lead to trouble if e.g. they go off cpu and can get scheduled because of this thread. Reported by: pho	2017-11-26 21:10:47 +00:00
Nathan Whitehorn	efe67753cc	Remove some, but not all, assumptions that the BSP is CPU 0 and that CPUs are numbered densely from there to n_cpus. MFC after: 1 month	2017-11-25 23:41:05 +00:00
Mateusz Guzik	2c50bafef5	Add the missing lockstat check for thread lock.	2017-11-25 20:49:27 +00:00
Mateusz Guzik	5ba6facfcd	rwlock: fix up compilation of the previous change commmitted wrong version of the patch	2017-11-25 20:25:45 +00:00
Mateusz Guzik	c1e1a7ec30	rwlock: add __rw_try_{r,w}lock_int	2017-11-25 20:22:51 +00:00
Mateusz Guzik	cec1747322	sx: change sunlock to wake waiters up if it locked sleepq sleepq is only locked if the curhtread is the last reader. By the time the lock gets acquired new ones could have arrived. The previous code would unlock and loop back. This results spurious relocking of sleepq. This is a step towards xadd-based unlock routine.	2017-11-25 20:13:50 +00:00
Mateusz Guzik	93118b62f9	locks: retry turnstile/sleepq loops on failed cmpset In order to go to sleep threads set waiter flags, but that can spuriously fail e.g. when a new reader arrives. Instead of unlocking everything and looping back, re-evaluate the new state while still holding the lock necessary to go to sleep.	2017-11-25 20:10:33 +00:00
Mateusz Guzik	2e106e0427	rwlock: stop re-reading the owner when going to sleep	2017-11-25 20:08:11 +00:00
John Baldwin	ffb6607984	Decode kevent structures logged via ktrace(2) in kdump. - Add a new KTR_STRUCT_ARRAY ktrace record type which dumps an array of structures. The structure name in the record payload is preceded by a size_t containing the size of the individual structures. Use this to replace the previous code that dumped the kevent arrays dumped for kevent(). kdump is now able to decode the kevent structures rather than dumping their contents via a hexdump. One change from before is that the 'changes' and 'events' arrays are not marked with separate 'read' and 'write' annotations in kdump output. Instead, the first array is the 'changes' array, and the second array (only present if kevent doesn't fail with an error) is the 'events' array. For kevent(), empty arrays are denoted by an entry with an array containing zero entries rather than no record. - Move kevent decoding tables from truss to libsysdecode. This adds three new functions to decode members of struct kevent: sysdecode_kevent_filter, sysdecode_kevent_flags, and sysdecode_kevent_fflags. kdump uses these helper functions to pretty-print kevent fields. - Move structure definitions for freebsd11 and freebsd32 kevent structures to <sys/event.h> so that they can be shared with userland. The 32-bit structures are only exposed if _WANT_KEVENT32 is defined. The freebsd11 structures are only exposed if _WANT_FREEBSD11_KEVENT is defined. The 32-bit freebsd11 structure requires both. - Decode freebsd11 kevent structures in truss for the compat11.kevent() system call. - Log 32-bit kevent structures via ktrace for 32-bit compat kevent() system calls. - While here, constify the 'void *data' argument to ktrstruct(). Reviewed by: kib (earlier version) MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D12470	2017-11-25 04:49:12 +00:00
Mark Johnston	dbe4541db2	Have lockstat:::sx-release fire only after the lock state has changed. MFC after: 1 week	2017-11-24 19:04:31 +00:00
Mark Johnston	26d94f99af	Add a missing lockstat:::sx-downgrade probe. We were returning without firing the probe when the lock had no shared waiters. MFC after: 1 week	2017-11-24 19:02:06 +00:00
Ed Schouten	814629dd64	Don't let cpu_set_syscall_retval() clobber exec_setregs(). Upon successful completion, the execve() system call invokes exec_setregs() to initialize the registers of the initial thread of the newly executed process. What is weird is that when execve() returns, it still goes through the normal system call return path, clobbering the registers with the system call's return value (td->td_retval). Though this doesn't seem to be problematic for x86 most of the times (as the value of eax/rax doesn't matter upon startup), this can be pretty frustrating for architectures where function argument and return registers overlap (e.g., ARM). On these systems, exec_setregs() also needs to initialize td_retval. Even worse are architectures where cpu_set_syscall_retval() sets registers to values not derived from td_retval. On these architectures, there is no way cpu_set_syscall_retval() can set registers to the way it wants them to be upon the start of execution. To get rid of this madness, let sys_execve() return EJUSTRETURN. This will cause cpu_set_syscall_retval() to leave registers intact. This makes process execution easier to understand. It also eliminates the difference between execution of the initial process and successive ones. The initial call to sys_execve() is not performed through a system call context. Reviewed by: kib, jhibbits Differential Revision: https://reviews.freebsd.org/D13180	2017-11-24 07:35:08 +00:00

1 2 3 4 5 ...

15772 Commits