freebsd-skq

Author	SHA1	Message	Date
mjg	048a894ebc	vfs: flatten vop vectors This eliminates the following loop from all VOP calls: while(vop != NULL && \ vop->vop_spare2 == NULL && vop->vop_bypass == NULL) vop = vop->vop_default; Reviewed by: jeff Tesetd by: pho Differential Revision: https://reviews.freebsd.org/D22738	2019-12-16 00:06:22 +00:00
mjg	dafe5ecb83	mtx: eliminate recursion support from thread lock Now that it is not used after schedlock changes got merged. Note the unlock routine temporarily still checks for it on account of just using regular spin unlock. This is a prelude towards a general clean up.	2019-12-16 00:04:33 +00:00
jeff	506c867c6e	schedlock 4/4 Don't hold the scheduler lock while doing context switches. Instead we unlock after selecting the new thread and switch within a spinlock section leaving interrupts and preemption disabled to prevent local concurrency. This means that mi_switch() is entered with the thread locked but returns without. This dramatically simplifies scheduler locking because we will not hold the schedlock while spinning on blocked lock in switch. This change has not been made to 4BSD but in principle it would be more straightforward. Discussed with: markj Reviewed by: kib Tested by: pho Differential Revision: https://reviews.freebsd.org/D22778	2019-12-15 21:26:50 +00:00
jeff	d71b815393	schedlock 3/4 Eliminate lock recursion from turnstiles. This was simply used to avoid tracking the top-level turnstile lock. explicitly check for it before picking up and dropping locks. Reviewed by: kib Tested by: pho Differential Revision: https://reviews.freebsd.org/D22746	2019-12-15 21:19:41 +00:00
jeff	7431380c94	schedlock 2/4 Do all sleepqueue post-processing in sleepq_remove_thread() so that we do not require the thread lock after a context switch. Reviewed by: jhb, kib Differential Revision: https://reviews.freebsd.org/D22745	2019-12-15 21:18:07 +00:00
ian	80d5f0620c	Rewrite arm kernel stack unwind code to work when unwinding through modules. The arm kernel stack unwinder has apparently never been able to unwind when the path of execution leads through a kernel module. There was code that tried to handle modules by looking for the unwind data in them, but it did so by trying to find symbols which have never existed in arm kernel modules. That caused the unwind code to panic, and because part of panic handling calls into the unwind code, that just created a recursion loop. Locating the unwind data in a loaded module requires accessing the Elf section headers to find the SHT_ARM_EXIDX section. For preloaded modules those headers are present in a metadata blob. For dynamically loaded modules, the headers are present only while the loading is in progress; the memory is freed once the module is ready to use. For that reason, there is new code in kern/link_elf.c, wrapped in #ifdef __arm__, to extract the unwind info while the headers are loaded. The values are saved into new fields in the linker_file structure which are also conditional on __arm__. In arm/unwind.c there is new code to locally cache the per-module info needed to find the unwind tables. The local cache is crafted for lockless read access, because the unwind code often needs to run in context where sleeping is not allowed. A large comment block describes the local cache list, so I won't repeat it all here.	2019-12-15 21:16:35 +00:00
jeff	bf925a1e49	schedlock 1/4 Eliminate recursion from most thread_lock consumers. Return from sched_add() without the thread_lock held. This eliminates unnecessary atomics and lock word loads as well as reducing the hold time for scheduler locks. This will eventually allow for lockless remote adds. Discussed with: kib Reviewed by: jhb Tested by: pho Differential Revision: https://reviews.freebsd.org/D22626	2019-12-15 21:11:15 +00:00
jeff	671ed4e43f	Fix a mistake in r355765. We need to activate the page if it is not yet on a pagequeue. Reported by: pho	2019-12-15 06:26:47 +00:00
jeff	011da14d39	Add a deferred free mechanism for freeing swap space that does not require an exclusive object lock. Previously swap space was freed on a best effort basis when a page that had valid swap was dirtied, thus invalidating the swap copy. This may be done inconsistently and requires the object lock which is not always convenient. Instead, track when swap space is present. The first dirty is responsible for deleting space or setting PGA_SWAP_FREE which will trigger background scans to free the swap space. Simplify the locking in vm_fault_dirty() now that we can reliably identify the first dirty. Discussed with: alc, kib, markj Differential Revision: https://reviews.freebsd.org/D22654	2019-12-15 03:15:06 +00:00
jeff	ed81eeddcf	Handle pagein clustering in vm_page_grab_valid() so that it can be used by exec_map_first_page(). This will also enable pagein clustering for other interested consumers (tmpfs, md, etc). Discussed with: alc Approved by: kib Differential Revision: https://reviews.freebsd.org/D22731	2019-12-15 02:00:32 +00:00
dougm	e85455eaae	Simplify the processing a leaf mask to find big-enough ranges of set bits, by storing and modifying the complement of the original leaf mask, and by avoiding some unnecessary intermediate variables in computing the shift amounts. The logic is similar to what has recently been committed to sys/sys/bitstring.h. Compute better hint updates for the case when the cursor starts in mid-leaf, and eliminates some otherwise viable solutions. Assume the worst case, that all the eliminated offsets could have been solutions, and you can still compute a better hint than we use now. Eliminate some unnecessary conditional control flow. Approved by: alc Tested by: pho Differential Revision: https://reviews.freebsd.org/D22666	2019-12-14 19:44:42 +00:00
mjg	778235e28f	Remove the useless return value from proc_set_cred	2019-12-14 00:43:17 +00:00
jhb	9f5deb0c9b	Remove the deprecated timeout(9) interface. All in-tree consumers have been converted to callout(9). Reviewed by: kib, markj Differential Revision: https://reviews.freebsd.org/D22602	2019-12-13 21:03:12 +00:00
imp	3e4227506d	Create new wrapper function: bus_delayed_attach_children() Delay the attachment of children, when requested, until after interrutps are running. This is often needed to allow children to run transactions on i2c or spi busses. It's a common enough idiom that it will be useful to have its own wrapper. Reviewed by: ian Differential Revision: https://reviews.freebsd.org/D21465	2019-12-13 19:39:33 +00:00
jhb	a366bebd40	Use callout(9) instead of deprecated timeout(9) for fail points. Allocate the callout structure on-demand from fail_point_use_timeout_path() since most fail points do not use timeouts. Reviewed by: markj (earlier version), cem Differential Revision: https://reviews.freebsd.org/D22599	2019-12-13 19:26:04 +00:00
trasz	afefb77b29	Add kern_kill() and use it in Linuxulator. It's just a cleanup, no functional changes. Reviewed by: kib MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D22645	2019-12-13 18:44:02 +00:00
trasz	787237cc3f	Add kern_getsid() and use it in Linuxulator; no functional changes. Reviewed by: kib MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D22647	2019-12-13 18:39:36 +00:00
rlibby	dbf795e374	bitset: rename confusing macro NAND to ANDNOT s/BIT_NAND/BIT_ANDNOT/, and for CPU and DOMAINSET too. The actual implementation is "and not" (or "but not"), i.e. A but not B. Fortunately this does appear to be what all existing callers want. Don't supply a NAND (not (A and B)) operation at this time. Discussed with: jeff Reviewed by: cem Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D22791	2019-12-13 09:32:16 +00:00
cem	d335535e7a	kern/subr_unit: Rip srandomdev, random(3) out of dead code The simulation cannot be reproduced, so the value of using a deterministic PRNG like random(3) is dubious. The number of repitions used in the sample isn't a problem for the Chacha implementation of arc4random we have today. (Also, no one actually runs this code; it was provided as an example of the work the author did validating the implementation. It's not even test code.)	2019-12-13 04:48:20 +00:00
rmacklem	902c2ec05a	r355677 requires that vop_stdioctl() be global so it can be called from NFS. r355677 modified the NFS client so that it does lseek(SEEK_DATA/SEEK_HOLE) for NFSv4.2, but calls vop_stdioctl() otherwise. As such, vop_stdioctl() needs to be a global function. Missed during the code merge for r355677.	2019-12-13 00:14:12 +00:00
trasz	0a02818a79	Add kern_sync(9), and make kernel code call it instead of going via sys_sync(2). Minor cleanup, no functional changes. Reviewed by: kib MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D19366	2019-12-12 18:45:31 +00:00
markj	3d88440770	Rename tdq_ipipending and clear it in sched_switch(). This fixes a regression after r355311. Specifically, sched_preempt() may trigger a context switch by calling thread_lock(), since thread_lock() calls critical_exit() in its slow path and the interrupted thread may have already been marked for preemption. This would happen before tdq_ipipending is cleared, blocking further preemption IPIs. The CPU can be left in this state indefinitely if the interrupted thread migrates. Rename tdq_ipipending to tdq_owepreempt. Any switch satisfies a remote preemption request, so clear tdq_owepreempt in sched_switch() instead of sched_preempt() to avoid subtle problems of the sort described above. Reviewed by: jeff, kib Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D22758	2019-12-12 02:43:24 +00:00
mjg	656673caeb	vfs: locking primitives which elide ->v_vnlock and shared locking disablement Both of these features are not needed by many consumers and result in avoidable reads which in turn puts them on profiles due to cache-line ping ponging. On top of that the current lockgmr entry point is slower than necessary single-threaded. As an attempted clean up preparing for other changes, provide new routines which don't support any of the aforementioned features. With these patches in place vop_stdlock and vop_stdunlock disappear from flamegraphs during -j 104 buildkernel. Reviewed by: jeff (previous version) Tested by: pho Differential Revision: https://reviews.freebsd.org/D22665	2019-12-11 23:11:21 +00:00
mjg	9c60b86beb	fd: static-ize and devolatile openfiles Almost all access is using atomics. The only read is sysctl which should use a whole-int-at-a-time friendly read internally.	2019-12-11 23:09:12 +00:00
avg	e6a7e77046	add a sanity check to the system call registration code A system call number should be at least reserved. We do not expect an attempt to register a fixed number system call when nothing at all is known about it. MFC after: 3 weeks Sponsored by: Panzura	2019-12-11 15:52:29 +00:00
jhb	4c2c5f1a8a	Add a callout_func_t typedef for functions used with callout_*(). This typedef is the same as timeout_t except that it is in the callout namespace and header. Use this typedef in various places of the callout implementation that were either using the raw type or timeout_t. While here, add <sys/callout.h> to the manpage. Reviewed by: kib, imp MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D22751	2019-12-10 21:58:30 +00:00
mjg	c023a15140	vfs: refactor vhold and vdrop No fuctional changes.	2019-12-10 00:08:05 +00:00
jhb	4459aedbdc	Copy out aux args after the argument and environment vectors. Partially revert r354741 and r354754 and go back to allocating a fixed-size chunk of stack space for the auxiliary vector. Keep sv_copyout_auxargs but change it to accept the address at the end of the environment vector as an input stack address and no longer allocate room on the stack. It is now called at the end of copyout_strings after the argv and environment vectors have been copied out. This should fix a regression in r354754 that broke the stack alignment for newer Linux amd64 binaries (and probably broke Linux arm64 as well). Reviewed by: kib Tested on: amd64 (native, linux64 (only linux-base-c7), and i386) Sponsored by: DARPA Differential Revision: https://reviews.freebsd.org/D22695	2019-12-09 19:17:28 +00:00
mjg	bcfa67ab8b	vfs: introduce v_irflag and make v_type smaller The current vnode layout is not smp-friendly by having frequently read data avoidably sharing cachelines with very frequently modified fields. In particular v_iflag inspected for VI_DOOMED can be found in the same line with v_usecount. Instead make it available in the same cacheline as the v_op, v_data and v_type which all get read all the time. v_type is avoidably 4 bytes while the necessary data will easily fit in 1. Shrinking it frees up 3 bytes, 2 of which get used here to introduce a new flag field with a new value: VIRF_DOOMED. Reviewed by: kib, jeff Differential Revision: https://reviews.freebsd.org/D22715	2019-12-08 21:30:04 +00:00
mjg	4b9989aca8	vfs: clean up vputx a little 1. replace hand-rolled macros for operation type with enum 2. unlock the vnode in vput itself, there is no need to branch on it. existence of VPUTX_VPUT remains significant in that the inactive variant adds LK_NOWAIT to locking request. 3. remove the useless v_usecount assertion. few lines above the checks if v_usecount > 0 and leaves. should the value be negative, refcount would fail. 4. the CTR return vnode %p to the freelist is incorrect as vdrop may find the vnode with holdcnt > 1. if the like should exist, it should be moved there 5. no need to error = 0 for everyone Reviewed by: kib, jeff (previous version) Differential Revision: https://reviews.freebsd.org/D22718	2019-12-08 21:13:07 +00:00
mjg	872f296f3c	vfs: factor out vnode destruction out of vdrop Sponsored by: The FreeBSD Foundation	2019-12-08 21:11:25 +00:00
jeff	389afb1898	Handle multiple clock interrupts simultaneously in sched_clock(). Reviewed by: kib, markj, mav Differential Revision: https://reviews.freebsd.org/D22625	2019-12-08 01:17:38 +00:00
kib	5f45f7a6f5	Only return EPERM from kill(-pid) when no process was signalled. As mandated by POSIX. Also clarify the kill(2) manpage. While there, restructure the code in killpg1() to use helper which keeps overall state of the process list iteration in the killpg1_ctx structued, later used to infer the error returned. Reported by: amdmi3 Reviewed by: jilles Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D22621	2019-12-07 18:07:49 +00:00
mjg	0a3ea4b564	vfs: clean up delmntque similarly to vdrop r355414	2019-12-07 12:56:24 +00:00
mjg	818ef82e15	vfs: catch vn_printf up with reality - add the missing VV_VMSIZEVNLOCK and VV_READLINK flags - add decoding v_mflag While here sort flags.	2019-12-07 12:55:58 +00:00
brooks	dfa2e15cbe	sysent: Reduce duplication and improve readability. Use the power of variable to avoid spelling out source and generated files too many times. The previous Makefiles were hard to read, hard to edit, and badly formatted. Reviewed by: kevans, emaste Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D22714	2019-12-06 23:59:23 +00:00
mav	30af7c4d0c	Make devstat_end_transaction_bio() count BIO_ORDERED. MFC after: 2 weeks	2019-12-06 18:39:05 +00:00
bz	b42975a154	Improve EPOCH_TRACE Two changes to EPOCH_TRACE: (1) add a sysctl to surpress the backtrace from epoch_trace_report(). Sometimes the log line for the recursion is enough and the backtrace massively spams the console. (2) In order to be able to go without the backtrace do not only print where the previous occurance happened, but also where the current one happens. That way we have file:line information for both and can look at them without the need for getting line numbers from backtrace and a debugging tool. Reviewed by: glebius Sponsored by: Netflix (originally) Differential Revision: https://reviews.freebsd.org/D22641	2019-12-06 16:34:04 +00:00
mjg	b72734b537	sx: check for SX_LOCK_SHARED \| SX_LOCK_WRITE_SPINNER when exclusive-locking First, this removes a spurious difference compared to rw locks. More importantly though this avoids a trip through sleepq code if the lock happens to be caught in this state.	2019-12-05 13:43:44 +00:00
mjg	3e04f4b855	vfs: remove 'active' variable from _vdrop No functional changes.	2019-12-05 13:40:10 +00:00
mav	0e1fa50f0d	Mark some more hot global variables with __read_mostly. MFC after: 1 week	2019-12-04 21:26:03 +00:00
rlibby	b95a02bb86	mbuf zones: take out the trash The mbuf zones were explicitly specifying the uma trash procedures on zcreate, conditionally on INVARIANTS, because that used to be necessary in order to get use-after-free checking for uma zones with non-empty constructors or destructors. After r355137 uma automatically invokes the trash constructor and destructor as long as no init and fini are specified. This now allows the mbuf zones to pass their constructors and destructors without needing to add on the uma trash procedures conditionally. Reviewed by: cem, jhb, markj Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D22583	2019-12-04 18:21:29 +00:00
jhb	0d8d23a6a3	Use uintptr_t instead of register_t * for the stack base. - Use ustringp for the location of the argv and environment strings and allow destp to travel further down the stack for the stackgap and auxv regions. - Update the Linux copyout_strings variants to move destp down the stack as was done for the native ABIs in r263349. - Stop allocating a space for a stack gap in the Linux ABIs. This used to hold translated system call arguments, but hasn't been used since r159992. Reviewed by: kib Tested on: md64 (amd64, i386, linux64), i386 (i386, linux) Sponsored by: DARPA Differential Revision: https://reviews.freebsd.org/D22501	2019-12-03 23:17:54 +00:00
mckusick	d137f58263	Currently the breadn_flags() and getblkx() interfaces are passed the vnode, logical block number, and size of data block that is being requested. They then use the VOP_BMAP function to calculate the mapping from logical block number to physical block number from which to access the data. This change expands the interface to also pass the physical block number in cases where the VOP_MAP function may no longer work, for example when a file is being truncated. No functional change. Reviewed by: kib Tested by: Peter Holm Sponsored by: Netflix	2019-12-03 23:07:09 +00:00
jeff	5f3e7444d9	Use a precise bit count for the slab free items in UMA. This significantly shrinks embedded slab structures. Reviewed by: markj, rlibby (prior version) Differential Revision: https://reviews.freebsd.org/D22584	2019-12-02 22:44:34 +00:00
jeff	18bccfabd0	Fix the last few cases that grab without busy or valid. The grab functions must return the page in some held state for consistency elsewhere. Reviewed by: alc, kib, markj Differential Revision: https://reviews.freebsd.org/D22610	2019-12-02 22:38:25 +00:00
jeff	e7288d9732	Initialize the idle thread's lock sooner so it's not evaluated on every fork exit and we can rely on it elsewhere. Reviewed by: mav, kib, jhb, markj Differential Revision: https://reviews.freebsd.org/D22624	2019-12-02 22:35:45 +00:00
mjg	080ffac31b	lockmgr: remove more remnants of adaptive spinning Sponsored by: The FreeBSD Foundation	2019-12-01 00:35:08 +00:00
kevans	4e48e813a9	tty: implement TIOCNOTTY Generally, it's preferred that an application fork/setsid if it doesn't want to keep its controlling TTY, but it could be that a debugger is trying to steal it instead -- so it would hook in, drop the controlling TTY, then do some magic to set things up again. In this case, TIOCNOTTY is quite handy and still respected by at least OpenBSD, NetBSD, and Linux as far as I can tell. I've dropped the note about obsoletion, as I intend to support TIOCNOTTY as long as it doesn't impose a major burden. Reviewed by: bcr (manpages), kib Differential Revision: https://reviews.freebsd.org/D22572	2019-11-30 20:10:50 +00:00
mjg	c3fab6f99b	smp: cast the read in quiesce_all_critical through void * Fixes compilation on some 32-bit arm platforms. Sponsored by: The FreeBSD Foundation	2019-11-30 19:33:02 +00:00

1 2 3 4 5 ...

17034 Commits