freebsd-nq

Author	SHA1	Message	Date
Konstantin Belousov	a9fd669b4a	subr_turnstile: Extract some common code to a helper. Code walks the list of contested turnstiles to calculate the priority to unlend. Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week	2019-05-16 13:17:57 +00:00
Konstantin Belousov	0ddfdc60f8	imgact_elf.c: Add comment explaining the malloc/VOP_UNLOCK() dance from r347148. Requested by: alc Sponsored by: The FreeBSD Foundation MFC after: 3 days	2019-05-16 13:03:54 +00:00
Andrey V. Elsukov	2317067c31	Remove bpf interface lock, it is no longer exist.	2019-05-14 10:21:28 +00:00
Conrad Meyer	e199792d23	Revert r346292 (permit_nonrandom_stackcookies) We have a better, more comprehensive knob for this now: kern.random.initial_seeding.bypass_before_seeding=1. Requested by: delphij Sponsored by: Dell EMC Isilon	2019-05-13 23:37:44 +00:00
Mateusz Guzik	8ba6c1391b	cache: fix a brainfart in r347505 If bumping over the counter goes over the limit we have to decrement it back. Previous code would only bump the counter after adding the entry (thus allowing the cache to go over the limit). Sponsored by: The FreeBSD Foundation	2019-05-12 07:56:01 +00:00
Mateusz Guzik	5bf50787e6	cache: bump numcache on entry, while here fix lnumcache type Sponsored by: The FreeBSD Foundation	2019-05-12 06:59:22 +00:00
Mateusz Guzik	63ad3b65b0	cache: push sdt probes in cache_zap_locked to code doing the work Avoids branching to check which probe to evaluate. Very same check was being done later to do the actual work. Sponsored by: The FreeBSD Foundation	2019-05-12 06:39:30 +00:00
Doug Moore	87ae0686a2	A new parameter to blist_alloc specifies an upper bound on the size of the allocation request, so that the blocks allocated are from the next set of free blocks big enough to satisfy the minimum requirements of the request, and the number of blocks allocated are as many as possible, up to the specified maximum. The implementation of swp_pager_getswapspace uses this parameter to ask for a number of blocks between the new halved request size and the previous failed request size. Thus a request for 32 blocks may fail, but instead of getting only 16 blocks instead, the caller asks for 16 to 31 next, and might get 19 or 27, which is closer to what they originally wanted. I expect this to lead to bigger block allocations and less block fragmentation, at least in some cases. Approved by: kib (mentor) Differential Revision: https://reviews.freebsd.org/D20001	2019-05-11 16:15:13 +00:00
Doug Moore	535192530c	When bitpos can't be implemented with an inline ffs* instruction, change the binary search so that it does not depend on a single bit only being set in the bitmask. Use bitpos more generally, and avoid some clearing of bits to accommodate its current behavior. Approved by: kib (mentor) Differential Revision: https://reviews.freebsd.org/D20237	2019-05-11 09:09:10 +00:00
Doug Moore	0cb36fc9c2	Revert r347469. Approved by: kib (mentor)	2019-05-11 02:13:52 +00:00
Doug Moore	12cd7ded68	Don't use _Generic, as many systems don't know about it. Go back to a lo-tech switch statement. Approved by: kib (mentor) Differential Revision: https://reviews.freebsd.org/D20235	2019-05-10 23:12:37 +00:00
Doug Moore	4ab18ea23a	When bitpos can't be implemented with an inline ffs* instruction, change the binary search so that it does not depend on a single bit only being set in the bitmask. Use bitpos more generally, and avoid some clearing of bits to accommodate its current behavior. Approved by: kib (mentor) Differential Revision: https://reviews.freebsd.org/D20232	2019-05-10 22:49:01 +00:00
Doug Moore	d4808c4403	Add a (q)uit option to the subr_blist test program. Approved by: kib (mentor) Differential Revision: https://reviews.freebsd.org/D20234	2019-05-10 22:02:29 +00:00
Doug Moore	09b380a1ff	Replace the expression "-mask & ~mask" with a function call that does the same thing, but is commented so that it might be better understood. Approved by: kib (mentor) Differential Revision: https://reviews.freebsd.org/D20231	2019-05-10 19:55:29 +00:00
Doug Moore	d1c34a6b76	blist_next_leaf_alloc walks over all the meta-nodes between one leaf and the next one, and if blocks are allocated from the next leaf, it walks back toward where it started, as long as there are interleaving meta-nodes to be updated on account of the last free blocks under those meta-nodes being allocated. Only if the walk goes all the way back to the starting point must we calculate the position of the meta-node that is the least-comment parent of one leaf and the next, and update a bit in that meta-node to indicate the allocation of its last free block. There's no need to start calculating the position of that least-common parent until the walk back reaches the original starting point, and there's no need for a calculation that updates 'radius' to tell us when we've walked back to the beginning, since comparing scan to next suffices for that. Approved by: kib (mentor) Differential Revision: https://reviews.freebsd.org/D20229	2019-05-10 18:25:06 +00:00
Doug Moore	b1f59c92d8	Replace panic() with KASSERT() and provide more useful information when failure happens. Approved by: kib (mentor) Differential Revision: https://reviews.freebsd.org/D20226	2019-05-10 18:22:40 +00:00
Andrew Gallatin	4e255d7479	Bind TCP HPTS (pacer) threads to NUMA domains Bind the TCP pacer threads to NUMA domains and build per-domain pacer-thread lookup tables. These tables allow us to use the inpcb's NUMA domain information to match an inpcb with a pacer thread on the same domain. The motivation for this is to keep the TCP connection local to a NUMA domain as much as possible. Thanks to jhb for pre-reviewing an earlier version of the patch. Reviewed by: rrs Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D20134	2019-05-10 13:41:19 +00:00
Mateusz Guzik	ac97da9ad8	Reduce umtx-related work on exec and exit - there is no need to take the process lock to iterate the thread list after single-threading is enforced - typically there are no mutexes to clean up (testable without taking the global umtx lock) - typically there is no need to adjust the priority (testable without taking thread lock) Reviewed by: kib Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D20160	2019-05-08 16:30:38 +00:00
Ed Maste	0e26cd440f	make sysent after r347228 Regenerate to add @generated tag in generated files.	2019-05-07 18:10:21 +00:00
Conrad Meyer	7d7db5298d	device_printf: Use sbuf for more coherent prints on SMP device_printf does multiple calls to printf allowing other console messages to be inserted between the device name, and the rest of the message. This change uses sbuf to compose to two into a single buffer, and prints it all at once. It exposes an sbuf drain function (drain-to-printf) for common use. Update documentation to match; some unit tests included. Submitted by: jmg Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D16690	2019-05-07 17:47:20 +00:00
Ed Maste	5350e15d0d	makesyscalls: use @generated tag in generated files Multiple tools use @generated to identify generated files (for example, in a review Phabricator will by default hide diffs in generated files). Use the @generated tag in makesyscalls.sh as we've done for other generated files. Reviewed by: cem MFC after: 1 month Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D20183	2019-05-07 16:17:33 +00:00
Mark Johnston	7d43b5c98e	Simplify the test against maxproc in fork1(). Previously nprocs_new would be tested against maxprocs twice when nprocs_new < maxprocs - 10. Eliminate the unnecessary comparison. Submitted by: Wuyang Chung <wuyang.chung1@gmail.com> GitHub PR: https://github.com/freebsd/freebsd/pull/397 MFC after: 1 week	2019-05-07 15:03:26 +00:00
Doug Moore	27d172bb12	The intention of the blist cursor is for the search for free blocks to resume where the last search left off. Suppose that there are no free blocks of size 32, but plenty of size 16. If we repeatedly request size 32 blocks, fail, and retry with size 16 blocks, then the failures all reset the cursor to the beginning of memory, making the 16 block allocation use a first fit, rather than next fit, strategy. This change has blist_alloc make a copy of the cursor for its own decision making, and only updates the real blist cursor after a successful allocation, making those 16 block searches behave like next-fit searches. Approved by: markj (mentor) Differential Revision: https://reviews.freebsd.org/D20177	2019-05-06 22:12:15 +00:00
Conrad Meyer	6b6e2954dd	List-ify kernel dump device configuration Allow users to specify multiple dump configurations in a prioritized list. This enables fallback to secondary device(s) if primary dump fails. E.g., one might configure a preference for netdump, but fallback to disk dump as a second choice if netdump is unavailable. This change does not list-ify netdump configuration, which is tracked separately from ordinary disk dumps internally; only one netdump configuration can be made at a time, for now. It also does not implement IPv6 netdump. savecore(8) is already capable of scanning and iterating multiple devices from /etc/fstab or passed on the command line. This change doesn't update the rc or loader variables 'dumpdev' in any way; it can still be set to configure a single dump device, and rc.d/savecore still uses it as a single device. Only dumpon(8) is updated to be able to configure the more complicated configurations for now. As part of revving the ABI, unify netdump and disk dump configuration ioctl / structure, and leave room for ipv6 netdump as a future possibility. Backwards-compatibility ioctls are added to smooth ABI transition, especially for developers who may not keep kernel and userspace perfectly synced. Reviewed by: markj, scottl (earlier version) Relnotes: maybe Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D19996	2019-05-06 18:24:07 +00:00
Konstantin Belousov	78022527bb	Switch to use shared vnode locks for text files during image activation. kern_execve() locks text vnode exclusive to be able to set and clear VV_TEXT flag. VV_TEXT is mutually exclusive with the v_writecount > 0 condition. The change removes VV_TEXT, replacing it with the condition v_writecount <= -1, and puts v_writecount under the vnode interlock. Each text reference decrements v_writecount. To clear the text reference when the segment is unmapped, it is recorded in the vm_map_entry backed by the text file as MAP_ENTRY_VN_TEXT flag, and v_writecount is incremented on the map entry removal The operations like VOP_ADD_WRITECOUNT() and VOP_SET_TEXT() check that v_writecount does not contradict the desired change. vn_writecheck() is now racy and its use was eliminated everywhere except access. Atomic check for writeability and increment of v_writecount is performed by the VOP. vn_truncate() now increments v_writecount around VOP_SETATTR() call, lack of which is arguably a bug on its own. nullfs bypasses v_writecount to the lower vnode always, so nullfs vnode has its own v_writecount correct, and lower vnode gets all references, since object->handle is always lower vnode. On the text vnode' vm object dealloc, the v_writecount value is reset to zero, and deadfs vop_unset_text short-circuit the operation. Reclamation of lowervp always reclaims all nullfs vnodes referencing lowervp first, so no stray references are left. Reviewed by: markj, trasz Tested by: mjg, pho Sponsored by: The FreeBSD Foundation MFC after: 1 month Differential revision: https://reviews.freebsd.org/D19923	2019-05-05 11:20:43 +00:00
Konstantin Belousov	2d6b8546b7	imgact_elf: do not relock the text vnode if possible. We unlock the vnode around malloc(M_WAITOK), to make it possible for pagedaemon to flush vnode pages for us. Instead of doing it unconditionally, first try M_NOWAIT allocation, which typically succeed. Only on failure, unlock the vnode and retry with M_WAITOK. Reviewed by: markj, trasz Tested by: mjg, pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D19923	2019-05-05 11:04:01 +00:00
Conrad Meyer	665919aaaf	x86: Implement MWAIT support for stopping a CPU IPI_STOP is used after panic or when ddb is entered manually. MONITOR/ MWAIT allows CPUs that support the feature to sleep in a low power way instead of spinning. Something similar is already used at idle. It is perhaps especially useful in oversubscribed VM environments, and is safe to use even if the panic/ddb thread is not the BSP. (Except in the presence of MWAIT errata, which are detected automatically on platforms with known wakeup problems.) It can be tuned/sysctled with "machdep.stop_mwait," which defaults to 0 (off). This commit also introduces the tunable "machdep.mwait_cpustop_broken," which defaults to 0, unless the CPU has known errata, but may be set to "1" in loader.conf to signal that mwait wakeup is broken on CPUs FreeBSD does not yet know about. Unfortunately, Bhyve doesn't yet support MONITOR extensions, so this doesn't help bhyve hypervisors running FreeBSD guests. Submitted by: Anton Rang <rang AT acm.org> (earlier version) Reviewed by: kib Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D20135	2019-05-04 20:34:26 +00:00
Mateusz Guzik	5408c6db4e	sysv: get rid of fork/exit hooks if the code is compiled in Sponsored by: The FreeBSD Foundation	2019-05-04 19:05:30 +00:00
Mateusz Guzik	37d2b1f3e5	Annotate nprocs with __exclusive_cache_line Sponsored by: The FreeBSD Foundation	2019-05-04 19:04:17 +00:00
Mark Johnston	bc79b41c40	Disallow excessively small times of day in clock_settime(2). Reported by: syzkaller Reviewed by: cem, kib MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D20151	2019-05-03 21:26:44 +00:00
Mark Johnston	8e7130a8a7	Stop checking TD_IDLETHREAD() in buffer cache routines. These predicates are vestigal and cannot be true today. For example, idle threads are not allowed to acquire locks. Also cache curthread in breada(). No functional change intended. Reviewed by: kib, mckusick MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D20066	2019-04-29 13:23:32 +00:00
Alan Somers	f841e638fb	[skip ci] fix typo in comment from r59840 MFC after: 2 weeks Sponsored by: The FreeBSD Foundation	2019-04-26 15:00:59 +00:00
Ed Maste	5803d72f7e	make sysent after r346273 (readlinkat arg correction) PR: 197915 Reminded by: dchagin	2019-04-26 12:55:52 +00:00
John Baldwin	83bf5ec367	Remove p_code from struct proc. Contrary to the comments, it was never used by core dumps or debuggers. Instead, it used to hold the signal code of a pending signal, but that was replaced by the 'ksi_code' member of ksiginfo_t when signal information was reworked in 7.0. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D20047	2019-04-25 18:42:07 +00:00
Andrew Gallatin	50575ce11c	Track TCP connection's NUMA domain in the inpcb Drivers can now pass up numa domain information via the mbuf numa domain field. This information is then used by TCP syncache_socket() to associate that information with the inpcb. The domain information is then fed back into transmitted mbufs in ip{6}_output(). This mechanism is nearly identical to what is done to track RSS hash values in the inp_flowid. Follow on changes will use this information for lacp egress port selection, binding TCP pacers to the appropriate NUMA domain, etc. Reviewed by: markj, kib, slavash, bz, scottl, jtl, tuexen Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D20028	2019-04-25 15:37:28 +00:00
Warner Losh	2834e42cda	When parsing command line stuff, treat tabs and spaces the same. When creating complex config files, people like to use tabs to offset sections. Treat them the same as spaces for delimiters.	2019-04-18 22:52:12 +00:00
Conrad Meyer	ba57dad4b0	stack_protector: Add tunable to bypass random cookies This is a stopgap measure to unbreak installer/VM/embedded boot issues introduced (or at least exposed by) in r346250. Add the new tunable, "security.stack_protect.permit_nonrandom_cookies," in order to continue boot with insecure non-random stack cookies if the random device is unavailable. For now, enable it by default. This is NOT safe. It will be disabled by default in a future revision. There is follow-on work planned to use fast random sources (e.g., RDRAND on x86 and DARN on Power) to seed when the early entropy file cannot be provided, for whatever reason. Please see D19928. Some better hacks may be used to make the non-random __stack_chk_guard slightly less predictable (from delphij@ and mjg@); those suggestions are left for a future revision. I think it may also be plausible to move stack guard initialization far later in the boot process; potentially it could be moved all the way to just before userspace is started. Reported by: many Reviewed by: delphij, emaste, imp (all w/ caveat: this is a stopgap fix) Security: yes Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D19927	2019-04-16 18:47:20 +00:00
Ed Maste	e8ee7d9035	correct readlinkat(2) return type r176215 corrected readlink(2)'s return type and the type of the last argument. readlink(2) was introduced in r177788 after being developed as part of Google Summer of Code 2007; it appears to have inherited the wrong return type. Man pages and header files were already ssize_t; update syscalls.master to match. PR: 197915 Submitted by: Henning Petersen <henning.petersen@t-online.de> MFC after: 2 weeks	2019-04-16 13:26:31 +00:00
Conrad Meyer	13774e8228	random(4): Block read_random(9) on initial seeding read_random() is/was used, mostly without error checking, in a lot of very sensitive places in the kernel -- including seeding the widely used arc4random(9). Most uses, especially arc4random(9), should block until the device is seeded rather than proceeding with a bogus or empty seed. I did not spy any obvious kernel consumers where blocking would be inappropriate (in the sense that lack of entropy would be ok -- I did not investigate locking angle thoroughly). In many instances, arc4random_buf(9) or that family of APIs would be more appropriate anyway; that work was done in r345865. A minor cleanup was made to the implementation of the READ_RANDOM function: instead of using a variable-length array on the stack to temporarily store all full random blocks sufficient to satisfy the requested 'len', only store a single block on the stack. This has some benefit in terms of reducing stack usage, reducing memcpy overhead and reducing devrandom output leakage via the stack. Additionally, the stack block is now safely zeroed if it was used. One caveat of this change is that the kern.arandom sysctl no longer returns zero bytes immediately if the random device is not seeded. This means that FreeBSD-specific userspace applications which attempted to handle an unseeded random device may be broken by this change. If such behavior is needed, it can be replaced by the more portable getrandom(2) GRND_NONBLOCK option. On any typical FreeBSD system, entropy is persisted on read/write media and used to seed the random device very early in boot, and blocking is never a problem. This change primarily impacts the behavior of /dev/random on embedded systems with read-only media that do not configure "nodevice random". We toggle the default from 'charge on blindly with no entropy' to 'block indefinitely.' This default is safer, but may cause frustration. Embedded system designers using FreeBSD have several options. The most obvious is to plan to have a small writable NVRAM or NAND to persist entropy, like larger systems. Early entropy can be fed from any loader, or by writing directly to /dev/random during boot. Some embedded SoCs now provide a fast hardware entropy source; this would also work for quickly seeding Fortuna. A 3rd option would be creating an embedded-specific, more simplistic random module, like that designed by DJB in [1] (this design still requires a small rewritable media for forward secrecy). Finally, the least preferred option might be "nodevice random", although I plan to remove this in a subsequent revision. To help developers emulate the behavior of these embedded systems on ordinary workstations, the tunable kern.random.block_seeded_status was added. When set to 1, it blocks the random device. I attempted to document this change in random.4 and random.9 and ran into a bunch of out-of-date or irrelevant or inaccurate content and ended up rototilling those documents more than I intended to. Sorry. I think they're in a better state now. PR: 230875 Reviewed by: delphij, markm (earlier version) Approved by: secteam(delphij), devrandom(markm) Relnotes: yes Differential Revision: https://reviews.freebsd.org/D19744	2019-04-15 18:40:36 +00:00
Rick Macklem	eeb1f3ed51	Fix the NFSv4 client to safely find processes. r340744 broke the NFSv4 client, because it replaced pfind_locked() with a call to pfind(), since pfind() acquires the sx lock for the pid hash and the NFSv4 already holds a mutex when it does the call. The patch fixes the problem by recreating a pfind_any_locked() and adding the functions pidhash_slockall() and pidhash_sunlockall to acquire/release all of the pid hash locks. These functions are then used by the NFSv4 client instead of acquiring the allproc_lock and calling pfind(). Reviewed by: kib, mjg MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D19887	2019-04-15 01:27:15 +00:00
Edward Tomasz Napierala	91ff2d4883	Remove unneeded conditionals for sv_ functions - all the ABIs (apart from null_sysvec) define them, so the 'else' branch is never taken. Reviewed by: kib MFC after: 2 weeks Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D19889	2019-04-12 14:18:16 +00:00
Edward Tomasz Napierala	4033ecc915	Use shared vnode locks for the ELF interpreter. Reviewed by: kib MFC after: 2 weeks Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D19874	2019-04-11 11:21:45 +00:00
Alan Somers	691d4ab6f0	fix cache_lookup's documentation cache_lookup's documentation got dislocated by r324378. Relocate and expand it. Reviewed by: jhb, kib MFC after: 2 weeks Sponsored by: The FreeBSD Foundation	2019-04-10 13:02:33 +00:00
Edward Tomasz Napierala	b65ca345ef	Improve vnode lock assertions. MFC after: 2 weeks Sponsored by: DARPA, AFRL	2019-04-10 10:21:14 +00:00
Konstantin Belousov	ae90941431	Add vn_fsync_buf(). Provide a convenience function to avoid the hack with filling fake struct vop_fsync_args and then calling vop_stdfsync(). Sponsored by: The FreeBSD Foundation MFC after: 1 week	2019-04-09 20:20:04 +00:00
Edward Tomasz Napierala	9bcd7482b2	Factor out section loading into a separate function. Reviewed by: kib MFC after: 2 weeks Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D19846	2019-04-09 15:24:38 +00:00
Edward Tomasz Napierala	9274fb3599	Refactor ELF interpreter loading into a separate function. Reviewed by: kib MFC after: 2 weeks Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D19741	2019-04-08 14:31:07 +00:00
Mariusz Zaborski	de0b14f2db	In the unlinkat syscall, the operation is performed on the directory descriptor, not the file descriptor. The file descriptor is used only for verification so do not expect any additional capabilities on it. Reported by: antoine Tested by: antoine Discussed with: kib, emaste, bapt Sponsored by: Fudo Security	2019-04-08 14:23:52 +00:00
Mark Johnston	128c9bc05b	Set the p_oppid field of orphans when exiting. Such processes will be reparented to the reaper when the current parent is done with them (i.e., ptrace detached), so p_oppid must be updated accordingly. Add a regression test to exercise this code path. Previously it would not be possible to reap an orphan with a stale oppid. Reviewed by: kib, mjg Tested by: pho MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D19825	2019-04-07 14:26:14 +00:00
Conrad Meyer	d1139b5286	kern/subr_pctrie: Fix mismatched signedness in assertion comparison 'tos' is an index into an array and never holds a negative value. Correct its signedness to match PCTRIE_LIMIT, which it is compared to in assertions. No functional change (kills a warning).	2019-04-06 21:56:24 +00:00

1 2 3 4 5 ...

16595 Commits