freebsd-nq

Author	SHA1	Message	Date
Alexander Motin	61322a0a8a	Mark some more hot global variables with __read_mostly. MFC after: 1 week	2019-12-04 21:26:03 +00:00
Ryan Libby	30be9685a3	mbuf zones: take out the trash The mbuf zones were explicitly specifying the uma trash procedures on zcreate, conditionally on INVARIANTS, because that used to be necessary in order to get use-after-free checking for uma zones with non-empty constructors or destructors. After r355137 uma automatically invokes the trash constructor and destructor as long as no init and fini are specified. This now allows the mbuf zones to pass their constructors and destructors without needing to add on the uma trash procedures conditionally. Reviewed by: cem, jhb, markj Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D22583	2019-12-04 18:21:29 +00:00
John Baldwin	31174518d2	Use uintptr_t instead of register_t * for the stack base. - Use ustringp for the location of the argv and environment strings and allow destp to travel further down the stack for the stackgap and auxv regions. - Update the Linux copyout_strings variants to move destp down the stack as was done for the native ABIs in r263349. - Stop allocating a space for a stack gap in the Linux ABIs. This used to hold translated system call arguments, but hasn't been used since r159992. Reviewed by: kib Tested on: md64 (amd64, i386, linux64), i386 (i386, linux) Sponsored by: DARPA Differential Revision: https://reviews.freebsd.org/D22501	2019-12-03 23:17:54 +00:00
Kirk McKusick	d00066a5f9	Currently the breadn_flags() and getblkx() interfaces are passed the vnode, logical block number, and size of data block that is being requested. They then use the VOP_BMAP function to calculate the mapping from logical block number to physical block number from which to access the data. This change expands the interface to also pass the physical block number in cases where the VOP_MAP function may no longer work, for example when a file is being truncated. No functional change. Reviewed by: kib Tested by: Peter Holm Sponsored by: Netflix	2019-12-03 23:07:09 +00:00
Jeff Roberson	9b78b1f433	Use a precise bit count for the slab free items in UMA. This significantly shrinks embedded slab structures. Reviewed by: markj, rlibby (prior version) Differential Revision: https://reviews.freebsd.org/D22584	2019-12-02 22:44:34 +00:00
Jeff Roberson	4504268a1b	Fix the last few cases that grab without busy or valid. The grab functions must return the page in some held state for consistency elsewhere. Reviewed by: alc, kib, markj Differential Revision: https://reviews.freebsd.org/D22610	2019-12-02 22:38:25 +00:00
Jeff Roberson	e15046952d	Initialize the idle thread's lock sooner so it's not evaluated on every fork exit and we can rely on it elsewhere. Reviewed by: mav, kib, jhb, markj Differential Revision: https://reviews.freebsd.org/D22624	2019-12-02 22:35:45 +00:00
Mateusz Guzik	5fe188b1e8	lockmgr: remove more remnants of adaptive spinning Sponsored by: The FreeBSD Foundation	2019-12-01 00:35:08 +00:00
Kyle Evans	1b50b999f9	tty: implement TIOCNOTTY Generally, it's preferred that an application fork/setsid if it doesn't want to keep its controlling TTY, but it could be that a debugger is trying to steal it instead -- so it would hook in, drop the controlling TTY, then do some magic to set things up again. In this case, TIOCNOTTY is quite handy and still respected by at least OpenBSD, NetBSD, and Linux as far as I can tell. I've dropped the note about obsoletion, as I intend to support TIOCNOTTY as long as it doesn't impose a major burden. Reviewed by: bcr (manpages), kib Differential Revision: https://reviews.freebsd.org/D22572	2019-11-30 20:10:50 +00:00
Mateusz Guzik	e0a1a1e6cb	smp: cast the read in quiesce_all_critical through void * Fixes compilation on some 32-bit arm platforms. Sponsored by: The FreeBSD Foundation	2019-11-30 19:33:02 +00:00
Mateusz Guzik	3ac2ac2e08	lockprof: use IPI-injecetd fences to fix hangs on stat dump and reset The previously used quiesce_all_cpus walks all CPUs and waits until curthread can run on them. Even on contemporary machines this becomes a significant problem under load when it can literally take minutes for the operation to complete. With the patch the stall is normally less than 1 second. Reviewed by: kib, jeff (previous version) Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D21740	2019-11-30 17:24:42 +00:00
Mateusz Guzik	5032fe17a2	Add a way to inject fences using IPIs A variant of this facility was already used by rmlocks where IPIs would enforce ordering. This allows to elide fences where they are rarely needed and the cost of IPI (should it be necessary) is cheaper. Reviewed by: kib, jeff (previous version) Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D21740	2019-11-30 17:22:10 +00:00
Mateusz Guzik	a02cab334c	devfs: introduce a per-dev lock to protect ->si_devsw This allows bumping threadcount without taking the global devmtx lock. In particular this eliminates contention on said lock while using bhyve with multiple vms. Reviewed by: kib Tested by: markj MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D22548	2019-11-30 16:46:19 +00:00
Kyle Evans	9e387c3da2	tty_rel_gone: add locking assertion We already assert the lock is held later during tty_rel_free(), but it is arguably good form to clarify locking expectations here as well at the top-level that other drivers use.	2019-11-29 14:46:13 +00:00
Konstantin Belousov	fdc6b10d44	Add a VN_OPEN_INVFS flag. vn_open_cred() assumes that it is called from the top-level of a VFS syscall. Writers must call bwillwrite() before locking any VFS resource to wait for cleanup of dirty buffers. ZFS getextattr() and setextattr() VOPs do call vn_open_cred(), which results in wait for unrelated buffers while owning ZFS vnode lock (and ZFS does not use buffer cache). VN_OPEN_INVFS allows caller to skip bwillwrite. Note that ZFS is still incorrect there, because it starts write on an mp and locks a vnode while holding another vnode lock. Reported by: Willem Jan Withagen <wjw@digiware.nl> Sponsored by: The FreeBSD Foundation MFC after: 1 week	2019-11-29 14:02:32 +00:00
Ryan Libby	815db2f6f8	ktls_session zone: don't need to specify uma trash The use of the uma trash procedures is automatic, there's no need to pass them explicitly here. Reviewed by: markj Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D22582	2019-11-29 06:25:03 +00:00
Kyle Evans	cf29433090	tty_pts: don't rely on tty header pollution for sys/mutex.h tty_pts.c relies on sys/tty.h for sys/mutex.h. Include it directly instead of relying on this pollution to ease the diff for anyone that wants to try converting the tty lock to anything other than a mutex.	2019-11-29 03:56:01 +00:00
Jeff Roberson	6d6a03d7a8	Handle large mallocs by going directly to kmem. Taking a detour through UMA does not provide any additional value. Reviewed by: markj Differential Revision: https://reviews.freebsd.org/D22563	2019-11-29 03:14:10 +00:00
Jeff Roberson	b476ae7f52	Fix DEBUG_REDZONE build after r355169	2019-11-28 08:56:14 +00:00
Hans Petter Selasky	c2a8682ae8	Factor out check for mounted root file system. Differential Revision: https://reviews.freebsd.org/D22571 PR: 241639 MFC after: 1 week Sponsored by: Mellanox Technologies	2019-11-28 08:47:36 +00:00
Jeff Roberson	584061b480	Garbage collect the mostly unused us_keg field. Use appropriately named union members in vm_page.h to store the zone and slab. Remove some nearby dead code. Reviewed by: markj Differential Revision: https://reviews.freebsd.org/D22564	2019-11-28 07:49:25 +00:00
Konstantin Belousov	ef401a8558	Requested and tested by: kevans Reviewed by: kevans (previous version), markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D22546	2019-11-27 20:33:53 +00:00
Ryan Libby	59fb4a95c7	witness: sleepable rm locks are not sleepable in read mode There are two classes of rm lock, one "sleepable" and one not. But even a "sleepable" rm lock is only sleepable in write mode, and is non-sleepable when taken in read mode. Warn about sleepable rm locks in read mode as non-sleepable locks. Do this by defining a new lock operation flag, LOP_NOSLEEP, to indicate that a lock is non-sleepable despite what the LO_SLEEPABLE flag would indicate, and defining a new witness lock instance flag, LI_SLEEPABLE, to track the product of LO_SLEEPABLE and LOP_NOSLEEP on the lock instance. Reviewed by: markj Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D22527	2019-11-27 01:54:39 +00:00
Mateusz Guzik	588e69e2fd	cache: stop reusing .. entries on enter It almost never happens in practice anyway. With this eliminated ->nc_vp cannot change vnodes, removing an obstacle on the road to lockless lookup.	2019-11-27 01:21:42 +00:00
Mateusz Guzik	2ac930e32c	cache: fix numcache accounting on entry . entries are never created and .. can reuse existing entries, meaning the early count bump is both spurious and leading to overcounting in certain cases.	2019-11-27 01:20:55 +00:00
Mateusz Guzik	36afce39ae	cache: hide "doingcache" behind DEBUG_CACHE	2019-11-27 01:20:21 +00:00
Hans Petter Selasky	aa4612d133	Fix panic when loading kernel modules before root file system is mounted. Make sure the rootvnode is always NULL checked. Differential Revision: https://reviews.freebsd.org/D22545 PR: 241639 MFC after: 1 week Sponsored by: Mellanox Technologies	2019-11-26 12:20:44 +00:00
Mariusz Zaborski	8e49361164	procdesc: allow to collect status through wait(1) if process is traced The debugger like truss(1) depends on the wait(2) syscall. This syscall waits for ALL children. When it is waiting for ALL child's the children created by process descriptors are not returned. This behavior was introduced because we want to implement libraries which may pdfork(1). The behavior of process descriptor brakes truss(1) because it will not be able to collect the status of processes with process descriptors. To address this problem the status is returned to parent when the child is traced. While the process is traced the debugger is the new parent. In case the original parent and debugger are the same process it means the debugger explicitly used pdfork() to create the child. In that case the debugger should be using kqueue()/pdwait() instead of wait(). Add test case to verify that. The test case was implemented by markj@. Reviewed by: kib, markj Discussed with: jhb MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D20362	2019-11-25 18:33:21 +00:00
Ryan Libby	43cefe8b19	sysctl sysctls: wire old buf before output with sysctl lock Several sysctl sysctls output to a user buffer while holding a non-sleepable lock that protects the sysctl topology. They need to wire the output buffer, or else they may try to sleep on a page fault. Reviewed by: cem, markj Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D22528	2019-11-25 07:38:27 +00:00
Konstantin Belousov	b631c36f0d	Record part of the owner struct thread pointer into busy_lock. Record as much bits from curthread into busy_lock as fits. Low bits for struct thread * representation are zero due to struct and zone alignment, and they leave space for busy flags (perhaps except statically allocated thread0). Upper bits are not very interesting for assert, and in most practical situations recorded value should allow to manually identify the owner with certainity. Assert that unbusy is performed by the owner, except few places where unbusy is done in io completion handler. For this case, add _unchecked variants of asserts and unbusy primitives. Reviewed by: markj (previous version) Tested by: pho Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D22298	2019-11-24 19:12:23 +00:00
Warner Losh	a921c2003f	Add a warning about Giant Locked devices Add a warning when a device registers with devfs and requests D_NEEDGIANT. The warning says the device will go away before 13.0. This is needed to flush out the devices in the tree that are still Giant locked. This warning, or some variant of it, should have gone into the tree a long time ago... The intention is to require all devices be converted to not use automatic giant in this way, or remove any such devices that remain that we don't have the hardware to test a conversion of. kbd so far is the only device that can't leave the tree, yet needs something sensible done to avoid the auto giant lock (even if it is just doing the wrapping itself). There may be others added to this list... Any discussions of this topic will take place on arch@.	2019-11-23 23:57:26 +00:00
Conrad Meyer	7993a104a1	Add explicit SI_SUB_EPOCH Add explicit SI_SUB_EPOCH, after SI_SUB_TASKQ and before SI_SUB_SMP (EARLY_AP_STARTUP). Rename existing "SI_SUB_TASKQ + 1" to SI_SUB_EPOCH. epoch(9) consumers cannot epoch_alloc() before SI_SUB_EPOCH:SI_ORDER_SECOND, but likely should allocate before SI_SUB_SMP. Prior to this change, consumers (well, epoch itself, and net/if.c) just open-coded the SI_SUB_TASKQ + 1 order to match epoch.c, but this was fragile. Reviewed by: mmacy Differential Revision: https://reviews.freebsd.org/D22503	2019-11-22 23:23:40 +00:00
Gleb Smirnoff	329377f44b	cc_ktr_event_name is used only with KTR	2019-11-21 23:55:43 +00:00
Alexander Motin	130fffa2a3	Add variant of root_mount_hold() without allocation. It allows to use this KPI in non-sleepable contexts. MFC after: 2 weeks Sponsored by: iXsystems, Inc.	2019-11-21 21:59:35 +00:00
Andrew Turner	a27ac4644a	Disable KCSAN within a panic. The kernel is single threaded at this point and the panic is more important. Sponsored by: DARPA, AFRL	2019-11-21 13:59:01 +00:00
Andrew Turner	68cad68149	Add kcsan_md_unsupported from NetBSD. It's used to ignore virtual addresses that may have a different physical address depending on the CPU. Sponsored by: DARPA, AFRL	2019-11-21 13:22:23 +00:00
Andrew Turner	bba0065f0d	Fix the bus_space functions with KCSAN on arm64. Arm64 doesn't define the bus_space_set_multi_stream and bus_space_set_region_stream functions. Don't try to define them there. Sponsored by: DARPA, AFRL	2019-11-21 13:12:58 +00:00
Andrew Turner	849aef496d	Port the NetBSD KCSAN runtime to FreeBSD. Update the NetBSD Kernel Concurrency Sanitizer (KCSAN) runtime to work in the FreeBSD kernel. It is a useful tool for finding data races between threads executing on different CPUs. This can be enabled by enabling KCSAN in the kernel config, or by using the GENERIC-KCSAN amd64 kernel. It works on amd64 and arm64, however the later needs a compiler change to allow -fsanitize=thread that KCSAN uses. Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D22315	2019-11-21 11:22:08 +00:00
Andrew Turner	0cb5357037	Import the NetBSD Kernel Concurrency Sanitizer (KCSAN) runtime. KCSAN is a tool to find concurrent memory access that may race each other. After a determined number of memory accesses a cell is created, this describes the current access. It will then delay for a short period to allow other CPUs a chance to race. If another CPU performs a memory access to an overlapping region during this delay the race is reported. This is a straight import of the NetBSD code, it will be adapted to FreeBSD in a future commit. Sponsored by: DARPA, AFRL	2019-11-20 14:37:48 +00:00
Mateusz Guzik	d578a4256e	cache: minor stat cleanup Remove duplicated stats and move numcachehv from debug to vfs.cache.	2019-11-20 12:08:32 +00:00
Mateusz Guzik	d957f3a4f0	vfs: perform a more racy check in vfs_notify_upper Locking mp does not buy anything interms of correctness and only contributes to contention.	2019-11-20 12:07:54 +00:00
Mateusz Guzik	1fccb43c39	vfs: change si_usecount management to count used vnodes Currently si_usecount is effectively a sum of usecounts from all associated vnodes. This is maintained by special-casing for VCHR every time usecount is modified. Apart from complicating the code a little bit, it has a scalability impact since it forces a read from a cacheline shared with said count. There are no consumers of the feature in the ports tree. In head there are only 2: revoke and devfs_close. Both can get away with a weaker requirement than the exact usecount, namely just the count of active vnodes. Changing the meaning to the latter means we only need to modify it on 0<->1 transitions, avoiding the check plenty of times (and entirely in something like vrefact). Reviewed by: kib, jeff Tested by: pho Differential Revision: https://reviews.freebsd.org/D22202	2019-11-20 12:05:59 +00:00
Jeff Roberson	639676877b	Simplify anonymous memory handling with an OBJ_ANON flag. This eliminates reudundant complicated checks and additional locking required only for anonymous memory. Introduce vm_object_allocate_anon() to create these objects. DEFAULT and SWAP objects now have the correct settings for non-anonymous consumers and so individual consumers need not modify the default flags to create super-pages and avoid ONEMAPPING/NOSPLIT. Reviewed by: alc, dougm, kib, markj Tested by: pho Differential Revision: https://reviews.freebsd.org/D22119	2019-11-19 23:19:43 +00:00
Kyle Evans	4cc12fb848	sysent: regenerate after r354835 The lua-based makesyscalls produces slightly different output than its makesyscalls.sh predecessor, all whitespace differences more closely matching the source syscalls.master.	2019-11-18 23:31:12 +00:00
Kyle Evans	f22a592111	Convert in-tree sysent targets to use new makesyscalls.lua flua is bootstrapped as part of the build for those on older versions/revisions that don't yet have flua installed. Once upgraded past r354833, "make sysent" will again naturally work as expected. Reviewed by: brooks Differential Revision: https://reviews.freebsd.org/D21894	2019-11-18 23:28:23 +00:00
John Baldwin	03b0d68c72	Check for errors from copyout() and suword*() in sv_copyout_args/strings. Reviewed by: brooks, kib Tested on: amd64 (amd64, i386, linux64), i386 (i386, linux) Sponsored by: DARPA Differential Revision: https://reviews.freebsd.org/D22401	2019-11-18 20:07:43 +00:00
David Bright	2d5603fe65	Jail and capability mode for shm_rename; add audit support for shm_rename Co-mingling two things here: * Addressing some feedback from Konstantin and Kyle re: jail, capability mode, and a few other things * Adding audit support as promised. The audit support change includes a partial refresh of OpenBSM from upstream, where the change to add shm_rename has already been accepted. Matthew doesn't plan to work on refreshing anything else to support audit for those new event types. Submitted by: Matthew Bryan <matthew.bryan@isilon.com> Reviewed by: kib Relnotes: Yes Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D22083	2019-11-18 13:31:16 +00:00
Konstantin Belousov	01a2b5679b	kern_exec: p_osrel and p_fctl0 were obliterated by failed execve(2) attempt. Zeroing of them is needed so that an image activator can update the values as appropriate (or not set at all). Reviewed by: markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D22379	2019-11-17 14:52:45 +00:00
Scott Long	de890ea465	Create a new sysctl subtree, machdep.mitigations. Its purpose is to organize knobs and indicators for code that mitigates functional and security issues in the architecture/platform. Controls for regular operational policy should still go into places security, hw, kern, etc. The machdep root node is inherently architecture dependent, but mitigations tend to be architecture dependent as well. Some cases like Spectre do cross architectural boundaries, but the mitigation code for them tends to be architecture dependent anyways, and multiple architectures won't be active in the same image of the kernel. Many mitigation knobs already exist in the system, and they will be moved with compat naming in the future. Going forward, mitigations should collect in machdep.mitigations. Reviewed by: imp, brooks, rwatson, emaste, jhb Sponsored by: Intel	2019-11-15 23:27:17 +00:00
John Baldwin	e353233118	Add a sv_copyout_auxargs() hook in sysentvec. Change the FreeBSD ELF ABIs to use this new hook to copyout ELF auxv instead of doing it in the sv_fixup hook. In particular, this new hook allows the stack space to be allocated at the same time the auxv values are copied out to userland. This allows us to avoid wasting space for unused auxv entries as well as not having to recalculate where the auxv vector is by walking back up over the argv and environment vectors. Reviewed by: brooks, emaste Tested on: amd64 (amd64 and i386 binaries), i386, mips, mips64 Sponsored by: DARPA Differential Revision: https://reviews.freebsd.org/D22355	2019-11-15 18:42:13 +00:00

1 2 3 4 5 ...

16980 Commits