freebsd-skq

Author	SHA1	Message	Date
Mateusz Guzik	07c6e2f4ab	vfs: unlazy before dooming the vnode With this change having the listmtx lock held postpones dooming the vnode. Use this fact to simplify iteration over the lazy list. It also allows filters to safely access ->v_data. Reviewed by: kib (early version) Differential Revision: https://reviews.freebsd.org/D23397	2020-01-30 02:12:52 +00:00
Gleb Smirnoff	79674264df	Fix text format definition for kern.maxvnodes, vfs.wantfreevnodes. This is a regression from r356642, r356645.	2020-01-30 00:18:00 +00:00
Conrad Meyer	07a65f9d38	hwpstate_intel(4): Silence/fix Coverity reports These were all introduced in the initial import of hwpstate_intel(4). Reported by: Coverity CIDs: 1413161, 1413164, 1413165, 1413167 X-MFC-With: r357002	2020-01-29 03:15:34 +00:00
Warner Losh	42ec4f05a3	Make mqueue objects work across a fork again. In r110908 (2003) alfred added DFLAG_PASSABLE to tag those types of FD that can be passed via unix pipes, but mqueuefs didn't exist yet. Later, in r152825 (2005) davidxu neglected to include DFLAG_PASSABLE since people don't normally pass these things via unix sockets (it's a FreeBSD implementation detail that it's a file descriptor, nobody noticed). Then r223866 (2011) by jonathan used the new flag in fdcopy, which fork uses. Due to that, mqueuefs actually broke mqueue objects being propagated by fork. No mention of mqueuefs was made in r223866, so I think it was an unintended consequence. Fix this by tagging mqueuefs as passable as well. They were prior to alfred's change (and it's clear there's no intent in his change to change this behavior), and POSIX requires this to be the case as well. PR: 243103 Reviewed by: kib@, jiles@ Differential Revision: https://reviews.freebsd.org/D23038	2020-01-27 22:36:54 +00:00
John Baldwin	425e5f9dcf	Revert accidental change from r357146.	2020-01-26 14:23:27 +00:00
John Baldwin	c73222d0e6	Fix some misleading indentation warnings reported by recent clang. These should not be any functional change. While the change in emul10kx-pcm.c looks like a real bug fix (as opposed to inconsistent whitespace), the extra statements were not harmful. Reviewed by: kib Sponsored by: DARPA Differential Revision: https://reviews.freebsd.org/D23363	2020-01-26 14:20:57 +00:00
Mateusz Guzik	1513f80391	vfs: do an unlocked check before iterating the lazy list For most filesystems it is expected to be empty most of the time.	2020-01-26 07:06:18 +00:00
Mateusz Guzik	cd0e46c66b	vfs: remove vop loop from vop_sigdefer All ops are guaranteed to be present since r357131.	2020-01-26 07:05:06 +00:00
Mateusz Guzik	6d69e665dd	vfs: fix freevnodes count update race against preemption vdbatch_process leaves the critical section too early, openign a time window where another thread can get scheduled and modify vd->freevnodes. Once it the preempted thread gets back it overrides the value with 0. Just move critical_exit to the end of the function.	2020-01-26 00:40:27 +00:00
Mateusz Guzik	dc9a1cb60b	vfs: predict vn_lock failure as unlikely in vget	2020-01-26 00:34:57 +00:00
Jason A. Harmening	a9aa06f7b1	Implement cycle-detecting garbage collector for AF_UNIX sockets The existing AF_UNIX socket garbage collector destroys any socket which may potentially be in a cycle, as indicated by its file reference count being equal to its enqueue count. However, this can produce false positives for in-flight sockets which aren't part of a cycle but are part of one or more SCM_RIGHTS mssages and which have been closed on the sending side. If the garbage collector happens to run at exactly the wrong time, destruction of these sockets will render them unusable on the receiving side, such that no previously-written data may be read. This change rewrites the garbage collector to precisely detect cycles: 1. The existing check of msgcount==f_count is still used to determine whether the socket is potentially in a cycle. 2. The socket is now placed on a local "dead list", which is used to reduce iteration time (and therefore contention on the global unp_link_rwlock). 3. The first pass through the dead list removes each potentially-dead socket's outgoing references from the graph of potentially-dead sockets, using a gc-specific copy of the original reference count. 4. The second series of passes through the dead list removes from the list any socket whose remaining gc refcount is non-zero, as this indicates the socket is actually accessible outside of any possible cycle. Iteration is repeated until no further sockets are removed from the dead list. 5. Sockets remaining in the dead list are destroyed as before. PR: 227285 Submitted by: jan.kokemueller@gmail.com (prior version) Reviewed by: markj Differential Revision: https://reviews.freebsd.org/D23142	2020-01-25 08:57:26 +00:00
Mark Johnston	a89c2c8c34	Revert r357050. It seems to have introduced a couple of regressions. Reported by: cy, pho	2020-01-24 14:58:02 +00:00
Edward Tomasz Napierala	b3fb13eb55	Add kern_unmount() and use in Linuxulator. No functional changes. Reviewed by: kib MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D22646	2020-01-24 11:57:55 +00:00
Mateusz Guzik	28eb39a5ab	vfs: allow v_usecount to transition 0->1 without the interlock There is nothing to do but to bump the count even during said transition. There are 2 places which can do it: - vget only does this after locking the vnode, meaning there is no change in contract versus inactive or reclamantion - vref only ever did it with the interlock held which did not protect against either (that is, it would always succeed) VCHR vnodes retain special casing due to the need to maintain dev use count. Reviewed by: jeff, kib Tested by: pho (previous version) Differential Revision: https://reviews.freebsd.org/D23185	2020-01-24 07:47:44 +00:00
Mateusz Guzik	d93762b94d	vfs: stop handling VI_OWEINACT in vget vget is almost always called with LK_SHARED, meaning the flag (if present) is almost guaranteed to get cleared. Stop handling it in the first place and instead let the thread which wanted to do inactive handle the bumepd usecount. Reviewed by: jeff Tested by: pho Differential Revision: https://reviews.freebsd.org/D23184	2020-01-24 07:45:59 +00:00
Mateusz Guzik	74c4b7cc60	vfs: stop unlocking the vnode upfront in vput Doing so runs into races with filesystems which make half-constructed vnodes visible to other users, while depending on the chain vput -> vinactive -> vrecycle to be executed without dropping the vnode lock. Impediments for making this work got cleared up (notably vop_unlock_post now does not do anything and lockmgr stops touching the lock after the final write). Stacked filesystems keep vhold/vdrop across unlock, which arguably can now be eliminated. Reviewed by: jeff Differential Revision: https://reviews.freebsd.org/D23344	2020-01-24 07:44:25 +00:00
Mateusz Guzik	c00115f108	lockmgr: don't touch the lock past unlock This evens it up with other locking primitives. Note lock profiling still touches the lock, which again is in line with the rest. Reviewed by: jeff Differential Revision: https://reviews.freebsd.org/D23343	2020-01-24 07:42:57 +00:00
Mark Johnston	1bfca40c57	Set td_oncpu before dropping the thread lock during a switch. After r355784 we no longer hold a thread's thread lock when switching it out. Preserve the previous synchronization protocol for td_oncpu by setting it together with td_state, before dropping the thread lock during a switch. Reported and tested by: pho Reviewed by: kib Discussed with: jeff Differential Revision: https://reviews.freebsd.org/D23270	2020-01-23 16:24:51 +00:00
Jeff Roberson	91e31c3c08	Consistently use busy and vm_page_valid() rather than touching page bits directly. This improves API compliance, asserts, etc. Reviewed by: kib, markj Differential Revision: https://reviews.freebsd.org/D23283	2020-01-23 04:54:49 +00:00
Jeff Roberson	1eb13fce84	Block the thread lock in sched_throw() and use cpu_switch() to unblock it. The introduction of lockless switch in r355784 created a race to re-use the exiting thread that was only possible to hit on a hypervisor. Reported/Tested by: rlibby Discussed with: rlibby, jhb	2020-01-23 03:36:50 +00:00
Gleb Smirnoff	ad3980121b	DEVICE_POLLING is an alternative to network interrupts and also needs to enter epoch. Assert that in the netisr_poll() and do the work for the idle poll routine.	2020-01-23 01:30:50 +00:00
Gleb Smirnoff	511d1afb6b	Enter the network epoch for interrupt handlers of INTR_TYPE_NET. Provide tunable to limit how many times handlers may be executed without reentering epoch. Differential Revision: https://reviews.freebsd.org/D23242	2020-01-23 01:24:47 +00:00
Gleb Smirnoff	c4eb66309f	Add ie_hflags to struct intr_event, which accumulates flags from all handlers on this event. For now handle only IH_ENTROPY in that manner.	2020-01-23 01:20:59 +00:00
Conrad Meyer	4577cf3744	cpufreq(4): Add support for Intel Speed Shift Intel Speed Shift is Intel's technology to control frequency in hardware, with hints from software. Let's get a working version of this in the tree and we can refine it from here. Submitted by: bwidawsk, scottph Reviewed by: bcr (manpages), myself Discussed with: jhb, kib (earlier versions) With feedback from: Greg V, gallatin, freebsdnewbie AT freenet.de Relnotes: yes Differential Revision: https://reviews.freebsd.org/D18028	2020-01-22 23:28:42 +00:00
Hans Petter Selasky	1f69a50940	Make sure the VNET is properly set when calling tcp_drop() from the ktls taskqueue callback function. A valid VNET is needed when updating statistics. panic() tcp_state_change() tcp_drop() ktls_reset_send_tag() taskqueue_run_locked() taskqueue_thread_loop() Sponsored by: Mellanox Technologies	2020-01-21 11:43:25 +00:00
Mateusz Guzik	6403455301	cache: revert r352613 now that vhold does not take locks	2020-01-20 19:52:23 +00:00
Mateusz Guzik	8bba93c7e0	cache: make numcachehv use counter(9) on all archs Requested by: kib	2020-01-20 14:42:11 +00:00
Jeff Roberson	d6e13f3b4d	Don't hold the object lock while calling getpages. The vnode pager does not want the object lock held. Moving this out allows further object lock scope reduction in callers. While here add some missing paging in progress calls and an assert. The object handle is now protected explicitly with pip. Reviewed by: kib, markj Differential Revision: https://reviews.freebsd.org/D23033	2020-01-19 23:47:32 +00:00
Mateusz Guzik	a9099e5b10	vfs: switch vop_stdunlock to call lockmgr_unlock Since the flags argument is now alawys 0 the new call provides the same behavior.	2020-01-19 21:41:34 +00:00
Jeff Roberson	811d05fcb7	Provide an API for interlocked refcount sleeps. Reviewed by: kib, markj Differential Revision: https://reviews.freebsd.org/D22908	2020-01-19 18:18:17 +00:00
Mateusz Guzik	28479aaae2	vfs: allow v_holdcnt to transition 0->1 without the interlock Since r356672 ("vfs: rework vnode list management") there is nothing to do apart from altering freevnodes count, but this much can be safely done based on the result of atomic_fetchadd. Reviewed by: kib Tested by: pho Differential Revision: https://reviews.freebsd.org/D23186	2020-01-19 17:47:04 +00:00
Mateusz Guzik	059cb4843b	cache: counter_u64_add_protected -> counter_u64_add Fixes booting on RISC-V where it does happen to not be equivalent. Reported by: lwhsu	2020-01-19 17:05:26 +00:00
Mateusz Guzik	1399033590	cache: convert numcachehv to counter(9) on 64-bit platforms	2020-01-19 05:37:27 +00:00
Mateusz Guzik	512fa9a4e0	vfs: plug a conditional assigment of lo_name in getnewvnode It only matters for witness. No functional changes.	2020-01-19 05:36:45 +00:00
Kyle Evans	05d7dd739c	sysent targets: further cleanup and deduplication r355473 vastly improved the readability and cleanliness of these Makefiles. Every single one of them follows the same pattern and duplicates the exact same logic. Now that we have GENERATED/SRCS, split SRCS up into the two parameters we'll use for ${MAKESYSCALLS} rather than assuming a specific ordering of SRCS and include a common sysent.mk to handle the rest. This makes it less tedious to make sweeping changes. Some default values are provided for GENERATED/SYSENT_*; almost all of these just use a 'syscalls.master' and 'syscalls.conf' in cwd, and they all use effectively the same filenames with an arbitrary prefix. Most ABIs will be able to get away with just setting GENERATED_PREFIX and including ^/sys/conf/sysent.mk, while others only need light additions. kern/Makefile is the notable exception, as it doesn't take a SYSENT_CONF and the generated files are spread out between ^/sys/kern and ^/sys/sys, but it otherwise fits the pattern enough to use the common version. Reviewed by: brooks, imp Nice!: emaste Differential Revision: https://reviews.freebsd.org/D23197	2020-01-18 20:37:45 +00:00
Mateusz Guzik	2d0c620272	vfs: distribute freevnodes counter per-cpu It gets rolled up to the global when deferred requeueing is performed. A dedicated read routine makes sure to return a value only off by a certain amount. This soothes a global serialisation point for all 0<->1 hold count transitions. Reviewed by: jeff Differential Revision: https://reviews.freebsd.org/D23235	2020-01-18 01:29:02 +00:00
Mateusz Guzik	d3cc535474	vfs: provide F_ISUNIONSTACK as a kludge for libc Prior to introduction of this op libc's readdir would call fstatfs(2), in effect unnecessarily copying kilobytes of data just to check fs name and a mount flag. Reviewed by: kib (previous version) Differential Revision: https://reviews.freebsd.org/D23162	2020-01-17 14:42:25 +00:00
Mateusz Guzik	1ad72b270c	vfs: shorten lock hold time in vdbatch_process	2020-01-17 14:39:00 +00:00
Gleb Smirnoff	66c6c556b6	Change argument order of epoch_call() to more natural, first function, then its argument. Reviewed by: imp, cem, jhb	2020-01-17 06:10:24 +00:00
Mateusz Guzik	66f67d5e5e	vfs: increment numvnodes without the vnode list lock unless under pressure The vnode list lock is only needed to reclaim free vnodes or kick the vnlru thread (or to block and not miss a wake up (but note the sleep has a timeout so this would not be a correctness issue)). Try to get away without the lock by just doing an atomic increment. The lock is contended e.g., during poudriere -j 104 where about half of all acquires come from vnode allocation code. Note the entire scheme needs a rewrite, the above just reduces it's SMP impact. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D23140	2020-01-16 21:45:21 +00:00
Mateusz Guzik	b7f50b9ad1	vfs: refcator vnode allocation Semantics are almost identical. Some code is deduplicated and there are fewer memory accesses. Reviewed by: kib, jeff Differential Revision: https://reviews.freebsd.org/D23158	2020-01-16 21:43:13 +00:00
Mateusz Guzik	875cfc082d	vfs: reimplement vlrureclaim to actually use LRU Take advantage of global ordering introduced in r356672. Reviewed by: mckusick (previous version) Differential Revision: https://reviews.freebsd.org/D23067	2020-01-16 10:44:02 +00:00
Jeff Roberson	a81c400e75	Simplify VM and UMA startup by eliminating boot pages. Instead use careful ordering to allocate early pages in the same way boot pages were but only as needed. After the KVA allocator has started up we allocate the KVA that we consumed during boot. This also makes the boot pages freeable since they have vm_page structures allocated with the rest of memory. Parts of this patch were written and tested by markj. Reviewed by: glebius, markj Differential Revision: https://reviews.freebsd.org/D23102	2020-01-16 05:01:21 +00:00
Kirk McKusick	bbb1e07d65	Peter Holm reports that his test that does an umount(8) on an active mount point while numerous tests are running that are writing to files on that mount point cause the unmount(8) to hang forever. The unmount(8) system call is handled in the kernel by the dounmount() function. The cause of the hang is that prior to dounmount() calling VFS_UNMOUNT() it is calling VFS_SYNC(mp, MNT_WAIT). The MNT_WAIT flag indicates that VFS_SYNC() should not return until all the dirty buffers associated with the mount point have been written to disk. Because user processes are allowed to continue writing and can do so faster than the data can be written to disk, the call to VFS_SYNC() can never finish. Unlike VFS_SYNC(), the VFS_UNMOUNT() routine can suspend all processes when they request to do a write thus having a finite number of dirty buffers to write that cannot be expanded. There is no need to call VFS_SYNC() before calling VFS_UNMOUNT(), because VFS_UNMOUNT() needs to flush everything again anyway after suspending writes, to catch anything that was dirtied between the VFS_SYNC() and writes being suspended. The fix is to simply remove the unnecessary call to VFS_SYNC() from dounmount(). Reported by: Peter Holm Analysis by: Chuck Silvers Tested by: Peter Holm MFC after: 7 days Sponsored by: Netflix	2020-01-15 18:53:32 +00:00
Gleb Smirnoff	9074694339	Since this code uses if_ref()/if_rele() it must include if_var.h explicitly, not via header pollution.	2020-01-15 03:39:11 +00:00
Gleb Smirnoff	3264dcadc9	- Move global network epoch definition to epoch.h, as more different subsystems tend to need to know about it, and including if_var.h is huge header pollution for them. Polluting possible non-network users with single symbol seems much lesser evil. - Remove non-preemptible network epoch. Not used yet, and unlikely to get used in close future.	2020-01-15 03:34:21 +00:00
Mateusz Guzik	cda3176851	vfs: in vop_stdadd_writecount only vlazy vnodes on mounts using msync The only reason to vlazy there is to (overzealously) ensure all vnodes which need to be visited by msync scan can be found there. In particluar this is of no use zfs and tmpfs. While here depessimize the check.	2020-01-15 01:34:05 +00:00
Ryan Libby	51871224c0	malloc: remove assumptions about MINALLOCSIZE Remove assumptions about the minimum MINALLOCSIZE, in order to allow testing of smaller MINALLOCSIZE. A following patch will lower the MINALLOCSIZE, but not so much that the present patch is required for correctness at these sites. Reviewed by: jeff, markj Sponsored by: Dell EMC Isilon	2020-01-14 02:14:02 +00:00
Konstantin Belousov	fedab1b499	Code must not unlock a mutex while owning the thread lock. Reviewed by: hselasky, markj Sponsored by: Mellanox Technologies MFC after: 1 week Differential revision: https://reviews.freebsd.org/D23150	2020-01-13 14:30:19 +00:00
Mateusz Guzik	0c236d3d52	vfs: per-cpu batched requeuing of free vnodes Constant requeuing adds significant lock contention in certain workloads. Lessen the problem by batching it. Per-cpu areas are locked in order to synchronize against UMA freeing memory. vnode's v_mflag is converted to short to prevent the struct from growing. Sample result from an incremental make -s -j 104 bzImage on tmpfs: stock: 122.38s user 1780.45s system 6242% cpu 30.480 total patched: 144.84s user 985.90s system 4856% cpu 23.282 total Reviewed by: jeff Tested by: pho (in a larger patch, previous version) Differential Revision: https://reviews.freebsd.org/D22998	2020-01-13 02:39:41 +00:00

1 2 3 4 5 ...

17135 Commits