freebsd-skq

Author	SHA1	Message	Date
Konstantin Belousov	16dea83410	null_vput_pair(): release use reference on dvp earlier We might own the last use reference, and then vrele() at the end would need to take the dvp vnode lock to inactivate, which causes deadlock with vp. We cannot vrele() dvp from start since this might unlock ldvp. Handle it by holding the vnode and dropping use ref after lowerfs VOP_VPUT_PAIR() ended. This effectivaly requires unlock of the vp vnode after VOP_VPUT_PAIR(), so the call is changed to set unlock_vp to true unconditionally. This opens more opportunities for vp to be reclaimed, if lvp is still alive we reinstantiate vp with null_nodeget(). Reported and tested by: pho Reviewed by: mckusick Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D29178	2021-03-12 13:31:08 +02:00
Konstantin Belousov	44691b33cc	vlrureclaim: only skip vnode with resident pages if it own the pages Nullfs vnode which shares vm_object and pages with the lower vnode should not be exempt from the reclaim just because lower vnode cached a lot. Their reclamation is actually very cheap and should be preferred over real fs vnodes, but this change is already useful. Reported and tested by: pho Reviewed by: mckusick Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D29178	2021-03-12 13:31:08 +02:00
Konstantin Belousov	0b3948e73b	softdep_unmount: assert that no dandling dependencies are left Reviewed by: mckusick Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D29178	2021-03-12 13:31:08 +02:00
Konstantin Belousov	7a8d4b4da6	FFS: assign fully initialized struct mount_softdeps to um_softdep Other threads observing the non-NULL um_softdep can assume that it is safe to use it. This is important for ro->rw remounts where change from read-only to read-write status cannot be made atomic. Reviewed by: mckusick Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D29178	2021-03-12 13:31:08 +02:00
Konstantin Belousov	2af934cc15	Assert that um_softdep is NULL on free(ump), i.e. softdep_unmount() was called Reviewed by: mckusick Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D29178	2021-03-12 13:31:08 +02:00
Konstantin Belousov	f776c54cee	ffs_mount: when remounting ro->rw and sbupdate failed, cleanup softdeps Reviewed by: mckusick Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D29178	2021-03-12 13:31:08 +02:00
Konstantin Belousov	d7e5e37416	softdep_unmount: handle spurious wakeups Reviewed by: mckusick Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D29178	2021-03-12 13:31:08 +02:00
Konstantin Belousov	fabbc3d879	softdep_flush(): do not access ump after we acked FLUSH_EXIT and unlocked SU lock otherwise we might follow a pointer in the freed memory. Reviewed by: mckusick Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D29178	2021-03-12 13:31:08 +02:00
Konstantin Belousov	7c7a6681fa	ffs: clear MNT_SOFTDEP earlier when remounting rw to ro Suppose that we remount rw->ro and in parallel some reader tries to instantiate a vnode, e.g. during lookup. Suppose that softdep_unmount() already started, but we did not cleared the MNT_SOFTDEP flag yet. Then ffs_vgetf() calls into softdep_load_inodeblock() which accessed destroyed hashes and freed memory. Set/clear fs_ronly simultaneously (WRT to files flush) with MNT_SOFTDEP. It might be reasonable to move the change of fs_ronly to under MNT_ILOCK, but no readers take it. Reported and tested by: pho Reviewed by: mckusick Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D29178	2021-03-12 13:31:07 +02:00
Konstantin Belousov	7f682bdcab	Rework MOUNTED/DOING SOFTDEP/SUJ macros Now MNT_SOFTDEP indicates that SU are active in any variant +-J, and SU+J is indicated by MNT_SOFTDEP \| MNT_SUJ combination. The reason is that unmount will be able to easily hide SU from other operations by clearing MNT_SOFTDEP while keeping the record of the active journal. Reviewed by: mckusick Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D29178	2021-03-12 13:31:07 +02:00
Konstantin Belousov	81cdb19e04	ffs softdep: clear ump->um_softdep on softdep_unmount() Reviewed by: mckusick Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D29178	2021-03-12 13:31:07 +02:00
Konstantin Belousov	a285d3edac	ffs_extern.h: Add comments for ffs_vgetf() flags Requested and reviewed by: mckusick Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D29178	2021-03-12 13:30:59 +02:00
Konstantin Belousov	fd97fa6463	Add FFSV_FORCEINODEDEP flag for ffs_vgetf() It will be used to allow SU flush code to sync the volume while external consumers see that SU is already disabled on the filesystem. Use it where ffs_vgetf() called by SU code to process dependencies. Reviewed by: mckusick Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D29178	2021-03-12 13:30:38 +02:00
Konstantin Belousov	25aac48d2c	simplify journal_mount: move the out label after success block This removes the need to check for error == 0. Reviewed by: mckusick Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D29178	2021-03-12 13:30:37 +02:00
Wei Hu	a491581f3f	Hyper-V: hn: Enable vSwitch RSC support in hn netvsc driver Receive Segment Coalescing (RSC) in the vSwitch is a feature available in Windows Server 2019 hosts and later. It reduces the per packet processing overhead by coalescing multiple TCP segments when possible. This happens mostly when TCP traffics are among different guests on same host. This patch adds netvsc driver support for this feature. The patch also updates NVS version to 6.1 as needed for RSC enablement. MFC after: 2 weeks Sponsored by: Microsoft Differential Revision: https://reviews.freebsd.org/D29075	2021-03-12 04:35:16 +00:00
Warner Losh	ba5de7e930	SPDX: Spell 4 clause BSD license correctly	2021-03-11 14:17:54 -07:00
Mark Johnston	2f1cfb7f63	gmirror: Pre-allocate the timeout event structure We can't call malloc(M_WAITOK) in a callout handler. Reviewed by: imp Reported by: pho Tested by: pho MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D29223	2021-03-11 15:45:15 -05:00
Warner Losh	8423f5d4c1	nvme: use config_intrhook_drain to avoid removable card races nvme drives are configured early in boot. However, a number of the configuration steps takes which take a while, so we defer those to a config intrhook that runs before the root filesystem is mounted. At the same time, the PCI hot plug wakes up and tests the status of the card. It may decide that the card has gone away and deletes the child. As part of that process nvme_detach is called. If this call happens after the config_intrhook starts to run, but before it is finished, there's a race where we can tear down the device's soft state while the config_intrhook is still using it. Use the new config_intrhook_drain to disestablish the hook. Either it will be removed w/o running, or the routine will wait for it to finish. This closes the race and allows safe hotplug at any time, even very early in boot. Sponsored by: Netflix, Inc Reviewed by: jhb, mav Differential Revision: https://reviews.freebsd.org/D29006	2021-03-11 09:45:10 -07:00
Warner Losh	e52368365d	config_intrhook: provide config_intrhook_drain config_intrhook_drain will remove the hook from the list as config_intrhook_disestablish does if the hook hasn't been called. If it has, config_intrhook_drain will wait for the hook to be disestablished in the normal course (or expedited, it's up to the driver to decide how and when to call config_intrhook_disestablish). This is intended for removable devices that use config_intrhook and might be attached early in boot, but that may be removed before the kernel can call the config_intrhook or before it ends. To prevent all races, the detach routine will need to call config_intrhook_train. Sponsored by: Netflix, Inc Reviewed by: jhb, mav, gde (in D29006 for man page) Differential Revision: https://reviews.freebsd.org/D29005	2021-03-11 09:45:10 -07:00
Edward Tomasz Napierala	dc0119c281	linsysfs: create /sys/bus/ and /sys/subsystem/ This looks like a no-op, but it prevents udevadm(8) with failing loudly, which in turn unbreaks installation of libfprint-2-2, which in Focal is a dependency for make-4.2.1-1.2. One might wonder why installing a build utility involves messing with device handling... Sponsored By: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D29133	2021-03-11 15:50:51 +00:00
Mark Johnston	968079f253	vm_reserv: Fix list locking in vm_reserv_reclaim_contig() The per-domain partpop queue is locked by the combination of the per-domain lock and individual reservation mutexes. vm_reserv_reclaim_contig() scans the queue looking for partially populated reservations that can be reclaimed in order to satisfy the caller's allocation. During the scan, we drop the per-domain lock. At this point, the rvn pointer may be invalidated. Take care to load rvn after re-acquiring the per-domain lock. While here, simplify the condition used to check whether a reservation was dequeued while the per-domain lock was dropped. Reviewed by: alc, kib Reported by: gallatin MFC after: 3 days Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D29203	2021-03-11 10:35:35 -05:00
Warner Losh	1645a4ae64	usb: tiny formatting nit Format 300 baud like all the others here. No functional change.	2021-03-11 08:24:13 -07:00
Kristof Provost	913e7dc3e0	pf: Remove redundant kif != NULL checks pf_kkif_free() already checks for NULL, so we don't have to check before we call it. Reviewed by: melifaro@ MFC after: 1 week Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D29195	2021-03-11 10:39:43 +01:00
Kristof Provost	5e9dae8e14	pf: Factor out pf_krule_free() Reviewed by: melifaro@ MFC after: 1 week Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D29194	2021-03-11 10:39:43 +01:00
Oskar Holmund	17b14d8f77	usr.sbin/pwm/pwm add support for flags The pwm utility cant set the only flag defined (PWM_POLARITY_INVERTED) so this patch add the option -I (capital letter i) to send it to the drivers. None of existing PWM driver have implemented support for flags. But soon:ish I will put up an review of a pwm driver using TI OMAP DMTimer. Differential Revision: https://reviews.freebsd.org/D29137 MFC after: 2 weeks	2021-03-11 09:57:56 +01:00
Oskar Holmund	7d4a5de84d	share/man/man9/pwmbus.9 fix types in arguments Fix the types of period and duty in share/man/man9/pwmbus.9 to match the one in sys/dev/pmw/pwmbus.c. Reviewed By: rpokala Differential Revision: https://reviews.freebsd.org/D29139 MFC after: 3 days	2021-03-11 09:57:04 +01:00
Greg V	15565e0a21	kern.mk: fix -Wno-error style to fix build with Clang 12 Clang 12 no longer supports -Wno-error-..., only the -Wno-error=... style (which is already used everywhere else in the tree). Differential Revision: https://reviews.freebsd.org/D29157	2021-03-10 17:34:35 -05:00
Alexander V. Chernikov	b1d63265ac	Flush remaining routes from the routing table during VNET shutdown. Summary: This fixes rtentry leak for the cloned interfaces created inside the VNET. PR: 253998 Reported by: rashey at superbox.pl MFC after: 3 days Loopback teardown order is `SI_SUB_INIT_IF`, which happens after `SI_SUB_PROTO_DOMAIN` (route table teardown). Thus, any route table operations are too late to schedule. As the intent of the vnet teardown procedures to minimise the amount of effort by doing global cleanups instead of per-interface ones, address this by adding a relatively light-weight routing table cleanup function, `rib_flush_routes()`. It removes all remaining routes from the routing table and schedules the deletion, which will happen later, when `rtables_destroy()` waits for the current epoch to finish. Test Plan: ``` set_skip:set_skip_group_lo -> passed [0.053s] tail -n 200 /var/log/messages \| grep rtentry ``` Reviewers: #network, kp, bz Reviewed By: kp Subscribers: imp, ae Differential Revision: https://reviews.freebsd.org/D29116	2021-03-10 21:10:14 +00:00
John Baldwin	3fa034210c	ktls: Fix non-inplace TLS 1.3 encryption. Copy the iovec for the trailer from the proper place. This is the same fix for CBC encryption from `ff6a7e4ba6`. Reported by: gallatin Reviewed by: gallatin, markj Fixes: `49f6925ca` Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D29177	2021-03-10 11:07:40 -08:00
Alexander Motin	2cee045b4d	Move time math out of disabled interrupts sections. We don't need the result before next sleep time, so no reason to additionally increase interrupt latency. While there, remove extra PM ticks to microseconds conversion, making C2/C3 sleep times look 4 times smaller than really. The conversion is already done by AcpiGetTimerDuration(). Now I see reported sleep times up to 0.5s, just as expected for planned 2 wakeups per second. MFC after: 1 month	2021-03-10 13:52:51 -05:00
Olivier Houchard	c328f64d81	arm64: Fix COMPAT_FREEBSD32. The ENTRY() macro was modified by commit `28d945204e` to add an optional NOP instruction at the beginning of the function. It is of course an arm64 instruction, so unsuitable for the 32bits sigcode. So just use EENTRY() instead for aarch32_sigcode. This should fix receiving signals when running 32bits binaries on FreeBSD/arm64. MFC After: 1 week	2021-03-10 19:06:42 +01:00
Mitchell Horne	7e7f7beee7	ns8250: don't drop IER_TXRDY on bus_grab/ungrab It has been observed that some systems are often unable to resume from ddb after entering with debug.kdb.enter=1. Checking the status further shows the terminal is blocked waiting in tty_drain(), but it never makes progress in clearing the output queue, because sc->sc_txbusy is high. I noticed that when entering polling mode for the debugger, IER_TXRDY is set in the failure case. Since this bit is never tracked by the softc, it will not be restored by ns8250_bus_ungrab(). This creates a race in which a TX interrupt can be lost, creating the hang described above. Ensuring that this bit is restored is enough to prevent this, and resume from ddb as expected. The solution is to track this bit in the sc->ier field, for the same lifetime that TX interrupts are enabled. PR: 223917, 240122 Reviewed by: imp, manu Tested by: bz MFC after: 5 days Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D29130	2021-03-10 11:04:42 -04:00
Alex Richardson	953a7d7c61	Arch64: Clear VFP state on execve() I noticed that many of the math-related tests were failing on AArch64. After a lot of debugging, I noticed that the floating point exception flags were not being reset when starting a new process. This change resets the VFP inside exec_setregs() to ensure no VFP register state is leaked from parent processes to children. This commit also moves the clearing of fpcr that was added in `65618fdda0` from fork() to execve() since that makes more sense: fork() can retain current register values, but execve() should result in a well-defined clean state. Reviewed By: andrew MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D29060	2021-03-10 12:44:42 +00:00
Hans Petter Selasky	dfb33cb0ef	Allocating the LinuxKPI current structure from a software interrupt thread must be done using the M_NOWAIT flag after `1ae20f7c70` . MFC after: 1 week Sponsored by: Mellanox Technologies // NVIDIA Networking	2021-03-10 13:27:40 +01:00
Hans Petter Selasky	6eb60f5b7f	Use the word "LinuxKPI" instead of "Linux compatibility", to not confuse with user-space Linux compatibility support. No functional change. MFC after: 1 week Sponsored by: Mellanox Technologies // NVIDIA Networking	2021-03-10 12:35:16 +01:00
Hans Petter Selasky	d1cbe79089	Allocating the LinuxKPI current structure from an interrupt thread must be done using the M_NOWAIT flag after `1ae20f7c70` . MFC after: 1 week Sponsored by: Mellanox Technologies // NVIDIA Networking	2021-03-10 10:51:04 +01:00
Hans Petter Selasky	ebe5cf355d	Implement basic support for allocating memory from a specific numa node in the LinuxKPI. Differential Revision: https://reviews.freebsd.org/D29077 Reviewed by: markj@ and kib@ MFC after: 1 week Sponsored by: Mellanox Technologies // NVIDIA Networking	2021-03-09 21:01:47 +01:00
Kyle Evans	94dddbfd00	if_wg: export tx_bytes, rx_bytes, and last_handshake The names are self-explanatory; these are currently only used by the wg(8) tool, but they are handy data points to have. Reviewed by: grehan MFC after: 3 days Discussed with: decke Differential Revision: https://reviews.freebsd.org/D29143	2021-03-09 13:50:41 -06:00
Kyle Evans	0dd691b412	iflib: allow clone detach if not yet init If we hit an error during init, then we'll unwind our state and attempt to detach the device -- don't block it. This was discovered by creating a wg0 with missing parameters; said failure ended up leaving this orphaned device in place and ended up panicking the system upon enumeration of the dev.* sysctl space. Reviewed by: gallatin, markj MFC after: 3 days Differential Revision: https://reviews.freebsd.org/D29145	2021-03-09 13:49:13 -06:00
Kyle Evans	299f8977ce	if_wg: wg_input: remove a couple locals (NFC) We have no use for the udphdr or this hlen local, just spell out the addition inline. MFC after: 3 days Reviewed by: grehan, markj Differential Revision: https://reviews.freebsd.org/D29142	2021-03-09 13:49:13 -06:00
Jason A. Harmening	e4b8deb222	amd64 pmap: convert to counter(9), add PV and pagetable page counts This change converts most of the counters in the amd64 pmap from global atomics to scalable counter(9) counters. Per discussion with kib@, it also removes the handrolled per-CPU PCID save count as it isn't considered generally useful. The bulk of these counters remain guarded by PV_STATS, as it seems unlikely that they will be useful outside of very specific debugging scenarios. However, this change does add two new counters that are available without PV_STATS. pt_page_count and pv_page_count track the number of active physical-to-virtual list pages and page table pages, respectively. These will be useful in evaluating the memory footprint of pmap structures under various workloads, which will help to guide future changes in this area. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D28923	2021-03-09 09:27:10 -08:00
Leandro Lupori	043577b721	ofwfb: fix boot on LE Some framebuffer properties obtained from the device tree were not being properly converted to host endian. Replace OF_getprop calls by OF_getencprop where needed to fix this. This fixes boot on PowerPC64 LE, when using ofwfb as the system console. Reviewed by: bdragon Sponsored by: Eldorado Research Institute (eldorado.org.br) MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D27475	2021-03-09 13:29:24 -03:00
Kyle Evans	b3dac3913d	ifconfig: allow displaying/setting persistent-keepalive The kernel-side already accepted a persistent-keepalive-interval, so just add a verb to ifconfig(8) for it and start exporting it so that ifconfig(8) can view it. PR: 253790 MFC after: 3 days Discussed with: decke	2021-03-09 05:16:42 -06:00
Kyle Evans	1ae20f7c70	kern: malloc: fix panic on M_WAITOK during THREAD_NO_SLEEPING() Simple condition flip; we wanted to panic here after epoch_trace_list(). Reviewed by: glebius, markj MFC after: 3 days Differential Revision: https://reviews.freebsd.org/D29125	2021-03-09 05:16:39 -06:00
Kyle Evans	e80e371d79	if_wg: avoid sleeping under the net epoch No sleeping allowed here, so avoid it. Collect the subset of data we want inside of the epoch, as we'll need extra allocations when we add items to the nvlist. Reviewed by: grehan (earlier version), markj MFC after: 3 days Differential Revision: https://reviews.freebsd.org/D29124	2021-03-09 05:16:30 -06:00
Kyle Evans	bae59285f9	if_wg: return to m_defrag() of incoming mbuf, sans leak This partially reverts `df55485085` but still fixes the leak. It was overlooked (sigh) that some packets will exceed MHLEN and cannot be physically contiguous without clustering, but we don't actually need it to be. m_defrag() should pull up enough for any of the headers that we do need to be accessible. Fixes: `df55485085` Pointy hat; kevans	2021-03-09 04:52:22 -06:00
Alexander Motin	075e4807df	Do not read timer extra time when MWAIT is used. When we enter C2+ state via memory read, it may take chipset some time to stop CPU. Extra register read covers that time. But MWAIT makes CPU stop immediately, so we don't need to waste time after wakeup with interrupts still disabled, increasing latency. On my system it reduces ping localhost latency, waking up all CPUs once a second, from 277us to 242us. MFC after: 1 month	2021-03-08 18:43:47 -05:00
Alexander Motin	455219675d	Change mwait_bm_avoidance use to match Linux. Even though the information is very limited, it seems the intent of this flag is to control ACPI_BITREG_BUS_MASTER_STATUS use for C3, not force ACPI_BITREG_ARB_DISABLE manipulations for C2, where it was never needed, and which register not really doing anything for years. It wasted lots of CPU time on congested global ACPI hardware lock when many CPU cores were trying to enter/exit deep C-states same time. On idle 80-core system it pushed ping localhost latency up to 20ms, since badport_bandlim() via counter_ratecheck() wakes up all CPUs same time once a second just to synchronously reset the counters. Now enabling C-states increases the latency from 0.1 to just 0.25ms. Discussed with: kib MFC after: 1 month	2021-03-08 18:27:36 -05:00
Warner Losh	6ffdaa5f2d	Move back the isa non-PNP driver deadline to FreeBSD 14.	2021-03-08 16:00:23 -07:00
Warner Losh	88a5591203	config_intrhook: Move from TAILQ to STAILQ and padding config_intrhook doesn't need to be a two-pointer TAILQ. We rarely add/delete from this and so those need not be optimized. Instaed, use the one-pointer STAILQ plus a uintptr_t to be used as a flags word. This will allow these changes to be MFC'd to 12 and 13 to fix a race in removable devices. Feedback from: jhb Reviewed by: mav Differential Revision: https://reviews.freebsd.org/D29004	2021-03-08 15:59:00 -07:00

1 2 3 4 5 ...

136502 Commits