257488 Commits

Author SHA1 Message Date
Warner Losh
8423f5d4c1 nvme: use config_intrhook_drain to avoid removable card races
nvme drives are configured early in boot. However, a number of the configuration
steps takes which take a while, so we defer those to a config intrhook that runs
before the root filesystem is mounted. At the same time, the PCI hot plug wakes
up and tests the status of the card. It may decide that the card has gone away
and deletes the child. As part of that process nvme_detach is called. If this
call happens after the config_intrhook starts to run, but before it is finished,
there's a race where we can tear down the device's soft state while the
config_intrhook is still using it. Use the new config_intrhook_drain to
disestablish the hook. Either it will be removed w/o running, or the routine
will wait for it to finish. This closes the race and allows safe hotplug at any
time, even very early in boot.

Sponsored by:		Netflix, Inc
Reviewed by:		jhb, mav
Differential Revision:	https://reviews.freebsd.org/D29006
2021-03-11 09:45:10 -07:00
Warner Losh
e52368365d config_intrhook: provide config_intrhook_drain
config_intrhook_drain will remove the hook from the list as
config_intrhook_disestablish does if the hook hasn't been called.  If it has,
config_intrhook_drain will wait for the hook to be disestablished in the normal
course (or expedited, it's up to the driver to decide how and when
to call config_intrhook_disestablish).

This is intended for removable devices that use config_intrhook and might be
attached early in boot, but that may be removed before the kernel can call the
config_intrhook or before it ends. To prevent all races, the detach routine will
need to call config_intrhook_train.

Sponsored by:		Netflix, Inc
Reviewed by:		jhb, mav, gde (in D29006 for man page)
Differential Revision:	https://reviews.freebsd.org/D29005
2021-03-11 09:45:10 -07:00
Edward Tomasz Napierala
dc0119c281 linsysfs: create /sys/bus/ and /sys/subsystem/
This looks like a no-op, but it prevents udevadm(8) with failing
loudly, which in turn unbreaks installation of libfprint-2-2, which
in Focal is a dependency for make-4.2.1-1.2.

One might wonder why installing a build utility involves messing
with device handling...

Sponsored By:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D29133
2021-03-11 15:50:51 +00:00
Mark Johnston
968079f253 vm_reserv: Fix list locking in vm_reserv_reclaim_contig()
The per-domain partpop queue is locked by the combination of the
per-domain lock and individual reservation mutexes.
vm_reserv_reclaim_contig() scans the queue looking for partially
populated reservations that can be reclaimed in order to satisfy the
caller's allocation.

During the scan, we drop the per-domain lock.  At this point, the rvn
pointer may be invalidated.  Take care to load rvn after re-acquiring
the per-domain lock.

While here, simplify the condition used to check whether a reservation
was dequeued while the per-domain lock was dropped.

Reviewed by:	alc, kib
Reported by:	gallatin
MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D29203
2021-03-11 10:35:35 -05:00
Warner Losh
1645a4ae64 usb: tiny formatting nit
Format 300 baud like all the others here. No functional change.
2021-03-11 08:24:13 -07:00
Kristof Provost
913e7dc3e0 pf: Remove redundant kif != NULL checks
pf_kkif_free() already checks for NULL, so we don't have to check before
we call it.

Reviewed by:	melifaro@
MFC after:	1 week
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D29195
2021-03-11 10:39:43 +01:00
Kristof Provost
5e9dae8e14 pf: Factor out pf_krule_free()
Reviewed by:	melifaro@
MFC after:	1 week
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D29194
2021-03-11 10:39:43 +01:00
Oskar Holmund
17b14d8f77 usr.sbin/pwm/pwm add support for flags
The pwm utility cant set the only flag defined (PWM_POLARITY_INVERTED) so this
patch add the option -I (capital letter i) to send it to the drivers.

None of existing PWM driver have implemented support for flags.
But soon:ish I will put up an review of a pwm driver using TI OMAP DMTimer.

Differential Revision: https://reviews.freebsd.org/D29137
MFC after:   2 weeks
2021-03-11 09:57:56 +01:00
Oskar Holmund
7d4a5de84d share/man/man9/pwmbus.9 fix types in arguments
Fix the types of period and duty in share/man/man9/pwmbus.9 to match the one in sys/dev/pmw/pwmbus.c.

Reviewed By: rpokala
Differential Revision: https://reviews.freebsd.org/D29139
MFC after:   3 days
2021-03-11 09:57:04 +01:00
Greg V
15565e0a21 kern.mk: fix -Wno-error style to fix build with Clang 12
Clang 12 no longer supports -Wno-error-..., only the -Wno-error=...
style (which is already used everywhere else in the tree).

Differential Revision:	https://reviews.freebsd.org/D29157
2021-03-10 17:34:35 -05:00
Alexander V. Chernikov
b1d63265ac Flush remaining routes from the routing table during VNET shutdown.
Summary:
This fixes rtentry leak for the cloned interfaces created inside the
 VNET.

PR:	253998
Reported by:	rashey at superbox.pl
MFC after:	3 days

Loopback teardown order is `SI_SUB_INIT_IF`, which happens after `SI_SUB_PROTO_DOMAIN` (route table teardown).
Thus, any route table operations are too late to schedule.
As the intent of the vnet teardown procedures to minimise the amount of effort by doing global cleanups instead of per-interface ones, address this by adding a relatively light-weight routing table cleanup function, `rib_flush_routes()`.
It removes all remaining routes from the routing table and schedules the deletion, which will happen later, when `rtables_destroy()` waits for the current epoch to finish.

Test Plan:
```
set_skip:set_skip_group_lo  ->  passed  [0.053s]
tail -n 200 /var/log/messages | grep rtentry
```

Reviewers: #network, kp, bz

Reviewed By: kp

Subscribers: imp, ae

Differential Revision: https://reviews.freebsd.org/D29116
2021-03-10 21:10:14 +00:00
John Baldwin
3fa034210c ktls: Fix non-inplace TLS 1.3 encryption.
Copy the iovec for the trailer from the proper place.  This is the same
fix for CBC encryption from ff6a7e4ba6bf.

Reported by:	gallatin
Reviewed by:	gallatin, markj
Fixes:		49f6925ca
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D29177
2021-03-10 11:07:40 -08:00
Alexander Motin
2cee045b4d Move time math out of disabled interrupts sections.
We don't need the result before next sleep time, so no reason to
additionally increase interrupt latency.

While there, remove extra PM ticks to microseconds conversion, making
C2/C3 sleep times look 4 times smaller than really.  The conversion
is already done by AcpiGetTimerDuration().  Now I see reported sleep
times up to 0.5s, just as expected for planned 2 wakeups per second.

MFC after:	1 month
2021-03-10 13:52:51 -05:00
Olivier Houchard
c328f64d81 arm64: Fix COMPAT_FREEBSD32.
The ENTRY() macro was modified by commit
28d945204ea1014d7de6906af8470ed8b3311335 to add an optional NOP instruction
at the beginning of the function. It is of course an arm64 instruction, so
unsuitable for the 32bits sigcode. So just use EENTRY() instead for
aarch32_sigcode. This should fix receiving signals when running 32bits
binaries on FreeBSD/arm64.

MFC After: 1 week
2021-03-10 19:06:42 +01:00
Dag-Erling Smørgrav
409388cfac Fix post-start check when unbound.conf has moved.
Reported by:	phk@
MFC after:	1 week
2021-03-10 15:53:25 +00:00
Dag-Erling Smørgrav
e5f02c140b Fix local-unbound setup for some IPv6 deployments.
PR:		250984
MFC after:	1 week
2021-03-10 15:53:22 +00:00
Mitchell Horne
7e7f7beee7 ns8250: don't drop IER_TXRDY on bus_grab/ungrab
It has been observed that some systems are often unable to resume from
ddb after entering with debug.kdb.enter=1. Checking the status further
shows the terminal is blocked waiting in tty_drain(), but it never makes
progress in clearing the output queue, because sc->sc_txbusy is high.

I noticed that when entering polling mode for the debugger, IER_TXRDY is
set in the failure case. Since this bit is never tracked by the softc,
it will not be restored by ns8250_bus_ungrab(). This creates a race in
which a TX interrupt can be lost, creating the hang described above.
Ensuring that this bit is restored is enough to prevent this, and resume
from ddb as expected.

The solution is to track this bit in the sc->ier field, for the same
lifetime that TX interrupts are enabled.

PR:		223917, 240122
Reviewed by:	imp, manu
Tested by:	bz
MFC after:	5 days
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D29130
2021-03-10 11:04:42 -04:00
Alex Richardson
953a7d7c61 Arch64: Clear VFP state on execve()
I noticed that many of the math-related tests were failing on AArch64.
After a lot of debugging, I noticed that the floating point exception flags
were not being reset when starting a new process. This change resets the
VFP inside exec_setregs() to ensure no VFP register state is leaked from
parent processes to children.

This commit also moves the clearing of fpcr that was added in 65618fdda0f27
from fork() to execve() since that makes more sense: fork() can retain
current register values, but execve() should result in a well-defined
clean state.

Reviewed By:	andrew
MFC after:	1 week
Differential Revision: https://reviews.freebsd.org/D29060
2021-03-10 12:44:42 +00:00
Hans Petter Selasky
dfb33cb0ef Allocating the LinuxKPI current structure from a software interrupt thread
must be done using the M_NOWAIT flag after 1ae20f7c70ea .

MFC after:	1 week
Sponsored by:	Mellanox Technologies // NVIDIA Networking
2021-03-10 13:27:40 +01:00
Hans Petter Selasky
6eb60f5b7f Use the word "LinuxKPI" instead of "Linux compatibility", to not confuse with
user-space Linux compatibility support. No functional change.

MFC after:	1 week
Sponsored by:	Mellanox Technologies // NVIDIA Networking
2021-03-10 12:35:16 +01:00
Hans Petter Selasky
d1cbe79089 Allocating the LinuxKPI current structure from an interrupt thread must be
done using the M_NOWAIT flag after 1ae20f7c70ea .

MFC after:	1 week
Sponsored by:	Mellanox Technologies // NVIDIA Networking
2021-03-10 10:51:04 +01:00
Kyle Evans
ce53f92e6c wg(4): note the persistent-keepalive ifconfig(8) option
MFC after:	3 days
Fixes:	b3dac3913dc9
2021-03-09 14:21:35 -06:00
Hans Petter Selasky
ebe5cf355d Implement basic support for allocating memory from a specific numa node
in the LinuxKPI.

Differential Revision:	https://reviews.freebsd.org/D29077
Reviewed by:	markj@ and kib@
MFC after:	1 week
Sponsored by:	Mellanox Technologies // NVIDIA Networking
2021-03-09 21:01:47 +01:00
Kyle Evans
94dddbfd00 if_wg: export tx_bytes, rx_bytes, and last_handshake
The names are self-explanatory; these are currently only used by the
wg(8) tool, but they are handy data points to have.

Reviewed by:	grehan
MFC after:	3 days
Discussed with:	decke
Differential Revision:	https://reviews.freebsd.org/D29143
2021-03-09 13:50:41 -06:00
Kyle Evans
0dd691b412 iflib: allow clone detach if not yet init
If we hit an error during init, then we'll unwind our state and attempt
to detach the device -- don't block it.

This was discovered by creating a wg0 with missing parameters; said
failure ended up leaving this orphaned device in place and ended up
panicking the system upon enumeration of the dev.* sysctl space.

Reviewed by:	gallatin, markj
MFC after:	3 days
Differential Revision:	https://reviews.freebsd.org/D29145
2021-03-09 13:49:13 -06:00
Kyle Evans
299f8977ce if_wg: wg_input: remove a couple locals (NFC)
We have no use for the udphdr or this hlen local, just spell out the
addition inline.

MFC after:	3 days
Reviewed by:	grehan, markj
Differential Revision:	https://reviews.freebsd.org/D29142
2021-03-09 13:49:13 -06:00
Jason A. Harmening
e4b8deb222 amd64 pmap: convert to counter(9), add PV and pagetable page counts
This change converts most of the counters in the amd64 pmap from
global atomics to scalable counter(9) counters.  Per discussion
with kib@, it also removes the handrolled per-CPU PCID save count
as it isn't considered generally useful.

The bulk of these counters remain guarded by PV_STATS, as it seems
unlikely that they will be useful outside of very specific debugging
scenarios.  However, this change does add two new counters that
are available without PV_STATS.  pt_page_count and pv_page_count
track the number of active physical-to-virtual list pages and page
table pages, respectively.  These will be useful in evaluating
the memory footprint of pmap structures under various workloads,
which will help to guide future changes in this area.

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D28923
2021-03-09 09:27:10 -08:00
Leandro Lupori
043577b721 ofwfb: fix boot on LE
Some framebuffer properties obtained from the device tree were not being
properly converted to host endian.
Replace OF_getprop calls by OF_getencprop where needed to fix this.

This fixes boot on PowerPC64 LE, when using ofwfb as the system console.

Reviewed by:    bdragon
Sponsored by:   Eldorado Research Institute (eldorado.org.br)
MFC after:      1 week
Differential Revision:  https://reviews.freebsd.org/D27475
2021-03-09 13:29:24 -03:00
Baptiste Daroussin
f61831d2e8 Revert "rc: implement parallel boot"
This is not ready yet for prime time

This reverts commit 763db58932874bb47fc6f9322ab81cc947f80991.
This reverts commit f1ab799927c8e93e8f58e5039f287a2ca45675ec.
This reverts commit 6e822e99570fdf4c564be04840a054bccc070222.
This reverts commit 77e1ccbee3ed6c837929e4e232fd07f95bfc8294.
2021-03-09 14:26:07 +01:00
Kyle Evans
b3dac3913d ifconfig: allow displaying/setting persistent-keepalive
The kernel-side already accepted a persistent-keepalive-interval, so
just add a verb to ifconfig(8) for it and start exporting it so that
ifconfig(8) can view it.

PR:		253790
MFC after:	3 days
Discussed with:	decke
2021-03-09 05:16:42 -06:00
Kyle Evans
172a8241c9 ifconfig: wg: stop requiring peer endpoints
The way that wireguard is designed does not actually require all peers
to have endpoints. In an architecture that might mimic a traditional
VPN server <-> client, the wg interface on a server would have a number
of peers without set endpoints -- the expectation is that the "clients"
will connect to the "server" peer, which will authenticate the
connection as a known peer and learn the endpoint from there.

MFC after:	3 days
Discussed with:	decke, grehan (independently)
2021-03-09 05:16:42 -06:00
Kyle Evans
1ae20f7c70 kern: malloc: fix panic on M_WAITOK during THREAD_NO_SLEEPING()
Simple condition flip; we wanted to panic here after epoch_trace_list().

Reviewed by:	glebius, markj
MFC after:	3 days
Differential Revision:	https://reviews.freebsd.org/D29125
2021-03-09 05:16:39 -06:00
Kyle Evans
e80e371d79 if_wg: avoid sleeping under the net epoch
No sleeping allowed here, so avoid it.  Collect the subset of data we
want inside of the epoch, as we'll need extra allocations when we add
items to the nvlist.

Reviewed by:	grehan (earlier version), markj
MFC after:	3 days
Differential Revision:	https://reviews.freebsd.org/D29124
2021-03-09 05:16:30 -06:00
Kyle Evans
bae59285f9 if_wg: return to m_defrag() of incoming mbuf, sans leak
This partially reverts df55485085 but still fixes the leak. It was
overlooked (sigh) that some packets will exceed MHLEN and cannot be
physically contiguous without clustering, but we don't actually need
it to be. m_defrag() should pull up enough for any of the headers that
we do need to be accessible.

Fixes:	df55485085
Pointy hat;	kevans
2021-03-09 04:52:22 -06:00
Rick Macklem
09673fc0f3 mountd(8): generate a syslog message when the "V4:" line is missing
Daniel reported that NFSv4 mounts were not working despite having
set "nfsv4_server_enable=YES" in /etc/rc.conf.  Mountd was logging a
message that there was no /etc/exports file.
He noted that creating a /etc/exports file with a "V4:" line in it
was needed make NFSv4 mounts work.
At least one "V4:" line in one of the exports(5) file(s) is needed to
make NFSv4 mounts work. This patch fixes mountd.c so that it logs a
message indicting that there is no "V4:" line in any exports(5)
file when NFSv4 mounts are enabled.
To avoid this message being generated erroneously, /etc/rc.d/mountd
is updated to make sure vfs.nfsd.server_max_nfsvers is properly set
before mountd(8) is started.

Reported by:	debdrup
PR:	253901
MFC after:	2 weeks
2021-03-08 16:08:02 -08:00
Alexander Motin
075e4807df Do not read timer extra time when MWAIT is used.
When we enter C2+ state via memory read, it may take chipset some
time to stop CPU.  Extra register read covers that time.  But MWAIT
makes CPU stop immediately, so we don't need to waste time after
wakeup with interrupts still disabled, increasing latency.

On my system it reduces ping localhost latency, waking up all CPUs
once a second, from 277us to 242us.

MFC after:	1 month
2021-03-08 18:43:47 -05:00
Alexander Motin
455219675d Change mwait_bm_avoidance use to match Linux.
Even though the information is very limited, it seems the intent of
this flag is to control ACPI_BITREG_BUS_MASTER_STATUS use for C3,
not force ACPI_BITREG_ARB_DISABLE manipulations for C2, where it was
never needed, and which register not really doing anything for years.
It wasted lots of CPU time on congested global ACPI hardware lock
when many CPU cores were trying to enter/exit deep C-states same time.

On idle 80-core system it pushed ping localhost latency up to 20ms,
since badport_bandlim() via counter_ratecheck() wakes up all CPUs
same time once a second just to synchronously reset the counters.
Now enabling C-states increases the latency from 0.1 to just 0.25ms.

Discussed with:	kib
MFC after:	1 month
2021-03-08 18:27:36 -05:00
Warner Losh
6ffdaa5f2d Move back the isa non-PNP driver deadline to FreeBSD 14. 2021-03-08 16:00:23 -07:00
Warner Losh
88a5591203 config_intrhook: Move from TAILQ to STAILQ and padding
config_intrhook doesn't need to be a two-pointer TAILQ. We rarely add/delete
from this and so those need not be optimized. Instaed, use the one-pointer
STAILQ plus a uintptr_t to be used as a flags word. This will allow these
changes to be MFC'd to 12 and 13 to fix a race in removable devices.

Feedback from: jhb
Reviewed by: mav
Differential Revision:	https://reviews.freebsd.org/D29004
2021-03-08 15:59:00 -07:00
Alexander V. Chernikov
7634919e15 Fix 'in6_purgeaddr: err=65, destination address delete failed' message.
P2P ifa may require 2 routes: one is the loopback route, another is
 the "prefix" route towards its destination.

Current code marks loopback routes existence with IFA_RTSELF and
 "prefix" p2p routes with IFA_ROUTE.

For historic reasons, we fill in ifa_dstaddr for loopback interfaces.
To avoid installing the same route twice, we preemptively set
 IFA_RTSELF when adding "prefix" route for loopback.
However, the teardown part doesn't have this hack, so we try to
 remove the same route twice.

Fix this by checking if ifa_dstaddr is different from the ifa_addr
 and moving this logic into a separate function.

Reviewed By: kp
Differential Revision: https://reviews.freebsd.org/D29121
MFC after:	3 days
2021-03-08 21:28:35 +00:00
Jessica Clarke
f2f8405cf6 if_vtbe: Add missing includes to fix build
PR:		254137
Reported by:	Mina Galić <me@igalic.co>
MFC after:	3 days
Fixes:		f8bc74e2f4a5 ("tap: add support for virtio-net offloads")
2021-03-08 20:48:48 +00:00
Kyle Evans
8cc15b0dfc x86: tsc: deprioritize TSC on VirtualBox
Misbehavior has been observed with TSC under VirtualBox, where threads
doing small sleeps (~1 second) may miss their wake up and hang around
in a sleep state indefinitely.  Switching back to ACPI-fast decidedly
fixes it, so stop using TSC on VirtualBox at least for the time being.

This partially reverts 84eaf2ccc6aa, applying it only to VirtualBox and
increasing the quality to 0. Negative qualities can never be chosen and
cannot be chosen with the tunable recently added. If we do not have a
timecounter with a higher quality than 0, then TSC does at least leave
the system mostly usable.

PR:		253087
Reviewed by:	emaste, kib
MFC after:	3 days
Differential Revision:	https://reviews.freebsd.org/D29132
2021-03-08 14:43:06 -06:00
John Baldwin
ef74bfc6fe Add ObsoleteFiles.inc entries for various OCF headers removed in 13.
MFC after:	3 days
2021-03-08 11:17:21 -08:00
John Baldwin
c5a365623f Correct the name of the structure used for TCP socket options.
The structure was renamed while refactoring Netflix's KTLS changes for
upstreaming, but the original name remained in tcp.4 and was
subsequently copied to ktls.4.

PR:		254141
Reported by:	asomers
MFC after:	3 days
2021-03-08 10:46:40 -08:00
Mark Johnston
1bb7c45c53 wg: Fix a mismerge
df55485085 fixed a leak that I had initially fixed in a11009dccb.

Fixes:	a11009dccb
2021-03-08 12:56:14 -05:00
Mark Johnston
ffe3def903 iflib: Make if_shared_ctx_t a pointer to const
This structure is shared among multiple instances of a driver, so we
should ensure that it doesn't somehow get treated as if there's a
separate instance per interface.  This is especially important for
software-only drivers like wg.

DEVICE_REGISTER() still returns a void * and so the per-driver sctx
structures are not yet defined with the const qualifier.

Reviewed by:	gallatin, erj
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D29102
2021-03-08 12:39:06 -05:00
Mark Johnston
435c7cfb24 Rename _cscan_atomic.h and _cscan_bus.h to atomic_san.h and bus_san.h
Other kernel sanitizers (KMSAN, KASAN) require interceptors as well, so
put these in a more generic place as a step towards importing the other
sanitizers.

No functional change intended.

MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D29103
2021-03-08 12:39:06 -05:00
Mark Johnston
0e72eb4602 ath_hal: Stop printing messages during boot
ath_hal is compiled into the kernel by default and so always prints a
message to dmesg even when the system has no ath hardware.

MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2021-03-08 12:39:06 -05:00
Mark Johnston
7995dae9d3 posix timers: Improve the overrun calculation
timer_settime(2) may be used to configure a timeout in the past.  If
the timer is also periodic, we also try to compute the number of timer
overruns that occurred between the initial timeout and the time at which
the timer fired.  This is done in a loop which iterates once per period
between the initial timeout and now.  If the period is small and the
initial timeout was a long time ago, this loop can take forever to run,
so the system is effectively DOSed.

Replace the loop with a more direct calculation of
(now - initial timeout) / period to compute the number of overruns.

Reported by:	syzkaller
Reviewed by:	kib
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D29093
2021-03-08 12:39:06 -05:00
Mark Johnston
60d12ef952 posix timers: Sprinkle some style fixes
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2021-03-08 12:39:06 -05:00