Commit Graph

266921 Commits

Author SHA1 Message Date
Konstantin Belousov
747a6b7ace cloudabi and linux ABIs: do not call umtx_thread_cleanup() from thr_exit syscall
These ABIs do not use umtx at all, so there is nothing to clean.
Cloudabi references to umtx keys do not require any cleanups anyway.

Requested by:	dchagin
Reviewed by:	dchagin, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D30987
2021-07-07 14:12:14 +03:00
Konstantin Belousov
28a66fc3da Do not call FreeBSD-ABI specific code for all ABIs
Use sysentvec hooks to only call umtx_thread_exit/umtx_exec, which handle
robust mutexes, for native FreeBSD ABI.  Similarly, there is no sense
in calling sigfastblock_clear() for non-native ABIs.

Requested by:	dchagin
Reviewed by:	dchagin, markj (previous version)
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D30987
2021-07-07 14:12:07 +03:00
Konstantin Belousov
55976ce11a Move sv_onexit() sysentvec hook slightly later
after itimers are stopped.  This makes it more usable for e.g. native FreeBSD
ABI sysentvecs.

Reviewed by:	dchagin, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D30987
2021-07-07 14:12:07 +03:00
Konstantin Belousov
71ab344524 Add sv_onexec_old() sysent hook for exec event
Unlike sv_onexec(), it is called from the old (pre-exec) sysentvec structure.
The old vmspace for the process is still intact during the call.

Reviewed by:	dchagin, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D30987
2021-07-07 14:12:07 +03:00
Mateusz Guzik
edcf1054d3 cxgb: use m_gethdr_raw
Sponsored by:	Rubicon Communications, LLC ("Netgate")
2021-07-07 11:05:46 +00:00
Mateusz Guzik
a56888534d iflib: use m_gethdr_raw
Reviewed by:	gallatin
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D31081
2021-07-07 11:05:46 +00:00
Mateusz Guzik
c2c34ee540 mbuf: add m_get_raw and m_gethdr_raw
The intent is to eliminate the MT_NOINIT flag and consequently a branch
from the constructor.

Reviewed by:	gallatin
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D31080
2021-07-07 11:05:46 +00:00
Mateusz Guzik
0a718a6e6e mbuf: replace all direct uma_zfree(zone_mbuf) calls with m_free_raw
Reviewed by:	donner
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D31082
2021-07-07 11:05:46 +00:00
Alexander Motin
bdd11cbb90
FreeBSD: Hardcode abd_chunk_size to PAGE_SIZE
It makes no sense to set it below PAGE_SIZE, since it increases all
overheads and makes returning memory to OS problematic.  It makes no
sense to set it above PAGE_SIZE, since such allocations and especially
frees are too expensive and cause KVA fragmentation to benefit from
fewer chunks.  After that it makes no sense to keep more complicated
math here.

What may have sense though is just a tunable border between linear and
scatter ABDs, previously also controlled by this tunable.  Retain that
functionality by taking abd_scatter_min_size tunable from Linux, just
with different default value.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Closes #12328
2021-07-06 17:39:23 -07:00
Alexander Motin
97752ba22a
Move gethrtime() calls out of vdev queue lock
This dramatically reduces the lock contention on systems with slower
(non-TSC) timecounters.  With TSC the difference is minimal, but since
this lock is pretty congested, any improvement counts.  Plus I don't
see any reason to do it under the lock other than the latency of the
lock itself, which this change actually reduces.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #12281
2021-07-06 14:38:00 -07:00
Andrew Turner
a7b05eb16c Sync the arm64 special registers with the Armv8.5 XML
Add the missing macros and decode all the fields as described in the
Arm Architecture System Registers XML corresponding to Armv8.5.

Sponsored by:	The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D30983
2021-07-06 20:46:55 +00:00
Edward Tomasz Napierala
6f147a0734 cam: enable kern.cam.ada.enable_uma_ccbs by default
This makes the ada(4) driver use UMA for its CCBs.  While it's
da(4) counterpart needs some more testing, this one seems to be
safe now.

Please let me know via email if you notice any suspicious kernel
messages,

Reviewed By:	imp
Sponsored by:	NetApp, Inc.
Sponsored by:	Klara, Inc.
Differential Revision:	https://reviews.freebsd.org/D30567
2021-07-07 09:40:34 +01:00
Bjoern A. Zeeb
da2f833f7a MMCCAM: fix a panic after cam_sim_alloc_dev() removal in sdhci.c
During the removal of cam_sim_alloc_dev() in
aeb04e88f5 for sdhci.c and the
follow-up build-fix in a72af82e31
slot->dev and slot->bus got mixed up for MMCCAM;  slot->dev is
only used in the !MMCCAM case so is uninitialised here leading to
a panic;  switch back to slot->bus to return to the status quo.

Reviewed by:	imp (ack on arm@)
X-Differential Revision:	https://reviews.freebsd.org/D30857
2021-07-07 00:37:45 +00:00
Justin Gottula
6e4e3c3ab6
Udev rules: remove zvol compat symlinks (without the leading zvol/)
This is a potentially arguable change, because it removes some
compatibility cruft that certain systems or people may have come to rely
on (either a very long time ago, or unwisely in recent times).

On the other hand, it's been literally over a decade since OpenZFS
switched to the strategy of using opaque numbered /dev/zd* device nodes,
with the canonical zvol access path being a directory tree of symlinks
created by udev rules inside /dev/zvol/*. (See #102.) Even at the time,
the /dev/* scheme was labeled as being for "compatibility".

This commit removes the second tree of symlinks located directly at
/dev/*, under the assumption that anybody with any sense has been using
the intended /dev/zvol/* path for a very very long time now.

(The more I think about this, the more I anticipate that some large
fraction of people will have been blissfully unaware that the intention
has been for them to use the /dev/zvol/* tree all along, and they will
have come to rely upon the /dev/* tree simply because it's been there
this whole time despite being a compat thing.)

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed-by: Neal Gompa <ngompa@datto.com>
Signed-off-by: Justin Gottula <justin@jgottula.com>
Closes #12303
2021-07-06 13:41:17 -07:00
Randall Stewart
e834f9a44a tcp: Address goodput and TLP edge cases.
There are several cases where we make a goodput measurement and we are running
out of data when we decide to make the measurement. In reality we should not make
such a measurement if there is no chance we can have "enough" data. There is also
some corner case TLP's that end up not registering as a TLP like they should, we
fix this by pushing the doing_tlp setup to the actual timeout that knows it did
a TLP. This makes it so we always have the appropriate flag on the sendmap
indicating a TLP being done as well as count correctly so we make no more
that two TLP's.

In addressing the goodput lets also add a "quality" metric that can be viewed via
blackbox logs so that a casual observer does not have to figure out how good
of a measurement it is. This is needed due to the fact that we may still make
a measurement that is of a poorer quality as we run out of data but still have
a minimal amount of data to make a measurement.

Reviewed by: tuexen
Sponsored by: Netflix Inc.
Differential Revision: https://reviews.freebsd.org/D31076
2021-07-06 15:26:37 -04:00
Mateusz Guzik
2a69eb8c87 cxgb: switch bare zone_mbuf use to m_free_raw
Sponsored by:	Rubicon Communications, LLC ("Netgate")
2021-07-06 19:05:11 +00:00
Alexander Motin
d0732fa819 Add ocs_gendump.c to the build, missed in 29e2dbd42c. 2021-07-06 15:03:06 -04:00
Ryan Moeller
53b438b242 zfsd: Check for error from zpool_vdev_online
Onlining a vdev can fail. Log the error if it does.

Reviewed by:	mav, asomers
MFC after:	1 week
Sponsored by:	iXsystems, Inc.
Differential Revision:	https://reviews.freebsd.org/D30882
2021-07-06 16:00:18 +00:00
Ram Kishore Vegesna
29e2dbd42c ocs_fc: Add gendump and dump_to_host ioctl command support.
Support to generate firmware dump.

Approved by: mav(mentor)
2021-07-06 21:08:11 +05:30
Julien Grall
2b2c460d7b etc/ttys: add xen console
Xen VMs get a simulated serial device meant for use as a console.  Often
an xterm or other advanced terminal is used, so use xterm as the type.

Depending on configuration, FreeBSD on Xen for amd64 may instead use an
emulated serial port, but the virtual console may also be available.

Submitted by:	Elliott Mitchell <ehem+freebsd@m5p.com>
Reviewed by:	imp (slightly earlier version)
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D29873
2021-07-06 11:53:10 -03:00
Elliott Mitchell
c76616f496 etc/ttys: merge ttys file down to single file
The tty lists were already pretty similar and there hadn't been any real
need for them to remain distinct for some time. As such, merge to a
single file.

The RISC-V console is preserved. For systems where it doesn't exist, its
presence in /etc/ttys is harmless. The uncommented version of the
ttyv8/XDM line from ttys.amd64 was the one chosen.

Reviewed by:	imp
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D30256
2021-07-06 11:53:10 -03:00
Andrew Gallatin
28d0a740dd ktls: auto-disable ifnet (inline hw) kTLS
Ifnet (inline) hw kTLS NICs typically keep state within
a TLS record, so that when transmitting in-order,
they can continue encryption on each segment sent without
DMA'ing extra state from the host.

This breaks down when transmits are out of order (eg,
TCP retransmits).  In this case, the NIC must re-DMA
the entire TLS record up to and including the segment
being retransmitted.  This means that when re-transmitting
the last 1448 byte segment of a TLS record, the NIC will
have to re-DMA the entire 16KB TLS record. This can lead
to the NIC running out of PCIe bus bandwidth well before
it saturates the network link if a lot of TCP connections have
a high retransmoit rate.

This change introduces a new sysctl (kern.ipc.tls.ifnet_max_rexmit_pct),
where TCP connections with higher retransmit rate will be
switched to SW kTLS so as to conserve PCIe bandwidth.

Reviewed by:	hselasky, markj, rrs
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D30908
2021-07-06 10:28:32 -04:00
Ed Maste
c9144ec14d Skip netgraph tests when WITHOUT_NETGRAPH is set
PR:		256986
Reported by:	John Marshall
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2021-07-06 09:45:34 -04:00
Alex Richardson
2eefc1d926 Fix building rescue/rescue when sanitizers are enabled
We have to ensure that we don't link any instrumented object files
into rescue as it is a static executable and static binaries can't
use the sanitizer runtime.

Reviewed By:	imp
Differential Revision: https://reviews.freebsd.org/D31044
2021-07-06 12:18:30 +01:00
Alex Richardson
c78f449d85 usr.bin/diff: fix UBSan error in readhash
UBSan complains about the `sum = sum * 127 + chrtran(t);` line below since
that can overflow an `int`. Use `unsigned int` instead to ensure that
overflow is well-defined.

Reviewed By:	imp
MFC after:	1 week
Differential Revision: https://reviews.freebsd.org/D31075
2021-07-06 12:16:40 +01:00
Alex Richardson
31914882fc Import Arm Optimized Routines v21.02
This is the new replacement for the existing cortex-strings code which will
be replaced in a follow-up commit.
We should also be able to use some of the math functions to allow the
tests to pass on AArch64 (and other architectures) instead of just x86.
We might also be able to reuse some of the tests for the kyua testsuite.

Imported using
```
curl -L e823e3abf5 | tar --strip-components=1 -xvzf -
git add .
```

Differential Revision: https://reviews.freebsd.org/D29035
git-subtree-dir: contrib/arm-optimized-routines
git-subtree-mainline: e34c713b0e
git-subtree-split: f9f37c002a
2021-07-06 11:05:34 +01:00
Alex Richardson
e34c713b0e rtld/tests: Avoid function name conflict with libc opendir()
This prevents these tests from being compiled with ASAN since the asan
interceptors also define opendir() but matching the libc function.

Reviewed By:	oshogbo, kib, markj
Differential Revision: https://reviews.freebsd.org/D31038
2021-07-06 10:51:57 +01:00
Alex Richardson
4d552825ec usr.bin/login: send errors to console if syslog isn't running
I was debugging why login(1) wasn't working as expected on a minimal
MFS_ROOT disk image. This image doesn't have syslogd running so the
warnings were lost and I had to use GDB to find out why login(1) was
failing (missing PAM libraries) instead of being able to see it in
the console output.

MFC after:	1 week
Reviewed By:	pfg
Differential Revision: https://reviews.freebsd.org/D30892
2021-07-06 10:51:16 +01:00
Alex Richardson
d053fb22f6 usr.bin/sort: Avoid UBSan errors
UBSan complains about out-of-bounds accesses for zero-length arrays. To
avoid this we can use flexible array members. However, the C standard does
not allow for structures that only contain flexible array members, so we
move the length parameters into that structure too.

Split out from D28233.

Reviewed By:	markj
MFC after:	1 week
Differential Revision: https://reviews.freebsd.org/D31009
2021-07-06 10:51:05 +01:00
Edward Tomasz Napierala
a081a943a0 cam: drop unused 'saved_ccb' field from softcs
No functional changes.  Do not MFC this, it changes kernel ABI.

Sponsored by:	NetApp, Inc.
Sponsored by:	Klara, Inc.
Differential Revision:	https://reviews.freebsd.org/D30698
2021-07-06 10:04:38 +01:00
Edward Tomasz Napierala
13aa56fcd5 cam(4): preserve alloc_flags when copying CCBs
Before UMA CCBs, all CCBs were of the same size, and could
be trivially copied using bcopy(9).  Now we have to preserve
alloc_flags, otherwise we might end up attempting to free
stack-allocated CCB to UMA; we also need to take CCB size
into account.

This fixes kernel panic which would occur when trying to access
a stopped (as in, SCSI START STOP, also "ctladm stop") SCSI device.

Reported By:	Gary Jennejohn <gljennjohn@gmail.com>
Tested By:	Gary Jennejohn <gljennjohn@gmail.com>
Reviewed By:	imp
Sponsored by:	NetApp, Inc.
Sponsored by:	Klara, Inc.
Differential Revision:	https://reviews.freebsd.org/D31054
2021-07-06 09:27:22 +01:00
Wojciech Macek
382376f398 enetc: Add support for 2.5G fixed-link speed
With the v5.13 device-tree update speed of the CPU switch port was
changed to 2.5G. Reflect that in the driver.

Submitted by: Kornel Duleba <mindal@semihalf.com>
Obtained from: Semihalf
Sponsored by: Alstom Group
2021-07-06 09:01:30 +02:00
Alexander Motin
e3bcd07d83 nvme(4): Report NPWA before NPWG as stripesize.
New Samsung 980 SSDs report Namespace Preferred Write Alignment of
8 (4KB) and Namespace Preferred Write Granularity of 32 (16KB).
My quick tests show that 16KB is a minimal sequential write size
when the SSD reaches peak IOPS, so writing much less is very slow.
But writing slightly less or slightly more does not change much,
so it seems not so much a size granularity as minimum I/O size.

Thinking about different stripesize consumers:
 - Partition alignment should be based on NPWA by definition.
 - ZFS ashift in part of forcing alignment of all I/Os should also
be based on NPWA.  In part of forcing size granularity, if really
needed, it may be set to NPWG, but too big value can make ZFS too
space-inefficient, and the 16KB is actually the biggest supported
value there now.
 - ZFS recordsize/volblocksize could potentially be tuned up toward
NPWG to work as I/O size granularity, but enabled compression makes
it too fuzzy.  And those are normally user-configurable things.
 - ZFS I/O aggregation code could definitely use Optimal Write Size
value and may be NPWG, but we don't have fields in GEOM now to report
the minimal and optimal I/O sizes, and even maximal is not reported
outside GEOM DISK to be used by ZFS.

MFC after:	1 week
2021-07-05 23:13:15 -04:00
Alan Cox
e41fde3ed7 On a failed fcmpset don't pointlessly repeat tests
In a few places, on a failed compare-and-set, both the amd64 pmap and
the arm64 pmap repeat tests on bits that won't change state while the
pmap is locked.  Eliminate some of these unnecessary tests.

Reviewed by:	andrew, kib, markj
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D31014
2021-07-05 21:07:40 -05:00
Jessica Clarke
348c41d181 riscv: Implement non-stub __vdso_gettc and __vdso_gettimekeep
PR:	256905
Reviewed by:	arichardson, mhorne
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D30963
2021-07-05 16:16:53 +01:00
Jessica Clarke
af433832f7 geom_label: Remove an old sysinstall(8) workaround
We removed sysinstall(8) back in 2011, so this workaround should be long
since unnecessary. This workaround can end up breaking cases that are
hit in the real world, such as dd'ing a small pre-built disk image to a
large partition that you intend to grow on first boot and uses a UFS
disk label for / in its /etc/fstab (as the only reliable thing a raw UFS
image can reference).

Reviewed by:	imp, mckusick
Differential Revision:	https://reviews.freebsd.org/D30825
2021-07-05 16:15:32 +01:00
Jessica Clarke
55c57a7811 rman: Remove an outdated comment that no longer applies
Since commit 2dd1bdf183 in 2016 the r_start and r_end fields have been
rman_res_t, which was briefly unsigned long, but commit da1b038af9
changed the typedef to be uintmax_t instead. C99 is also something we
assume these days.

Reviewed by:	imp
Differential Revision:	https://reviews.freebsd.org/D30808
2021-07-05 16:15:03 +01:00
Emmanuel Vadot
b464c459ea loader: Fix dtb loading
When calling file_findfile with only a type it returns
the first file matching the type. But in fdt_apply_overlays we
then iterate on the next files and try loading them as dtb overlays.
Fix this by checking the type one more time.

Sponsored by:	Diablotin Systems
Reported by:	Mark Millard <marklmi@yahoo.com>
2021-07-05 15:53:08 +02:00
Mateusz Guzik
f649cff587 pf: padalign global locks found in pf.c
Reviewed by:	kp
Sponsored by:	Rubicon Communications, LLC ("Netgate")
2021-07-05 09:56:54 +00:00
Emmanuel Vadot
48687f733f armv7: allwinner: Add aw_r_intc driver
This is also needed after the 5.13 dts update.

Sponsored by:	Diablotin Systems
Reported by:	Mark Millard <marklmi@yahoo.com>
2021-07-05 11:38:23 +02:00
Mateusz Guzik
dc1ab04e4c pf: allow table stats clearing and reading with ruleset rlock
Instead serialize against these operations with a dedicated lock.

Prior to the change, When pushing 17 mln pps of traffic, calling
DIOCRGETTSTATS in a loop would restrict throughput to about 7 mln.  With
the change there is no slowdown.

Reviewed by:	kp (previous version)
Sponsored by:	Rubicon Communications, LLC ("Netgate")
2021-07-05 10:42:01 +02:00
Mateusz Guzik
f92c21a28c pf: depessimize table handling
Creating tables and zeroing their counters induces excessive IPIs (14
per table), which in turns kills single- and multi-threaded performance.

Work around the problem by extending per-CPU counters with a general
counter populated on "zeroing" requests -- it stores the currently found
sum. Then requests to report the current value are the sum of per-CPU
counters subtracted by the saved value.

Sample timings when loading a config with 100k tables on a 104-way box:

stock:

pfctl -f tables100000.conf  0.39s user 69.37s system 99% cpu 1:09.76 total
pfctl -f tables100000.conf  0.40s user 68.14s system 99% cpu 1:08.54 total

patched:

pfctl -f tables100000.conf  0.35s user 6.41s system 99% cpu 6.771 total
pfctl -f tables100000.conf  0.48s user 6.47s system 99% cpu 6.949 total

Reviewed by:	kp (previous version)
Sponsored by:	Rubicon Communications, LLC ("Netgate")
2021-07-05 10:42:01 +02:00
Peter Holm
c5d6dd80b5 stress2: Wait for the "swap" program to terminate 2021-07-05 09:16:32 +02:00
Peter Holm
7ebe83ddb7 stress2: Limit scope of rm(1) wildcard in cleanup.
Reviewed by:	 rgrimes
2021-07-05 09:14:05 +02:00
Li-Wen Hsu
1678975109
freebsd-tips: Fix the description of fetch(1) to match the command
Reported by:	jrtc27
MFC with:	ffe6afc4f0
2021-07-05 10:14:25 +08:00
Vladimir Kondratyev
5fa1eb1cd9 Bump __FreeBSD_version to 1400025 for LinuxKPI change. 2021-07-05 03:22:19 +03:00
Vladimir Kondratyev
8b33cb8303 LinuxKPI: Implement sequence counters and sequential locks
as a thin wrapper around native version found in sys/seqc.h.
This replaces out-of-base GPLv2-licensed code used by drm-kmod.

Reviewed by:	hselasky
Differential revision:	https://reviews.freebsd.org/D31006
2021-07-05 03:20:55 +03:00
Vladimir Kondratyev
019391bf85 LinuxKPI: Implement strscpy
strscpy copies the src string, or as much of it as fits, into the dst
buffer.  The dst buffer is always NUL terminated, unless it's zero-sized.
strscpy returns the number of characters copied (not including the
trailing NUL) or -E2BIG if len is 0 or src was truncated.

Currently drm-kmod replaces strscpy with strncpy that is not quite
correct as strncpy does not NUL-terminate truncated strings and returns
different values on exit.

Reviewed by:	hselasky, imp, manu
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D31005
2021-07-05 03:20:42 +03:00
Vladimir Kondratyev
98a6984a9e LinuxKPI: Use macro for implementation of some dma_map_* functions
This allows to remove unimplemented attrs parameter which type differs
between Linux kernel versions and to compile both drm-kmod and ofed
callers unmodified.
Also convert it to 'unsigned long' type to match modern Linuxes.

Reviewed by:	hselasky
Differential revision:	https://reviews.freebsd.org/D30932
2021-07-05 03:20:23 +03:00
Vladimir Kondratyev
864b11007a LinuxKPI: Implement irq_work_sync() routine.
irq_work_sync() performs draining of irq_work task.
Required by drm-kmod.

Reviewed by:	hselasky
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D30818
2021-07-05 03:20:06 +03:00