Commit Graph

135632 Commits

Author SHA1 Message Date
Ryan Libby
c86fa3b8d7 pf: quiet -Wredundant-decls for pf_get_ruleset_number
In e86bddea9f sys/netpfil/pf/pf.h grew a
declaration of pf_get_ruleset_number.  Now delete the old declaration
from sys/net/pfvar.h.

Reviewed by:	kp
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D28081
2021-01-10 21:53:15 -08:00
Alexander V. Chernikov
685de460bc Use static initializers for fib algo to shift initialization
to ealier stage. This allows to register modules loaded at
 boot time.

Reported by:	olivier
2021-01-11 00:16:54 +00:00
Vincenzo Maffione
55f0ad5fde netmap: restore hwofs and support it in iflib
Restore the hwofs functionality temporarily disabled by
7ba6ecf216 to prevent issues with iflib.
This patch brings the necessary changes to iflib to
enable howfs to allow interface restarts without
disrupting netmap applications actively using its
rings.
After this change, it becomes possible for multiple
non-cooperating netmap applications to use non-overlapping
subsets of the available netmap rings without clashing
with each other.

PR:		252453
MFC after:	1 week
2021-01-10 22:51:15 +00:00
Konstantin Belousov
a013e285df x86 tsc: mark %eax as earlyclobber in tscp_get_timecount_low().
i386 codegen insists on preloading tc_priv into register on i386, and
this register cannot be %eax because RDTSCP instruction clobbers it
before it is used.

Reported and tested by:	dim
MFC after:	6 days
Sponsored by:	The FreeBSD Foundation
2021-01-11 00:05:49 +02:00
Rick Macklem
148a227bf8 nfsd: add KASSERTs to nfsm_trimtrailing() for M_EXTPG mbufs
Add KASSERTS to nfsm_trimtrailing() to confirm the sanity of
the arguments for the M_EXTPG case.

Suggested by:	kib
Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D28053
2021-01-10 13:50:15 -08:00
Thomas Skibo
facdd1cd20 cgem: add 64-bit support
Add 64-bit address support to Cadence CGEM Ethernet driver for use in
other SoCs such as the Zynq UltraScale+ and SiFive HighFive Unleashed.

Reviewed by:	philip, 0mp (manpages)
Differential Revision: https://reviews.freebsd.org/D24304
2021-01-10 16:51:52 -04:00
Alan Cox
5a181b8bce Prefer the use of vm_page_domain() to vm_phys_domain().
When we already have the vm page in hand, use vm_page_domain() instead
of vm_phys_domain().  The former has a trivial constant-time
implementation whereas the latter iterates over the mem_affinity array.

Reviewed by:	kib, markj
Differential Revision:	https://reviews.freebsd.org/D28005
2021-01-10 13:25:33 -06:00
Vladimir Kondratyev
0f0379fa55 hid: Add recently imported drivers to NOTES
Reviewed by:	hselasky
Differential revision:	https://reviews.freebsd.org/D28060
2021-01-10 22:17:20 +03:00
Vladimir Kondratyev
8ffcde2554 hid: fix extraneous SYSCTL_ADD_INT() options revealed by LINT build
Reviewed by:	hselasky (as part of D28060)
2021-01-10 22:17:20 +03:00
Vincenzo Maffione
54bbcca4f9 re: netmap: enable/disable krings on interface reinit
This prevents krings from being used during an interface
reset, and notifies the active applications.
See also 1d238b07d5.

MFC after:      1 week
2021-01-10 15:09:05 +00:00
Vincenzo Maffione
8aa8484cbf iflib: fix build failure in case DEV_NETMAP is not defined
This addresses the build failure introduced by
3d65fd97e8.

MFC with: 3d65fd97e8
2021-01-10 14:43:58 +00:00
Vincenzo Maffione
bb714db6d3 netmap: vtnet: enable/disable krings on any interface reinit
See 3d65fd97e8 for a detailed explanation.

PR:             252453
MFC after:      1 week
2021-01-10 14:10:09 +00:00
Vincenzo Maffione
4ba9ad0dc3 iflib: add assert to prevent out-of-bounds array access
The iflib_queues_alloc() allocates isc_nrxqs iflib_dma_info structs
for each rxqset, and links each struct to a different free list.
As a result, it must be isc_nrxqs >= isc_nfl (plus the completion
queue, if present).
Add an assertion to make this constraint explicit.

MFC after:	2 weeks
2021-01-10 13:59:20 +00:00
Robert Watson
4f2cbaf3cd Track pipe(2) reads and writes as rusage message receives and sends, a
feature misplaced during the transition from BSD 4.4's socket implementation
to the optimised FreeBSD pipe implementation.

MFC after:		1 week
Reviewed by:		arichardson, imp
Differential Revision:	https://reviews.freebsd.org/D27878
2021-01-10 12:16:39 +00:00
Vincenzo Maffione
3d65fd97e8 netmap: iflib: enable/disable krings on any interface reinit
Since 1d238b07d5, krings are disabled before
a reinit cycle triggered by iflib_netmap_register.
However, this operation is actually necessary also for
any interface reinit triggered by other causes (i.e.,
ifconfig commands).
We achieve this goal by moving the krings enable/disable
operation inside iflib_stop() and iflib_init_locked().

Once here, this change also removes some redundant operations
from iflib_netmap_register(), that are already performed by
iflib_stop().

PR:		252453
MFC after:	1 week
2021-01-10 12:04:08 +00:00
Jamie Gritton
2a4b225146 jail: Simplify handling of prison_deref()
Track the the current lock/reference state in a single variable,
rather than deducing the proper prison_deref() flags from a
combination of equations and hard-coded values.
2021-01-09 21:05:06 -08:00
Konstantin Belousov
ac2576b9f7 tmpfs open: assert that there is no double-init of f_data.
Sponsored by:	The FreeBSD Foundation
2021-01-10 04:48:36 +02:00
Konstantin Belousov
9f200bc47b tmpfs_free_tmp(): explicitly assert that tmp is locked
Despite TMPFS_UNLOCK() is done in both paths later, unlocking not locked
mutex provides different failure mode.

MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2021-01-10 04:48:29 +02:00
Konstantin Belousov
42bebbda9e tmpfs: make M_TMPFSMNT static to tmpfs_vfsops.c
This malloc type is only used in this file.

MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2021-01-10 04:44:55 +02:00
Konstantin Belousov
9e680e4005 tsc: add RDTSCP or faster variants of get_timecount()
Use it in preference of Xfenced RDTSC if RDTSCP is supported. It is
recommended by both Intel and AMD. But, on AMD Zens and newer use
LFENCE, as recommended by AMD [*]. In particular, this means that now
AMD CPUs use more appropriate fence instead of too harsh MFENCe.

Add comment explaining the intent of the selection logic.

Reported by:	gallatin [*]
Reviewed by:	gallatin, markj
Tested by:	gallatin, pho
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D27986
2021-01-10 04:42:34 +02:00
Konstantin Belousov
45974de8fb x86: Add rdtscp32() into cpufunc.h.
Suggested by:	markj
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D27986
2021-01-10 04:42:34 +02:00
Konstantin Belousov
826fc3cc3d tsc: use u_int for return type for prototype, same as in definitions.
Reviewed by:	gallatin, markj
MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D27986
2021-01-10 04:42:34 +02:00
Konstantin Belousov
aa9450e44b x86 identcpu.c: fix formatting of the comment.
Reviewed by:	gallatin, markj
MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D27986
2021-01-10 04:42:33 +02:00
Konstantin Belousov
5844bd058a jobc: rework detection of orphaned groups.
Instead of trying to maintain pg_jobc counter on each process group
update (and sometimes before), just calculate the counter when needed.
Still, for the benefit of the signal delivery code, explicitly mark
orphaned groups as such with the new process group flag.

This way we prevent bugs in the corner cases where updates to the counter
were missed due to complicated configuration of p_pptr/p_opptr/real_parent
(debugger).

Since we need to iterate over all children of the process on exit, this
change mostly affects the process group entry and leave, where we need
to iterate all process group members to detect orpaned status.

(For MFC, keep pg_jobc around but unused).

Reported by:	jhb
Reviewed by:	jilles
Tested by:	pho
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D27871
2021-01-10 04:41:20 +02:00
Konstantin Belousov
cf4f802e77 kinfo_proc: move job-control related data collection into a new helper.
This improves code structure and allows to put the lock asserts right
into place where the locks are needed.

Also move zeroing of the kinfo_proc structure from fill_kinfo_proc_only()
to fill_kinfo_proc(), this looks more symmetrical.

Reviewed by:	jilles
Tested by:	pho
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D27871
2021-01-10 04:41:20 +02:00
Konstantin Belousov
4daea93813 Lock proctree in around fill_kinfo_proc().
Proctree lock is needed for correct calculation and collection of the
job-control related data in kinfo_proc.  There was even an XXX comment
about it.

Satisfy locking and lock ordering requirements by taking proctree lock
around pass over each bucket in proc_iterate(), and in sysctl_kern_proc()
and note_procstat_proc() for individual process reporting.

Reviewed by:	jilles
Tested by:	pho
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D27871
2021-01-10 04:41:20 +02:00
Konstantin Belousov
a008bdeda3 tty_wait_background: improve locking.
Increase the scope of the process group lock ownership.  This ensures that
we are consistent in returning EIO for tty write from an orphan and delivery
of TTYOUT signals.

Reviewed by:	jilles
Tested by:	pho
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D27871
2021-01-10 04:41:20 +02:00
Konstantin Belousov
ef739c7373 pgrp: Prevent use after free.
Often, we have a process locked and need to get locked process group.
In this case, because progress group lock is before process lock,
unlocking process allows the group to be freed.  See for instance
tty_wait_background().

Make pgrp structures allocated from nofree zone, and ensure type stability
of the pgrp mutex.

Reviewed by:	jilles
Tested by:	pho
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D27871
2021-01-10 04:41:19 +02:00
Konstantin Belousov
e0d83cd3e4 issignal(): when handling STOP-like signals, drop sigacts mutex earlier.
Reviewed by:	jilles
Tested by:	pho
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D27871
2021-01-10 04:41:19 +02:00
Konstantin Belousov
993a1699b1 Style. Improve some KASSERTs messages.
Reviewed by:	jilles
Tested by:	pho
MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D27871
2021-01-10 04:41:19 +02:00
Konstantin Belousov
811f08c1cf amd64 pmap: add comment explaining TLB invalidation modes.
Requested and reviewed by:	alc
Discussed with:	markj
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D25815
2021-01-10 02:44:42 +02:00
Alan Somers
19cca0b961 aio: fix the tests when ZFS is not available
Don't try to cleanup the zpool if we couldn't create a zpool in the
first place.

Submitted by:	tmunro
MFC-with:	022ca2fc7f
2021-01-09 17:16:38 -07:00
Neel Chauhan
408c514f73 linuxkpi: Fix the "error: unknown type name 'u32'" compilation issue when
building the Intel QAT/QuickAssist driver.

Approved by:		hselasky, kib
Differential Revision:	https://reviews.freebsd.org/D28055
2021-01-09 15:27:04 -08:00
Vincenzo Maffione
9ac59d42c0 netmap: vtnet: stop krings during interface reset
Similarly to what done for iflib in 1d238b07d5,
this patch prevents access to the krings during the interface
reset triggered by netmap_register().

MFC after:	1 week
2021-01-09 22:34:52 +00:00
Vincenzo Maffione
7ba6ecf216 netmap: refactor netmap_reset
The netmap_reset() function is meant to be called by the driver
when they initialize (or re-initialize) a hardware ring.
However, since the introduction of support for opening (in
netmap mode) a subset of the available rings, netmap_reset()
may be called multiple times on actively used rings, causing
both kring and netmap ring to transition to an inconsistent
state.
This changes improves the situation by resetting all the
indices fields of the kring to 0, as expected after the
reinitialization of a hardware ring.

PR:	    252518
MFC after:  1 week
2021-01-09 22:07:24 +00:00
Vincenzo Maffione
3189ba6167 netmap: iflib: fix asserts in netmap_fl_refill()
When netmap_fl_refill() is called at initialization time (e.g.,
during netmap_iflib_register()), nic_i must be 0, since the
free list is reinitialized. At the end of the refill cycle, nic_i
must still be zero, because exactly N descriptors (N is the ring size)
are refilled.
This patch therefore fixes the assertions to check on nic_i rather
than on nm_i. The current netmap_reset() may in fact cause nm_i
to be != 0 while the device is resetting: this may happen when
multiple non-cooperating processes open different subsets of the
available netmap rings.

PR:	    252518
MFC after:  1 week
2021-01-09 21:35:07 +00:00
Vincenzo Maffione
1d238b07d5 netmap: iflib: stop krings during interface reset
When different processes open separate subsets of the
available rings of a same netmap interface, a device
reset may be performed while one of the processes
is actively using some rings (e.g., caused by another
process executing a nmport_open()).
With this patch, such situation will cause the
active process to get a POLLERR, so that it can
have a chance to detect the situation.
We also guarantee that no process is running a txsync
or rxsync (ioctl or poll) while an iflib device reset
is in progress.

PR:	    252453
MFC after:  1 week
2021-01-09 21:01:46 +00:00
Michael Tuexen
6685e259e3 tcp: don't use KTLS socket option on listening sockets
KTLS socket options make use of socket buffers, which are not
available for listening sockets.

Reported by:		syzbot+a8829e888a93a4a04619@syzkaller.appspotmail.com
Reviewed by:		jhb@
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D27948
2021-01-08 08:57:11 +01:00
Kyle Evans
da77382f55 arm: revert MAXDSIZ change from 202aea9c82
This imposes a fairly severe limitation on space available for mmap that
was not noticed prior to commit. Unfixed mmap will only map from
[data + MAXSIZE, end of user VA space], bringing the amount of usable space
down way too low for non-trivial link jobs (for instance).

Reported by:	mmel
2021-01-09 14:14:00 -06:00
Jan Kokemüller
4d0c33be63 kevent(2): Bugfix for wrong EVFILT_TIMER timeouts
When using NOTE_NSECONDS in the kevent(2) API, US_TO_SBT should be
used instead of NS_TO_SBT, otherwise the timeout results are
misleading.

PR:		252539
Reviewed by:	kevans, kib
Approved by:	kevans
MFC after:	3 weeks
2021-01-09 20:00:25 +01:00
Warner Losh
40e6e2c2f7 sysctl: improve debug.kdb.panic_str description
Improve the wording for this sysctl.

Submitted by: rpokala@
2021-01-09 11:10:42 -07:00
Mark Johnston
109260d202 mvneta: Acquire the softc lock before clearing the MIB
Reported by:	Andrei Martin <andrei.cos.martin@gmail.com>
MFC with:	caf552a607
2021-01-09 10:04:17 -05:00
Alexander V. Chernikov
0433870efe Add fib lookup testing module.
This module intended to measure performance of routing lookups.

Uses a list of IP addresses specified by sysctl one-by-one.
Performance testing is triggered by changing sysctl OID with a number of lookups to execute.
Lookups are done by the chunks of 10K routes, entering/exiting epoch on
 chunk granularity to amortise cost.

Example:
make -C sys/modules/test/fib_lookup unload load
for i in `cat ~/ip4.txt`; do sysctl net.route.test.add_inet_addr=$i; done
for i in `cat ~/ip6.txt`; do sysctl net.route.test.add_inet6_addr=$i; done

sysctl net.route.test.run_inet=10000000

dmesg | tail

Dec 13 23:24:05 current kernel: 10000000 packets in 417240173 nanoseconds, 23967011 pps
Dec 13 23:24:06 current kernel: run: 10000000 packets vnet 0xfffff80003073f00
Dec 13 23:24:07 current kernel: 10000000 packets in 423086254 nanoseconds, 23635842 pps

Differential Revision: https://reviews.freebsd.org/D27604
2021-01-09 13:20:30 +00:00
Alexander V. Chernikov
537d134373 Bring DPDK route lookups to FreeBSD.
This change introduces loadable fib lookup modules based on
 DPDK rte_lpm lib targeted for high-speed lookups in large-scale tables.
It is based on the lookup framework described in D27401.

IPv4 module is called dpdk_lpm4. It wraps around rte_lpm [1] library.
This library implements variation of DIR24-8 [2] lookup algorithm.
Module provide lockless route lookups and in-place incremental updates,
 allowing for good RIB performance.

IPv6 module is called dpdk_lpm6. It wraps around rte_lpm6 [3] library.
Implementation can be seen as multi-bit trie where the stride or number of bits
 inspected on each level varies from level to level.
It can vary from 1 to 14 memory accesses, with 5 being the average value
 for the lengths that are most commonly used in IPv6.
Module provide lockless route lookups for global unicast addresses
 and in-place incremental updates, allowing for good RIB performance.

Implementation details:
* wrapper code lives in `sys/contrib/dpdk_rte_lpm/dpdk_lpm[6].c`.
* rte_lpm[6] implementation contains both RIB and FIB code.
 . RIB ("rule_") code, backed by array of hash tables part has been commented out,
 as base radix already provides all the necessary primitives.
* link-local lookups are currently implemented as base radix lookup.
 This part should be converted to something like read-only radix trie.

Usage detail:
Compile kernel with option FIB_ALGO and load dpdk_lpm4/dpdk_lpm6
 module at any time. They will be picked up automatically when
 amount of routes raises to several thousand.

[1]: https://doc.dpdk.org/guides/prog_guide/lpm_lib.html
[2]: http://yuba.stanford.edu/~nickm/papers/Infocom98_lookup.pdf
[3]: https://doc.dpdk.org/guides/prog_guide/lpm6_lib.html

Differential Revision: https://reviews.freebsd.org/D27412
2021-01-09 12:41:04 +00:00
Hans Petter Selasky
a898ee51c4 Fix LINT kernel build after 01f2e864f7.
Differential revision:	https://reviews.freebsd.org/D27893
Sponsored by: Mellanox Technologies // NVIDIA Networking
2021-01-09 10:49:21 +01:00
Chuck Tuffli
e83fdf8bb3 fix big-endian platforms after 6733401935
The NVMe byte-swap routines for big-endian platforms used memcpy() to
move the unaligned 64-bit value into a temp register to byte swap it.
Instead of introducing a dependency, manually byte-swap the values in
place.

Point hat:	me
2021-01-08 14:41:45 -08:00
Bryan Drewery
f222a6b886 dtrace: Fix /"string" == NULL/ comparisons using an uninitialized value.
A test of this is funcs/tst.strtok.d which has this filter:

    BEGIN
    /(this->field = strtok(this->str, ",")) == NULL/
    {
            exit(1);
    }
The test will randomly fail with exit status of 1 indicating that this->field
was NULL even though printing it out shows it is not.

This is compiled to the DTrace instruction set:
    // Pushed arguments not shown here
    // call strtok() and set result into %r1
    07: 2f001f01    call DIF_SUBR(31), %r1          ! strtok
    // set thread local scalar this->field from %r1
    08: 39050101    stls %r1, DT_VAR(1281)          ! DT_VAR(1281) = "field"
    // Prepare for the == comparison
    // Set right side of %r2 to NULL
    09: 25000102    setx DT_INTEGER[1], %r2         ! 0x0
    // string compare %r1 (strtok result) to %r2
    10: 27010200    scmp %r1, %r2

In this case only %r1 is loaded with a string limit set to lim1.  %r2 being
NULL does not get loaded and does not set lim2.  Then we call dtrace_strncmp()
with MIN(lim1, lim2) resulting in passing 0 and comparing neither side.
dtrace_strncmp() handles this case fine and it already has been while
being lucky with what lim2 was [un]initialized as.

Reviewed by:	markj, Don Morris <dgmorris AT earthlink.net>
Sponsored by:	Dell EMC
Differential Revision:	https://reviews.freebsd.org/D27671
2021-01-08 14:37:17 -08:00
Mitchell Horne
cbc9be948a sifive_uart: quiet GCC -Werror=parentheses
Add an additional set of braces to clarify intention. The '&' operator
has a higher precedence than '|', but the reader may not always remember
this. No functional change.
2021-01-08 17:32:18 -04:00
Warner Losh
936440560b sysctl: implement debug.kdb.panic_str
This is just like debug.kdb.panic, except the string that's passed in
is reported in the panic message. This allows people with automated
systems to collect kernel panics over a large fleet of machines to
flag panics better. Strings like "Warner look at this hang" or "see
JIRA ABC-1234 for details" allow these automated systems to route the
forced panic to the appropriate engineers like you can with other
types of panics. Other users are likely possible.

Relnotes: Yes
Sponsored by: Netflix
Reviewed by: allanjude (earlier version)
Suggestions from review folded in by: 0mp, emaste, lwhsu
Differential Revision: https://reviews.freebsd.org/D28041
2021-01-08 14:30:28 -07:00
Toomas Soome
7c6a71d16c i386 kernel is not built with vt_vbefb
Add vt_vbefb to GENERIC and NOTES
2021-01-08 20:58:23 +02:00