Commit Graph

255788 Commits

Author SHA1 Message Date
Alexander V. Chernikov
537d134373 Bring DPDK route lookups to FreeBSD.
This change introduces loadable fib lookup modules based on
 DPDK rte_lpm lib targeted for high-speed lookups in large-scale tables.
It is based on the lookup framework described in D27401.

IPv4 module is called dpdk_lpm4. It wraps around rte_lpm [1] library.
This library implements variation of DIR24-8 [2] lookup algorithm.
Module provide lockless route lookups and in-place incremental updates,
 allowing for good RIB performance.

IPv6 module is called dpdk_lpm6. It wraps around rte_lpm6 [3] library.
Implementation can be seen as multi-bit trie where the stride or number of bits
 inspected on each level varies from level to level.
It can vary from 1 to 14 memory accesses, with 5 being the average value
 for the lengths that are most commonly used in IPv6.
Module provide lockless route lookups for global unicast addresses
 and in-place incremental updates, allowing for good RIB performance.

Implementation details:
* wrapper code lives in `sys/contrib/dpdk_rte_lpm/dpdk_lpm[6].c`.
* rte_lpm[6] implementation contains both RIB and FIB code.
 . RIB ("rule_") code, backed by array of hash tables part has been commented out,
 as base radix already provides all the necessary primitives.
* link-local lookups are currently implemented as base radix lookup.
 This part should be converted to something like read-only radix trie.

Usage detail:
Compile kernel with option FIB_ALGO and load dpdk_lpm4/dpdk_lpm6
 module at any time. They will be picked up automatically when
 amount of routes raises to several thousand.

[1]: https://doc.dpdk.org/guides/prog_guide/lpm_lib.html
[2]: http://yuba.stanford.edu/~nickm/papers/Infocom98_lookup.pdf
[3]: https://doc.dpdk.org/guides/prog_guide/lpm6_lib.html

Differential Revision: https://reviews.freebsd.org/D27412
2021-01-09 12:41:04 +00:00
Hans Petter Selasky
a898ee51c4 Fix LINT kernel build after 01f2e864f7.
Differential revision:	https://reviews.freebsd.org/D27893
Sponsored by: Mellanox Technologies // NVIDIA Networking
2021-01-09 10:49:21 +01:00
Kyle Evans
8c4094f38c certctl: factor out certname resolution
create_blacklisted() will identify a cert whether it's provided a path to
a cert or the hash.serial format that is shown by `certctl list`.

Factor this logic out into a resolve_certname() so that it may be reused
elsewhere.
2021-01-08 22:36:22 -06:00
Kyle Evans
b799d38a2a certctl: replace hardcoded uses of /usr/local
Use the new user.localbase sysctl here as well, to reduce the number of
hardcoded localbase by one (1).

MFC after:	3 days (note: just use a literal /usr/local default)
2021-01-08 22:06:42 -06:00
Chuck Tuffli
e83fdf8bb3 fix big-endian platforms after 6733401935
The NVMe byte-swap routines for big-endian platforms used memcpy() to
move the unaligned 64-bit value into a temp register to byte swap it.
Instead of introducing a dependency, manually byte-swap the values in
place.

Point hat:	me
2021-01-08 14:41:45 -08:00
Dimitry Andric
a82f07fc2e Fix 32-bit build post 6733401935
The general style in sbin/nvmecontrol apppears to print uint64_t types
using %j, so I'm using that instead of the more general (but admittedly
ugly) PRIu64.
2021-01-08 23:38:30 +01:00
Bryan Drewery
f222a6b886 dtrace: Fix /"string" == NULL/ comparisons using an uninitialized value.
A test of this is funcs/tst.strtok.d which has this filter:

    BEGIN
    /(this->field = strtok(this->str, ",")) == NULL/
    {
            exit(1);
    }
The test will randomly fail with exit status of 1 indicating that this->field
was NULL even though printing it out shows it is not.

This is compiled to the DTrace instruction set:
    // Pushed arguments not shown here
    // call strtok() and set result into %r1
    07: 2f001f01    call DIF_SUBR(31), %r1          ! strtok
    // set thread local scalar this->field from %r1
    08: 39050101    stls %r1, DT_VAR(1281)          ! DT_VAR(1281) = "field"
    // Prepare for the == comparison
    // Set right side of %r2 to NULL
    09: 25000102    setx DT_INTEGER[1], %r2         ! 0x0
    // string compare %r1 (strtok result) to %r2
    10: 27010200    scmp %r1, %r2

In this case only %r1 is loaded with a string limit set to lim1.  %r2 being
NULL does not get loaded and does not set lim2.  Then we call dtrace_strncmp()
with MIN(lim1, lim2) resulting in passing 0 and comparing neither side.
dtrace_strncmp() handles this case fine and it already has been while
being lucky with what lim2 was [un]initialized as.

Reviewed by:	markj, Don Morris <dgmorris AT earthlink.net>
Sponsored by:	Dell EMC
Differential Revision:	https://reviews.freebsd.org/D27671
2021-01-08 14:37:17 -08:00
Bryan Drewery
556fcdce5b bsd.compat.mk: Allow finding non-internal libraries
Currently only libexec/rtld-elf32 uses internal LIBC_NOSSP_PIC during
the build but it gets it directly from the objdir rather than a sysroot.
For example, /usr/obj/usr/src/amd64.amd64/obj-lib32/lib/libc/libc_nossp_pic.a.
We don't stage lib32 libraries in WORLDTMP/usr/lib32 and doing so doesn't
buy much.  If we want to use a staged lib32 library then we need to look in
LIBCOMPATTMP where they were staged.  For example if LIBC_PIC were wanted then
look for /usr/obj/usr/src/amd64.amd64/obj-lib32/tmp/usr/lib32/libc_pic.a.

Reported by:	rlibby
Reviewed by:	rlibby
Sponsored by:	Dell EMC
Differential Revision:	https://reviews.freebsd.org/D27648
2021-01-08 14:34:21 -08:00
Bryan Drewery
44b8b2a00d Makefile.inc1: Avoid using release/Makefile for VERSION.
release/Makefile.inc1 has git executions that were being ran for each of
these lookups.  The results were not needed so just lookup what we want
directly instead.

Reviewed by:	gjb, rlibby, emaste (maybe)
Sponsored by:	Dell EMC
Differential Revision:	https://reviews.freebsd.org/D27643
2021-01-08 14:33:35 -08:00
Mitchell Horne
cbc9be948a sifive_uart: quiet GCC -Werror=parentheses
Add an additional set of braces to clarify intention. The '&' operator
has a higher precedence than '|', but the reader may not always remember
this. No functional change.
2021-01-08 17:32:18 -04:00
Warner Losh
936440560b sysctl: implement debug.kdb.panic_str
This is just like debug.kdb.panic, except the string that's passed in
is reported in the panic message. This allows people with automated
systems to collect kernel panics over a large fleet of machines to
flag panics better. Strings like "Warner look at this hang" or "see
JIRA ABC-1234 for details" allow these automated systems to route the
forced panic to the appropriate engineers like you can with other
types of panics. Other users are likely possible.

Relnotes: Yes
Sponsored by: Netflix
Reviewed by: allanjude (earlier version)
Suggestions from review folded in by: 0mp, emaste, lwhsu
Differential Revision: https://reviews.freebsd.org/D28041
2021-01-08 14:30:28 -07:00
Toomas Soome
7c6a71d16c i386 kernel is not built with vt_vbefb
Add vt_vbefb to GENERIC and NOTES
2021-01-08 20:58:23 +02:00
Konstantin Belousov
de27805fee linuxkpi: handle ARI
Stop trying to manually calculate RID, which cannot be done correctly
by PCI_DEVFN().  Use PCI_GET_RID() method instead.

Do not use pci_find_dbsf() to go from the linux pci_dev to freebsd
device_t.  First, device is readily available as dev.bsddev.  Second,
using pci_find_dbsf() fails for ARI-enabled functions with large
function numbers, because PCI_SLOT()/PCI_FUNC() are for non-ARI.

Reviewed by:	bz, hselasky, manu
Tested by:	manu (drm)
Sponsored by:	Mellanox Technologies/NVidia Networking
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D27960
2021-01-08 23:17:21 +02:00
Kyle Evans
04a3ba363d libregex: re-enable make check
The tests are generally expected to pass, uncomment the annotation that
lets `make check` work. Note that `make check` currently requires kyua
from ports or an appropriate symlink into /usr/local/bin.
2021-01-08 13:58:35 -06:00
Miod Vallat
d36b5dbe28 libc: regex: rework unsafe pointer arithmetic
regcomp.c uses the "start + count < end" idiom to check that there are
"count" bytes available in an array of char "start" and "end" both point to.

This is fine, unless "start + count" goes beyond the last element of the
array. In this case, pedantic interpretation of the C standard makes the
comparison of such a pointer against "end" undefined, and optimizers from
hell will happily remove as much code as possible because of this.

An example of this occurs in regcomp.c's bothcases(), which defines
bracket[3], sets "next" to "bracket" and "end" to "bracket + 2". Then it
invokes p_bracket(), which starts with "if (p->next + 5 < p->end)"...

Because bothcases() and p_bracket() are static functions in regcomp.c, there
is a real risk of miscompilation if aggressive inlining happens.

The following diff rewrites the "start + count < end" constructs into "end -
start > count". Assuming "end" and "start" are always pointing in the array
(such as "bracket[3]" above), "end - start" is well-defined and can be
compared without trouble.

As a bonus, MORE2() implies MORE() therefore SEETWO() can be simplified a
bit.

PR:		252403
2021-01-08 13:58:35 -06:00
Cy Schubert
c6951fac78 Fix 32-bit build post 5cc52631b3. 2021-01-08 11:28:30 -08:00
mhorne
bbfa199cbc arm64: gdb(4) machine-dependent bits
Everything required for remote kernel debugging over a serial
connection. For FDT-based systems, a debug port can be specified by
setting hw.fdt.dbgport to the desired device tree node in loader.conf.
For example, hw.fdt.dbgport="uart1", or
hw.fdt.dbgport="serial@ff1a0000".

Looks good:	emaste
Tested by:	rwatson
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D27727
2021-01-08 14:53:44 -04:00
mhorne
5f66d5a313 arm64: remove pcb_pc
The program counter field in the PCB is written in exactly one place,
makectx(), upon entry to the debugger. For threads other than curthread,
its value will be empty, or bogus. Rather than writing to this field in
more places, it can be removed in favor of using the value in the link
register.

To make this clearer, pcb->pcb_x[30] is renamed to pcb->pcb_lr, similar
to what already exists in struct trapframe. Also, prefer lr to x30 in
assembly, as it better conveys intention.

This improves PC_REGS() for kdb_thread != curthread. It is required for
a functional gdb(4) stub, fixing the output of `info threads`, in
particular.

The space occupied by pcb_pc is retained, for compatibility with kgdb.

Reviewed by:	markj, jhb
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D27720
2021-01-08 14:53:44 -04:00
mhorne
e9bb4ce3d0 arm64: don't pass user trapframe to kdb_trap()
This effectively undoes the changes made in r321571. While useful, it is
inconsistent with how other architectures pass trapframes to kdb. This
change is also required to get a working gdb(4) stub on arm64, as
otherwise the backtrace will begin too early.

As of 088a7eef95, this information can still be obtained via
"show registers/u".

Reviewed by:	jhb (slightly earlier version)
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Pull Request:	https://reviews.freebsd.org/D27719
2021-01-08 14:53:20 -04:00
mhorne
088a7eef95 ddb: add ability to print user registers
The debugger is always entered after some kind of kernel trap, often a
breakpoint in kdb_enter(). This means that the most recent trapframe
will include kernel state at the time of the trap, when often it is
desirable to the developer to view the contents of the previous
trapframe. This trapframe often corresponds to the entry from userspace.

The ddb(4) man page claims the ability to display user register state
via the 'u' modifier to `show registers`, but this appears untrue. It is
not obvious from a quick search of the history when this feature was
added, or when it was removed. (Re)implement this feature in
db_show_regs, noting that it is not necessarily populated with userspace
state.

Reviewed by:	jhb (earlier version), markj, bcr (manpages)
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D27705
2021-01-08 14:53:06 -04:00
Andrew Gallatin
52cd25eb1a mbuf: enable ext_pgs ("unmapped") mbufs by default
Ext_pg mbufs allow carrying multiple pages per mbuf. This
reduces mbuf linked list traversals, especially in socket
buffers, thereby reducing cache misses and CPU use for
applications using sendfile.  Note that ext_pages use
unmapped pages, eliminating KVA mapping costs on 32-bit
platforms.

Ext_pg mbufs are also required for ktls (KERN_TLS), and having
them disabled by default is a stumbling block for those
wishing to enable ktls.

Reviewed-by:	jhb, glebius
Sponsored by:	Netfix
2021-01-08 13:43:30 -05:00
Mark Johnston
e65e4e61f5 vmd: Clean up resources properly when vmd_attach() fails
- Free the resource container by calling rman_fini().[1]
- Call device_delete_child() if device_probe_and_attach() fails.

Reported by:	nc [1]
MFC after:	2 weeks
2021-01-08 13:32:05 -05:00
Mark Johnston
adc0dcc352 mpr, mps: Fix an off-by-one bug in the BTDH_MAPPING ioctl
The device mapping table contains sc->max_devices entries, so only
indices in [0, sc->max_devices) are valid.

MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D27964
2021-01-08 13:32:05 -05:00
Mark Johnston
de828a91db mpr, mps: Fix a stack buffer overflow in the user passthru ioctl
Previously we copied in the request into a stack-allocated structure
that could be smaller than the request size.  Furthermore, we checked
the request size only after doing the copyin.

Fix this by allocating a buffer to hold the request, then copying the
buffer's contents into a command descriptor.  This is a bit heavy-handed
but I expect the overhead will not be noticeable.  The approach of
coping the header in first is susceptible to TOCTOU problems.

Reviewed by:	imp
Reported by:	maxpl0it@protonmail.com
MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D27963
2021-01-08 13:32:04 -05:00
Mark Johnston
092cf8d63f safexcel: Fix a race around unblocking of crypto ops
safexcel_ring_intr() could fail to observed that sc_blocked is set after
completing all outstanding ops for a ring, in which case blocked ops
would be deferred forever.

Request structures are managed by individual rings, so move the
"blocked" flag into the per-ring state block and use the ring lock to
synchronize with safexcel_process().  Remove sc_mtx since it is now
unused.

MFC after:	3 days
Sponsored by:	Rubicon Communications, LLC (Netgate)
2021-01-08 13:32:04 -05:00
Mark Johnston
8ba6acbbe6 safexcel: Stop using a stack buffer for the ring lock name
mtx_init() does not make a copy of the name so the buffer must be valid
for the lifetime of the driver instance.  Store each ring's lock's name
in the ring structure.

MFC after:	3 days
Sponsored by:	Rubicon Communications, LLC (Netgate)
2021-01-08 13:32:04 -05:00
Mark Johnston
501159696c igmp: Avoid leaking mbuf when source validation fails
PR:		252504
Submitted by:	Panagiotis Tsolakos <panagiotis.tsolakos@gmail.com>
MFC after:	3 days
2021-01-08 13:32:04 -05:00
Hans Petter Selasky
431980466f Don't offset the UAR map twice in mlx5en(4).
The new UAR API already offsets the UAR map pointer the mlx5en(4) is using.
While at it remove some no longer needed variables for keeping track
of the current BF offset.

This fixes a regression issue after the new UAR allocation APIs
were introduced.

MFC after: 1 week
Sponsored by: Mellanox Technologies // NVIDIA Networking
2021-01-08 18:35:49 +01:00
Chuck Tuffli
6733401935 nvmecontrol: add device self-test op and log page
Add decoding of the Device Self-test log page and the ability to start
or abort a test.

Reviewed by:	imp, mav
Tested by:	Muhammad Ahmad <muhammad.ahmad@seagate.com>
MFC after:	2 weeks
Differential Revision: https://reviews.freebsd.org/D27517
2021-01-08 09:27:56 -08:00
Gleb Smirnoff
7edc1bd9dc When be_activate() turns on a new boot environment, it always deactivates
the current one first. And if it fails to do so, it abandons activation.
However, with the new bootonce feature, there is a legitimate case when
a pool doesn't have "bootfs" property set. Check for this case before
calling be_deactivate().

Reviewed by:	kevans
2021-01-08 09:23:16 -08:00
Kyle Evans
14a16fd3e7 build: add WITHOUT_CLEAN workaround for 821aa63a09
The *w variants of ncurses directories went away, and the remaining names
build the widechar variants instead of non-widechar variants. As such, the
entire ncurses tree should be regenerated.

Key off of lib/ncurses/ncursesw being present and remove the whole ncurses
hierarchy if it is.

Reviewed by:	emaste (IRC)
2021-01-08 10:43:53 -06:00
Kyle Evans
9be9771c87 efidev: remove EFIIOC_GET_TABLE ioctl
This ioctl would instantly induce a panic, likely since near inception, up
until 0861c7d3e0. Lack of previous interest in fixing it combined with
the problematic interface (exports a pointer, really a physical address)
brings us to the natural conclusion: remove it until a useful consumer
forward.

If it eventually gets resurrected, the interface should definitely not
return in this exact form and likely needs to be reimagined.

The associated KPI, efi_get_table, is left intact for the time being.

Reviewed by:	imp, jrtc27
Also discussed with:	brooks, jhb
Differential Revision:	https://reviews.freebsd.org/D28030
2021-01-08 10:41:50 -06:00
Ulrich Spörlein
40903394bf GitHub actions: unbreak macOS build
Error: llvm 11.0.0 is already installed

Also make the linking failure non-fatal:

Error: The `brew link` step did not complete successfully
2021-01-08 15:36:38 +01:00
Andrew Turner
6815909abd Move the PMC overflow count to make it per-CPU
Virtual PMCs could be running on multiple CPUs so this needs to be
a per-CPU value.

Submitted by:	rwatson (earlier version)
Reviewed by:	gnn
Sponsored by:	Innovate UK
Differential Revision:	https://reviews.freebsd.org/D27973
2021-01-08 14:24:43 +00:00
Andrew Turner
90a6e9ef63 Update hwpmc on armv7 to handle overflow better
When testing hwpmc on arm64 we found the counter could overflow while
reading the event count. Handle this case in the armv7 code by also
checking if the overflow bit is set and incrementing the overflow
cound as needed.

Sponsored by:	Innovate UK
Differential Revision:	https://reviews.freebsd.org/D27969
2021-01-08 14:24:43 +00:00
Mateusz Guzik
8ddea0b127 cache: just assign ni_resflags = NIRES_ABS
It is guaranteed to be 0 on entry.
2021-01-08 13:57:10 +00:00
Mateusz Guzik
77589de8aa mac: cheaper check for mac_vnode_check_readlink 2021-01-08 13:57:10 +00:00
Hans Petter Selasky
f8f5b459d2 Update user access region, UAR, APIs in the core in mlx5core.
This change include several changes as listed below all related to UAR.
UAR is a special PCI memory area where the so-called doorbell register and
blue flame register live. Blue flame is a feature for sending small packets
more efficiently via a PCI memory page, instead of using PCI DMA.

- All structures and functions named xxx_uuars were renamed into xxx_bfreg.
- Remove partially implemented Blueflame support from mlx5en(4) and mlx5ib.
- Implement blue flame register allocator.
- Use blue flame register allocator in mlx5ib.
- A common UAR page is now allocated by the core to support doorbell register
  writes for all of mlx5en and mlx5ib, instead of allocating one UAR per
  sendqueue.
- Add support for DEVX query UAR.
- Add support for 4K UAR for libmlx5.

Linux commits:
7c043e908a74ae0a935037cdd984d0cb89b2b970
2f5ff26478adaff5ed9b7ad4079d6a710b5f27e7
0b80c14f009758cefeed0edff4f9141957964211
30aa60b3bd12bd79b5324b7b595bd3446ab24b52
5fe9dec0d045437e48f112b8fa705197bd7bc3c0
0118717583cda6f4f36092853ad0345e8150b286
a6d51b68611e98f05042ada662aed5dbe3279c1e

MFC after: 1 week
Sponsored by: Mellanox Technologies // NVIDIA Networking
2021-01-08 13:33:46 +01:00
Hans Petter Selasky
3764792007 Fix whitespace in mlx5en(4).
MFC after: 1 week
Sponsored by: Mellanox Technologies // NVIDIA Networking
2021-01-08 13:33:46 +01:00
Hans Petter Selasky
9a47ae044b Bump driver versions for mlx5en(4) and mlx4en(4).
MFC after: 1 week
Sponsored by: Mellanox Technologies // NVIDIA Networking
2021-01-08 12:35:55 +01:00
Hans Petter Selasky
376e130b47 Fix memory leaks in error paths in krping.
MFC after: 1 week
Sponsored by: Mellanox Technologies // NVIDIA Networking
2021-01-08 12:35:55 +01:00
Hans Petter Selasky
89c0b4fa11 Bump some copyrights in mlx5en(4).
MFC after: 1 week
Sponsored by: Mellanox Technologies // NVIDIA Networking
2021-01-08 12:35:55 +01:00
Hans Petter Selasky
a00718e1df Implement SIOCGIFRSSKEY and SIOCGIFRSSHASH and mlx5en(4).
MFC after: 1 week
Sponsored by: Mellanox Technologies // NVIDIA Networking
2021-01-08 12:35:55 +01:00
Hans Petter Selasky
480570dbb3 Fixes for SRIOV in mlx5core.
- call pci_iov_detach() on detaching from PCI device to take care of hang
  on destroying VFs after PF is down.

- disable eswitch SRIOV support right after pci_iov_detach(),
  else the eswitch cleanup sometimes occur while the SRIOV flow table
  is still present.

Submitted by: kib@
MFC after: 1 week
Sponsored by: Mellanox Technologies // NVIDIA Networking
2021-01-08 12:35:55 +01:00
Hans Petter Selasky
98140747ca Update the PCI ID list in mlx5core.
- Add descriptions for new devices.
- Add support for Bluefield.

MFC after: 1 week
Sponsored by: Mellanox Technologies // NVIDIA Networking
2021-01-08 12:35:54 +01:00
Hans Petter Selasky
82c7abe778 The "unsigned" type is the same like "unsigned int".
MFC after: 1 week
Sponsored by: Mellanox Technologies // NVIDIA Networking
2021-01-08 12:35:54 +01:00
Hans Petter Selasky
87b3c8cc99 Fix spelling in mlx5core.
MFC after: 1 week
Sponsored by: Mellanox Technologies // NVIDIA Networking
2021-01-08 12:35:54 +01:00
Hans Petter Selasky
daa150aaa3 Properly handle case where firmware dump returns more registers on second pass
in mlx5core.

Submitted by: kib@
MFC after: 1 week
Sponsored by: Mellanox Technologies // NVIDIA Networking
2021-01-08 12:35:54 +01:00
Hans Petter Selasky
50a9f8bbc1 Downgrade error about missing VSC to warning and make messages consistent
in mlx5core.

Submitted by: kib@
MFC after: 1 week
Sponsored by: Mellanox Technologies // NVIDIA Networking
2021-01-08 12:35:53 +01:00
Toomas Soome
742653ebd5 sysctl debug.dump_modinfo should recognize font module
Add MODINFOMD_FONT to dump list.
2021-01-08 09:24:49 +02:00