136859 Commits

Author SHA1 Message Date
Mark Johnston
6faf45b34b amd64: Implement a KASAN shadow map
The idea behind KASAN is to use a region of memory to track the validity
of buffers in the kernel map.  This region is the shadow map.  The
compiler inserts calls to the KASAN runtime for every emitted load
and store, and the runtime uses the shadow map to decide whether the
access is valid.  Various kernel allocators call kasan_mark() to update
the shadow map.

Since the shadow map tracks only accesses to the kernel map, accesses to
other kernel maps are not validated by KASAN.  UMA_MD_SMALL_ALLOC is
disabled when KASAN is configured to reduce usage of the direct map.
Currently we have no mechanism to completely eliminate uses of the
direct map, so KASAN's coverage is not comprehensive.

The shadow map uses one byte per eight bytes in the kernel map.  In
pmap_bootstrap() we create an initial set of page tables for the kernel
and preloaded data.

When pmap_growkernel() is called, we call kasan_shadow_map() to extend
the shadow map.  kasan_shadow_map() uses pmap_kasan_enter() to allocate
memory for the shadow region and map it.

Reviewed by:	kib
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D29417
2021-04-13 17:42:20 -04:00
Mark Johnston
38da497a4d Add the KASAN runtime
KASAN enables the use of LLVM's AddressSanitizer in the kernel.  This
feature makes use of compiler instrumentation to validate memory
accesses in the kernel and detect several types of bugs, including
use-after-frees and out-of-bounds accesses.  It is particularly
effective when combined with test suites or syzkaller.  KASAN has high
CPU and memory usage overhead and so is not suited for production
environments.

The runtime and pmap maintain a shadow of the kernel map to store
information about the validity of memory mapped at a given kernel
address.

The runtime implements a number of functions defined by the compiler
ABI.  These are prefixed by __asan.  The compiler emits calls to
__asan_load*() and __asan_store*() around memory accesses, and the
runtime consults the shadow map to determine whether a given access is
valid.

kasan_mark() is called by various kernel allocators to update state in
the shadow map.  Updates to those allocators will come in subsequent
commits.

The runtime also defines various interceptors.  Some low-level routines
are implemented in assembly and are thus not amenable to compiler
instrumentation.  To handle this, the runtime implements these routines
on behalf of the rest of the kernel.  The sanitizer implementation
validates memory accesses manually before handing off to the real
implementation.

The sanitizer in a KASAN-configured kernel can be disabled by setting
the loader tunable debug.kasan.disable=1.

Obtained from:	NetBSD
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D29416
2021-04-13 17:42:20 -04:00
Mark Johnston
01028c736c Add a KASAN option to the kernel build
LLVM support for enabling KASAN has not yet landed so the option is not
yet usable, but hopefully this will change soon.

Reviewed by:	imp, andrew
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D29454
2021-04-13 17:42:20 -04:00
Mitchell Horne
5742f2d89c arm64: adjust comments in dbg_monitor_exit()
These comments were copied from dbg_monitor_enter(), but the intended
modifications weren't made. Update them to reflect what this code
actually does.

MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
2021-04-13 14:41:31 -03:00
Mitchell Horne
a2a8b582bd arm64: clear debug registers after execve(2)
This is both intuitive and required, as any previous breakpoint settings
may not be applicable to the new process.

Reported by:	arichardson
Reviewed by:	kib
MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D29672
2021-04-13 14:41:03 -03:00
Konstantin Belousov
8cca7b7f28 nfs client: depend on xdr
Since 7763814fc9c27 nfsrpc_setclient() uses mem_alloc() that is macro
around malloc(M_RPC).  M_RPC is provided by xdr.ko.

Reviewed by:	rmacklem
Sponsored by:	Mellanox Technologies/NVidia Networking
MFC after:	1 week
2021-04-13 18:04:43 +03:00
Edward Tomasz Napierala
ca6e1fa3ce linux: adjust ordering of Linux auxv and add dummy AT_HWCAP2
This should be a no-op; the purpose of this is to reduce
a spurious difference between Linuxulator and Linux, to make
debugging core dumps slightly easier.

Note that AT_HWCAP2 we pass to Linux binaries is always 0,
instead of being equal to 'cpu_feature2'.  This matches what
I've observed under Ubuntu Focal VM.

Reviewed By:	chuck, dchagin
Sponsored By:	EPSRC
Differential Revision:	https://reviews.freebsd.org/D29609
2021-04-13 13:14:30 +01:00
Kurosawa Takahiro
2aa21096c7 pf: Implement the NAT source port selection of MAP-E Customer Edge
MAP-E (RFC 7597) requires special care for selecting source ports
in NAT operation on the Customer Edge because a part of bits of the port
numbers are used by the Border Relay to distinguish another side of the
IPv4-over-IPv6 tunnel.

PR:		254577
Reviewed by:	kp
Differential Revision:	https://reviews.freebsd.org/D29468
2021-04-13 10:53:18 +02:00
John Baldwin
45d5c28439 cxgbe: Ignore doomed virtual interfaces when updating the clip table.
A doomed VI does not have a valid ifnet.

Reported by:	Jithesh Arakkan @ Chelsio
Reviewed by:	np
MFC after:	1 week
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D29662
2021-04-12 14:36:40 -07:00
John Baldwin
76681661be OCF: Remove support for asymmetric cryptographic operations.
There haven't been any non-obscure drivers that supported this
functionality and it has been impossible to test to ensure that it
still works.  The only known consumer of this interface was the engine
in OpenSSL < 1.1.  Modern OpenSSL versions do not include support for
this interface as it was not well-documented.

Reviewed by:	cem
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D29736
2021-04-12 14:28:43 -07:00
John Baldwin
89df484739 iscsi: Kick threads out of iscsi_ioctl() during unload.
iscsid can be sleeping in iscsi_ioctl() causing the destroy_dev() to
sleep forever if iscsi.ko is unloaded while iscsid is running.

Reported by:	Jithesh Arakkan @ Chelsio
Reviewed by:	mav
MFC after:	1 week
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D29688
2021-04-12 13:58:21 -07:00
John Baldwin
568e69e4eb cxgbe: Add counters for iSCSI PDUs transmitted via TOE.
Reviewed by:	np
MFC after:	1 week
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D29297
2021-04-12 13:57:45 -07:00
Warner Losh
662053e8dc hptrr: Move to using .o files
Use .o files directly. Replace the .o.uu files that we uudecode with .o files.
Adjust the kernel and module build to cope.

Suggestions by:		markj@, emaste@
Sposnored by:           Netflix, Inc
Differential Revision:	https://reviews.freebsd.org/D29636
2021-04-12 13:47:55 -06:00
Warner Losh
fddb3f4d7d hptmv: use .o files directly
uudecode the .o.uu files and commit directly to the tree. Adjust the build
infrastructure to cope with the new location, both for the kernel and modules.

Sposnored by:           Netflix, Inc
Differential Revision:	https://reviews.freebsd.org/D29635
2021-04-12 13:47:55 -06:00
Warner Losh
550cb4ab85 hpt27xx: store the .o files directly in the tree
Store the .o files directly in the tree. We no longer need to play uuencode
games like we did in the CVS days. Adjust the build infrastructure to match.

Reviewed by:            markj@
Sposnored by:           Netflix, Inc
Differential Revision:	https://reviews.freebsd.org/D29634
2021-04-12 13:47:55 -06:00
Warner Losh
5b20c5e1f8 hptnr: Store the .o files directly in the repo
We no longer need to use uuencode to uuencode files in our tree.  Store the .o
file directly instead. Adjust the build to cope with the new arrangement.

Suggestions by:		emaste, bz, donner
Reviewed by:		markm
Sposnored by:		Netflix, Inc
Differential Revision:	https://reviews.freebsd.org/D29632
2021-04-12 13:47:55 -06:00
Gleb Smirnoff
8d5719aa74 syncache: simplify syncache_add() KPI to return struct socket pointer
directly, not overwriting the listen socket pointer argument.
Not a functional change.
2021-04-12 08:27:40 -07:00
Gleb Smirnoff
08d9c92027 tcp_input/syncache: acquire only read lock on PCB for SYN,!ACK packets
When packet is a SYN packet, we don't need to modify any existing PCB.
Normally SYN arrives on a listening socket, we either create a syncache
entry or generate syncookie, but we don't modify anything with the
listening socket or associated PCB. Thus create a new PCB lookup
mode - rlock if listening. This removes the primary contention point
under SYN flood - the listening socket PCB.

Sidenote: when SYN arrives on a synchronized connection, we still
don't need write access to PCB to send a challenge ACK or just to
drop. There is only one exclusion - tcptw recycling. However,
existing entanglement of tcp_input + stacks doesn't allow to make
this change small. Consider this patch as first approach to the problem.

Reviewed by:	rrs
Differential revision:	https://reviews.freebsd.org/D29576
2021-04-12 08:25:31 -07:00
Mitchell Horne
2816bd8442 rmlock(9): add an RM_DUPOK flag
Allows for duplicate locks to be acquired without witness complaining.
Similar flags exists already for rwlock(9) and sx(9).

Reviewed by:	markj
MFC after:	3 days
Sponsored by:	NetApp, Inc.
Sponsored by:	Klara, Inc.
NetApp PR:	52
Differential Revision:	https://reviews.freebsd.org/D29683n
2021-04-12 11:42:21 -03:00
Hans Petter Selasky
7497dd5889 Fix build of stand/usb .
MFC after:	1 week
Sponsored by:	Mellanox Technologies // NVIDIA Networking
2021-04-12 16:13:33 +02:00
Mark Johnston
dfff37765c Rename struct device to struct _device
types.h defines device_t as a typedef of struct device *.  struct device
is defined in subr_bus.c and almost all of the kernel uses device_t.
The LinuxKPI also defines a struct device, so type confusion can occur.

This causes bugs and ambiguity for debugging tools.  Rename the FreeBSD
struct device to struct _device.

Reviewed by:	gbe (man pages)
Reviewed by:	rpokala, imp, jhb
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D29676
2021-04-12 09:32:30 -04:00
Mark Johnston
3f322b22e0 linuxkpi: Fix pcie_set_readrq()
We were passing a LinuxKPI struct device * to a pci(4) function that
expects a device_t.

Reviewed by:	manu, hselasky, bz
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D29675
2021-04-12 09:32:21 -04:00
Mark Johnston
56cbd386fb qlnxr: Properly initialize the Linux device structure
The driver needs to provide a LinuxKPI device structure to register
itself with the IB subsystem.  It was erroneously using a copy of its
FreeBSD device structure for this purpose.

Use linux_pci_attach_device() instead, following the example of the
Chelsio iwarp driver.  Also ensure that we don't leak the faked device
during detach.

Reviewed by:	hselasky
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D29595
2021-04-12 09:32:08 -04:00
Mark Johnston
9771af4942 cxgb: Use device_t in preference to struct device *
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2021-04-12 09:32:04 -04:00
Mark Johnston
d8b1601d54 al_eth: Use device_t in preference to struct device *
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2021-04-12 09:32:02 -04:00
Mark Johnston
f66a1f4074 genet: Use device_t in preference to struct device *
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2021-04-12 09:31:58 -04:00
Kristof Provost
5e98cae661 pf: Ensure that we don't use kif passed to pfi_kkif_attach()
Once a kif is passed to pfi_kkif_attach() we must ensure we never re-use
it for anything else.
Set the kif to NULL afterwards to guarantee this.

Reported-by: syzbot+be5d4f4a7a4c295e659a@syzkaller.appspotmail.com
MFC after:	4 weeks
Sponsored by:	Rubicon Communications, LLC ("Netgate")
2021-04-12 11:55:21 +02:00
Andrew Turner
3da5983889 Remove versatile support
It was used for testing armv6 under QEMU, however since then we added
support for the QEMU virt platform.

Reviewed by:	imp, manu
Differential Revision:	https://reviews.freebsd.org/D29707
2021-04-12 06:16:31 +00:00
Andrew Turner
5d2d599d3f Create VM_MEMATTR_DEVICE on all architectures
This is intended to be used with memory mapped IO, e.g. from
bus_space_map with no flags, or pmap_mapdev.

Use this new memory type in the map request configured by
resource_init_map_request, and in pciconf.

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D29692
2021-04-12 06:15:31 +00:00
Navdeep Parhar
bf5057691b cxgbe/tom: Fix potential leak in t4_aiotx_process_job.
The mbuf allocated could be a chain and must be freed with m_freem.

Reviewed by:	jhb@
MFC after:	1 week
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D29579
2021-04-11 19:14:18 -07:00
Rick Macklem
9edaceca81 nfsd: cut the Linux NFSv4.1/4.2 some slack w.r.t. RFC5661
Recent testing of network partitioning a FreeBSD NFSv4.1
server from a Linux NFSv4.1 client identified problems
with both the FreeBSD server and Linux client.

Sometimes, after some Linux NFSv4.1/4.2 clients establish
a new TCP connection, they will advance the sequence number
for a session slot by 2 instead of 1.
RFC5661 specifies that a server should reply
NFS4ERR_SEQ_MISORDERED for this case.
This might result in a system call error in the client and
seems to disable future use of the slot by the client.
Since advancing the sequence number by 2 seems harmless,
allow this case if vfs.nfs.linuxseqsesshack is non-zero.

Note that, if the order of RPCs is actually reversed,
a subsequent RPC with a smaller sequence number value
for the slot will be received.  This will result in
a NFS4ERR_SEQ_MISORDERED reply.
This has not been observed during testing.
Setting vfs.nfs.linuxseqsesshack to 0 will provide
RFC5661 compliant behaviour.

This fix affects the fairly rare case where a NFSv4
Linux client does a TCP reconnect and then apparently
erroneously increments the sequence number for the
session slot twice during the reconnect cycle.

PR:	254816
MFC after:	2 weeks
2021-04-11 16:51:25 -07:00
Vladimir Kondratyev
774cbf9b64 hv_kbd: Fix leaked $FreeBSD$ expansion
MFC with:	c2a159286c76
2021-04-12 02:16:22 +03:00
Vladimir Kondratyev
e4643aa4c4 hv_kbd: Add support for K_XLATE and K_CODE modes for gen 2 VMs
That fixes disabled keyboard input after Xorg server has been stopped.

Reviewed by:	whu
MFC after:	1 month
Differential revision:	https://reviews.freebsd.org/D28171
2021-04-12 02:14:12 +03:00
Vladimir Kondratyev
c2a159286c hv_kbd: Add evdev protocol support for gen 2 VMs
Reviewed by:	whu
MFC after:	1 month
Differential revision:	https://reviews.freebsd.org/D28170
2021-04-12 02:14:12 +03:00
Rick Macklem
e152bbecb2 param.h: bump __FreeBSD_version for commit 7763814fc9c2
Commit 7763814fc9c2 changed the internal KAPI between the krpc
and NFS.  As such, the krpc, nfscommon and nfscl modules must
all be rebuilt from sources.
2021-04-11 14:50:56 -07:00
Rick Macklem
7763814fc9 nfsv4 client: do the BindConnectionToSession as required
During a recent testing event, it was reported that the NFSv4.1/4.2
server erroneously bound the back channel to a new TCP connection.
RFC5661 specifies that the fore channel is implicitly bound to a
new TCP connection when an RPC with Sequence (almost any of them)
is done on it.  For the back channel to be bound to the new TCP
connection, an explicit BindConnectionToSession must be done as
the first RPC on the new connection.

Since new TCP connections are created by the "reconnect" layer
(sys/rpc/clnt_rc.c) of the krpc, this patch adds an optional
upcall done by the krpc whenever a new connection is created.
The patch also adds the specific upcall function that does a
BindConnectionToSession and configures the krpc to call it
when required.

This is necessary for correct interoperability with NFSv4.1/NFSv4.2
servers when the nfscbd daemon is running.

If doing NFSv4.1/NFSv4.2 mounts without this patch, it is
recommended that the nfscbd daemon not be running and that
the "pnfs" mount option not be specified.

PR:	254840
Comments by:	asomers
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D29475
2021-04-11 14:34:57 -07:00
Vincenzo Maffione
70275a6735 netmap: don't use linux type struct device *
Such type cannot be used in code that is in common between
FreeBSD and Linux. Use the FreeBSD type instead.

MFC after:	3 days
Reported by:	markj
Differential Revision:	https://reviews.freebsd.org/D29677
2021-04-11 21:13:01 +00:00
Hans Petter Selasky
5a3426f453 if_smsc: Add the ability to disable "turbo_mode", also called RX frame batching,
similarly to the Linux driver, by a tunable read only sysctl.

Submitted by:	Oleg Sidorkin <osidorkin@gmail.com>
PR:		254884
MFC after:	1 week
Sponsored by:	Mellanox Technologies // NVIDIA Networking
2021-04-11 20:25:58 +02:00
Alexander V. Chernikov
afbb64f1d8 Fix vlan creation for the older ifconfig(8) binaries.
Reported by:	allanjude
MFC after:	immediately
2021-04-11 18:13:09 +01:00
Edward Tomasz Napierala
ec5325dbca cam: make sure to clear even more CCBs allocated on the stack
This is my second pass, this time over all of CAM except
for the SCSI target bits.  There should be no functional
changes.

Reviewed By:	imp
Sponsored by:	NetApp, Inc.
Sponsored by:	Klara, Inc.
Differential Revision:	https://reviews.freebsd.org/D29549
2021-04-11 15:24:22 +01:00
Konstantin Belousov
a091c35323 ptrace: restructure comments around reparenting on PT_DETACH
style code, and use {} for both branches.

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2021-04-11 14:44:30 +03:00
Konstantin Belousov
9d7e450b64 ptrace: remove dead call to FIX_SSTEP()
It was an alias for procfs_fix_sstep() long time ago.

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2021-04-11 14:44:30 +03:00
Piotr Pawel Stefaniak
a212f56d10 Balance parentheses in sysctl descriptions 2021-04-11 10:30:55 +02:00
Mateusz Guzik
97ed4babb5 zfs: avoid memory allocation in arc_prune_async 2021-04-11 07:19:56 +02:00
Mateusz Guzik
feea35bed0 zfs: make vnlru_free_vfsops use conditional on version
Diff reduction against upstream.
2021-04-11 04:57:26 +00:00
Rick Macklem
22cefe3d83 nfsd: fix replies from session cache for multiple retries
Recent testing of network partitioning a FreeBSD NFSv4.1
server from a Linux NFSv4.1 client identified problems
with both the FreeBSD server and Linux client.

Commit 05a39c2c1c18 fixed replying with the cached reply in
in the session slot if same session slot sequence#.
However, the code uses the reply and, as such,
will fail for a subsequent retry of the RPC.
A subsequent retry would be an extremely rare event,
but this patch fixes this, so long as m_copym(..M_NOWAIT)
does not fail, which should also be a rare event.

This fix affects the exceedingly rare case where a NFSv4
client retries a non-idempotent RPC, such as a lock
operation, multiple times.  Note that retries only occur
after the client has needed to create a new TCP connection,
with a new TCP connection for each retry.

MFC after:	2 weeks
2021-04-10 15:50:25 -07:00
Mateusz Guzik
a7fbfdee73 zfs: change format string in zio_fini to get rid of the cast 2021-04-10 20:33:43 +00:00
Alexander V. Chernikov
7f5f3fcc32 Fix direct route installation with net/bird.
Slighly relax the gateway validation rules imposed by the
 2fe5a79425c7, by requiring only first 8 bytes (everyhing
 before sdl_data to be present in the AF_LINK gateway.

Reported by:	olivier
2021-04-10 16:31:16 +01:00
Alexander V. Chernikov
63dceebe68 Appease -Wsign-compare in radix.c
Differential Revision:	https://reviews.freebsd.org/D29661
Submitted by:	zec
MFC after	2 weeks
2021-04-10 13:48:25 +00:00
Alexander V. Chernikov
caf2f62765 Allow to specify debugnet fib in sysctl/tunable.
Differential Revision:	https://reviews.freebsd.org/D29593
Reviewed by:		donner
MFC after:		2 weeks
2021-04-10 13:47:49 +00:00