Commit Graph

137035 Commits

Author SHA1 Message Date
Konstantin Belousov
437c241d0c vfs_vnops.c: Make vn_statfile() non-static
Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D29323
2021-04-15 12:47:56 +03:00
Konstantin Belousov
42be0a7b10 Style.
Add missed spaces, wrap long lines.

Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D29323
2021-04-15 12:47:46 +03:00
Mateusz Guzik
4f0279e064 cache: extend mismatch vnode assert print to include the name 2021-04-15 07:55:43 +00:00
Kirk McKusick
14d0cd7225 Ensure that the mount command shows "with quotas" when quotas are enabled.
When quotas are enabled with the quotaon(8) command, it sets the
MNT_QUOTA flag in the mount structure mnt_flag field. The mount
structure holds a cached copy of the filesystem statfs structure
in mnt_stat that includes a copy of the mnt_flag field in
mnt_stat.f_flags. The mnt_stat structure may not be updated for
hours. Since the mount command requests mount details using the
MNT_NOWAIT option, it gets the mount's mnt_stat statfs structure
whose f_flags field does not yet show the MNT_QUOTA flag being set
in mnt_flag.

The fix is to have quotaon(8) set the MNT_QUOTA flag in both mnt_flag
and in mnt_stat.f_flags so that it will be immediately visible to
callers of statfs(2).

Reported by:  Christos Chatzaras
Tested by:    Christos Chatzaras
PR:           254682
MFC after:    3 days
Sponsored by: Netflix
2021-04-14 15:25:08 -07:00
Vladimir Kondratyev
8e84712d01 hidmap: add missing opt_hid.h to module Makefile
Reported by:	pstef
MFC after:	2 weeks
2021-04-14 23:05:59 +03:00
Mark Johnston
aabe13f145 uma: Introduce per-domain reclamation functions
Make it possible to reclaim items from a specific NUMA domain.

- Add uma_zone_reclaim_domain() and uma_reclaim_domain().
- Permit parallel reclamations.  Use a counter instead of a flag to
  synchronize with zone_dtor().
- Use the zone lock to protect cache_shrink() now that parallel reclaims
  can happen.
- Add a sysctl that can be used to trigger reclamation from a specific
  domain.

Currently the new KPIs are unused, so there should be no functional
change.

Reviewed by:	mav
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D29685
2021-04-14 13:03:34 -04:00
Mark Johnston
54f421f9e8 uma: Split bucket_cache_drain() to permit per-domain reclamation
Note that the per-domain variant does not shrink the target bucket size.

No functional change intended.

MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
2021-04-14 13:03:34 -04:00
Mark Johnston
29bb6c19f0 domainset: Define additional global policies
Add global definitions for first-touch and interleave policies.  The
former may be useful for UMA, which implements a similar policy without
using domainset iterators.

No functional change intended.

Reviewed by:	mav
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D29104
2021-04-14 13:03:33 -04:00
Emmanuel Vadot
0c80ad2dc6 arm: Add no-cftconvert for sdma-imx6 files
Fixes a warning when building kernel:
ctfconvert: file.c: Couldn't read ehdr: Invalid argument

MFC after:	3 days
2021-04-14 15:43:37 +02:00
Martin Matuska
6db169e920 zfs: merge openzfs/zfs@3522f57b6 (master)
Notable upstream pull request merges:
  #11742 When specifying raidz vdev name, parity count should match
  #11744 Use a helper function to clarify gang block size
  #11771 Support running FreeBSD buildworld on Arm-based macOS hosts

This is the last update that will be MFCed into stable/13.

From now on, the tracking of OpenZFS branches will be different:
- main continues tracking openzfs/zfs/master
- stable/13 is going to track openzfs/zfs/zfs-2.1-release

Obtained from:	OpenZFS
MFC after:	1 week
2021-04-14 12:51:51 +02:00
Vladimir Kondratyev
6678e75e4f pchtherm: Add IDs for CannonLake-H, CometLake and Lewisburg controllers
Submitted by:	Dmitry Luhtionov <dmitryluhtionov@gmail.com>
MFC after:	2 weeks
2021-04-14 13:15:19 +03:00
Konstantin Belousov
75c5cf7a72 filt_timerexpire: avoid process lock recursion
Found by:	syzkaller
Reported and reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D29746
2021-04-14 10:53:28 +03:00
Konstantin Belousov
5cc1d19941 realtimer_expire: avoid proc lock recursion when called from itimer_proc_continue()
It is fine to drop the process lock there, process cannot exit until its
timers are cleared.

Found by:	syzkaller
Reported and reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D29746
2021-04-14 10:53:19 +03:00
Konstantin Belousov
5edf7227ec pseudofs: limit writes to 1M
Noted and reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D29752
2021-04-14 10:23:21 +03:00
Konstantin Belousov
116f26f947 sbuf_uionew(): sbuf_new() takes int as length
and length should be not less than SBUF_MINSIZE

Reported and tested by:	pho
Noted and reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D29752
2021-04-14 10:23:20 +03:00
Vladimir Kondratyev
fb451895fb ichsmb: Add PCI ID for Intel Gemini Lake SMBus controller
Submitted by:	Dmitry Luhtionov <dmitryluhtionov@gmail.com>
MFC after:	2 weeks
2021-04-14 03:58:07 +03:00
Navdeep Parhar
d107ee06f3 cxgbe(4): RSS hash for VXLAN traffic is computed from the inner frame.
Sponsored by:	Chelsio Communications
2021-04-13 16:50:12 -07:00
John Baldwin
774c4c82ff TOE: Use a read lock on the PCB for syncache_add().
Reviewed by:	np, glebius
Fixes:		08d9c92027
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D29739
2021-04-13 16:31:04 -07:00
Tai-hwa Liang
d9b61e7153 if_firewire: fixing panic upon packet reception for VNET build
netisr_dispatch_src() needs valid VNET pointer or firewire_input() will panic
when receiving a packet.

Reviewed by:	glebius
MFC after:	2 weeks
2021-04-13 22:59:58 +00:00
Mark Johnston
06a53ecf24 malloc: Add state transitions for KASAN
- Reuse some REDZONE bits to keep track of the requested and allocated
  sizes, and use that to provide red zones.
- As in UMA, disable memory trashing to avoid unnecessary CPU overhead.

MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D29461
2021-04-13 17:42:21 -04:00
Mark Johnston
f1c3adefd9 execve: Mark exec argument buffers
We cache mapped execve argument buffers to avoid the overhead of TLB
shootdowns.  Mark them invalid when they are freed to the cache.

MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D29460
2021-04-13 17:42:21 -04:00
Mark Johnston
b261bb4057 vfs: Add KASAN state transitions for vnodes
vnodes are a bit special in that they may exist on per-CPU lists even
while free.  Add a KASAN-only destructor that poisons regions of each
vnode that are not expected to be accessed after a free.

MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D29459
2021-04-13 17:42:21 -04:00
Mark Johnston
2b914b85dd kmem: Add KASAN state transitions
Memory allocated with kmem_* is unmapped upon free, so KASAN doesn't
provide a lot of benefit, but since allocations are always a multiple of
the page size we can create a redzone when the allocation request size
is not a multiple of the page size.

MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D29458
2021-04-13 17:42:21 -04:00
Mark Johnston
244f3ec642 kstack: Add KASAN state transitions
We allocate kernel stacks using a UMA cache zone.  Cache zones have
KASAN disabled by default, but in this case it makes sense to enable it.

Reviewed by:	andrew
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D29457
2021-04-13 17:42:21 -04:00
Mark Johnston
09c8cb717d uma: Add KASAN state transitions
- Add a UMA_ZONE_NOKASAN flag to indicate that items from a particular
  zone should not be sanitized.  This is applied implicitly for NOFREE
  and cache zones.
- Add KASAN call backs which get invoked:
  1) when a slab is imported into a keg
  2) when an item is allocated from a zone
  3) when an item is freed to a zone
  4) when a slab is freed back to the VM

  In state transitions 1 and 3, memory is poisoned so that accesses will
  trigger a panic.  In state transitions 2 and 4, memory is marked
  valid.
- Disable trashing if KASAN is enabled.  It just adds extra CPU overhead
  to catch problems that are detected by KASAN.

MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D29456
2021-04-13 17:42:21 -04:00
Mark Johnston
f115c06121 amd64: Add MD bits for KASAN
- Initialize KASAN before executing SYSINITs.
- Add a GENERIC-KASAN kernel config, akin to GENERIC-KCSAN.
- Increase the kernel stack size if KASAN is enabled.  Some of the
  ASAN instrumentation increases stack usage and it's enough to
  trigger stack overflows in ZFS.
- Mark the trapframe as valid in interrupt handlers if it is
  assigned to td_intr_frame.  Otherwise, an interrupt in a function
  which creates a poisoned alloca region can trigger false positives.

MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D29455
2021-04-13 17:42:20 -04:00
Mark Johnston
6faf45b34b amd64: Implement a KASAN shadow map
The idea behind KASAN is to use a region of memory to track the validity
of buffers in the kernel map.  This region is the shadow map.  The
compiler inserts calls to the KASAN runtime for every emitted load
and store, and the runtime uses the shadow map to decide whether the
access is valid.  Various kernel allocators call kasan_mark() to update
the shadow map.

Since the shadow map tracks only accesses to the kernel map, accesses to
other kernel maps are not validated by KASAN.  UMA_MD_SMALL_ALLOC is
disabled when KASAN is configured to reduce usage of the direct map.
Currently we have no mechanism to completely eliminate uses of the
direct map, so KASAN's coverage is not comprehensive.

The shadow map uses one byte per eight bytes in the kernel map.  In
pmap_bootstrap() we create an initial set of page tables for the kernel
and preloaded data.

When pmap_growkernel() is called, we call kasan_shadow_map() to extend
the shadow map.  kasan_shadow_map() uses pmap_kasan_enter() to allocate
memory for the shadow region and map it.

Reviewed by:	kib
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D29417
2021-04-13 17:42:20 -04:00
Mark Johnston
38da497a4d Add the KASAN runtime
KASAN enables the use of LLVM's AddressSanitizer in the kernel.  This
feature makes use of compiler instrumentation to validate memory
accesses in the kernel and detect several types of bugs, including
use-after-frees and out-of-bounds accesses.  It is particularly
effective when combined with test suites or syzkaller.  KASAN has high
CPU and memory usage overhead and so is not suited for production
environments.

The runtime and pmap maintain a shadow of the kernel map to store
information about the validity of memory mapped at a given kernel
address.

The runtime implements a number of functions defined by the compiler
ABI.  These are prefixed by __asan.  The compiler emits calls to
__asan_load*() and __asan_store*() around memory accesses, and the
runtime consults the shadow map to determine whether a given access is
valid.

kasan_mark() is called by various kernel allocators to update state in
the shadow map.  Updates to those allocators will come in subsequent
commits.

The runtime also defines various interceptors.  Some low-level routines
are implemented in assembly and are thus not amenable to compiler
instrumentation.  To handle this, the runtime implements these routines
on behalf of the rest of the kernel.  The sanitizer implementation
validates memory accesses manually before handing off to the real
implementation.

The sanitizer in a KASAN-configured kernel can be disabled by setting
the loader tunable debug.kasan.disable=1.

Obtained from:	NetBSD
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D29416
2021-04-13 17:42:20 -04:00
Mark Johnston
01028c736c Add a KASAN option to the kernel build
LLVM support for enabling KASAN has not yet landed so the option is not
yet usable, but hopefully this will change soon.

Reviewed by:	imp, andrew
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D29454
2021-04-13 17:42:20 -04:00
Mitchell Horne
5742f2d89c arm64: adjust comments in dbg_monitor_exit()
These comments were copied from dbg_monitor_enter(), but the intended
modifications weren't made. Update them to reflect what this code
actually does.

MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
2021-04-13 14:41:31 -03:00
Mitchell Horne
a2a8b582bd arm64: clear debug registers after execve(2)
This is both intuitive and required, as any previous breakpoint settings
may not be applicable to the new process.

Reported by:	arichardson
Reviewed by:	kib
MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D29672
2021-04-13 14:41:03 -03:00
Konstantin Belousov
8cca7b7f28 nfs client: depend on xdr
Since 7763814fc9 nfsrpc_setclient() uses mem_alloc() that is macro
around malloc(M_RPC).  M_RPC is provided by xdr.ko.

Reviewed by:	rmacklem
Sponsored by:	Mellanox Technologies/NVidia Networking
MFC after:	1 week
2021-04-13 18:04:43 +03:00
Edward Tomasz Napierala
ca6e1fa3ce linux: adjust ordering of Linux auxv and add dummy AT_HWCAP2
This should be a no-op; the purpose of this is to reduce
a spurious difference between Linuxulator and Linux, to make
debugging core dumps slightly easier.

Note that AT_HWCAP2 we pass to Linux binaries is always 0,
instead of being equal to 'cpu_feature2'.  This matches what
I've observed under Ubuntu Focal VM.

Reviewed By:	chuck, dchagin
Sponsored By:	EPSRC
Differential Revision:	https://reviews.freebsd.org/D29609
2021-04-13 13:14:30 +01:00
Kurosawa Takahiro
2aa21096c7 pf: Implement the NAT source port selection of MAP-E Customer Edge
MAP-E (RFC 7597) requires special care for selecting source ports
in NAT operation on the Customer Edge because a part of bits of the port
numbers are used by the Border Relay to distinguish another side of the
IPv4-over-IPv6 tunnel.

PR:		254577
Reviewed by:	kp
Differential Revision:	https://reviews.freebsd.org/D29468
2021-04-13 10:53:18 +02:00
John Baldwin
45d5c28439 cxgbe: Ignore doomed virtual interfaces when updating the clip table.
A doomed VI does not have a valid ifnet.

Reported by:	Jithesh Arakkan @ Chelsio
Reviewed by:	np
MFC after:	1 week
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D29662
2021-04-12 14:36:40 -07:00
John Baldwin
76681661be OCF: Remove support for asymmetric cryptographic operations.
There haven't been any non-obscure drivers that supported this
functionality and it has been impossible to test to ensure that it
still works.  The only known consumer of this interface was the engine
in OpenSSL < 1.1.  Modern OpenSSL versions do not include support for
this interface as it was not well-documented.

Reviewed by:	cem
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D29736
2021-04-12 14:28:43 -07:00
John Baldwin
89df484739 iscsi: Kick threads out of iscsi_ioctl() during unload.
iscsid can be sleeping in iscsi_ioctl() causing the destroy_dev() to
sleep forever if iscsi.ko is unloaded while iscsid is running.

Reported by:	Jithesh Arakkan @ Chelsio
Reviewed by:	mav
MFC after:	1 week
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D29688
2021-04-12 13:58:21 -07:00
John Baldwin
568e69e4eb cxgbe: Add counters for iSCSI PDUs transmitted via TOE.
Reviewed by:	np
MFC after:	1 week
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D29297
2021-04-12 13:57:45 -07:00
Warner Losh
662053e8dc hptrr: Move to using .o files
Use .o files directly. Replace the .o.uu files that we uudecode with .o files.
Adjust the kernel and module build to cope.

Suggestions by:		markj@, emaste@
Sposnored by:           Netflix, Inc
Differential Revision:	https://reviews.freebsd.org/D29636
2021-04-12 13:47:55 -06:00
Warner Losh
fddb3f4d7d hptmv: use .o files directly
uudecode the .o.uu files and commit directly to the tree. Adjust the build
infrastructure to cope with the new location, both for the kernel and modules.

Sposnored by:           Netflix, Inc
Differential Revision:	https://reviews.freebsd.org/D29635
2021-04-12 13:47:55 -06:00
Warner Losh
550cb4ab85 hpt27xx: store the .o files directly in the tree
Store the .o files directly in the tree. We no longer need to play uuencode
games like we did in the CVS days. Adjust the build infrastructure to match.

Reviewed by:            markj@
Sposnored by:           Netflix, Inc
Differential Revision:	https://reviews.freebsd.org/D29634
2021-04-12 13:47:55 -06:00
Warner Losh
5b20c5e1f8 hptnr: Store the .o files directly in the repo
We no longer need to use uuencode to uuencode files in our tree.  Store the .o
file directly instead. Adjust the build to cope with the new arrangement.

Suggestions by:		emaste, bz, donner
Reviewed by:		markm
Sposnored by:		Netflix, Inc
Differential Revision:	https://reviews.freebsd.org/D29632
2021-04-12 13:47:55 -06:00
Gleb Smirnoff
8d5719aa74 syncache: simplify syncache_add() KPI to return struct socket pointer
directly, not overwriting the listen socket pointer argument.
Not a functional change.
2021-04-12 08:27:40 -07:00
Gleb Smirnoff
08d9c92027 tcp_input/syncache: acquire only read lock on PCB for SYN,!ACK packets
When packet is a SYN packet, we don't need to modify any existing PCB.
Normally SYN arrives on a listening socket, we either create a syncache
entry or generate syncookie, but we don't modify anything with the
listening socket or associated PCB. Thus create a new PCB lookup
mode - rlock if listening. This removes the primary contention point
under SYN flood - the listening socket PCB.

Sidenote: when SYN arrives on a synchronized connection, we still
don't need write access to PCB to send a challenge ACK or just to
drop. There is only one exclusion - tcptw recycling. However,
existing entanglement of tcp_input + stacks doesn't allow to make
this change small. Consider this patch as first approach to the problem.

Reviewed by:	rrs
Differential revision:	https://reviews.freebsd.org/D29576
2021-04-12 08:25:31 -07:00
Mitchell Horne
2816bd8442 rmlock(9): add an RM_DUPOK flag
Allows for duplicate locks to be acquired without witness complaining.
Similar flags exists already for rwlock(9) and sx(9).

Reviewed by:	markj
MFC after:	3 days
Sponsored by:	NetApp, Inc.
Sponsored by:	Klara, Inc.
NetApp PR:	52
Differential Revision:	https://reviews.freebsd.org/D29683n
2021-04-12 11:42:21 -03:00
Hans Petter Selasky
7497dd5889 Fix build of stand/usb .
MFC after:	1 week
Sponsored by:	Mellanox Technologies // NVIDIA Networking
2021-04-12 16:13:33 +02:00
Mark Johnston
dfff37765c Rename struct device to struct _device
types.h defines device_t as a typedef of struct device *.  struct device
is defined in subr_bus.c and almost all of the kernel uses device_t.
The LinuxKPI also defines a struct device, so type confusion can occur.

This causes bugs and ambiguity for debugging tools.  Rename the FreeBSD
struct device to struct _device.

Reviewed by:	gbe (man pages)
Reviewed by:	rpokala, imp, jhb
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D29676
2021-04-12 09:32:30 -04:00
Mark Johnston
3f322b22e0 linuxkpi: Fix pcie_set_readrq()
We were passing a LinuxKPI struct device * to a pci(4) function that
expects a device_t.

Reviewed by:	manu, hselasky, bz
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D29675
2021-04-12 09:32:21 -04:00
Mark Johnston
56cbd386fb qlnxr: Properly initialize the Linux device structure
The driver needs to provide a LinuxKPI device structure to register
itself with the IB subsystem.  It was erroneously using a copy of its
FreeBSD device structure for this purpose.

Use linux_pci_attach_device() instead, following the example of the
Chelsio iwarp driver.  Also ensure that we don't leak the faked device
during detach.

Reviewed by:	hselasky
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D29595
2021-04-12 09:32:08 -04:00
Mark Johnston
9771af4942 cxgb: Use device_t in preference to struct device *
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2021-04-12 09:32:04 -04:00
Mark Johnston
d8b1601d54 al_eth: Use device_t in preference to struct device *
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2021-04-12 09:32:02 -04:00
Mark Johnston
f66a1f4074 genet: Use device_t in preference to struct device *
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2021-04-12 09:31:58 -04:00
Kristof Provost
5e98cae661 pf: Ensure that we don't use kif passed to pfi_kkif_attach()
Once a kif is passed to pfi_kkif_attach() we must ensure we never re-use
it for anything else.
Set the kif to NULL afterwards to guarantee this.

Reported-by: syzbot+be5d4f4a7a4c295e659a@syzkaller.appspotmail.com
MFC after:	4 weeks
Sponsored by:	Rubicon Communications, LLC ("Netgate")
2021-04-12 11:55:21 +02:00
Andrew Turner
3da5983889 Remove versatile support
It was used for testing armv6 under QEMU, however since then we added
support for the QEMU virt platform.

Reviewed by:	imp, manu
Differential Revision:	https://reviews.freebsd.org/D29707
2021-04-12 06:16:31 +00:00
Andrew Turner
5d2d599d3f Create VM_MEMATTR_DEVICE on all architectures
This is intended to be used with memory mapped IO, e.g. from
bus_space_map with no flags, or pmap_mapdev.

Use this new memory type in the map request configured by
resource_init_map_request, and in pciconf.

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D29692
2021-04-12 06:15:31 +00:00
Navdeep Parhar
bf5057691b cxgbe/tom: Fix potential leak in t4_aiotx_process_job.
The mbuf allocated could be a chain and must be freed with m_freem.

Reviewed by:	jhb@
MFC after:	1 week
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D29579
2021-04-11 19:14:18 -07:00
Rick Macklem
9edaceca81 nfsd: cut the Linux NFSv4.1/4.2 some slack w.r.t. RFC5661
Recent testing of network partitioning a FreeBSD NFSv4.1
server from a Linux NFSv4.1 client identified problems
with both the FreeBSD server and Linux client.

Sometimes, after some Linux NFSv4.1/4.2 clients establish
a new TCP connection, they will advance the sequence number
for a session slot by 2 instead of 1.
RFC5661 specifies that a server should reply
NFS4ERR_SEQ_MISORDERED for this case.
This might result in a system call error in the client and
seems to disable future use of the slot by the client.
Since advancing the sequence number by 2 seems harmless,
allow this case if vfs.nfs.linuxseqsesshack is non-zero.

Note that, if the order of RPCs is actually reversed,
a subsequent RPC with a smaller sequence number value
for the slot will be received.  This will result in
a NFS4ERR_SEQ_MISORDERED reply.
This has not been observed during testing.
Setting vfs.nfs.linuxseqsesshack to 0 will provide
RFC5661 compliant behaviour.

This fix affects the fairly rare case where a NFSv4
Linux client does a TCP reconnect and then apparently
erroneously increments the sequence number for the
session slot twice during the reconnect cycle.

PR:	254816
MFC after:	2 weeks
2021-04-11 16:51:25 -07:00
Vladimir Kondratyev
774cbf9b64 hv_kbd: Fix leaked $FreeBSD$ expansion
MFC with:	c2a159286c
2021-04-12 02:16:22 +03:00
Vladimir Kondratyev
e4643aa4c4 hv_kbd: Add support for K_XLATE and K_CODE modes for gen 2 VMs
That fixes disabled keyboard input after Xorg server has been stopped.

Reviewed by:	whu
MFC after:	1 month
Differential revision:	https://reviews.freebsd.org/D28171
2021-04-12 02:14:12 +03:00
Vladimir Kondratyev
c2a159286c hv_kbd: Add evdev protocol support for gen 2 VMs
Reviewed by:	whu
MFC after:	1 month
Differential revision:	https://reviews.freebsd.org/D28170
2021-04-12 02:14:12 +03:00
Rick Macklem
e152bbecb2 param.h: bump __FreeBSD_version for commit 7763814fc9
Commit 7763814fc9 changed the internal KAPI between the krpc
and NFS.  As such, the krpc, nfscommon and nfscl modules must
all be rebuilt from sources.
2021-04-11 14:50:56 -07:00
Rick Macklem
7763814fc9 nfsv4 client: do the BindConnectionToSession as required
During a recent testing event, it was reported that the NFSv4.1/4.2
server erroneously bound the back channel to a new TCP connection.
RFC5661 specifies that the fore channel is implicitly bound to a
new TCP connection when an RPC with Sequence (almost any of them)
is done on it.  For the back channel to be bound to the new TCP
connection, an explicit BindConnectionToSession must be done as
the first RPC on the new connection.

Since new TCP connections are created by the "reconnect" layer
(sys/rpc/clnt_rc.c) of the krpc, this patch adds an optional
upcall done by the krpc whenever a new connection is created.
The patch also adds the specific upcall function that does a
BindConnectionToSession and configures the krpc to call it
when required.

This is necessary for correct interoperability with NFSv4.1/NFSv4.2
servers when the nfscbd daemon is running.

If doing NFSv4.1/NFSv4.2 mounts without this patch, it is
recommended that the nfscbd daemon not be running and that
the "pnfs" mount option not be specified.

PR:	254840
Comments by:	asomers
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D29475
2021-04-11 14:34:57 -07:00
Vincenzo Maffione
70275a6735 netmap: don't use linux type struct device *
Such type cannot be used in code that is in common between
FreeBSD and Linux. Use the FreeBSD type instead.

MFC after:	3 days
Reported by:	markj
Differential Revision:	https://reviews.freebsd.org/D29677
2021-04-11 21:13:01 +00:00
Hans Petter Selasky
5a3426f453 if_smsc: Add the ability to disable "turbo_mode", also called RX frame batching,
similarly to the Linux driver, by a tunable read only sysctl.

Submitted by:	Oleg Sidorkin <osidorkin@gmail.com>
PR:		254884
MFC after:	1 week
Sponsored by:	Mellanox Technologies // NVIDIA Networking
2021-04-11 20:25:58 +02:00
Alexander V. Chernikov
afbb64f1d8 Fix vlan creation for the older ifconfig(8) binaries.
Reported by:	allanjude
MFC after:	immediately
2021-04-11 18:13:09 +01:00
Edward Tomasz Napierala
ec5325dbca cam: make sure to clear even more CCBs allocated on the stack
This is my second pass, this time over all of CAM except
for the SCSI target bits.  There should be no functional
changes.

Reviewed By:	imp
Sponsored by:	NetApp, Inc.
Sponsored by:	Klara, Inc.
Differential Revision:	https://reviews.freebsd.org/D29549
2021-04-11 15:24:22 +01:00
Konstantin Belousov
a091c35323 ptrace: restructure comments around reparenting on PT_DETACH
style code, and use {} for both branches.

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2021-04-11 14:44:30 +03:00
Konstantin Belousov
9d7e450b64 ptrace: remove dead call to FIX_SSTEP()
It was an alias for procfs_fix_sstep() long time ago.

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2021-04-11 14:44:30 +03:00
Piotr Pawel Stefaniak
a212f56d10 Balance parentheses in sysctl descriptions 2021-04-11 10:30:55 +02:00
Mateusz Guzik
97ed4babb5 zfs: avoid memory allocation in arc_prune_async 2021-04-11 07:19:56 +02:00
Mateusz Guzik
feea35bed0 zfs: make vnlru_free_vfsops use conditional on version
Diff reduction against upstream.
2021-04-11 04:57:26 +00:00
Rick Macklem
22cefe3d83 nfsd: fix replies from session cache for multiple retries
Recent testing of network partitioning a FreeBSD NFSv4.1
server from a Linux NFSv4.1 client identified problems
with both the FreeBSD server and Linux client.

Commit 05a39c2c1c fixed replying with the cached reply in
in the session slot if same session slot sequence#.
However, the code uses the reply and, as such,
will fail for a subsequent retry of the RPC.
A subsequent retry would be an extremely rare event,
but this patch fixes this, so long as m_copym(..M_NOWAIT)
does not fail, which should also be a rare event.

This fix affects the exceedingly rare case where a NFSv4
client retries a non-idempotent RPC, such as a lock
operation, multiple times.  Note that retries only occur
after the client has needed to create a new TCP connection,
with a new TCP connection for each retry.

MFC after:	2 weeks
2021-04-10 15:50:25 -07:00
Mateusz Guzik
a7fbfdee73 zfs: change format string in zio_fini to get rid of the cast 2021-04-10 20:33:43 +00:00
Alexander V. Chernikov
7f5f3fcc32 Fix direct route installation with net/bird.
Slighly relax the gateway validation rules imposed by the
 2fe5a79425, by requiring only first 8 bytes (everyhing
 before sdl_data to be present in the AF_LINK gateway.

Reported by:	olivier
2021-04-10 16:31:16 +01:00
Alexander V. Chernikov
63dceebe68 Appease -Wsign-compare in radix.c
Differential Revision:	https://reviews.freebsd.org/D29661
Submitted by:	zec
MFC after	2 weeks
2021-04-10 13:48:25 +00:00
Alexander V. Chernikov
caf2f62765 Allow to specify debugnet fib in sysctl/tunable.
Differential Revision:	https://reviews.freebsd.org/D29593
Reviewed by:		donner
MFC after:		2 weeks
2021-04-10 13:47:49 +00:00
Alexander V. Chernikov
c3a456defa Always use inp fib in the inp_lookup_mcast_ifp().
inp_lookup_mcast_ifp() is static and is only used in the inp_join_group().
The latter function is also static, and is only used in the inp_setmoptions(),
 which relies on inp being non-NULL.

As a result, in the current code, inp_lookup_mcast_ifp() is always called
 with non-NULL inp. Eliminate unused RT_DEFAULT_FIB condition and always
 use inp fib instead.

Differential Revision:	https://reviews.freebsd.org/D29594
Reviewed by:		kp
MFC after:		2 weeks
2021-04-10 13:47:49 +00:00
Kristof Provost
a9b338b260 pf: Move prototypes for userspace functions to userspace header
These functions no longer exist in the kernel, so there's no reason to
keep the prototypes in a kernel header. Move them to pfctl where they're
actually implemented.

Reviewed by:	glebius
MFC after:	4 weeks
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D29643
2021-04-10 11:16:02 +02:00
Kristof Provost
d710367d11 pf: Implement nvlist variant of DIOCGETRULE
MFC after:	4 weeks
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D29559
2021-04-10 11:16:01 +02:00
Kristof Provost
5c62eded5a pf: Introduce nvlist variant of DIOCADDRULE
This will make future extensions of the API much easier.
The intent is to remove support for DIOCADDRULE in FreeBSD 14.

Reviewed by:	markj (previous version), glebius (previous version)
MFC after:	4 weeks
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D29557
2021-04-10 11:16:00 +02:00
Konstantin Belousov
94172affa4 amd64: clear debug registers on execing 32bit Linux binary
Reviewed by:	jhb
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D29687
2021-04-10 04:25:02 +03:00
Konstantin Belousov
d50adfec9e amd64: clear debug registers on execing 32bit native binary
Reviewed by:	jhb
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D29687
2021-04-10 04:25:02 +03:00
Konstantin Belousov
2f15884747 amd64 linux64: use x86_clear_dbregs()
instead of manually inlining it

Reviewed by:	jhb
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D29687
2021-04-10 04:25:02 +03:00
Konstantin Belousov
290b0d123a x86: use x86_clear_dbregs() on fork
instead of manual zeroing of the debug registers file in pcb.
This centralizes the cleaning code, but the practical difference is
that PCB_DBREGS flag is cleared, saving some operations on context
switching.

Reviewed by:	jhb
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D29687
2021-04-10 04:25:02 +03:00
Konstantin Belousov
a8b75a57c9 x86: add x86_clear_dbregs() helper
Move the code from exec_setregs() to reset debug registers state on exec,
to the x86_clear_dbregs() helper

Reviewed by:	jhb
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D29687
2021-04-10 04:25:01 +03:00
John Baldwin
86e352c934 Fix a typo in a comment: frame -> framework.
MFC after:	1 week
Sponsored by:	Chelsio Communications
2021-04-09 16:10:55 -07:00
John Baldwin
6a06b00a0d nlmrsa: Remove this deprecated driver.
Relnotes:	yes
Sponsored by:	Chelsio Communications
2021-04-09 16:10:31 -07:00
Gleb Smirnoff
1a7fe55ab8 tcp_hostcache: make THC_LOCK/UNLOCK macros to work with hash head pointer.
Not a functional change.
2021-04-09 14:07:35 -07:00
Gleb Smirnoff
4f49e3382f tcp_hostcache: style(9)
Reviewed by:	rscheff
2021-04-09 14:07:27 -07:00
Gleb Smirnoff
7c71f3bd6a tcp_hostcache: remove extraneous check.
All paths leading here already checked this setting.

Reviewed by:	rscheff
2021-04-09 14:07:19 -07:00
Gleb Smirnoff
0c25bf7e7c tcp_hostcache: implement tcp_hc_updatemtu() via tcp_hc_update.
Locking changes are planned here, and without this change too
much copy-and-paste would be between these two functions.

Reviewed by:	rscheff
2021-04-09 14:06:44 -07:00
Konstantin Belousov
2fd1ffefaa Stop arming kqueue timers on knote owner suspend or terminate
This way, even if the process specified very tight reschedule
intervals, it should be stoppable/killable.

Reported and reviewed by:	markj
Tested by:	markj, pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D29106
2021-04-09 23:43:51 +03:00
Konstantin Belousov
533e5057ed Add helper for kqueue timers callout scheduling
Reviewed by:	markj
Tested by:	markj, pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D29106
2021-04-09 23:42:56 +03:00
Konstantin Belousov
4d27d8d2f3 Stop arming realtime posix process timers on suspend or terminate
Reported and reviewed by:	markj
Tested by:	markj, pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D29106
2021-04-09 23:42:51 +03:00
Konstantin Belousov
dc47fdf131 Stop arming periodic process timers on suspend or terminate
Reported and reviewed by:	markj
Tested by:	markj, pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D29106
2021-04-09 23:42:44 +03:00
Alexander V. Chernikov
ee2cf2b360 Implement better rebuild-delay fib algo policy.
The intent is to better handle time intervals with large amount of RIB
updates (e.g. BGP peer going up or down), while still keeping low sync
delay for the rest scenarios.

The implementation is the following: updates are bucketed into the
buckets of size 50ms. If the number of updates within a current bucket
 exceeds the threshold of 500 routes/sec (e.g. 10 updates per bucket
interval), the update is delayed for another 50ms. This can be repeated
 until the maximum update delay (1 sec) is reached.

All 3 variables are runtime tunables:

* net.route.algo.fib_max_sync_delay_ms: 1000
* net.route.algo.bucket_change_threshold_rate: 500
* net.route.algo.bucket_time_ms: 50

Differential Review:	https://reviews.freebsd.org/D29588
MFC after:		2 weeks
2021-04-09 21:33:03 +01:00
Vincenzo Maffione
172c5eb272 netmap: vtnet: remove unused variable
Reported by:	bdragon
2021-04-09 19:33:41 +00:00
Wojciech Macek
243000b19f pci_dw: Trim ATU windows bigger than 4GB
The size of the ATU MEM/IO windows is implicitly casted to uint32_t.
Because of that some window sizes were silently demoted to 0 and ignored.
Check the size if its too large, trim it to 4GB and print a warning message.

Submitted by: Kornel Duleba <mindal@semihalf.com>
Reviewed by: mw
Obtained from: Semihalf
Sponsored by: Marvell
Differential revision: https://reviews.freebsd.org/D29625
2021-04-09 09:37:59 +02:00
Konstantin Belousov
5af1131de7 struct mount uppers: correct locking annotations
It is all locked by the uppers' interlock.

Noted by:	Alexander Lochmann <alexander.lochmann@tu-dortmund.de>
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2021-04-09 01:06:26 +03:00
Rick Macklem
05a39c2c1c nfsd: fix replies from session cache for retried RPCs
Recent testing of network partitioning a FreeBSD NFSv4.1
server from a Linux NFSv4.1 client identified problems
with both the FreeBSD server and Linux client.

The FreeBSD server failec to reply using the cached
reply in the session slot when an RPC was retried on
the session slot, as indicated by same slot sequence#.

This patch fixes this.  It should also fix a similar
failure for NFSv4.0 mounts, when the sequence# in
the open/lock_owner requires a reply be done from
an entry locked into the DRC.

This fix affects the fairly rare case where a NFSv4
client retries a non-idempotent RPC, such as a lock
operation.  Note that retries only occur after the
client has needed to create a new TCP connection.

MFC after:	2 weeks
2021-04-08 14:04:22 -07:00
Alexander V. Chernikov
9e5243d7b6 Enforce check for using the return result for ifa?_try_ref().
Suggested by:	hps
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D29504
2021-04-05 03:35:19 +01:00
Richard Scheffenegger
b878ec024b tcp: Use jenkins_hash32() in hostcache
As other parts of the base tcp stack (eg.
tcp fastopen) already use jenkins_hash32,
and the properties appear reasonably good,
switching to use that.

Reviewed By: tuexen, #transport, ae
MFC after: 2 weeks
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D29515
2021-04-08 20:29:19 +02:00
Gleb Smirnoff
373ffc62c1 tcp_hostcache.c: remove unneeded includes.
Reviewed by:	rscheff
2021-04-08 10:58:44 -07:00
Gleb Smirnoff
29acb54393 tcp_hostcache: add bool argument for tcp_hc_lookup() to tell are we
looking to only read from the result, or to update it as well.
For now doesn't affect locking, but allows to push stats and expire
update into single place.

Reviewed by:	rscheff
2021-04-08 10:58:44 -07:00
Gleb Smirnoff
489bde5753 tcp_hostcache: hide rmx_hits/rmx_updates under ifdef.
They have little value unless you do some profiling investigations,
but they are performance bottleneck.

Reviewed by:	rscheff
2021-04-08 10:58:44 -07:00
Gleb Smirnoff
2cca4c0ee0 Remove tcp_hostcache.h. Everything is private.
Reviewed by:	rscheff
2021-04-08 10:58:44 -07:00
Richard Scheffenegger
90cca08e91 tcp: Prepare PRR to work with NewReno LossRecovery
Add proper PRR vnet declarations for consistency.
Also add pointer to tcpopt struct to tcp_do_prr_ack, in preparation
for it to deal with non-SACK window reduction (after loss).

No functional change.

MFC after: 2 weeks
Reviewed By: tuexen, #transport
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D29440
2021-04-08 19:16:31 +02:00
Richard Scheffenegger
9f2eeb0262 [tcp] Fix ECN on finalizing sessions.
A subtle oversight would subtly change new data packets
sent after a shutdown() or close() call, while the send
buffer is still draining.

MFC after: 3 days
Reviewed By: #transport, tuexen
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D29616
2021-04-08 15:26:09 +02:00
Andrew Turner
5998328e55 Clean up the style in the arm64 bus.h
MFC after:	2 weeks
Sponsored by:	Innovate UK
2021-04-08 10:27:11 +00:00
Mitchell Horne
1fd001db9c arm64: clear debug register state on fork
Following the analogous change for amd64 and i386 in 8223717ce6,
ensure that new processes start with these registers inactive.

PR:		254661
Reported by:	Michał Górny
Reviewed by:	kib, emaste
MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D29630
2021-04-08 09:41:41 -03:00
Kristof Provost
4967f672ef pf: Remove unused variable rt_listid from struct pf_krule
Reviewed by:	donner
MFC after:	4 weeks
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D29639
2021-04-08 13:24:35 +02:00
Mateusz Guzik
72b3b5a941 vfs: replace vfs_smr_quiesce with vfs_smr_synchronize
This ends up using a smr specific method.

Suggested by:	markj
Tested by:	pho
2021-04-08 11:14:45 +00:00
Andrew Turner
24b2f4ea49 arm64: Fix finding the pmc event ID
The lower pmc event bits were masked off to find the PMC event ID.
The doesn't work when there are more events. Switch it to use the
offser relative to the first event while also checking the ID is
in the expected range.

Reviewed by:	gnn, ray
Sponsored by:	Innovate UK
Differential Revision:	https://reviews.freebsd.org/D29600
2021-04-08 07:52:21 +00:00
Andrew Turner
d6a53211a7 Discard the arm64 VFP state before resetting it
When resetting the VFP state we need to discard any old state so we don't
try to save it on a context switch. Move this first so resetting the pcb
is safe to perform outside a critical section.

Reviewed by:	arichardson
Sponsored by:	Innovate UK
Differential Revision:	https://reviews.freebsd.org/D29401
2021-04-08 07:51:26 +00:00
Vincenzo Maffione
15dc713ceb netmap: vtnet: add support for netmap offsets
Follow-up change to a6d768d845.
This change adds support for netmap offsets.
2021-04-07 21:32:20 +00:00
Greg V
f689cb23b2 ipmi,smbios: move smbios_walk_table to smbios.h
This function will be used for exposing DMI info as sysctls in the
smbios module (in an upcoming review).

While here, add __packed to the structs.

Reviewed by:	dab
MFC after:	1 week
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D29270
2021-04-07 15:05:49 -05:00
Greg V
a29bff7a52 smbios: support getting address from EFI
On some systems (e.g. Lenovo ThinkPad X240, Apple MacBookPro12,1)
the SMBIOS entry point is not found in the <0xFFFFF space.

Follow the SMBIOS spec and use the EFI Configuration Table for
locating the entry point on EFI systems.

Reviewed by:	rpokala, dab
MFC after:	1 week
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D29276
2021-04-07 14:46:29 -05:00
Alexander Motin
5a8d32b53b Add IDs for ASMedia ASM116x PCIe 3.0 AHCI controllers.
MFC after:	1 week
2021-04-07 15:09:56 -04:00
Mark Johnston
0f07c234ca Remove more remnants of sio(4)
Reviewed by:	imp
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D29626
2021-04-07 14:33:02 -04:00
Mark Johnston
274579831b capsicum: Limit socket operations in capability mode
Capsicum did not prevent certain privileged networking operations,
specifically creation of raw sockets and network configuration ioctls.
However, these facilities can be used to circumvent some of the
restrictions that capability mode is supposed to enforce.

Add capability mode checks to disallow network configuration ioctls and
creation of sockets other than PF_LOCAL and SOCK_DGRAM/STREAM/SEQPACKET
internet sockets.

Reviewed by:	oshogbo
Discussed with:	emaste
Reported by:	manu
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D29423
2021-04-07 14:32:56 -04:00
Kristof Provost
6d786845cf pf: Do not short-circuit processing for REPLY_TO
When we find a state for packets that was created by a reply-to rule we
still need to process the packet. The state may require us to modify the
packet (e.g. in rdr or nat cases), which we won't do with the shortcut.

MFC after:	2 week
Sponsored by:	Rubicon Communications, LLC ("Netgate")
2021-04-07 17:03:17 +02:00
Kristof Provost
ab8d25880e libnv: Allow use in non-sleepable contexts
44c125c4ce switched the nvlist allocations
to be M_WAITOK, but this precludes the use in non-sleepable contexts.
(E.g. with a nonsleepable lock held).

All callers for these allocation functions already cope with memory
alloation failures, so there's no reason to allow sleeping during
allocations.

Reviewed by:	melifaro, oshogbo
MFC after:	1 week
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D29556
2021-04-07 15:54:10 +02:00
Alexander Motin
ac503c194c Introduce "soft" serseq variant.
With new ZFS prefetcher improvements it is no longer needed to fully
serialize reads to reach decent prediction hit rate.  Softer variant
only creates small time window to reduce races instead of completely
blocking following reads while previous is running.  It much less
hurts the performance in case of prediction miss.

MFC after:	1 month
2021-04-06 17:27:16 -04:00
Mateusz Guzik
13b3862ee8 cache: update an assert on CACHE_FPL_STATUS_ABORTED
Since symlink support it can get upgraded to CACHE_FPL_STATUS_DESTROYED.

Reported by:	bdrewery
2021-04-06 22:31:58 +02:00
Mark Johnston
2425f5e912 mount: Disallow mounting over a jail root
Discussed with:	jamie
Approved by:	so
Security:	CVE-2020-25584
Security:	FreeBSD-SA-21:10.jail_mount
2021-04-06 14:49:36 -04:00
Mark Johnston
982693bb72 vm_fault: Shoot down multiply mapped COW source page mappings
Reviewed by:	kib, rlibby
Discussed with:	alc
Approved by:	so
Security:	CVE-2021-29626
Security:	FreeBSD-SA-21:08.vm
2021-04-06 14:49:28 -04:00
Marcin Wojtas
9857e00a52 pci_user: fix build for 32-bit platforms
Commit: f2f1ab39c0 ("pci_user: call bus_translate_resource before BAR mmap")
broke build for 32-bit platforms due to rman_res_t and vm_paddr_t
incompatible types. Fix that.
2021-04-06 18:50:36 +02:00
Marcin Wojtas
f2f1ab39c0 pci_user: call bus_translate_resource before BAR mmap
On some armv8 machines it is possible that the mapping between CPU
and PCI bus BAR base addresses is not 1:1. In case a BAR is allocated
in kernel using bus_alloc_resource_any this translation is handled in
ofw_pci_activate_resource.

Do the same in pci_user.c by calling bus_translate_resource devmethod.
This fixes mmaping BARs to userspace on Marvell SoCs (Armada 7k8k/CN913x)
and possibly many other platforms.

Submitted by: Kornel Duleba <mindal@semihalf.com>
Reviewed by: kib
Obtained from: Semihalf
Sponsored by: Marvell
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D29604
2021-04-06 17:15:04 +02:00
Marcin Wojtas
57dbb3c259 pci_dw: fix outbound I/O window configuration
Use viewport "2" instead of "0" and change window type from MEM to IO.
Without these changes the MEM ATU window can be overwritten with the IO one.

Submitted by: Kornel Duleba <mindal@semihalf.com>
Obtained from: Semihalf
Sponsored by: Marvell
Differential revision: https://reviews.freebsd.org/D29516
2021-04-06 14:31:39 +02:00
Leandro Lupori
28d14569c8 powerpc64: add missing TLB invalidations to radix
Radix MMU code was missing TLB invalidations when some Level 3 PDEs were
modified. This caused TLB multi-hit machine check interrupts when
superpages were enabled.

Reviewed by:		jhibbits
MFC after:		2 weeks
Sponsored by:		Eldorado Research Institute (eldorado.org.br)
Differential Revision:	https://reviews.freebsd.org/D29511
2021-04-06 08:31:44 -03:00
Poul-Henning Kamp
6c709cbf03 Add Siemens SITOP UPS500S usb device 2021-04-06 10:56:27 +00:00
Konstantin Belousov
5b3b19db73 linuxkpi: remove erronously committed diff save file
Sponsored by:	Mellanox Technologies/NVidia Networking
MFC after:	1 week
2021-04-06 03:42:13 +03:00
Konstantin Belousov
8011fb795b linuxkpi: drop single-use variable
Reviewed by:	hselasky
Sponsored by:	Mellanox Technologies/NVidia Networking
MFC after:	1 week
2021-04-06 03:38:29 +03:00
Konstantin Belousov
f6b108837e linuxkpi: avoid counting per-thread use for the embedded linux cdevs
The counter is not used to control destroy.

Reviewed by:	hselasky
Sponsored by:	Mellanox Technologies/NVidia Networking
MFC after:	1 week
2021-04-06 03:38:29 +03:00
Konstantin Belousov
7f9867f8c6 linuxkpi: do not destroy/free embedded linux cdevs
They have their own lifetime managed by the containing objects.
Premature and unexpected free causes corruption.

Reviewed by:	hselasky
Sponsored by:	Mellanox Technologies/NVidia Networking
MFC after:	1 week
2021-04-06 03:38:29 +03:00
Konstantin Belousov
28b482e2ba linuxkpi: rename cdev to ldev
the variables hold pointers to a linux_cdev, not to a FreeBSD cdev.

Reviewed by:	hselasky
Sponsored by:	Mellanox Technologies/NVidia Networking
MFC after:	1 week
2021-04-06 03:38:28 +03:00
Konstantin Belousov
7b0125cbec linuxkpi: copy ldev into local to test and free the same pointer
Reviewed by:	hselasky
Sponsored by:	Mellanox Technologies/NVidia Networking
MFC after:	1 week
2021-04-06 03:38:28 +03:00
Konstantin Belousov
d36d681615 rtld dl_iterate_phdr(): dlpi_tls_data is wrong
dl_iterate_phdr() dlpi_tls_data should provide the TLS module segment
address, and not the TLS init segment address as it does now.

Reported by:	emacsray@gmail.com
PR:	254774
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2021-04-06 03:23:08 +03:00
Mark Johnston
843d16436d qat: Make prototypes consistent with the implementation
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2021-04-05 16:30:00 -04:00
Krzysztof Galazka
20a52706c8 ixl(4): Add tunable to override Flow Control settings
Add flow_control to hw.ixl tunables tree to let override
initial flow control configuration for all interfaces.
Keep using configuration set by NVM by default.

Reviewed by:	erj@, gallatin@
Tested by:	gowtham.kumar.ks_intel.com
MFC after:	1 week
Sponsored by:	Intel Corporation
Differential Revision:	https://reviews.freebsd.org/D29338
2021-04-05 11:17:55 -07:00
Brandon Bergren
0c9f52d4ce powerpc: Fix programmer's switch driver and add to GENERIC
Older G4 and G3 models have a programmer's switch that can be used to
generate an interrupt to drop into the debugger.

This code hadn't been tested for a long time. It had been broken back
in 2005 in r153050.

Repair and modernize the code and add it to GENERIC.

Reviewed by:	jhibbits (approved w/ removal of unused sc_dev var)
Sponsored by:	Tag1 Consulting, Inc.
Differential Revision:	https://reviews.freebsd.org/D29131
2021-04-05 12:04:12 -05:00
Justin Hibbits
16e549ebe2 Merge the QorIQ GPIO drivers between arm and powerpc
Summary:
They're nearly identical, so don't use two copies.  Merge the newer
driver into the older one, and move it to a common location.

Add the Semihalf and associated copyrights in addition to mine, since
it's a non-trivial amount of code merged.

Reviewed By: mw
Differential Revision: https://reviews.freebsd.org/D29520
2021-04-05 10:35:15 -05:00
Alexander Motin
5a898b2b78 Set PCIe device's Max_Payload_Size to match PCIe root's.
Usually on boot the MPS is already configured by BIOS.  But we've
found that on hot-plug it is not true at least for our Supermicro
X11 boards.  As result, mismatch between root's configuration of
256 bytes and device's default of 128 bytes cause problems for some
devices, while others seem to work fine.

MFC after:	1 month
Sponsored by:	iXsystems, Inc.
2021-04-05 10:34:40 -04:00
Kristof Provost
f4c0290916 pf: Add static DTrace probe points
These two have proven to be useful during debugging. We may as well keep
them permanently.
Others will be added as their utility becomes clear.

Reviewed by:	gnn
MFC after:	2 weeks
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D29555
2021-04-05 09:57:06 +02:00
Kristof Provost
829a69db85 pf: change pf_route so pf only runs when packets enter and leave the stack.
before this change pf_route operated on the semantic that pf runs
when packets go over an interface, so when pf_route changed which
interface the packet was on it would run pf_test again. this change
changes (restores) the semantic that pf is only supposed to run
when packets go in or out of the network stack, even if route-to
is responsibly for short circuiting past the network stack.

just to be clear, for normal packets (ie, those not touched by
route-to/reply-to/dup-to), there isn't a difference between running
pf when packets enter or leave the stack, or having pf run when a
packet goes over an interface.

the main reason for this change is that running the same packet
through pf multiple times creates confusion for the state table.
by default, pf states are floating, meaning that packets are matched
to states regardless of which interface they're going over. if a
packet leaving on em0 is rerouted out em1, both traversals will end
up using the same state, which at best will make the accounting
look weird, or at worst fail some checks in the state and get
dropped.

another reason for this commit is is to make handling of the changes
that route-to makes consistent with other changes that are made to
packet. eg, when nat is applied to a packet, we don't run pf_test
again with the new addresses.

the main caveat with this diff is you can't have one rule that
pushes a packet out a different interface, and then have a rule on
that second interface that NATs the packet. i'm not convinced this
ever worked reliably or was used much anyway, so we don't think
it's a big concern.

discussed with many, with special thanks to bluhm@, sashan@ and
sthen@ for weathering most of that pain.
ok claudio@ sashan@ jmatthew@

Obtained from:	OpenBSD
MFC after:	2 weeks
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D29554
2021-04-05 09:57:06 +02:00
Vincenzo Maffione
361e950180 iflib: add support for netmap offsets
Follow-up change to a6d768d845.
This change adds iflib support for netmap offsets, enabling
applications to use offsets on any driver backed by iflib.
2021-04-05 07:54:47 +00:00
Rick Macklem
7a606f280a nfsd: make the server repeat CB_RECALL every couple of seconds
Commit 01ae8969a9 stopped the NFSv4.1/4.2 server from implicitly
binding the back channel to a new TCP connection so that it
conforms to RFC5661, for NFSv4.1/4.2. An effect of this
for the Linux NFS client is that it will do a
BindConnectionToSession when it sees NFSV4SEQ_CBPATHDOWN
set in a sequence reply. This will fix the back channel, but the
first attempt at a callback like CB_RECALL will already have
failed. Without this patch, a CB_RECALL will not be retried
and that can result in a 5 minute delay until the delegation
times out.

This patch modifies the code so that it will retry the
CB_RECALL every couple of seconds, often avoiding the
5 minute delay.

This is not critical for correct behaviour, but avoids
the 5 minute delay for the case where the Linux client
re-binds the back channel via BindConnectionToSession.

MFC after:	2 weeks
2021-04-04 18:15:54 -07:00
Rick Macklem
6f2addd838 nfsd: fix BindConnectionToSession so that it clears "cb path down"
Commit 01ae8969a9 stopped the NFSv4.1/4.2 server from implicitly
binding the back channel to a new TCP connection so that it
conforms to RFC5661, for NFSv4.1/4.2. An effect of this
for the Linux NFS client is that it will do a
BindConnectionToSession when it sees NFSV4SEQ_CBPATHDOWN
set in a sequence reply. It will do this for every RPC
reply until it no longer sees the flag.
Without that patch, this will happen until the client does
an Open, which will clear LCL_CBDOWN.

This patch clears LCL_CBDOWN right away, so that
NFSV4SEQ_CBPATHDOWN will no longer be sent to the client
in Sequence replies and the Linux client will not repeat
the BindConnectionToSession RPCs.

This is not critical for correct behaviour, but reduces
RPC overheads for cases where the Open will not be done
for a while.

MFC after:	2 weeks
2021-04-04 15:05:39 -07:00
Konstantin Belousov
89619b747b Add sysctl debug.uma_reclaim
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2021-04-04 20:39:06 +03:00
Konstantin Belousov
51a7be5f60 Style
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2021-04-04 20:39:06 +03:00
Edward Tomasz Napierala
7f6157f7fd lock_delay(9): improve interaction with restrict_starvation
After e7a5b3bd05, the la->delay value was adjusted after
being set by the starvation_limit code block, which is wrong.

Reported By:	avg
Reviewed By:	avg
Fixes:		e7a5b3bd05
Sponsored By:	NetApp, Inc.
Sponsored By:	Klara, Inc.
Differential Revision:	https://reviews.freebsd.org/D29513
2021-04-03 13:08:53 +01:00
Mark Johnston
4d221f59b8 fbt: Remove some handling for multiple CTF containers
This was ported from illumos but not completely done.  Currently we do
not perform type deduplication between KLDs and the kernel, i.e., kernel
modules have a complete type graph.  So, remove it for now since it's
not functional and complicates the task of modifying various CTF type
definitions, and we are hitting some limits in the current format which
necessitate an update.

No functional change intended.

MFC after:	2 weeks
2021-04-02 17:49:13 -04:00
Mark Johnston
52a99c72b5 sendfile: Fix error initialization in sendfile_getobj()
Reviewed by:	chs, kib
Reported by:	jhb
Fixes:		faa998f6ff
MFC after:	1 day
Differential Revision:	https://reviews.freebsd.org/D29540
2021-04-02 17:42:38 -04:00
Richard Scheffenegger
a04906f027 fix typo in 38ea2bd069 2021-04-02 20:34:33 +02:00
Richard Scheffenegger
38ea2bd069 Use sbuf_drain unconditionally
After making sbuf_drain safe for external use,
there is no need to protect the call.

MFC after: 2 weeks
Reviewed By: tuexen, #transport
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D29545
2021-04-02 20:27:46 +02:00
Richard Scheffenegger
cad4fd0365 Make sbuf_drain safe for external use
While sbuf_drain was an internal function, two
KASSERTS checked the sanity of it being called.
However, an external caller may be ignorant if
there is any data to drain, or if an error has
already accumulated. Be nice and return immediately
with the accumulated error.

MFC after: 2 weeks
Reviewed By: tuexen, #transport
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D29544
2021-04-02 20:12:11 +02:00
Konstantin Belousov
aa3ea612be x86: remove gcov kernel support
Reviewed by:	jhb
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D29529
2021-04-02 15:41:51 +03:00
Konstantin Belousov
76b1b5ce6d nullfs: protect against user creating inconsistent state
The VFS conventions is that VOP_LOOKUP() methods do not need to handle
ISDOTDOT lookups for VV_ROOT vnodes (since they cannot, after all).  Nullfs
bypasses VOP_LOOKUP() to lower filesystem, and there, due to user actions,
it is possible to get into situation where
- upper vnode does not have VV_ROOT set
- lower vnode is root
- ISDOTDOT is requested
User just needs to nullfs-mount non-root of some filesystem, and then move
some directory under mount, out of mount, using lower filesystem.

In this case, nullfs cannot do much, but we still should and can ensure
internal kernel structures are consistent.  Avoid ISDOTDOT lookup forwarding
when VV_ROOT is set on lower dvp, return somewhat arbitrary ENOENT.

PR:	253593
Reported by:	Gregor Koscak <elogin41@gmail.com>
Test by:	Patrick Sullivan <sulli00777@gmail.com>
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2021-04-02 15:40:25 +03:00
Vincenzo Maffione
9bad2638cc netmap: restore commit a56e6334d1
The fix in a56e6334d1
was accidentally reverted by commit 45c67e8f6b.
2021-04-02 10:45:47 +00:00
Bjoern A. Zeeb
37c3241a43 LinuxKPI: treat firmware file names more lenient
A lot of firmware files have a "-" in the name.  That "-" is a problem
when dealing with shell variables or loader (e.g., auto-loading .ko).
It may thus often be convenient to generate firmware kernel object files
with s/-/_/g in the name.  In order to automatically find them from
drivers using LinuxKPI also substitue the '-' for a '_' like we do
for '/' and '.' already.

Reviewed-by:	hselasky, manu (ok)
MFC-after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D29514
2021-04-02 10:03:39 +00:00
Bjoern A. Zeeb
4ded022d3b mlx5: remove dependency on ifnet specifics of linux/netdevice.h
Rename the last remaining bits depending on ifnet from linux/netdevice.h
instead of using the compat macros. This helps clearing up
struct netdevice being struct ifnet from linux/netdevice.h.

Sponsored-by:	The FreeBSD Foundation
MFC-after:	2 weeks
Reviewed-by:	hselasky, kib
X-D-R:		D29366
Differential Revision:	https://reviews.freebsd.org/D29497
2021-04-02 10:01:30 +00:00
Dmitry Chagin
a78109d5db Partially revert r248770.
Under geom(4) nvme_ns_bio_process() is on the path where sleep
is prohibited as g_io_shedule_down() calls THREAD_NO_SLEEPNG()
before geom->start().

Reviewed By:		imp
MFC after:		2 weeks
Differential Revision:	https://reviews.freebsd.org/D29539
2021-04-02 11:43:17 +03:00
Vincenzo Maffione
45c67e8f6b netmap: several typo fixes
No functional changes intended.
2021-04-02 07:01:20 +00:00
Vincenzo Maffione
66671ae589 netmap: fix typo bug in netmap_compute_buf_len 2021-04-02 06:47:28 +00:00
Mateusz Guzik
f79bd71def cache: add high level overview
Differential Revision:	https://reviews.freebsd.org/D28675
2021-04-02 05:11:05 +02:00
Mateusz Guzik
dc532884d5 cache: fix resizing in face of lockless lookup
Reported by:	pho
Tested by:	pho
2021-04-02 05:11:05 +02:00
Mateusz Guzik
3f56bc7986 vfs: add vfs_smr_quiesce
This can be used to observe all CPUs not executing while within
vfs_smr_enter.
2021-04-02 05:11:05 +02:00
Lawrence Stewart
1eb402e47a stats(3): Improve t-digest merging of samples which result in mu adjustment underflow.
Allow the calculation of the mu adjustment factor to underflow instead of
rejecting the VOI sample from the digest and logging an error. This trades off
some (currently unquantified) additional centroid error in exchange for better
fidelity of the distribution's density, which is the right trade off at the
moment until follow up work to better handle and track accumulated error can be
undertaken.

Obtained from:	Netflix
MFC after:	immediately
2021-04-02 13:17:53 +11:00
Jung-uk Kim
429f71bf08 ACPICA: Fix build with options ACPI_DEBUG 2021-04-01 21:18:49 -04:00
Jung-uk Kim
cfd1ed4681 Merge ACPICA 20210331.
(cherry picked from commit 1e02e5b0ba8634758c128dcb43c67342c7219cd4)
2021-04-01 19:36:59 -04:00
John Baldwin
d2e076c37b ossl: Don't encryt/decrypt too much data for chacha20.
The loops for Chacha20 and Chacha20+Poly1305 which encrypted/decrypted
full blocks of data used the minimum of the input and output segment
lengths to determine the size of the next chunk ('todo') to pass to
Chacha20_ctr32().  However, the input and output segments could extend
past the end of the ciphertext region into the tag (e.g.  if a "plain"
single mbuf contained an entire TLS record).  If the length of the tag
plus the length of the last partial block together were at least as
large as a full Chacha20 block (64 bytes), then an extra block was
encrypted/decrypted overlapping with the tag.  Fix this by also
capping the amount of data to encrypt/decrypt by the amount of
remaining data in the ciphertext region ('resid').

Reported by:	gallatin
Reviewed by:	cem, gallatin, markj
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D29517
2021-04-01 15:49:07 -07:00
Rick Macklem
4e6c2a1ee9 nfsv4 client: factor loop contents out into a separate function
Commit fdc9b2d50f replaced a couple of while loops with LIST_FOREACH()
loops.  This patch factors the body of that loop out into a separate
function called nfscl_checkown().
This prepares the code for future changes to use a hash table of
lists for open searches via file handle.

This patch should not result in a semantics change.

MFC after:	2 weeks
2021-04-01 15:36:37 -07:00
Navdeep Parhar
516fe911a6 cxgbe(4): Always use the per-VI callout to read interface stats.
There is no change in the source of the stats (t4_get_port_stats or
t4_get_vi_stats) but the per-port callout is gone.

Sponsored by:	Chelsio Communications
Reviewed by:	jhb@
Differential Revision:	https://reviews.freebsd.org/D29527
2021-04-01 14:24:29 -07:00
Richard Scheffenegger
9aef4e7c2b tcp: Shouldn't drain empty sbuf
MFC after: 2 weeks
Reviewed By: tuexen, #transport
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D29524
2021-04-01 17:18:38 +02:00
Mark Johnston
cb5f8694a5 powernv: Include NUMA locality information in the CPU topology
ULE uses this topology to try and preserve locality when migrating
threads between CPUs and when performing work stealing.  Ensure that on
NUMA systems it will at least take the NUMA topology into account.

Reviewed by:	bdragon, jhibbits (previous version)
Tested by:	bdragon
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D28580
2021-04-01 10:25:57 -04:00
Richard Scheffenegger
02f26e98c7 tcp: Add hash histogram output and validate bucket length accounting
Provide a histogram output to check, if the hashsize or
bucketlimit could be optimized. Also add some basic sanity
checks around the accounting of the hash utilization.

MFC after: 2 weeks
Reviewed By: tuexen, #transport
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D29506
2021-04-01 14:44:14 +02:00
Richard Scheffenegger
529a2a0f27 tcp: For hostcache performance, use atomics instead of counters
As accessing the tcp hostcache happens frequently on some
classes of servers, it was recommended to use atomic_add/subtract
rather than (per-CPU distributed) counters, which have to be
summed up at high cost to cache efficiency.

PR: 254333
MFC after: 2 weeks
Sponsored by: NetApp, Inc.
Reviewed By: #transport, tuexen, jtl
Differential Revision: https://reviews.freebsd.org/D29522
2021-04-01 10:03:30 +02:00
Ka Ho Ng
03efa462b2 AMD-vi: Mixed format IVHD block should replace fixed format IVHD block
This fixes double IVHD_SETUP_INTR calls on the same IOMMU device.

Sponsored by:	The FreeBSD Foundation
MFC with:	74ada297e8
Reported by:	Oleg Ginzburg <olevole@olevole.ru>
Reviewed by:	grehan
Approved by:	philip (mentor)
Differential Revision:	https://reviews.freebsd.org/D29521
2021-04-01 15:31:24 +08:00
Ka Ho Ng
cf76495e0a AMD-vi: Fix mismatched NULL checking in amdiommu teardown path
Sponsored by:	The FreeBSD Foundation
Approved by:	lwhsu (mentor)
MFC with:	74ada297e8
2021-04-01 03:34:34 +08:00
Justin Hibbits
921716186f powerpc/aim: Update timebase directly on resume instead of through platform
This only works on single-CPU G4 systems, and more work is needed for
dual-CPU systems.  That said, platform sleep does not work, and this is
currently only used for PMU-based CPU speed change.

The elimination of the platform_smp_timebase_sync() call is so that the
timebase sync rendezvous can be enhanced to perform better
synchronization, which requires a full rendezvous.  This would be
impossible to do on this single-threaded run.
2021-03-31 13:34:06 -05:00
Justin Hibbits
b6d8f3b517 powerpc/powermac: Constrain 'cpu_sleep()' for AIM to mpc745x
Rename cpu_sleep() to mpc745x_sleep() to denote what it's actually
intended for.  This function is very G4-specific, and will not work on
any other CPU.  This will afterward eliminate a
platform_smp_timebase_sync() call by directly updating the timebase
instead.
2021-03-31 13:34:06 -05:00
Richard Scheffenegger
95e56d31e3 tcp: Make hostcache.cache_count MPSAFE by using a counter_u64_t
Addressing the underlying root cause for cache_count to
show unexpectedly high  values, by protecting all arithmetic on
that global variable by using counter(9).

PR:		254333
Reviewed By: tuexen, #transport
MFC after: 2 weeks
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D29510
2021-03-31 20:24:13 +02:00
Navdeep Parhar
5394893269 cxgbe/t4_tom: restore socket's protosw before entering TIME_WAIT.
This fixes a panic due to stale so->so_proto if t4_tom is unloaded and
one or more connections that were previously offloaded are still around
in TIME_WAIT state.

Reviewed by:	jhb@
MFC after:	1 week
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D29503
2021-03-31 10:54:32 -07:00
Richard Scheffenegger
869880463c tcp: drain tcp_hostcache_list in between per-bucket locks
Explicitly drain the sbuf after completing each hash bucket
to minimize the work performed while holding the hash
bucket lock.

PR:		254333
MFC after:	2 weeks
Reviewed By:	tuexen, jhb, #transport
Sponsored by: 	NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D29483
2021-03-31 19:24:21 +02:00
Richard Scheffenegger
c804c8f2c5 Export sbuf_drain to orchestrate lock and drain action
While exporting large amounts of data to a sysctl
request, datastructures may need to be locked.

Exporting the sbuf_drain function allows the
coordination between drain events and held
locks, to avoid stalls.

PR:		254333
Reviewed By:	jhb
MFC after:	2 weeks
Sponsored by:	NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D29481
2021-03-31 19:17:37 +02:00
Konstantin Belousov
baacf70137 vxlan: correct interface MTU when using hw offloads
Otherwise it breaks when offloading like checksum or TSO are used,
because second (encapsulated) ip_output() processing passes fragments of
the encapsulated packet down to the hardware interface.

Diagnosed by:	hselasky
Reviewed by:	np
Sponsored by:	Nvidia Networking / Mellanox Technologies
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D29501
2021-03-31 14:38:26 +03:00
Konstantin Belousov
e243367b64 mbuf: add a way to mark flowid as calculated from the internal headers
In some settings offload might calculate hash from decapsulated packet.
Reserve a bit in packet header rsstype to indicate that.

Add m_adj_decap() that acts similarly to m_adj, but also either clear
flowid if it is not marked as inner, or transfer it to the decapsulated
header, clearing inner indicator. It depends on the internals of m_adj()
that reuses the argument packet header for the result.

Use m_adj_decap() for decapsulating vxlan(4) and gif(4) input packets.

Reviewed by:	ae, hselasky, np
Sponsored by:	Nvidia Networking / Mellanox Technologies
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D28773
2021-03-31 14:38:26 +03:00
Justin Hibbits
a5f07fa0c6 powerpc/pseries: Add new hypercall definition, H_REGISTER_PROC_TBL
This will be used by the Radix MMU on pseries.

MFC after:	1 week
2021-03-30 21:22:21 -05:00
Justin Hibbits
895a22583d [PowerPC] Fix ISA_206 subword atomics
The POWER7 subword atomics were not using the correct instructions for
byte and halfword stores in the atomic_fcmpset code.

This only affects builds with custom CFLAGS that have explicitly enabled
ISA_206_ATOMICS.
2021-03-30 20:23:04 -05:00
Jason A. Harmening
8dc8feb53d Clean up a couple of MD warts in vm_fault_populate():
--Eliminate a big ifdef that encompassed all currently-supported
architectures except mips and powerpc32.  This applied to the case
in which we've allocated a superpage but the pager-populated range
is insufficient for a superpage mapping.  For platforms that don't
support superpages the check should be inexpensive as we shouldn't
get a superpage in the first place.  Make the normal-page fallback
logic identical for all platforms and provide a simple implementation
of pmap_ps_enabled() for MIPS and Book-E/AIM32 powerpc.

--Apply the logic for handling pmap_enter() failure if a superpage
mapping can't be supported due to additional protection policy.
Use KERN_PROTECTION_FAILURE instead of KERN_FAILURE for this case,
and note Intel PKU on amd64 as the first example of such protection
policy.

Reviewed by:	kib, markj, bdragon
Differential Revision:	https://reviews.freebsd.org/D29439
2021-03-30 18:15:55 -07:00
Justin Hibbits
74f6cb0f31 [PowerPC] Remove unused IPI type count tracking.
ipi_msg_count is inaccessible outside this file and is never read.

It was introduced in the original SMP support code in r178628 and was never
actually used anywhere.

Remove it to slightly improve IPI performance.

Submitted by:	jhibbits
MFC after:	1 week
2021-03-30 20:03:06 -05:00
Konstantin Belousov
8223717ce6 x86: clear %db registers in new process
Reported by:	 Michał Górny <mgorny@gentoo.org>
PR:	254661
Reviewed by:	emaste, jhb
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D29496
2021-03-31 02:07:35 +03:00
Rick Macklem
01ae8969a9 nfsd: do not implicitly bind the back channel for NFSv4.1/4.2 mounts
The NFSv4.1 (and 4.2 on 13) server incorrectly binds
a new TCP connection to the back channel when first
used by an RPC with a Sequence op in it (almost all of them).
RFC5661 specifies that only the fore channel should be bound.

This was done because early clients (including FreeBSD)
did not do the required BindConnectionToSession RPC.

Unfortunately, this breaks the Linux client when the
"nconnects" mount option is used, since the server
may do a callback on the incorrect TCP connection.

This patch converts the server behaviour to that
required by the RFC.  It also makes the server test/indicate
failure of the back channel more aggressively.

Until this patch is applied to the server, the
"nconnects" mount option is not recommended for a Linux
NFSv4.1/4.2 client mount to the FreeBSD server.

Reported by:	bcodding@redhat.com
Tested by:	bcodding@redhat.com
PR:		254560
MFC after:	1 week
2021-03-30 14:31:05 -07:00
Alexander V. Chernikov
b8598e2ff6 Add IPv4 fib lookup performance tests with uniform keys.
Submitted by:	zec
MFC after:	1 week
2021-03-30 14:32:28 +01:00
Leandro Lupori
75e67b4920 powerpc64: support superpages on pmap_mincore
Now that superpages for HPT MMU has landed, finish implementation of
pmap_mincore by adding support for superpages.

Submitted by:           Fernando Eckhardt Valle <fernando.valle@eldorado.org.br>
Reviewed by:            bdragon, luporl
MFC after:              1 week
Sponsored by:           Eldorado Research Institute (eldorado.org.br)
Differential Revision:  https://reviews.freebsd.org/D29230
2021-03-30 15:54:01 -03:00
Edward Tomasz Napierala
076686fe07 cam: make sure to clear CCBs allocated on the stack
This is required for small CCBs support, where we need to track
whether the CCB was allocated from an UMA zone or not.  There are
no (intended) functional changes with the current source.

Reviewed By:	imp
Sponsored by:	NetApp, Inc.
Sponsored by:	Klara, Inc.
Differential Revision:	https://reviews.freebsd.org/D29484
2021-03-30 19:15:43 +01:00
Greg V
8ebda6e44b efifb,vbefb: implement vd_fini
This removes the pmap entry when switching away to e.g. drm fb.

Differential Revision:	https://reviews.freebsd.org/D29020
MFC After:	1 month
2021-03-30 17:47:49 +02:00
Mitchell Horne
7c13440845 arm: add options GDB to std.armv6 and std.armv7
There are now options for specifying the debug port via tunable
(hw.fdt.dbgport) and device tree (/chosen/freebsd-dbgpath), so this can
be usefully included in GENERIC.

MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D29152
2021-03-30 12:04:24 -03:00
Bjoern A. Zeeb
7069b4c6a4 LinuxKPI/OFED: (re)move inetdevice.h implementation
The two functions in linux/inetdevice.h are highly FreeBSD/ifnet
specific.  This is a result of struct net_device being mapped to
struct ifnet.

The only known consumer of these functions are two files in the
ofed/infiniband code.

As a first step of cleaning up copy linux/inetdevice.h to
rdma/ib_addr_freebsd.h. (It stayed a separate file to preserve
copyright and license of the original file; otherwise it could be
merged into ib_addr.h where more EPOCH/vnet/.. are already used).

Slightly rename the function to not conflict with LinuxKPI
in the future.

Remove the three last, now unneeded includes of inetdevice.h and
zap linux/inetdevice.h to an empty header file with only the forward
include to netdevice.h remaining.

Sponsored-by:	The FreeBSD Foundation
MFC-after:	2 weeks
Reviewed-by:	hselasky, kib
X-D-R:		D29366 (extracted as further cleanup)
Differential Revision:	https://reviews.freebsd.org/D29434
2021-03-30 14:40:46 +00:00
Mitchell Horne
7446b0888d gdb: report specific stop reason for watchpoints
The remote protocol allows for implementations to report more specific
reasons for the break in execution back to the client [1]. This is
entirely optional, so it is only implemented for amd64, arm64, and i386
at the moment.

[1] https://sourceware.org/gdb/current/onlinedocs/gdb/Stop-Reply-Packets.html

Reviewed by:	jhb
MFC after:	3 weeks
Sponsored by:	NetApp, Inc.
Sponsored by:	Klara, Inc.
NetApp PR:	51
Differential Revision:	https://reviews.freebsd.org/D29174
2021-03-30 11:36:41 -03:00