Commit Graph

134556 Commits

Author SHA1 Message Date
Mark Johnston
93fb2b060b ntb: Fix the 32-bit build after r366969
Reported by:	Jenkins
MFC with:	r366969
2020-10-23 15:12:06 +00:00
Mark Johnston
6660ef6e91 ntb: Add Intel Xeon Gen3 support
The NTB hardware starting with Skylake has some changes to the register
map and the doorbell interface.  Add a new NTB_XEON_GEN3 device type and
use it to conditionalize driver logic that differs from the existing
Xeon code.

Reviewed by:	vangyzen
Discussed with:	cem, Bret Ketchum <Bret.Ketchum@dell.com>
MFC after:	1 month
Sponsored by:	NetApp, Inc.
Sponsored by:	Klara, Inc.
Differential Revision:	https://reviews.freebsd.org/D26683
2020-10-23 14:16:52 +00:00
Mark Johnston
97441fab87 ntb: Fix an assertion to permit >= 32 doorbells
MFC after:	1 week
Sponsored by:	NetApp, Inc.
Sponsored by:	Klara, Inc.
2020-10-23 14:15:58 +00:00
Edward Tomasz Napierala
1c7481377c Improve prctl(2) debug.
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D26899
2020-10-23 12:00:30 +00:00
Edward Tomasz Napierala
7135ca98d2 Add /proc/sys/kernel/ngroups_max to linprocfs(4). The id(1) command
seems to use it - it works fine without it, but still.

MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D26898
2020-10-23 11:57:55 +00:00
Hans Petter Selasky
a71074e0af Fix for loading cuse.ko via rc.d . Make sure we declare the cuse(3)
module by name and not only by the version information, so that
"kldstat -q -m cuse" works.

Found by:		Goran Mekic <meka@tilda.center>
MFC after:		1 week
Sponsored by:		Mellanox Technologies // NVIDIA Networking
2020-10-23 08:44:53 +00:00
Alan Cox
ccfd886a1b Conditionally compile struct vm_phys_seg's md_first field. This field is
only used by arm64's pmap.

Reviewed by:	kib, markj, scottph
Differential Revision:	https://reviews.freebsd.org/D26907
2020-10-23 06:24:38 +00:00
Navdeep Parhar
e2e43aafd7 cxgbe(4): Fix min/max typo in r366958. 2020-10-23 02:24:43 +00:00
Navdeep Parhar
b8b01d9be8 cxgbe(4): refine the values reported in if_ratelimit_query.
- Get the number of classes from chip_params.
- Get the number of ethofld tids from the firmware.
- Do not let tcp_ratelimit allocate all traffic classes.

Sponsored by:	Chelsio Communications
2020-10-23 01:36:54 +00:00
John Baldwin
8a82be5044 Handle CPL_RX_DATA on active TLS sockets.
In certain edge cases, the NIC might have only received a partial TLS
record which it needs to return to the driver.  For example, if the
local socket was closed while data was still in flight, a partial TLS
record might be pending when the connection is closed.  Receiving a
RST in the middle of a TLS record is another example.  When this
happens, the firmware returns the the partial TLS record as plain TCP
data via CPL_RX_DATA.  Handle these requests by returning an error to
OpenSSL (via so_error for KTLS or via an error TLS record header for
the older Chelsio OpenSSL interface).

Reported by:	Sony Arpita Das @ Chelsio
Reviewed by:	np
MFC after:	2 weeks
Sponsored by:	Chelsio Communications
Differential Revision:	Revision: https://reviews.freebsd.org/D26800
2020-10-23 00:23:54 +00:00
Alexander Motin
7dbbd1aeae Negotiate iSCSIProtocolLevel of 2 (RFC 7144) in initiator.
It does not change anything immediately, but allows further support of
Command Priority, Status Qualifier and new task management functions.

MFC after:	1 month
Sponsored by:	iXsystems, Inc.
2020-10-22 20:26:27 +00:00
Vincenzo Maffione
174f809da5 netmap: fix mutex double unlock bug
https://github.com/luigirizzo/netmap/pull/733

Submitted by:	 brian90013
MFC after:	3 days
2020-10-22 20:21:11 +00:00
Mateusz Guzik
c7520caa4f vfs: prevent avoidable evictions on mkdir of existing directories
mkdir -p /foo/bar/baz will mkdir each path component and ignore EEXIST.

The NOCACHE lookup will make the namecache unnecessarily evict the existing entry,
and then fallback to the fs lookup routine eventually leading namei to return an
error as the directory is already there.

For invocations like mkdir -p /usr/obj/usr/src/sys/GENERIC/modules this triggers
fallbacks to the slowpath for concurrently executing lookups.

Tested by:	pho
Discussed with:	kib
2020-10-22 19:28:12 +00:00
Mateusz Guzik
54f09403a3 cache: assert the created entry does not point to itself 2020-10-22 19:22:34 +00:00
Alan Cox
9e0ad88b82 Micro-optimize uma_small_alloc(). Replace bzero(..., PAGE_SIZE) by
pagezero().  Ultimately, they use the same method for bulk zeroing, but
the generality of bzero() requires size and alignment checks that
pagezero() does not.

Eliminate an unnecessary #include.

Reviewed by:	emaste, markj
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D26876
2020-10-22 17:47:51 +00:00
Jung-uk Kim
7cda7375e6 Add a new CCP device ID found on my Ryzen 5 3600XT.
MFC after:	1 week
2020-10-22 17:46:55 +00:00
Navdeep Parhar
610d345953 if_vxlan(4): csum_flags_to_inner_flags takes the tunnel protocol as a parameter.
No functional change.
2020-10-22 17:05:55 +00:00
Hans Petter Selasky
ce329aa256 Compile fix for MIPS, MIPS64, POWERPC and POWERPC64.
Add missing include files.

Differential Revision:	https://reviews.freebsd.org/D26254
Reviewed by:		melifaro@
MFC after:		1 week
Sponsored by:		Mellanox Technologies // NVIDIA Networking
2020-10-22 12:22:08 +00:00
Hans Petter Selasky
4c51d2963f Fix for monotolithic kernel builds using device lagg(4).
Differential Revision:	https://reviews.freebsd.org/D26254
Reviewed by:		melifaro@
MFC after:		1 week
Sponsored by:		Mellanox Technologies // NVIDIA Networking
2020-10-22 10:29:27 +00:00
Hans Petter Selasky
a92c4bb62a Add support for IP over infiniband, IPoIB, to lagg(4). Currently only
the failover protocol is supported due to limitations in the IPoIB
architecture. Refer to the lagg(4) manual page for how to configure
and use this new feature. A new network interface type,
IFT_INFINIBANDLAG, has been added, similar to the existing
IFT_IEEE8023ADLAG .

ifconfig(8) has been updated to accept a new laggtype argument when
creating lagg(4) network interfaces. This new argument is used to
distinguish between ethernet and infiniband type of lagg(4) network
interface. The laggtype argument is optional and defaults to
ethernet. The lagg(4) command line syntax is backwards compatible.

Differential Revision:	https://reviews.freebsd.org/D26254
Reviewed by:		melifaro@
MFC after:		1 week
Sponsored by:		Mellanox Technologies // NVIDIA Networking
2020-10-22 09:47:12 +00:00
Konstantin Belousov
18b8496c23 sysv_sem: semusz depends on semume.
Size of the per-process semaphore undo structure (semusz) depends on
the number of the per-process undos.  If kern.ipc.semume is adjusted,
semusz must be adjusted as well, and it makes no sense to delegate
adjustment to user.  Make it automatic.

Reported and tested by:	Olef <o.vandestadt@gmail.com>
PR:	250361
Reviewed by:	jhb, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D26826
2020-10-22 09:28:11 +00:00
Hans Petter Selasky
2ae634c6db Implement mbuf hashing routines for IP over infiniband, IPoIB.
No functional change intended.

Differential Revision:	https://reviews.freebsd.org/D26254
Reviewed by:		melifaro@
MFC after:		1 week
Sponsored by:		Mellanox Technologies // NVIDIA Networking
2020-10-22 09:17:56 +00:00
Hans Petter Selasky
9d40cf60d6 Factor out generic IP over infiniband, IPoIB, definitions and code
into net/if_infiniband.c and net/infiniband.h . No functional change
intended.

Differential Revision:	https://reviews.freebsd.org/D26254
Reviewed by:		melifaro@
MFC after:		1 week
Sponsored by:		Mellanox Technologies // NVIDIA Networking
2020-10-22 09:09:53 +00:00
Navdeep Parhar
b20b25e744 cxgbe(4): fix the size of the iq/eq maps.
The firmware can allocate ingress and egress context ids anywhere from
its configured range.  Size the iq/eq maps to match the entire range
instead of assuming that the firmware always allocates the first
available context id.

Reported by:	Baptiste Wicht @ Verisign
MFC after:	1 week
Sponsored by:	Chelsio Communications
2020-10-22 08:40:25 +00:00
Oleksandr Tymoshenko
f85aa85895 [hwpmc] Fix call chain capture for ARM64
Use ELR register value instead of LR for PMC_TRAPFRAME_TO_PC macro since
it's the former that indicates PC if the interrupted execution thread.

This fixes a bug where pmcstat lost the leaf function of the call chain
and started with the second function in the chain.

Although this change is an improvement over the previous logic there is still
posibility for incomplete data: if the leaf function does not have stack
variables and does not call any other functions compiler would not generate
a stack frame for it and the FP value would point to the caller's frame, so
instead of the actual "caller1 -> caller2 -> leaf" chain only
"caller1 -> leaf" would be captured.

Sponsored by:	Ampere Computing
Submitted by:	Klara, Inc.
2020-10-22 05:07:25 +00:00
Oleksandr Tymoshenko
d2112ab098 [armv8crypto] Fix cryptodev probe logic in armv8crypto
Add missing break to prevent falling through to the default case statement
and returning EINVAL for all session configs.

Sponsored by:	Ampere Computing
Submitted by:	Klara, Inc.
2020-10-22 04:49:14 +00:00
Alexander Motin
4138a74460 Pass lower 3 bits of sector_count for FPDMA commands.
When this code was written those bits were N/A, but now the lowest bit
is Rebuild Assist Recovery Control (RARC).

MFC after:	1 month
2020-10-22 03:30:39 +00:00
Alexander V. Chernikov
c7cffd65c5 Add support for stacked VLANs (IEEE 802.1ad, AKA Q-in-Q).
802.1ad interfaces are created with ifconfig using the "vlanproto" parameter.
Eg., the following creates a 802.1Q VLAN (id #42) over a 802.1ad S-VLAN
(id #5) over a physical Ethernet interface (em0).

ifconfig vlan5 create vlandev em0 vlan 5 vlanproto 802.1ad up
ifconfig vlan42 create vlandev vlan5 vlan 42 inet 10.5.42.1/24

VLAN_MTU, VLAN_HWCSUM and VLAN_TSO capabilities should be properly
supported. VLAN_HWTAGGING is only partially supported, as there is
currently no IFCAP_VLAN_* denoting the possibility to set the VLAN
EtherType to anything else than 0x8100 (802.1ad uses 0x88A8).

Submitted by:	Olivier Piras
Sponsored by:	RG Nets
Differential Revision:	https://reviews.freebsd.org/D26436
2020-10-21 21:28:20 +00:00
Navdeep Parhar
37d411338e cxgbe(4): display correct tid range for T6 based -SO cards.
Reported by:	Chelsio QA
MFC after:	1 week
Sponsored by:	Chelsio Communications
2020-10-21 20:42:29 +00:00
Edward Tomasz Napierala
f4d91df5a0 Make linux(4) warn about unsupported socket(2) types.
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D25680
2020-10-21 18:45:48 +00:00
Eric van Gyzen
c59370f055 ntb_tool: ubuf is too small to hold a human readable 64 bit value
ubuf buffer is too small. It should be 18 if a NULL is not needed,
or 19 to hold the NULL terminator for the full 64-BIT value plus
the 0x prefix.

Submitted by:	bret_ketchum@dell.com
Reviewed by:	markj mav
MFC after:	2 weeks
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D26893
2020-10-21 17:11:57 +00:00
Brooks Davis
44ca4575ea vmapbuf: don't smuggle address or length in buf
Instead, add arguments to vmapbuf.  Since this argument is
always a pointer use a type of void * and cast to vm_offset_t in
vmapbuf.  (In CheriBSD we've altered vm_fault_quick_hold_pages to
take a pointer and check its bounds.)

In no other situtation does b_data contain a user pointer and vmapbuf
replaces b_data with the actual mapping.

Suggested by:	jhb
Reviewed by:	imp, jhb
Obtained from:	CheriBSD
MFC after:	1 week
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D26784
2020-10-21 16:00:15 +00:00
Andrey V. Elsukov
7ec2f6bce5 Add dtrace SDT probe ipfw:::rule-matched.
It helps to reduce complexity with debugging of large ipfw rulesets.
Also define several constants and translators, that can by used by
dtrace scripts with this probe.

Reviewed by:	gnn
Obtained from:	Yandex LLC
MFC after:	2 weeks
Sponsored by:	Yandex LLC
Differential Revision:	https://reviews.freebsd.org/D26879
2020-10-21 15:01:33 +00:00
Mateusz Guzik
2f1c35053c cache: drop the spurious slash_prefixed argument 2020-10-21 05:57:25 +00:00
Konstantin Belousov
c0b5fcf692 Improve FPU Tag Word reconstruction on i386 to indicate register states.
Improve the code reconstructing en_tw in struct fpreg32 from FXSAVE
results so that all register states are indicated correctly.  The
previous code unconditionally mapped non-empty register state to
'normalized value' constant.  The new code explicitly distinguishes
the 'zero value' and 'special value' constants as well.  This improves
consistency between real FSAVE and translation from FXSAVE, and
ensures that tests using PT_GETFPREGS can rely on a single correct
value independently of the underlying implementation.

PR:	250454
Sponsored by:	The FreeBSD Foundation
Obtained from:	Moritz Systems
Submitted by:	Michał Górny <mgorny@moritz.systems>
Discussed with:	emaste
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D26856
2020-10-21 00:15:12 +00:00
Robert Wing
a2b559df1e geom_ctl.c: remove stale header files
- Remove "opt_geom.h", no kernel options are used.

- Remove <sys/sysctl.h>, no sysctl functionality is used here.

- Remove <sys/bio.h>, requirements for bio moved out in r112534.

- Remove <sys/lock.h> and <sys/mutex.h>, last used by DROP_GIANT() and
  PICKUP_GIANT(), which were removed in r115624.

- Remove <sys/disk.h> and <sys/kernel.h>, not used.

Reviewed by: phk, kevans (mentor)
Approved by: phk, kevans (mentor)
Differential Revision: https://reviews.freebsd.org/D26805
2020-10-20 20:59:13 +00:00
Ed Maste
f94fdddefd arm64: add uhci to GENERIC
uhci is (or, can be) used by VMware ESXi-Arm.

PR:		250308
Reported by:	Vincent Milum Jr
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
2020-10-20 20:11:29 +00:00
John Baldwin
ba610be90a Add a kernel crypto driver using assembly routines from OpenSSL.
Currently, this supports SHA1 and SHA2-{224,256,384,512} both as plain
hashes and in HMAC mode on both amd64 and i386.  It uses the SHA
intrinsics when present similar to aesni(4), but uses SSE/AVX
instructions when they are not.

Note that some files from OpenSSL that normally wrap the assembly
routines have been adapted to export methods usable by 'struct
auth_xform' as is used by existing software crypto routines.

Reviewed by:	gallatin, jkim, delphij, gnn
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D26821
2020-10-20 17:50:18 +00:00
Edward Tomasz Napierala
91bc73618d Fix linprocfs(4) /proc/self/mem semantics to more closely match Linux.
Steam's Anti-Cheat might depend on it.

PR:		248223
Analyzed by:	Alex S <iwtcex@gmail.com>
Reviewed by:	kib
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D26816
2020-10-20 17:24:29 +00:00
Edward Tomasz Napierala
1a34e9fad6 Fix potential race condition in linux stat(2).
Reviewed by:	kib
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D25618
2020-10-20 17:19:10 +00:00
John Baldwin
bc3d569800 Move generated OpenSSL assembly routines into the kernel sources.
Sponsored by:	Netflix
2020-10-20 17:00:43 +00:00
John Baldwin
f54c6ef100 Use a template assembly file to generate the embedded MFS.
This uses the .incbin directive to pull in the MFS image contents.
Using assembly directly ensures that symbols can be defined with the
name and properties (such as .size) desired without having to rename
symbols, etc. via a second objcopy invocation.  Since it is compiled
by the C compiler driver, it also avoids the need for all of the
EMBEDFS* make variables.

Suggested by:	jrtc27
Reviewed by:	kib, markj
Obtained from:	CheriBSD
MFC after:	2 weeks
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D26781
2020-10-20 16:48:45 +00:00
Mateusz Guzik
e9fb2bd9b8 ufs: catch up with removal of thread argument from VOP_INACTIVE 2020-10-20 09:46:20 +00:00
Mateusz Guzik
3fc7822de1 Bump __FreeBSD_version after VOP VPTOCNP and INACTIVE changes 2020-10-20 07:19:44 +00:00
Mateusz Guzik
ab21ed17ed vfs: drop the de facto curthread argument from VOP_INACTIVE 2020-10-20 07:19:03 +00:00
Mateusz Guzik
8ecd87a3e7 vfs: drop spurious cred argument from VOP_VPTOCNP 2020-10-20 07:18:27 +00:00
Ruslan Bukin
bce74ff0ce Fix build: only set iommu buswide flag if IOMMU code is included.
Sponsored by:	Innovate DSbD
2020-10-19 22:32:36 +00:00
Ruslan Bukin
c489ab6141 Add IOMMU_BUSWIDE ahci quirk.
Some controllers use PCI function 1 as the requester ID for DMA transfers,
but the controllers are not PCI multifunction.

Set the iommu buswide flag for them. This should instruct an IOMMU driver
to use the same translation rule for all the devices and functions of
a bus.

This was discovered on the ARM Neoverse N1 System Development Platform
(ARM N1SDP).

Bug reference: https://bugzilla.kernel.org/show_bug.cgi?id=42679

Reported by:	andrew
Reviewed by:	kib, mav
Sponsored by:	Innovate DSbD
Differential Revision:	https://reviews.freebsd.org/D26857
2020-10-19 21:27:27 +00:00
Navdeep Parhar
ae5da4e14d cxgbe(4): Updates to the drop features from r366532.
MFC after:	1 week
Sponsored by:	Chelsio Communications
2020-10-19 21:11:49 +00:00
Ed Maste
2c19e8ed90 build vmware modules on arm64
pvscsi and vmxnet3 build and work.  Exclude vmci for now as it contains
x86-specific assembly.

Reported by:	Vincent Milum Jr
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
2020-10-19 20:43:29 +00:00
Edward Tomasz Napierala
3001e97deb Fix fallout from r366811.
PR:		250442
Reported by:	lwhsu
Reviewed by:	mav
MFC after:	2 weeks
Sponsored by:	NetApp, Inc.
Sponsored by:	Klara, Inc.
Differential Revision:	https://reviews.freebsd.org/D26855
2020-10-19 20:26:37 +00:00
John Baldwin
6b7ecdcd9d Re-enable receive flow control for TOE TLS sockets.
Flow control was disabled during initial TOE TLS development to
workaround a hang (and to match the Linux TOE TLS support for T6).
The rest of the TOE TLS code maintained credits as if flow control was
enabled which was inherited from before the workaround was added with
the exception that the receive window was allowed to go negative.
This negative receive window handling (rcv_over) was because I hadn't
realized the full implications of disabling flow control.

To clean this up, re-enable flow control on TOE TLS sockets.  The
existing TPF_FORCE_CREDITS workaround is sufficient for the original
hang.  Now that flow control is enabled, remove the rcv_over
workaround and instead assert that the receive window never goes
negative matching plain TCP TOE sockets.

Reviewed by:	np
MFC after:	2 weeks
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D26799
2020-10-19 20:08:50 +00:00
Navdeep Parhar
3f3e04a062 cxgbe(4): Fix page fault in t4_get_lb_stats with 2 port T5 cards.
PR:		250449
Reported by:	freqlabs@
MFC after:	1 week
Sponsored by:	Chelsio Communications
2020-10-19 20:08:47 +00:00
John Baldwin
e7f6b6cf69 Fix a couple of bugs for asym crypto introduced in r359374.
- Check for null pointers in the crypto_drivers[] array when checking
  for empty slots in crypto_select_kdriver().

- Handle the case where crypto_kdone() is invoked on a request where
  krq_cap is NULL due to not finding a matching driver.

Reviewed by:	markj
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D26811
2020-10-19 20:04:03 +00:00
Konstantin Belousov
6b56b0ca93 nullfs: ensure correct lock is taken after bypass.
If lower VOP relocked the lower vnode, it is possible that nullfs
vnode was reclaimed meantime.  In this case nullfs vnode no longer
shares lock with lower vnode, which breaks locking protocol.

Check for the condition and acquire nullfs vnode lock if detected.

Reported and tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2020-10-19 19:23:22 +00:00
Konstantin Belousov
c0baa3dc4a vgonel(): avoid recursing into VOP_INACTIVE().
It is a common pattern for filesystems' VOP_INACTIVE() implementation
to forcibly reclaim the vnode when its state is final.  For instance,
UFS vnode with zero link count is removed, and since it is
inactivated, the last open reference on it is dropped.

On the other hand, vnode might get spurious usecount reference for
many reasons.  If the spurious reference exists while vgonel() checks
for active state of the vnode, it would recurse into VOP_INACTIVE().

Fix it by checking and not doing inactivation when vgone() was called
from inactive VOP.

Reported and tested by:	pho
Discussed with:	mjg
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2020-10-19 19:20:23 +00:00
Ed Maste
575a4437a9 uma: fix KTR message after r366840
Reported by:	bz
Sponsored by:	The FreeBSD Foundation
2020-10-19 18:54:44 +00:00
Mateusz Guzik
6d5d469fc1 cache: promote negative entries based on more than one hit
During tinderbox and similar workloads negative entries get at least one
hit before they get evicted. In the current scheme this avoidably promotes
them.

Be conservative and stick to 2 hits for now.
2020-10-19 18:51:51 +00:00
John Baldwin
6bcf3c46d8 Check TF_TOE not the tod pointer to determine if TOE is active.
The TF_TOE flag is the check used in the rest of the network stack to
determine if TOE is active on a socket.  There is at least one path in
the cxgbe(4) TOE driver that can leave the tod pointer non-NULL on a
socket not using TOE.

Reported by:	Sony Arpita Das <sonyarpitad@chelsio.com>
Reviewed by:	np
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D26803
2020-10-19 18:24:06 +00:00
John Baldwin
ecedef531b Mark asymmetric cryptography via OCF deprecated for 14.0.
Only one MIPS-specific driver implements support for one of the
asymmetric operations.  There are no in-kernel users besides
/dev/crypto.  The only known user of the /dev/crypto interface was the
engine in OpenSSL releases before 1.1.0.  1.1.0 includes a rewritten
engine that does not use the asymmetric operations due to lack of
documentation.

Reviewed by:	cem, markj
MFC after:	1 week
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D26810
2020-10-19 18:21:41 +00:00
John Baldwin
eeb4c816d6 Properly clear PCB_KERNNPX in fpu_kern_leave().
PR:		250423
Reported by:	CI
Tested by:	lwhsu
2020-10-19 17:35:45 +00:00
Mark Johnston
4caea9b169 icmp6: Count packets dropped due to an invalid hop limit
Pad the icmp6stat structure so that we can add more counters in the
future without breaking compatibility again, last done in r358620.
Annotate the rarely executed error paths with __predict_false while
here.

Reviewed by:	bz, melifaro
Sponsored by:	NetApp, Inc.
Sponsored by:	Klara, Inc.
Differential Revision:	https://reviews.freebsd.org/D26578
2020-10-19 17:07:19 +00:00
Mark Johnston
d80126a6f4 link_elf_obj: Colour VM objects
This will cause the VM to back sufficiently large .text sections, such
as those in zfs.ko or amdgpu.ko on amd64, with superpage mappings when
possible.

Reviewed by:	alc, kib
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D26802
2020-10-19 16:57:59 +00:00
Mark Johnston
f09cbea31a uma: Respect uk_reserve in keg_drain()
When a reserve of free items is configured for a zone, the reserve must
not be reclaimed under memory pressure.  Modify keg_drain() to simply
respect the reserved pool.

While here remove an always-false uk_freef == NULL check (kegs that
shouldn't be drained should set _NOFREE instead), and make sure that the
keg_drain() KTR statement does not reference an uninitialized variable.

Reviewed by:	alc, rlibby
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D26772
2020-10-19 16:57:40 +00:00
Mark Johnston
1b2dcc8c54 uma: Avoid depleting keg reserves when filling a bucket
zone_import() fetches a free or partially free slab from the keg and
then uses its items to populate an array, typically filling a bucket.
If a single allocation causes the keg to drop below its minimum reserve,
the inner loop ends.  However, if the bucket is still not full and
M_USE_RESERVE is specified, the outer loop will continue to fetch items
from the keg.

If M_USE_RESERVE is specified and the number of free items is below the
reserved limit, we should return only a single item.  Otherwise, if the
bucket size is larger than the reserve, all of the reserved items may
end up in a single per-CPU bucket, invisible to other CPUs.

Reviewed by:	rlibby
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D26771
2020-10-19 16:55:03 +00:00
Mark Johnston
6351771b7c vmem: Allocate btags before looping in vmem_xalloc()
BT_MAXALLOC (4) is the number of boundary tags required to complete an
allocation in the worst case: two to clip a free segment, and two to
import from a parent arena.  vmem_xalloc() preallocates four boundary
tags before attempting a search to simplify the segment allocation code.
It implements a loop that:
1) ensures that BT_MAXALLOC boundary tags are available,
2) attempts to find and clip a free segment satisfying the allocation
   constraints, and failing that,
3) attempts to import a segment.

On !UMA_MD_SMALL_ALLOC platforms the btag zone has to handle recusion:
it needs boundary tags to allocate boundary tags.  Thus we reserve
2 * BT_MAXALLOC * mp_ncpus tags for use when recursing: the factor of 2
is because there are two layers of vmem arenas, the per-domain arena and
global arena.  For a single thread, 2 * BT_MAXALLOC tags should be
sufficient.

Because of the way the loop is structured, BT_MAXALLOC tags are not
sufficient.  The first bt_fill() call may allocate BT_MAXALLOC tags,
then import a segment (consuming two tags), then attempt to top up the
preallocation before carving into the imported free segment, thus
requiring up to six tags in the worst case.  Because we don't
preallocate that many, this bug can cause deadlocks in rare scenarios.

Fix the problem by moving the preallocation out the loop.  This assumes
that only a single import is ever required to satisfy an allocation
request.

Thanks to manu, emaste and lwhsu for helping test debug patches.

Reported by:	Jenkins (hardware CI lab)
Reviewed by:	alc, kib, rlibby
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D26770
2020-10-19 16:54:06 +00:00
Mark Johnston
33a9bce62f vmem: Simplify bt_fill() callers a bit
No functional change intended.

Reviewed by:	alc, kib, rlibby
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D26769
2020-10-19 16:52:27 +00:00
Andrew Turner
3493e48db6 Remove unused labels from the arm64 casueword*
These are unused so can be removed. While here renumber the remaining label
to be 1.

Sponsored by:	Innovate UK
2020-10-19 15:52:42 +00:00
Ruslan Bukin
94dfb28ee0 Assign the reserved apic region (GAS entry) to the iommu domain msi_entry.
Requested by:	kib
Reviewed by:	kib
Sponsored by:	Innovate DSbD
Differential Revision:	https://reviews.freebsd.org/D26859
2020-10-19 15:50:58 +00:00
Mark Johnston
8e2cbc5660 vmx: Implement pmap (de)activation in C
Rewrite the code that maintains pm_active and invalidates EPTP-tagged
TLB entries in C.  Previously this work was done in vmx_enter_guest(),
in assembly, but there is no good reason for that and it makes the TLB
invalidation algorithm for nested page tables harder to review.

No functional change intended.  Now, an error from the invept
instruction results in a kernel panic rather than a vmexit.  Such errors
should occur only as a result of VMM bugs.

Reviewed by:	grehan, kib
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D26830
2020-10-19 15:24:35 +00:00
Ruslan Bukin
e707c8be4e Manage MSI iommu pages.
This allows the interrupt controller driver only need a small change to
create a map for the page the device will write to raise an interrupt.

Submitted by:	andrew
Reviewed by:	kib
Sponsored by:	Innovate DSbD
Differential Revision:	https://reviews.freebsd.org/D26705
2020-10-19 13:10:21 +00:00
Andrew Turner
956cc8e1b9 Split the common arm64 fu* and su* asm to a macro
As these are mostly identical split out the common code to a macro.

Sponsored by:	Innovate UK
2020-10-19 12:46:03 +00:00
Andrew Turner
474c444e04 Move the arm64 userspace access checks to macros
In the functions that copy between userspace and kernel space we check the
user space address is valid before performing the copy. These are mostly
identical within each type of function so create two macros to perform the
check.

Obtained from:	CheriBSD
Sponsored by:	Innovate UK
2020-10-19 12:06:16 +00:00
Mateusz Guzik
665c8c3e7d cache: refactor negative promotion/demotion handling
This will simplify policy changes.
2020-10-19 09:52:52 +00:00
Adrian Chadd
40ec30d45e [zfs] Remove a non-existent directory in the build infra
This directory doesn't exist and causes gcc-6.4 to complain about
a non-existent include directory

Approved by:	kevans, imp
Differential Revision:	https://reviews.freebsd.org/D26846
2020-10-18 22:37:58 +00:00
Bjoern A. Zeeb
01e579408b net80211: factor out the priv(9) checks into OS specifc code.
Factor out the priv(9) checks into OS specifc code so other OSes can equally
implement them.  This sorts out those XXX in the net80211 code.
We provide 3 arguments (cmd, vap, ifp) where available to the functions, in
order to allow other OSes to use that data but also in case we'd add auditing
to these check to have the information available. For now the arguments are
marked __unused.

PR:		249403
Reported by:	martin(NetBSD)
Reviewed by:	adrian, martin(NetBSD)
MFC after:	10 days
Sponsored by:	Rubicon Communications, LLC (d/b/a "Netgate")
Differential Revision:	https://reviews.freebsd.org/D26541
2020-10-18 21:34:04 +00:00
Alexander V. Chernikov
0c325f53f1 Implement flowid calculation for outbound connections to balance
connections over multiple paths.

Multipath routing relies on mbuf flowid data for both transit
 and outbound traffic. Current code fills mbuf flowid from inp_flowid
 for connection-oriented sockets. However, inp_flowid is currently
 not calculated for outbound connections.

This change creates simple hashing functions and starts calculating hashes
 for TCP,UDP/UDP-Lite and raw IP if multipath routes are present in the
 system.

Reviewed by:	glebius (previous version),ae
Differential Revision:	https://reviews.freebsd.org/D26523
2020-10-18 17:15:47 +00:00
Edward Tomasz Napierala
186bcdaac7 If the SIM freezes the queue at exactly the wrong moment, after
another thread has started to send in a CCB and already checked
the queue wasn't frozen, we would end up with iscsi_action()
being called despite the queue is now frozen.

Add a check to make sure this doesn't happen . Perhaps this should
be fixed at the CAM level instead, but given how the send queue and
SIM are governed by two separate mutexes, it is somewhat hard to do.

Reviewed by:	imp, mav
MFC after:	2 weeks
Sponsored by:	NetApp, Inc.
Sponsored by:	Klara, Inc.
Differential Revision:	https://reviews.freebsd.org/D26750
2020-10-18 16:30:49 +00:00
Edward Tomasz Napierala
d22ff249d9 Make g_attach() return ENXIO for orphaned providers; update various
classes to add missing error checking.

Reviewed by:	imp
MFC after:	2 weeks
Sponsored by:	NetApp, Inc.
Sponsored by:	Klara, Inc.
Differential Revision:	https://reviews.freebsd.org/D26658
2020-10-18 16:24:08 +00:00
Edward Tomasz Napierala
6221ec6064 Stop calling set_syscall_retval() from linux_set_syscall_retval().
The former clobbers some registers that shouldn't be touched.

Reviewed by:	kib (earlier version)
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D26406
2020-10-18 16:16:22 +00:00
Edward Tomasz Napierala
54669eb779 Add compat.linux.dummy_rlimits, and disable by default.
Turns out the dummy rlimits fix prlimit(1), but break su(8)
(login-1:4.5-1ubuntu2) - although not sudo(8), for some reason.

MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D26814
2020-10-18 15:58:16 +00:00
Edward Tomasz Napierala
c0d07d326f Slightly tweak linux ptrace(2) debug message; no functional changes.
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D26815
2020-10-18 15:56:47 +00:00
Alexander V. Chernikov
fa8b3fcb4c Simplify NET_EPOCH_EXIT in inp_join_group().
Suggested by:	kib
2020-10-18 12:03:36 +00:00
Hans Petter Selasky
22ab212ff8 Add new USB quirk.
PR:			250422
Submitted by:		vidwer+fbsdbugs@gmail.com
MFC after:		1 week
Sponsored by:		Mellanox Technologies // NVIDIA Networking
2020-10-18 08:58:14 +00:00
Bjoern A. Zeeb
04e7bb08a5 net80211: update for (more) VHT160 support
Implement two macros IEEE80211_VHTCAP_SUPP_CHAN_WIDTH_IS_160MHZ()
and its 80+80 counter part to check in vhtcaps for appropriate
levels of support and use the macros throughout the code.

Add vht160_chan_ranges/is_vht160_valid_freq and handle analogue
to vht80 in various parts of the code.

Add ieee80211_add_channel_cbw() which also takes the CBW flag
fields and make the former ieee80211_add_channel() a wrapper to it.
With the CBW flags we can add HT/VHT channels passing them to
getflags() for the 2/5ghz functions.

In ifconfig(8) add the regdomain_addchans() support for VHT160
and VHT80P80.

With this (+ regdoain.xml updates) VHT160 channels can be
configured, listed, and pass regdomain where appropriate.

Tested with:	iwlwifi
Reviewed by:	adrian
MFC after:	10 days
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D26712
2020-10-18 00:27:20 +00:00
Bjoern A. Zeeb
3d8a0ab61d clk: fix indentation
Just fix indentation of an if() clause.
No functional changes intended.

MFC after:	3 days
2020-10-17 23:42:33 +00:00
Bjoern A. Zeeb
f7a0bb0dec ddb: add show sysinit command
Add a show sysinit command to ddb (similar to show vnet_sysinit) which
proved to be helpful to debug some ordering issues on early-mid kernel
start panics.
2020-10-17 22:47:08 +00:00
Mateusz Guzik
4c4aa84848 cache: shorten names of debug stats 2020-10-17 21:30:46 +00:00
Mateusz Guzik
676557143f cache: don't automatically evict negative entries if usage is low
The previous scheme only looked at negative entry count in relation to the
total count, leading to tons of spurious evictions if the cache is not
significantly populated.

Instead, only try the above if negative entry count goes beyond namecache
capacity.
2020-10-17 21:22:40 +00:00
Alexander V. Chernikov
337418adf1 Fix sleepq_add panic happening with too wide net epoch in mcast control.
PR:		250413
Reported by:	Christopher Hall <hsw at bitmark.com>
Reviewed by:	ae
Differential Revision:	https://reviews.freebsd.org/D26827
2020-10-17 20:33:09 +00:00
Mitchell Horne
02a37049b4 riscv: zero reserved PTE bits for L2 PTEs
As was done for L3 PTEs in r362853, mask out the reserved bits when
extracting the physical address from an L2 PTE. Future versions of the
spec or custom implementations may make use of these reserved bits, in
which case the resulting physical address could be incorrect.

Submitted by:	Nathaniel Filardo <nwf20@cl.cam.ac.uk>
Reviewed by:	kp, mhorne
Differential Revision:	https://reviews.freebsd.org/D26607
2020-10-17 17:31:06 +00:00
Mateusz Guzik
e98c3bc667 cache: erwork sysctl vfs.cache tree
Split everything into neg, debug, param and stat categories.

The legacy nchstats sysctl (queried e.g., by systat) remains untouched.

While here rename some vars to be easier on the eye.
2020-10-17 13:06:29 +00:00
Mateusz Guzik
fa7c73d30c cache: factor negative lookup out of cache_fplookup_next 2020-10-17 13:04:46 +00:00
Mateusz Guzik
41e6b18422 cache: avoid smr in cache_neg_evict in favoro of the already held bucket lock 2020-10-17 13:04:25 +00:00
Mateusz Guzik
c38d8e1eb2 cache: rework parts of negative entry management
- declutter sysctl vfs.cache by moving relevant entries into
vfs.cache.neg
- add a little more parallelism to eviction by replacing the
global lock with an atomically modified counter
- track more statistics

The code needs further effort.
2020-10-17 08:48:58 +00:00
Mateusz Guzik
b31b5e9cfd cache: remove entries before trying to add new ones, not after
Should allow positive entries to replace negative ones in case
the cache is full.
2020-10-17 08:48:32 +00:00
Mateusz Guzik
ad89066af4 vfs: annotate mountlist_mtx with __exclusive_cache_line 2020-10-17 08:47:08 +00:00
Xin LI
5853735d5d Bump __FreeBSD_version after ptsname_r addition. 2020-10-17 04:14:46 +00:00
Matt Macy
180f822596 Update OpenZFS to 2.0.0-rc3-gfc5966
- fix panic due to tqid overflow
- Improve libzfs_error_init messages
- Expose zfetch_max_idistance tunable
- Make dbufstat work on FreeBSD
- Fix EIO after resuming receive of new dataset over an existing one
2020-10-17 01:06:04 +00:00
Mateusz Guzik
d6eee35004 cache: add a probe reporting addition of duplicate entries 2020-10-17 00:27:26 +00:00