Commit Graph

138473 Commits

Author SHA1 Message Date
Andrey V. Elsukov
d477a7feed Fix panic in IPv6 multicast code.
Add check that ifp supports IPv6 multicasts in in6_getmulti.
This fixes panic when user application tries to join into multicast
group on an interface that doesn't support IPv6 multicasts, like
IFT_PFLOG interfaces.

PR:             257302
Reviewed by:	melifaro
MFC after:	1 week
Differential Revision: https://reviews.freebsd.org/D31420
2021-08-06 12:57:59 +03:00
Hans Petter Selasky
bb5cd80e8b Update the TCP LRO code to handle both encrypted and un-encrypted traffic.
Encrypted and un-encrypted traffic needs to be coalesced separately.
Split the 16-bit lro_type field in the address information into two
8-bit fields, and then use the last 8-bit field for flags, which among
other indicate if the received mbuf is encrypted or un-encrypted.

Differential Revision:	https://reviews.freebsd.org/D31377
Reviewed by:	gallatin
MFC after:	1 week
Sponsored by:	NVIDIA Networking
2021-08-06 11:28:44 +02:00
Hans Petter Selasky
ed2196e5df sound(4): Implement playback and recording mode sysctl(8).
The dev.pcm.<N>.mode sysctl(8) gives information if a sound device
supports hardware mixing, playback or recording.

Submitted by:	Christos Margiolis <christos@freebsd.org>
Differential Revision:	https://reviews.freebsd.org/D31320
MFC after:	1 week
Sponsored by:	NVIDIA Networking
2021-08-06 11:28:44 +02:00
Hans Petter Selasky
132fca6335 sound(4): Fix typos.
Submitted by:	Christos Margiolis <christos@freebsd.org>
Differential Revision:	https://reviews.freebsd.org/D31320
MFC after:	1 week
Sponsored by:	NVIDIA Networking
2021-08-06 11:28:43 +02:00
Andrew Gallatin
739de953ec ktls: Move KERN_TLS ifdef to tcp_var.h
This allows us to remove stubs in ktls.h and allows us
to sort the function prototypes.

Reviewed by: jhb
Sponsored by: Netflix
2021-08-05 19:17:35 -04:00
Andrew Gallatin
09066b9866 ktls: Use the new PNOLOCK flag
Use the new PNOLOCK flag to tsleep() to indicate that
we are managing potential races, and don't need to
sleep with a lock, or have a backstop timeout.

Reviewed by: jhb
Sponsored by: Netflix
2021-08-05 17:19:12 -04:00
Andrew Gallatin
1b97a054f3 tsleep: Add a PNOLOCK flag
Add a PNOLOCK flag so that, in the race circumstance where
wakeup races are externally mitigated, tsleep() can be
called with a sleep time of 0 without triggering an
an assertion.

Reviewed by: jhb
Sponsored by: Netflix
2021-08-05 17:16:30 -04:00
Alexander V. Chernikov
8482aa7748 Use lltable calculated header when sending lle holdchain after successful lle resolution.
Subscribers: imp, ae, bz

Differential Revision: https://reviews.freebsd.org/D31391
2021-08-05 20:44:36 +00:00
John Baldwin
87322a9075 iscsi: Remove icl_soft-only fields from struct icl_conn.
Create a struct icl_soft_conn which extends struct icl_conn and
move fields only used by icl_soft from struct icl_conn to
struct icl_soft_conn.

Reviewed by:	mav
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D31414
2021-08-05 12:05:30 -07:00
Andrew Gallatin
2694c869ff ktls: fix a panic with INVARIANTS
98215005b7 introduced a new
thread that uses tsleep(..0) to sleep forever.  This hit
an assert due to sleeping with a 0 timeout.

So spell "forever" using SBT_MAX instead, which does not
trigger the assert.

Pointy hat to: gallatin
Pointed out by: emaste
Sponsored by: Netflix
2021-08-05 13:09:06 -04:00
Konstantin Belousov
4cc6fe1e5b coretemp: use x86_msr_op for thermal MSR access
Reviewed by:	markj
Discussed with:	mav
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D31386
2021-08-05 18:47:07 +03:00
Konstantin Belousov
d0bc4b4666 x86_msr_op: extend the KPI to allow MSR read and single-CPU operations
Reivewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D31386
2021-08-05 18:46:37 +03:00
Ka Ho Ng
245ec7651e param.h: Bump __FreeBSD_version to 1400029 for commit 0dc332bff2
Commit 0dc332bff2 changes fileops layout and adds VOP_DEALLOCATE VOP
call. LinuxKPI kmods and file system modules need to be rebuilt at
least.

Sponsored by:	The FreeBSD Foundation
2021-08-05 23:22:07 +08:00
Ka Ho Ng
da9fe3529b Regen after 0dc332bff2 2021-08-05 23:22:02 +08:00
Ka Ho Ng
0dc332bff2 Add fspacectl(2), vn_deallocate(9) and VOP_DEALLOCATE(9).
fspacectl(2) is a system call to provide space management support to
userspace applications. VOP_DEALLOCATE(9) is a VOP call to perform the
deallocation. vn_deallocate(9) is a public KPI for kmods' use.

The purpose of proposing a new system call, a KPI and a VOP call is to
allow bhyve or other hypervisor monitors to emulate the behavior of SCSI
UNMAP/NVMe DEALLOCATE on a plain file.

fspacectl(2) comprises of cmd and flags parameters to specify the
space management operation to be performed. Currently cmd has to be
SPACECTL_DEALLOC, and flags has to be 0.

fo_fspacectl is added to fileops.
VOP_DEALLOCATE(9) is added as a new VOP call. A trivial implementation
of VOP_DEALLOCATE(9) is provided.

Sponsored by:	The FreeBSD Foundation
Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D28347
2021-08-05 23:20:42 +08:00
Ka Ho Ng
abbb57d5a6 vfs: Introduce vn_bmap_seekhole_locked()
vn_bmap_seekhole_locked() is factored out version of vn_bmap_seekhole().
This variant requires shared vnode lock being held around the call.

Sponsored by:	The FreeBSD Foundation
Reviewed by:	kib, markj
Differential Revision:	https://reviews.freebsd.org/D31404
2021-08-05 22:52:26 +08:00
Ka Ho Ng
de2e152959 Add vnode_pager_purge_range(9) KPI
This KPI is created in addition to the existing vnode_pager_setsize(9)
KPI. The KPI is intended for file systems that are able to turn a range
of file into sparse range, also known as hole-punching.

Sponsored by:	The FreeBSD Foundation
Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D27194
2021-08-05 22:52:26 +08:00
Andrew Gallatin
98215005b7 ktls: start a thread to keep the 16k ktls buffer zone populated
Ktls recently received an optimization where we allocate 16k
physically contiguous crypto destination buffers. This provides a
large (more than 5%) reduction in CPU use in our
workload. However, after several days of uptime, the performance
benefit disappears because we have frequent allocation failures
from the ktls buffer zone.

It turns out that when load drops off, the ktls buffer zone is
trimmed, and some 16k buffers are freed back to the OS. When load
picks back up again, re-allocating those 16k buffers fails after
some number of days of uptime because physical memory has become
fragmented. This causes allocations to fail, because they are
intentionally done without M_NORECLAIM, so as to avoid pausing
the ktls crytpo work thread while the VM system defragments
memory.

To work around this, this change starts one thread per VM domain
to allocate ktls buffers with M_NORECLAIM, as we don't care if
this thread is paused while memory is defragged. The thread then
frees the buffers back into the ktls buffer zone, thus allowing
future allocations to succeed.

Note that waking up the thread is intentionally racy, but neither
of the races really matter. In the worst case, we could have
either spurious wakeups or we could have to wait 1 second until
the next rate-limited allocation failure to wake up the thread.

This patch has been in use at Netflix on a handful of servers,
and seems to fix the issue.

Differential Revision: https://reviews.freebsd.org/D31260
Reviewed by: jhb, markj,  (jtl, rrs, and dhw reviewed earlier version)
Sponsored by: Netflix
2021-08-05 10:19:12 -04:00
Michael Tuexen
3f1f6b6ef7 tcp, udp: improve input validation in handling bind()
Reported by:		syzbot+24fcfd8057e9bc339295@syzkaller.appspotmail.com
Reported by:		syzbot+6e90ceb5c89285b2655b@syzkaller.appspotmail.com
Reviewed by:		markj, rscheff
MFC after:		3 days
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D31422
2021-08-05 13:48:44 +02:00
Emmanuel Vadot
589951c76a arm64: conf: std.broadcom: Add dwcotg and smsc
Add the dwcotg and smsc usb to ethernet driver to std.broadcom.
Those are used in RPI3

PR:   257593
2021-08-05 13:16:23 +02:00
Andrew Turner
dcfd605871 Add more arm64 external abort sources
These will be used when we support the Arm Reliability, Availability,
and Serviceability extension.

Sponsored by:	The FreeBSD Foundation
2021-08-04 18:50:42 +00:00
Alexander V. Chernikov
f3a3b06121 [lltable] Unify datapath feedback mechamism.
Use newly-create llentry_request_feedback(),
 llentry_mark_used() and llentry_get_hittime() to
 request datapatch usage check and fetch the results
 in the same fashion both in IPv4 and IPv6.

While here, simplify llentry_provide_feedback() wrapper
 by eliminating 1 condition check.

MFC after:	2 weeks
Differential Revision: https://reviews.freebsd.org/D31390
2021-08-04 22:52:43 +00:00
John Baldwin
c51e4962a3 Document kern.log_wakeups_per_second.
PR:		148680
MFC after:	2 weeks
2021-08-04 11:50:34 -07:00
Mitchell Horne
1b8db4b4e3 arm: enable stack-smashing protection
With current generation clang/llvm it can pass all of our tests in
libc/ssp.

While here, remove the extra MACHINE_CPUARCH check for mips. SSP is
included in BROKEN_OPTIONS for this architecture in src.opts.mk, which
is enough to ensure normal builds won't set SSP_CFLAGS.

Reviewed by:	kevans, imp, emaste
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D31400
2021-08-04 15:23:22 -03:00
Mitchell Horne
8399d923a5 hwpmc_intel: assert for correct nclasses value
This variable is set based on the exact CPU model detected. If this
value is set too small, it could lead to a NULL-dereference from an
improperly initialized pmc_rowindex_to_classdep array.

Though it has been fixed, this was previously the case for Broadwell.
Add two asserts to catch this in DEBUG kernels, as it represents a
configuration error that may be hard to uncover otherwise.

PR:		253687
Reported by:	Zhenlei Huang <zlei.huang@gmail.com>
Sponsored by:	The FreeBSD Foundation
2021-08-04 15:23:22 -03:00
Mitchell Horne
4f35e8cba2 hwpmc: disable uncore class on Sandy Bridge and newer
It was written for Nehalem and Westmere, with minor but incomplete
updates for Sandy Bridge in 78d763a29b. The uncore architecture
changed significantly with this generation, bringing new layouts and
locations for some MSRs.

Misprogramming these MSRs in ucp_start_pmc() may panic the system, and
this is trivially reproducible via pmcstat(8) on at least Broadwell and
Haswell. Disable the class on these CPUs until it can be updated more
completely and leave a TODO comment detailing some of the work required.
Note that the nclasses value for Broadwell was already incorrect and
doesn't need changing.

The result is that any uncore events listed by pmcstat -L will no longer
be allocatable, but this is already the case for newer generations of
Intel CPUs.

PR:		253687
Reported by:	Zhenlei Huang <zlei.huang@gmail.com>
Reviewed by:	kib
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D31389
2021-08-04 15:23:22 -03:00
Konstantin Belousov
0ef5eee9d9 Add vn_lktype_write()
and remove repetetive code that calculates vnode locking type for write.

Reviewed by:	khng, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D31405
2021-08-04 19:40:13 +03:00
Warner Losh
e94f1a0a37 Revert "arm: remove fslsdma from GENERIC"
The firmware was already in the tree when I did this commit, and I
missed the message. The bug was obsolete.

This reverts commit 9e3761d126.

PR:		237466
Sponsored by:	Netflix
2021-08-03 20:10:32 -06:00
Andrew Turner
8b3bd5a2b5 Only store the arm64 ID registers in the cpu_desc
There is no need to store a pointer to the CPU implementer and part
strings. Switch to load them directly into the sbuf used to print them
on boot.

While here print the machine ID register when we fail to determine the
implementer or part we are booting on.

Reviewed by:	markj, kib
Sponsored by:	The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D31346
2021-08-03 01:42:40 +00:00
Andrew Turner
1a78f44cd2 Move setting arm64 HWCAP values to the ID tables
The HWCAPS values are based on the ID registers. Move setting these
to the existing ID register parsing code.

Previously we would need to handle all possible ID field values where
a HWCAP is set, however as most ID fields follow a scheme where when
the field increments it will only add new features meaning we only
need to check if the field is greater than when the HWCAP feature
was added.

While here stop setting HWCAP value that need kernel support, but this
support is missing.

Sponsored by:	The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D31201
2021-08-03 01:42:39 +00:00
Gleb Smirnoff
29b4fa7876 hdaa: add missing break in hdac_pin_patch().
Fixes driver attach on my Thinkpad X1 Carbon, and likely on
many other ALC family devices.

Fixes:	ef790cc740
2021-08-03 03:09:32 -07:00
Kornel Duleba
dfcaa2c18b enetc_mdio: Support building the driver as a loadable module.
After recent arm64 GENERIC config cleanup the ENETC MDIO
in NXP LS1028A SoC should support being loaded as a module.

Obtained from: Semihalf
Sponsored by: Alstom Group
2021-08-03 12:07:49 +02:00
Kornel Duleba
5ad6d28cbe enetc: Support building the driver as a loadable module.
Function level reset has to be done in attach in order to put the
hardware in a known state before configuring it.
The order of DRIVER_MODULEs was changed to ensure that the miibus driver
is loaded when mii_attach is called.

Obtained from: Semihalf
Sponsored by: Alstom Group
2021-08-03 12:07:49 +02:00
Kornel Duleba
bfce69ae9a dtb: freescale: Add fsl-ls1028a-rdb to the build
With the recent inclusion of ENETC networking driver we now support this
board.

Obtained from: Semihalf
Sponsored by: Alstom Group
2021-08-03 12:07:49 +02:00
Marcin Wojtas
451bcf1b36 Introduce driver for Freescale Felix switch
It is found on boards equipped with LS1028A SoC.
802.1q VLAN grouping is supported.
An external MDIO device is used for communicating with PHYs.
The driver is built as a module by default, it is not included
in GENERIC kernel config.

Submitted by: Lukasz Hajec <lha@semihalf.com>
              Kornel Duleba <mindal@semihalf.com>
Obtained from: Semihalf
Sponsored by: Alstom Group
Differential Revision: https://reviews.freebsd.org/D30923
2021-08-03 12:07:49 +02:00
Kornel Duleba
f5b29d0f35 etherswitch: Add a new striptagingress port flag
Felix switch found in LS1028A supports stripping VLAN tag on
ingress, instead of egress. The striptag flag excepts the latter
behaviour.
Add a new flag to support the feature.

Obtained from: Semihalf
Sponsored by: Alstom Group
Differential Revision: https://reviews.freebsd.org/D30922
2021-08-03 12:07:48 +02:00
Kyle Evans
04cc0c393c malloc(9): provide missing malloc_aligned implementation
Pointy hat:	kevans
Fixes:	6162cf885c ("malloc(9): Document/complete aligned variants")
2021-08-02 21:12:39 -05:00
Warner Losh
d00131e154 clock_id: These symbols weren't in 4.4BSD, adjust copyright
Peter Wemm added the first CLOCK_* symbols in 0f5ed9f420 in 1997
after obtaining them from NetBSD. In NetBSD, jtc@netbsd.org committed
them in sys/sys/time.h rev 1.19 dated 1996/11/15, along with all the
system calls associated with 1003.1b. FreeBSD's values are, however,
different than NetBSD's today. The USL/UCB lawsuit was settled in 1994,
so these couldn't have been derived from material provided to University
of California covered in that settlement. This file does not need the
settlement disclaimer.

Furthermore, I rewrote most of the code (except the symbols and their
values) when merging it from time.h and sys/time.h. Most of the creative
content of the file is new, so update copyright to reflect that.

Reviewed by:	kaktus
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D31369
2021-08-02 15:50:32 -06:00
Adam Fenn
6162cf885c malloc(9): Document/complete aligned variants
Comments on a pending kvmclock driver suggested adding a
malloc_aligned() to complement malloc_domainset_aligned(); add it now,
and document both.

Reviewed by:	imp, kib, allanjude (manpages)
Differential Revision:	https://reviews.freebsd.org/D31004
2021-08-02 15:36:14 -05:00
Eric van Gyzen
428624130a Fix lockstat:::thread-spin dtrace probe with LOCK_PROFILING
The spinning start time is missing from the calculation due to a
misplaced #endif.  Return the #endif where it's supposed to be.

Submitted by:	Alexander Alexeev <aalexeev@isilon.com>
Reviewed by:	bdrewery, mjg
MFC after:	1 week
Sponsored by:	Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D31384
2021-08-02 14:44:23 -05:00
John Baldwin
d59f1c49e2 cxgbe tom: Permit rcv_nxt mismatches on FIN for iSCSI connections on T6.
The remote peer might send a FIN in the middle of a burst of data
PDUs.  In the case of T6 with data PDU completion moderation, the
driver would not have seen these PDUs since the final PDU in the burst
was never received resulting in a stale rcv_nxt when the FIN is
received.

While here, invert the logic in the condition to be more readable and
always set tp->rcv_nxt from the sequence number in the CPL.  This sets
the proper value of rcv_nxt for FINs on connections with data received
but not reported via a CPL (e.g. a partial iSCSI PDU burst interrupted
by a FIN).

Reported by:	Jithesh Arakkan @ Chelsio
Reviewed by:	np
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D30871
2021-08-02 09:41:27 -07:00
Kristof Provost
600745f1e2 pf: bound DIOCGETSTATES memory use
Similar to what we did earlier for DIOCGETSTATESV2 we only allocate
enough memory for a handful of states and copy those out, bit by bit,
rather than allocating memory for all states in one go.

MFC after:	1 week
Sponsored by:	Rubicon Communications, LLC ("Netgate")
2021-08-02 16:29:23 +02:00
Adam Fenn
8ca384eb1d devclass_alloc_unit: move "at" hint test to after device-in-use test
Only perform this expensive operation when the unit number is a
potential candidate (i.e. not already in use), thereby reducing device
scan time on systems with many devices, unit numbers, and drivers.

Sponsored by:	NetApp, Inc.
Sponsored by:	Klara, Inc.
X-NetApp-PR:	#61
Differential Revision:	https://reviews.freebsd.org/D31381
2021-08-02 11:27:17 -05:00
Alexander Motin
ca34553b6f sched_ule(4): Pre-seed sched_random().
I don't think it changes anything, but why not.

While there, make cpu_search_highest() use all 8 lower load bits for
noise, since it does not use cs_prefer and the code is not shared
with cpu_search_lowest() any more.

MFC after:	1 month
2021-08-02 10:55:28 -04:00
Aleksandr Rybalko
aed2afeb51 Ignore ResourceProducer flag for:
o Arm CoreLink TM CMN-600 Coherent Mesh Network controller,
o Arm CoreLink DMC-620 Dynamic Memory Controller.

Sponsored by:	Ampere Computing LLC
Submitted by:	Klara Inc.
2021-08-02 14:11:20 +03:00
Ka Ho Ng
df95cc76af vmm: Bump vmname buffer in struct vm to VM_MAX_NAMELEN + 1
In hw.vmm.create sysctl handler the maximum length of vm name is
VM_MAX_NAMELEN. However in vm_create() the maximum length allowed is
only VM_MAX_NAMELEN - 1 chars. Bump the length of the internal buffer to
allow the length of VM_MAX_NAMELEN for vm name.

MFC after:	3 days
Reviewed by:	grehan
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D31372
2021-08-02 17:55:08 +08:00
Roger Pau Monné
82bf6a2566 xen/timer: fix amd64 LINT kernel build
On amd64 XENHVM depends on the xentimer device for PVH early startup,
so both should be added or removed together (like the current
dependency with xenpci). Fix this by adding xentimer to NOTES and
updating the comments on the config files. Note that on i386 there's
no such dependency between xentimer and XENHVM, since there's no PVH
support.

While there also fix the MINIMAL i386 build to include the xentimer,
so it keeps the same functionality as before xentimer was split from
XENHVM.

Reported by: lwhsu
PR: 257549
Fixes: ae59812748 ('xen/timer: make xen timer optional')
2021-08-02 10:33:35 +02:00
Hans Petter Selasky
9340ebd404 Add missing file to sys/conf/files after 469884cf04 .
Found by:	vishwin@
Differential Revision:	https://reviews.freebsd.org/D29921
MFC after:	1 week
Sponsored by:	NVIDIA Networking
2021-08-02 08:25:04 +02:00
Alexander Motin
8bb173fb5b sched_ule(4): Use trylock when stealing load.
On some load patterns it is possible for several CPUs to try steal
thread from the same CPU despite randomization introduced.  It may
cause significant lock contention when holding one queue lock idle
thread tries to acquire another one.  Use of trylock on the remote
queue allows both reduce the contention and handle lock ordering
easier.  If we can't get lock inside tdq_trysteal() we just return,
allowing tdq_idled() handle it.  If it happens in tdq_idled(), then
we repeat search for load skipping this CPU.

On 2-socket 80-thread Xeon system I am observing dramatic reduction
of the lock spinning time when doing random uncached 4KB reads from
12 ZVOLs, while IOPS increase from 327K to 403K.

MFC after:	1 month
2021-08-01 22:42:01 -04:00
Alexander Motin
2668bb2add sched_ule(4): Reduce duplicate search for load.
When sched_highest() called for some CPU group returns nothing, idle
thread calls it for the parent CPU group.  But the parent CPU group
also includes the CPU group we've just searched, and unless there is
a race going on, it is unlikely we find anything new this time.

Avoid the double search in case of parent group having only two sub-
groups (the most prominent case). Instead of escalating to the parent
group run the next search over the sibling subgroup and escalate two
levels up after if that fail too.  In case of more than two siblings
the difference is less significant, while searching the parent group
can result in better decision if we find several candidate CPUs.

On 2-socket 40-core Xeon system I am measuring ~25% reduction of CPU
time spent inside cpu_search_highest() in both SMT (2x20x2) and non-
SMT (2x20) cases.

MFC after:	1 month
2021-08-01 22:07:51 -04:00