Commit Graph

137242 Commits

Author SHA1 Message Date
Fedor Uporov
1484574843 Fix inode birthtime updating logic.
The birthtime field of struct vattr does not checked
for VNOVAL in case of ext2_setattr() and produce incorrect
inode birthtime values.

Found using pjdfstest:
    pjdfstest/tests/utimensat/03.t

Reviewed by:    pfg
MFC after:      2 weeks
Differential Revision:  https://reviews.freebsd.org/D29929
2021-05-07 10:08:20 +03:00
Alfredo Dal'Ava Junior
fb53b42e36 virtio-modern: fix PCI common read/write functions on big endian targets
Virtio modern has the common data organized in little endian, but
on powerpc64 BE it was reading and writing in the wrong endian.

Submitted by:	Leonardo Bianconi <leonardo.bianconi@eldorado.org.br>
Reviewed by:	bryanv, alfredo
Sponsored by:	Eldorado Research Institute (eldorado.org.br)
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D28947
2021-05-07 02:40:35 -03:00
Marcin Wojtas
d5b20eaafc sdhci_fsl_fdt: specify base clk divisor per SoC
Only LS1046A and LS1028A require the base clk to be divided by 2.
Implement that by moving the divider to a SoC specific data.
This commit fixes base clk setup for the entire SoC family,
including the already suported LS2160A.

Submitted by: Lukasz Hajec <lha@semihalf.com>
Reviewed by: manu
Obtained from: Semihalf
Sponsored by: Alstom Group
Differential Revision: https://reviews.freebsd.org/D30120
2021-05-07 03:48:54 +02:00
Marcin Wojtas
4dfb620ea4 Add LS1028A clockgen driver
The new driver provides probe and attach functions for the NXP LS1028A
clockgen and passes configuration information to QorIQ clockgen class.

Submitted by: Lukasz Hajec <lha@semihalf.com>
Obtained from: Semihalf
Sponsored by: Alstom Group
Differential Revision: https://reviews.freebsd.org/D30125
2021-05-07 03:48:53 +02:00
Warner Losh
42f3faa762 cdefs.h: Remove __GNUCLIKE___OFFSETOF, it's unused
__GNUCLIKE___OFFSETOF is unreferenced in the tree, remove it as long
obsolete.

Sponsored by:		Netflix
2021-05-06 16:34:55 -06:00
Warner Losh
a709a4f0d4 headers: Implement _ISOC11_SOURCES macro when __POSIX_C_SOURCE defined
When _ISOC11_SOURCES is defined for glibc at the same time
__POSIX_C_SOURCE is defined, it extends the __POSIX_C_SOURCE definition
by exaclty what C11 adds to the spec for each system header.  We follow
both OpenBSD's and glibc's convention by also C11 or higher compliation
mode is selected.

The Open Group is working on issuing a new version of the POSIX standard
that will realign the standard from C99 to a newer version of C. This
commit is a stop-gap measure for greater compatibility until that
environment has been standardized.

Reviewed by:		brooks@, arichards@, Olivier Certne
			(comments tweaked before commit)
PR:			255290
Sponsored by:		Netflix
Differential Revision:	https://reviews.freebsd.org/D29902
2021-05-06 16:20:36 -06:00
Andriy Gapon
12588ce02d PCI hot-plug: use dedicated taskqueue for device attach / detach
Attaching and detaching devices can be heavy-weight and detaching can
sleep waiting for events.  For that reason using the system-wide
single-threaded taskqueue_thread is not really appropriate.
There is even a possibility for a deadlock if taskqueue_thread is used
for detaching.

In fact, there is an easy to reproduce deadlock involving nvme, pass
and a sudden removal of an NVMe device.
A pass peripheral would not release a reference on an nvme sim until
pass_shutdown_kqueue() is executed via taskqueue_thread.  But the
taskqueue's thread is blocked in nvme_detach() -> ... -> cam_sim_free()
because of the outstanding reference.

MFC after:	10 days
Sponsored by:	CyberSecure
Reviewed by:	mav, imp
Differential Revision:	https://reviews.freebsd.org/D30144
2021-05-06 21:49:37 +03:00
Alan Somers
420dbe763f gmultipath: make physpath distinct from the underlying providers'
zfsd uses a device's physical path attribute to automatically replace a
missing ZFS disk when a blank disk is inserted into the same physical
slot.  Currently gmultipath passes through its underlying providers'
physical path attribute.  That may cause zfsd to replace a missing
gmultipath provider with a newly arrived, single-path disk.  That would
be bad.

This commit fixes that problem by simply appending "/mp" to the
underlying providers' physical path, in a manner similar to what geli
already does.

Sponsored by:	Axcient
MFC after:	3 weeks
Differential Revision: https://reviews.freebsd.org/D29941
2021-05-06 12:32:27 -06:00
Gleb Smirnoff
be578b67b5 tcp_twcheck(): use correct unlock macro.
This crippled in due to conflict between two last commits 1db08fbe3f
and 9e644c2300.

Submitted by:	Peter Lei
2021-05-06 10:19:21 -07:00
Randall Stewart
5d8fd932e4 This brings into sync FreeBSD with the netflix versions of rack and bbr.
This fixes several breakages (panics) since the tcp_lro code was
committed that have been reported. Quite a few new features are
now in rack (prefecting of DGP -- Dynamic Goodput Pacing among the
largest). There is also support for ack-war prevention. Documents
comming soon on rack..

Sponsored by:           Netflix
Reviewed by:		rscheff, mtuexen
Differential Revision:	https://reviews.freebsd.org/D30036
2021-05-06 11:22:26 -04:00
Andrew Turner
0ec3e99111 Use '.arch_extension crc' in the arm64 crc32 code
We don't care about the base architecture here, just that the crc
extension is enabled.

Sponsored by:	Innovate UK
2021-05-06 07:42:35 +00:00
Edward Tomasz Napierala
916f3dba45 linux(4): make arch_prctl(2) support GET_CET_STATUS, report unknown codes
This is largely a no-op, to make future debugging slightly easier.

Sponsored By:	EPSRC
Differential Revision:	https://reviews.freebsd.org/D30035
2021-05-06 09:33:42 +01:00
Justin Hibbits
49c894ddce powerpc64: Split out DMAP and non-DMAP implementations of some methods
Summary:
Some methods are split between DMAP and non-DMAP, conditional on
hw_direct_map variable.  Rather than checking this variable every time,
use it to install different functions via IFUNCs.

Reviewed By: luporl
Differential Revision: https://reviews.freebsd.org/D30071
2021-05-05 20:57:33 -05:00
Michael Tuexen
d1cb8d11b0 sctp: improve consistency when handling chunks of wrong size
MFC after:	3 days
2021-05-06 01:02:41 +02:00
Warner Losh
097e8701c9 fix style nit: space after if 2021-05-05 15:26:09 -06:00
Mark Johnston
6c34dde83e igmp: Avoid an out-of-bounds access when zeroing counters
When verifying, byte-by-byte, that the user-supplied counters are
zero-filled, sysctl_igmp_stat() would check for zero before checking the
loop bound.  Perform the checks in the correct order.

Reported by:	KASAN
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2021-05-05 17:12:51 -04:00
Mark Johnston
9a7c2de364 realloc: Fix KASAN(9) shadow map updates
When copying from the old buffer to the new buffer, we don't know the
requested size of the old allocation, but only the size of the
allocation provided by UMA.  This value is "alloc".  Because the copy
may access bytes in the old allocation's red zone, we must mark the full
allocation valid in the shadow map.  Do so using the correct size.

Reported by:	kp
Tested by:	kp
Sponsored by:	The FreeBSD Foundation
2021-05-05 17:12:51 -04:00
John Baldwin
9c87db4b3c Group all compat shim structures together to consolidate #ifdef's.
Reviewed by:	brooks, kib
Obtained from:	CheriBSD
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D29894
2021-05-05 13:59:09 -07:00
John Baldwin
01e9cbc4c5 Use thunks for compat ioctls using struct ifgroupreq.
Reviewed by:	brooks, kib
Obtained from:	CheriBSD
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D29893
2021-05-05 13:59:00 -07:00
John Baldwin
d61d98f4ed Add freebsd32 compat shims for SIOC[GS]DRVSPEC.
Reviewed by:	brooks, kib
Obtained from:	CheriBSD
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D29892
2021-05-05 13:58:50 -07:00
John Baldwin
d17e0940f7 Rework compat shims in ifioctl().
Centralize logic for handling compat ioctls into two blocks of code at
the start and end of the ioctl routine.  This avoids the conversion
logic being spread out both in multiple blocks in ifioctl as well as
various helper functions.

Reviewed by:	brooks, kib
Obtained from:	CheriBSD
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D29891
2021-05-05 13:58:23 -07:00
Warner Losh
a512d0ab00 kern: clarify boot time
In FreeBSD, the current time is computed from uptime + boottime. Uptime
is a continuous, smooth function that's monotonically increasing. To
effect changes to the current time, boottime is adjusted.  boottime is
mutable and shouldn't be cached against future need. Document the
current implementation, with the caveat that we may stop stepping
boottime on resume in the future and will step uptime instead (noted in
the commit message, but not in the code).

Sponsored by:		Netflix
Reviewed by:		phk, rpokala
Differential Revision:	https://reviews.freebsd.org/D30116
2021-05-05 12:32:13 -06:00
Warner Losh
cb58805943 cam: Add doxygen docs to cam_sim_alloc
Add description for what each of the parameters are to the cam_sim_alloc
call. Add some additional context for the mtx and queue parameters to
explain what special values passed in mean.

MFC After:		3 days
Reviewed by:		mav@
Sponsored by:		Netflix
Differential Revision:	https://reviews.freebsd.org/D30115
2021-05-05 11:44:39 -06:00
Ka Ho Ng
4e1e1d667f virtio_blk: Fix issuing T_GET_ID before DRIVER_OK status
DRIVER_OK status is set after device_attach() succeeds. For now postpone
disk_create to attach_completed() method.

MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Reviewed by:	grehan
Approved by:	lwhsu (mentor)
Differential Revision:	https://reviews.freebsd.org/D30049
2021-05-05 23:22:16 +08:00
Edward Tomasz Napierala
5e8caee259 linux: remove redundant SDT tracepoints
Remove all the 'entry' and 'return' probes; they clutter up the source
and are redundant to FBT.

Reviewed By:	dchagin
Sponsored By:	EPSRC
Differential Revision:	https://reviews.freebsd.org/D30040
2021-05-05 13:59:00 +01:00
Marko Zec
2aca58e16f Introduce DXR as an IPv4 longest prefix matching / FIB module
DXR maintains compressed lookup structures with a trivial search
procedure.  A two-stage trie is indexed by the more significant bits of
the search key (IPv4 address), while the remaining bits are used for
finding the next hop in a sorted array.  The tradeoff between memory
footprint and search speed depends on the split between the trie and
the remaining binary search.  The default of 20 bits of the key being
used for trie indexing yields good performance (see below) with
footprints of around 2.5 Bytes per prefix with current BGP snapshots.

Rebuilding lookup structures takes some time, which is compensated for by
batching several RIB change requests into a single FIB update, i.e. FIB
synchronization with the RIB may be delayed for a fraction of a second.
RIB to FIB synchronization, next-hop table housekeeping, and lockless
lookup capability is provided by the FIB_ALGO infrastructure.

DXR works well on modern CPUs with several MBytes of caches, especially
in VMs, where is outperforms other currently available IPv4 FIB
algorithms by a large margin.

Synthetic single-thread LPM throughput test method:

kldload test_lookup; kldload dpdk_lpm4; kldload fib_dxr
sysctl net.route.test.run_lps_rnd=N
sysctl net.route.test.run_lps_seq=N

where N is the number of randomly generated keys (IPv4 addresses) which
should be chosen so that each test iteration runs for several seconds.

Each reported score represents the best of three runs, in million
lookups per second (MLPS), for two bechmarks (RND & SEQ) with two FIBs:

host: single interface address, local subnet route + default route
BGP: snapshot from linx.routeviews.org, 887957 prefixes, 496 next hops

Bhyve VM on an Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60 GHz:
inet.algo         host, RND    host, SEQ    BGP, RND    BGP, SEQ
bsearch4             40.6         20.2         N/A         N/A
radix4                7.8          3.8         1.2         0.6
radix4_lockless      18.0          9.0         1.6         0.8
dpdk_lpm4            14.4          5.0        14.6         5.0
dxr                  70.3         34.7        43.0        19.5

Intel(R) Core(TM) i5-5300U CPU @ 2.30 GHz:
inet.algo         host, RND    host, SEQ    BGP, RND    BGP, SEQ
bsearch4             47.0         23.1         N/A         N/A
radix4                8.5          4.2         1.9         1.0
radix4_lockless      19.2          9.5         2.5         1.2
dpdk_lpm4            31.2          9.4        31.6         9.3
dxr                  84.9         41.4        51.7        23.6

Intel(R) Core(TM) i7-4771 CPU @ 3.50 GHz:
inet.algo         host, RND    host, SEQ    BGP, RND    BGP, SEQ
bsearch4             59.5         29.4         N/A         N/A
radix4               10.8          5.5         2.5         1.3
radix4_lockless      24.7         12.0         3.1         1.6
dpdk_lpm4            29.1          9.0        30.2         9.1
dxr                 101.3         49.9        69.8        32.5

AMD Ryzen 7 3700X 8-Core Processor @ 3.60 GHz:
inet.algo         host, RND    host, SEQ    BGP, RND    BGP, SEQ
bsearch4             70.8         35.4         N/A         N/A
radix4               14.4          7.2         2.8         1.4
radix4_lockless      30.2         15.1         3.7         1.8
dpdk_lpm4            29.9          9.0        30.0         8.9
dxr                 163.3         81.5        99.5        44.4

AMD Ryzen 5 5600X 6-Core Processor @ 3.70 GHz:
inet.algo         host, RND    host, SEQ    BGP, RND    BGP, SEQ
bsearch4             93.6         46.7         N/A         N/A
radix4               18.9          9.3         4.3         2.1
radix4_lockless      37.2         18.6         5.3         2.7
dpdk_lpm4            51.8         15.1        51.6        14.9
dxr                 218.2        103.3       114.0        49.0

Reviewed by:	melifaro
MFC after:	1 week
Differential Revision: https://reviews.freebsd.org/D29821
2021-05-05 13:45:52 +02:00
Marko Zec
a43104ebe7 Revise FIB lookups per second benchmarking routines.
Add a LPS benchmark variant which introduces artificial dependencies
between successive lookups. While here, instead of writing the results
from the lookups to a huge array, add them to an accumulator, in a more
lightweight attempt at preventing the CPU's OOO machinery from
discarding the lookup results if they would be completely unused.

net.route.test.run_lps_rnd measures LPS throughput with independent
uniformly random keys

net.route.test.run_lps_seq measures LPS throughput with uniformly
random keys with artificial interdependencies
Reviewed by:	melifaro
MFC after:	7 days
Differential Revision: https://reviews.freebsd.org/D30096
2021-05-05 12:28:17 +02:00
Warner Losh
122a8c7eb1 param.h: Fix typos
Submitted by:		rpokala@
Sponsored by:		Netflix
2021-05-05 00:50:35 -06:00
Warner Losh
9e0ba9536b param.h: Document __FreeBSD_version better
Document what __FreeBSD_version means a bit better by documenting the
sorts of events it should be bumped for. Also include a handy shorthand
for what it means. Add a some advice for how frequently to change this
as well.

Added a note about the approved way to parse this from the param.h file,
though that was not in the review. All in-tree users have been updated
to this method prior to this commit. Move and reword the comment that
was on the same line.

Suggestions by:		greg@unrelenting, arch@
Reviewed by:		rgrimes@ (earlier version).
Sponsored by:		Netflix
Differential Revision:	https://reviews.freebsd.org/D29850
2021-05-05 00:33:56 -06:00
Navdeep Parhar
f4ba035bca cxgbe(4): Use ifaddr_event_ext instead of ifaddr_event for CLIP management.
The _ext event notification includes the address being added/removed and
that gives the driver an easy way to ignore non-IPv6 addresses.  Remove
'tom' from the handler's name while here, it was moved out of t4_tom a
long time ago.

MFC after:	1 week
Sponsored by:	Chelsio Communications
2021-05-04 20:16:25 -07:00
Lutz Donnerhacke
b1bd44732d netgraph/ng_bridge: learn MACs via control message
Add a new control message to move ethernet addresses to a given link
in ng_bridge(4). Send this message instead of doing the work directly.
This decouples the read-only activity from the modification under a
more strict writer lock.

Decoupling the work is a prerequisite for multithreaded operation.

Approved by:	manpages (bcr), kp (earlier version)
MFC:		3 weeks
Differential Revision:	https://reviews.freebsd.org/D28516
2021-05-04 22:14:59 +02:00
Michael Tuexen
b621fbb1bf sctp: drop packet with SHUTDOWN-ACK chunks with wrong vtags
MFC after:	3 days
2021-05-04 18:43:31 +02:00
Edward Tomasz Napierala
023bff7990 linux(4): fix ptrace(2) to properly handle orig_rax
This fixes strace(1) erroneously reporting return values
as "Function not implemented", combined with reporting the binary
ABI as X32.

Very similar code in linux_ptrace_getregs() is left as it is - it's
probably wrong too, but I don't have a way to test it.

Sponsored By:	EPSRC
Differential Revision:	https://reviews.freebsd.org/D29927
2021-05-04 15:21:06 +01:00
Mark Johnston
8bde6d15d1 nfsclient: Copy only initialized fields in nfs_getattr()
When loading attributes from the cache, the NFS client is careful to
copy only the fields that it initialized.  After fetching attributes
from the server, however, it would copy the entire vattr structure
initialized from the RPC response, so uninitialized stack bytes would
end up being copied to userspace.  In particular, va_birthtime (v2 and
v3) and va_gen (v3) had this problem.

Use a common subroutine to copy fields provided by the NFS client, and
ensure that we provide a dummy va_gen for the v3 case.

Reviewed by:	rmacklem
Reported by:	KMSAN
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D30090
2021-05-04 08:53:57 -04:00
Edward Tomasz Napierala
ee384b229d linux(4): make linkat(2) handle AT_EMPTY_PATH
Sponsored By:	EPSRC
Differential Revision:	https://reviews.freebsd.org/D29974
2021-05-04 13:09:46 +01:00
Rick Macklem
0755df1eee nfscl: fix typo in a comment
MFC after:	2 weeks
2021-05-03 18:29:27 -07:00
Sai Rajesh Tallamraju
64881da478 ixgbe: Restore AIM support
AIM (adaptive interrupt moderation) was part of BSD11 driver. Upon IFLIB
migration, AIM feature got lost. Re-introducing AIM back into IFLIB
based IXGBE driver.

One caveat is that in BSD11 driver, a queue comprises both Rx and Tx
ring. Starting from BSD12, Rx and Tx have their own queues and rings.
Also, IRQ is now only configured for Rx side. So, when AIM is
re-enabled, we should now consider only Rx stats for configuring EITR
register in contrast to BSD11 where Rx and Tx stats were considered to
manipulate EITR register.

Reviewed by:	gallatin, markj
Sponsored by:	NetApp, Inc.
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D27344
2021-05-03 13:47:14 -04:00
Mark Johnston
f161d294b9 Add missing sockaddr length and family validation to various protocols
Several protocol methods take a sockaddr as input.  In some cases the
sockaddr lengths were not being validated, or were validated after some
out-of-bounds accesses could occur.  Add requisite checking to various
protocol entry points, and convert some existing checks to assertions
where appropriate.

Reported by:	syzkaller+KASAN
Reviewed by:	tuexen, melifaro
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D29519
2021-05-03 13:35:19 -04:00
Elliott Mitchell
a3c7da3d08 kern/intr: declare interrupt vectors unsigned
These should never get values large enough for sign to matter, but one
of them becoming negative could cause problems.

MFC after:	1 week
Reviewed by:	markj
Differential Revision:	https://reviews.freebsd.org/D29327
2021-05-03 13:24:30 -04:00
Mark Johnston
243b324f96 devfs: Avoid comparison with an uninitialized var in devfs_fp_check()
devvn_refthread() will initialize *devp only if it succeeds, so check for
success before comparing with fp->f_data.  Other devvn_refthread()
callers are careful to do this.

Reported by:	KMSAN
Reviewed by:	kib
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D30068
2021-05-03 13:24:30 -04:00
Mark Johnston
2b2d77e720 VOP_STAT: Provide a default value for va_gen
Some filesystems, e.g., pseudofs and the NFSv3 client, do not provide
one.

Reviewed by:	kib
Reported by:	KMSAN
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D30091
2021-05-03 13:24:30 -04:00
Mark Johnston
cdfcfc607a smp: Initialize arg->cpus sooner in smp_rendezvous_cpus_retry()
Otherwise, if !smp_started is true, then smp_rendezvous_cpus_done() will
harmlessly perform an atomic RMW on an uninitialized variable.

Reported by:	KMSAN
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2021-05-03 13:24:30 -04:00
Konstantin Belousov
7cb40543e9 filt_timerexpire: do not iterate over the interval
User-supplied data might make this loop too time-consuming. Divide
directly, and handle both the possibility that we were woken up earlier,
and arithmetic overflows/underflows from the calculation.

Reported and tested by:	pho (previous version)
Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D30069
2021-05-03 19:49:54 +03:00
Konstantin Belousov
87a64872cd Add ptrace(PT_COREDUMP)
It writes the core of live stopped process to the file descriptor
provided as an argument.

Based on the initial version from https://reviews.freebsd.org/D29691,
submitted by Michał Górny <mgorny@gentoo.org>.

Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D29955
2021-05-03 19:18:26 +03:00
Konstantin Belousov
68d311b666 ptracestop: mark threads suspended there with the new TDB_SSWITCH flag
This way threads in ptracestop can be discovered by debugger

Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D29955
2021-05-03 19:18:25 +03:00
Konstantin Belousov
9ebf9100ba ptrace: do not allow for parallel ptrace requests
Set a new P2_PTRACEREQ flag around the request Wait for the target     .
process P2_PTRACEREQ flag to clear before setting ours                 .

Otherwise, we rely on the moment that the process lock is not dropped
until the stopped target state is important.  This is going to be no
longer true after some future change.

Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D29955
2021-05-03 19:16:30 +03:00
Konstantin Belousov
54c8baa021 kern_ptrace(): extract code to determine ptrace eligibility into helper
Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D29955
2021-05-03 19:13:48 +03:00
Konstantin Belousov
2bd0506c8d kern_ptrace: change type of proctree_locked to bool
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D29955
2021-05-03 19:13:48 +03:00
Konstantin Belousov
af928fded0 Add thread_run_flash() helper
It unsuspends single suspended thread, passed as the argument.
It is up to the caller to arrange the target thread to suspend later,
since the state of the process is not changed from stopped.  In particular,
the unsuspended thread must not leave to userspace, since boundary code
is not prepared to this situation.

Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D29955
2021-05-03 19:13:47 +03:00
Konstantin Belousov
15465a2c25 Add sleepq_remove_nested()
The helper removes the thread from a sleep queue, assuming that it would
need to sleep. The sleepq_remove_nested() function is intended for quite
special case, where suspended thread from traced stopped process is
temporary unsuspended to do some work on behalf of the debugger in the
target context, and this work might require sleep.

Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D29955
2021-05-03 19:13:47 +03:00
Konstantin Belousov
86ffb3d1a0 ELF coredump: define several useful flags for the coredump operations
- SVC_ALL request dumping all map entries, including those marked as
  non-dumpable
- SVC_NOCOMPRESS disallows compressing the dump regardless of the sysctl
  policy
- SVC_PC_COREDUMP is provided for future use by userspace core dump
  request

Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D29955
2021-05-03 19:13:47 +03:00
Konstantin Belousov
5bc3c61780 imgact_elf: consistently pass flags from coredump down to helper functions
Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D29955
2021-05-03 19:13:47 +03:00
Edward Tomasz Napierala
7818653fd6 cam: fix integer overflow during inquiry
From my understanding this could happen with iSCSI LUNs with
unusually long names.  The bug would make CAM fail to retrieve
the full inquiry data.  Instead of bumping the size of the local
variable, just use a macro.

Reviewed By:	imp, mav
Sponsored by:	NetApp, Inc.
Sponsored by:	Klara, Inc.
X-NetApp-PR:	#50
Differential Revision:	https://reviews.freebsd.org/D29991
2021-05-03 15:20:17 +01:00
Jose Luis Duran
8f1562430f Add Apollo Lake SIO/LPSS UARTs PCI IDs
Add PCI IDs for Intel Apollo Lake Series HSUARTs:

    # pciconf -ll
    drv   selector      class    rev  hdr  vendor device subven subdev
    uart0@pci0:0:24:0:  118000   0b   00   8086   5abc   8086   7270
    uart1@pci0:0:24:1:  118000   0b   00   8086   5abe   8086   7270
    uart2@pci0:0:24:2:  118000   0b   00   8086   5ac0   8086   7270
    uart3@pci0:0:24:3:  118000   0b   00   8086   5aee   8086   7270

NB (Intel Document Number 336256-004US):
1. The E3900 and A3900 Series Processors support four LPSS_UART ports,
   while the N- and J- Series Processors support only LPSS_UART [2:1]
   ports.
2. The LPSS_UART1 port is dedicated for discrete Global Navigation
   Satellite System (GNSS).  This port can be used for generic UART
   functionality if GNSS is not used.
3. The LPSS_UART2 port is dedicated for host OS debug.
4. The LPSS_UART0 and LPSS_UART3 ports are for generic UART functionality.
5. Only UART [1:0] ports support DMA.

PR:	255556
Submitted by:	Jose Luis Duran <jlduran@gmail.com>
MFC after:	1 week
2021-05-03 14:38:52 +03:00
Jose Luis Duran
5b8b6b26e4 uart_bus_pci.c: Style
Wrap long lines, use tab instead of spaces.

PR:	255556
Submitted by:	Jose Luis Duran <jlduran@gmail.com>
MFC after:	1 week
2021-05-03 14:38:52 +03:00
Jose Luis Duran
0ea8a7f36d ifconfig: Minor documentation fix
Fix what appears to have been a small copy/paste typo in ifconfig(8)'s
documentation (man page and header file).

Not that it matters anymore.

Reference: Table I-2 in IEEE Std 802.1Q-2014.

PR:	255557
Submitted by:	Jose Luis Duran <jlduran@gmail.com>
MFC after:	1 week
2021-05-03 14:38:52 +03:00
Andrew Turner
0ec205197b Also enable IPIs on 32-bit arm
This was missed in 2420f6a

Reported by:	tuexen, imp
2021-05-03 08:36:57 +00:00
Michael Tuexen
8b3d0f6439 sctp: improve address list scanning
If the alternate address has to be removed, force the stack to
find a new one, if it is still needed.

MFC after:	3 days
2021-05-03 02:50:05 +02:00
Michael Tuexen
a89481d328 sctp: improve restart handling
This fixes in particular a possible use after free bug reported
Anatoly Korniltsev and Taylor Brandstetter for the userland stack.

MFC after:	3 days
2021-05-03 02:20:24 +02:00
Alexander Motin
655c200cc8 Fix build after 5f2e183505. 2021-05-02 20:07:38 -04:00
Alexander Motin
2760658b21 Improve UMA cache reclamation.
When estimating working set size, measure only allocation batches, not free
batches.  Allocation and free patterns can be very different.  For example,
ZFS on vm_lowmem event can free to UMA few gigabytes of memory in one call,
but it does not mean it will request the same amount back that fast too, in
fact it won't.

Update working set size on every reclamation call, shrinking caches faster
under pressure.  Lack of this caused repeating vm_lowmem events squeezing
more and more memory out of real consumers only to make it stuck in UMA
caches.  I saw ZFS drop ARC size in half before previous algorithm after
periodic WSS update decided to reclaim UMA caches.

Introduce voluntary reclamation of UMA caches not used for a long time. For
each zdom track longterm minimal cache size watermark, freeing some unused
items every UMA_TIMEOUT after first 15 minutes without cache misses. Freed
memory can get better use by other consumers.  For example, ZFS won't grow
its ARC unless it see free memory, since it does not know it is not really
used.  And even if memory is not really needed, periodic free during
inactivity periods should reduce its fragmentation.

Reviewed by:	markj, jeff (previous version)
MFC after:	2 weeks
Sponsored by:	iXsystems, Inc.
Differential Revision:	https://reviews.freebsd.org/D29790
2021-05-02 19:45:23 -04:00
Michael Tuexen
5f2e183505 sctp: improve error handling in INIT/INIT-ACK processing
When processing INIT and INIT-ACK information, also during
COOKIE processing, delete the current association, when it
would end up in an inconsistent state.

MFC after:	3 days
2021-05-02 22:41:35 +02:00
Rick Macklem
4f592683c3 copy_file_range(2): improve copying of a large hole to EOF
PR#255523 reported that a file copy for a file with a large hole
to EOF on ZFS ran slowly over NFSv4.2.
The problem was that vn_generic_copy_file_range() would
loop around reading the hole's data and then see it is all
0s. It was coded this way since UFS always allocates a data
block near the end of the file, such that a hole to EOF never exists.

This patch modifies vn_generic_copy_file_range() to check for a
ENXIO returned from VOP_IOCTL(..FIOSEEKDATA..) and handle that
case as a hole to EOF. asomers@ confirms that it works for his
ZFS test case.

PR:	255523
Tested by:	asomers
Reviewed by:	asomers
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D30076
2021-05-02 16:04:27 -07:00
Andrew Turner
2420f6aed9 Enable IPIs on CPU 0 on arm and arm64
Not all interrupt controllers enable IPIs by default as the Arm
GIC specs make it an implementation defined option. As at least two
hypervisors have also previously masked the IPIs on boot.

As we already enable these IPIs on the non-boot CPUs it is expected
this is a safe operation.

Differential Revision:	https://reviews.freebsd.org/D26975
2021-05-02 07:43:34 +00:00
Andrew Turner
fe38224977 Implement bus_map_resource on arm64
This will allow us to allocate an unmapped memory resource, then
later map it with a specific memory attribute.

This is also needed for virtio with the modern PCI attachment.

Reviewed by:	kib (via D29723)
Sponsored by:	Innovate UK
Differential Revision:	https://reviews.freebsd.org/D29694
2021-05-02 07:35:16 +00:00
Justin Hibbits
be48fe6000 powerpc/xive: Remove POWER9 DD1 IRQ bits
The OPAL_XIVE_*_VIA_IFW flags are used only for POWER9 DD1, which we
don't support.

Noticed while perusing Linux and skiboot git logs.
2021-05-01 16:18:02 -05:00
Andrew Turner
c78ad207ba Switch the EFI virtual address to a uint64_t
It is defined as a uint64_t in the UEFI spec. As it's not used as a
pointer by the kernel follow this and define it as the same in the
kernel.

Reviewed by:	kib, manu, imp
Sponsored by:	Innovate UK
Differential Revision:	https://reviews.freebsd.org/D29759
2021-05-01 06:01:20 +00:00
Andrew Turner
2abd4f8581 Add a way to map arm64 non-posted device memory
On arm64 we currently use a non-posted write for device memory, however
we should move to use posted writes. This is expected to work on most
hardware, however we will need to support a non-posted option for some
broken hardware.

Reviewed by:	imp, manu, bcr (manpage)
Differential Revision:	https://reviews.freebsd.org/D29722
2021-05-01 06:01:20 +00:00
Justin Hibbits
a6ca7519f8 powerpc64: Optimize radix trap handling a little more
Summary:
Since PCPU can live in a GPR for a while longer, let it, rather than
re-getting it in yet another register.  MFSPR is an expensive operation,
12 clock latency on POWER9, so the fewer operations we need, the better.

Since the check is tightly coupled to the fetch, by reducing the number
of fetch+check, we reduce the stalls, and improve the performance
marginally.  Buildworld was measured at a ~5-7% improvement on a single
run.

Reviewed By: nwhitehorn
Differential Revision: https://reviews.freebsd.org/D30003
2021-04-30 19:58:11 -05:00
Marcin Wojtas
e245ee2774 gicv3_its: Flush cache after allocating ITT memory
It has to be zeroed before committing it to device.
We do that by allocating it with M_ZERO, but there was no
memory barrier or cache flush to ensure its sees it zeroed.
This fixes MSIX on LS1028A SoC.

Submitted by: Kornel Duleba <mindal@semihalf.com>
Reviewed by: andrew
Obtained from: Semihalf
Sponsored by: Alstom Group
Differential Revision: https://reviews.freebsd.org/D30033
2021-05-01 00:58:26 +02:00
Eric van Gyzen
2f32a971b7 Wait longer for a previous IPI to be sent
When sending an IPI, if a previous IPI is still pending delivery,
native_lapic_ipi_vectored() waits for the previous IPI to be sent.
We've seen a few inexplicable panics with the current timeout of 50 ms.
Increase the timeout to 1 second and make it tunable.

No hardware specification mentions a timeout in this case; I checked
the Intel SDM, Intel MP spec, and Intel x2APIC spec.  Linux and illumos
wait forever.  In Linux, see __default_send_IPI_shortcut() in
arch/x86/kernel/apic/ipi.c.  In illumos, see apic_send_ipi() in
usr/src/uts/i86pc/io/pcplusmp/apic_common.c.  However, misbehaving hardware
could hang the system if we wait forever.

Reviewed by:	mav kib
MFC after:	1 week
Sponsored by:	Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D29942
2021-04-30 13:32:29 -05:00
Konstantin Belousov
619fe09586 ioccom: define ioctl cmd value that can never be valid
Its use is for cases where some filler is needed for cmd, or we need an
indication that there were no cmd supplied, and so on.

Reviewed by:	jhb
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D29935
2021-04-30 17:43:45 +03:00
Konstantin Belousov
2082565798 O_PATH: disable kqfilter for fifos
Filter on fifos is real filter for the object, and not a filesystem
events filter like EVFILT_VNODE.

Reported by:	markj using syzkaller
Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2021-04-30 17:43:45 +03:00
Konstantin Belousov
72a42ec63b amd64: disable LA57 by default for now
A testing on the real hardware uncovered an issue, and since I do not have
access to the machine, disable until the bug can be fixed.

Reported by:	"Pieper, Jeffrey E" <jeffrey.e.pieper@intel.com>
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2021-04-30 17:43:45 +03:00
Konstantin Belousov
21fc6a2a10 amd64: invalidate TLB between page table update and access
When setting up trampoline mapping for LA57 switcher, it is possible
that TLB still has some random mapping at that address.

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2021-04-30 17:43:45 +03:00
Michael Tuexen
e010d20032 sctp: update the vtag for INIT and INIT-ACK chunks
This is needed in case of responding with an ABORT to an INIT-ACK.
2021-04-30 13:33:16 +02:00
Marcin Wojtas
cd945dc08a iflib: Take iri_pad into account when processing small frames
Drivers can specify padding of received frames with iri_pad field.
This can be used to enforce ip alignment by hardware.
Iflib ignored that padding when processing small frames,
which rendered this feature inoperable.
I found it while writing a driver for a NIC that can ip align
received packets. Note that this doesn't change behavior of existing
drivers as they all set iri_pad to 0.

Submitted by: Kornel Duleba <mindal@semihalf.com>
Reviewed by: gallatin
Obtained from: Semihalf
Sponsored by: Alstom Group
Differential Revision: https://reviews.freebsd.org/D30009
2021-04-30 12:46:17 +02:00
Michael Tuexen
eb79855920 sctp: fix SCTP_PEER_ADDR_PARAMS socket option
Ignore spp_pathmtu if it is 0, when setting the IPPROTO_SCTP level
socket option SCTP_PEER_ADDR_PARAMS as required by RFC 6458.

MFC after:	1 week
2021-04-30 12:31:09 +02:00
Kristof Provost
055c55abef pf: Fix IP checksum on reassembly
If we reassemble a packet we modify the IP header (to set the length and
remove the fragment offset information), but we failed to update the
checksum. On certain setups (mostly where we did not re-fragment again
afterwards) this could lead to us sending out packets with incorrect
checksums.

PR:		255432
MFC after:	1 week
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D30026
2021-04-30 08:19:46 +02:00
Michael Tuexen
eecdf5220b sctp: use RTO.Initial of 1 second as specified in RFC 4960bis 2021-04-30 00:45:56 +02:00
Michael Tuexen
9de7354bb8 sctp: improve consistency in handling chunks with wrong size
Just skip the chunk, if no other handling is required by the
specification.
2021-04-28 18:11:06 +02:00
Mark Johnston
20e3b9d8bd kasan: Use vm_offset_t for the first parameter to kasan_shadow_map()
No functional change intended.

Sponsored by:	The FreeBSD Foundation
2021-04-29 11:39:02 -04:00
Yinlong Lu
ee8b757a94 ipmi: support getting address from EFI
The original implementation only supports getting the address from legacy
BIOS (by searching for the SMBIOS_SIG pattern in a fixed address space).

Try to get the SMBIOS table from EFI through efirt (EFI Runtime Services)
firstly.  Continue to search in the legacy BIOS if a NULL address is
returned from EFI.

By this way the ipmi function supports both legacy BIOS and UEFI systems.

Reviewed by:	dab, vangyzen
MFC after:	1 week
Sponsored by:	Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D30007
2021-04-29 05:20:58 -05:00
Kristof Provost
eaabed8ac4 pf: Trivial typo fix
PV -> PF

MFC after:	1 week
Sponsored by:	Rubicon Communications, LLC ("Netgate")
2021-04-29 15:25:07 +02:00
Navdeep Parhar
b9820bca18 cxgbe(4): Do not panic when tx is called with invalid checksum requests.
There is no need to panic in if_transmit if the checksums requested are
inconsistent with the frame being transmitted.  This typically indicates
that the kernel and driver were built with different INET/INET6 options,
or there is some other kernel bug.  The driver should just throw away
the requests that it doesn't understand and move on.

MFC after:	1 week
Sponsored by:	Chelsio Communications
2021-04-28 14:04:53 -07:00
Alexander V. Chernikov
41ce0e34ea [fib algo] Update fib_gen counter under FIB_MOD_LOCK.
MFC after:	3 days
2021-04-28 20:23:03 +00:00
Mateusz Guzik
074abaccfa cache: remove incomplete lockless lockout support during resize
This is already properly handled thanks to 2 step hash replacement.
2021-04-28 19:53:25 +00:00
Kevin Bowling
fdbcd35a75 ixgbe: Improve device name strings
This is just clerical work to ease bug triage and may be used to set
expectations around the ability for anyone in the community to perform
testing and development on older parts.

Approved by:	erj
MFC after:	1 month
Differential Revision:	https://reviews.freebsd.org/D29876
2021-04-28 10:29:59 -07:00
Kristof Provost
6b146f3b9b pf: Error tracing SDTs
Add additional DTrace static trace points to facilitate debugging
failing pf ioctl calls.

MFC after:	1 week
Sponsored by:	Rubicon Communications, LLC ("Netgate")
2021-04-28 17:19:10 +02:00
Neel Chauhan
341da0077e Bump __FreeBSD_version for commits efe7f12 and 9781105
These commits have added new APIs to linuxkpi.
2021-04-28 08:07:05 -07:00
Neel Chauhan
9781105bea linuxkpi: Introduce tasklet_disable_nosync()
This is needed for the drm-kmod 5.5 update.

Reviewed by:		hselasky (src)
MFC after:		3 days
Differential Revision:	https://reviews.freebsd.org/D30024
2021-04-28 08:05:57 -07:00
Neel Chauhan
efe7f12cd3 linuxkpi: Implement rcu_replace_pointer() macro
This is needed for the drm-kmod 5.5 update.

Reviewed by:		hselasky (src)
MFC after:		3 days
Differential Revision:	https://reviews.freebsd.org/D30025
2021-04-28 08:04:52 -07:00
Mark Johnston
d1e9441583 pipe: Avoid calling selrecord() on a closing pipe
pipe_poll() may add the calling thread to the selinfo lists of both ends
of a pipe.  It is ok to do this for the local end, since we know we hold
a reference on the file and so the local end is not closed.  It is not
ok to do this for the remote end, which may already be closed and have
called seldrain().  In this scenario, when the polling thread wakes up,
it may end up referencing a freed selinfo.

Guard the selrecord() call appropriately.

Reviewed by:	kib
Reported by:	syzkaller+KASAN
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D30016
2021-04-28 10:43:29 -04:00
Richard Scheffenegger
48be5b976e tcp: stop spurious rescue retransmissions and potential asserts
Reported by: pho@
MFC after: 3 days
Reviewed By: tuexen, #transport
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D29970
2021-04-28 15:01:10 +02:00
Thomas Munro
3aaaa2efde poll(2): Add POLLRDHUP.
Teach poll(2) to support Linux-style POLLRDHUP events for sockets, if
requested.  Triggered when the remote peer shuts down writing or closes
its end.

Reviewed by:	kib
MFC after:	1 month
Differential Revision:	https://reviews.freebsd.org/D29757
2021-04-28 23:00:31 +12:00
Alexander V. Chernikov
f9668e42b4 Add rib_walk_from() wrapper for selective rib tree traversal.
Provide wrapper for the rnh_walktree_from() rib callback.
As currently `struct rib_head` is considered internal to the
 routing subsystem, this wrapper is necessary to maintain isolation
 from the external code.

Differential Revision: https://reviews.freebsd.org/D29971
MFC after:	1 week
2021-04-28 08:09:45 +00:00
Navdeep Parhar
83b5cda106 cxgbe(4): Add support for NIC suspend/resume and live reset.
Add suspend/resume callbacks to the driver and a live reset built around
them.  This commit covers the basic NIC and future commits will expand
this functionality to other stateful parts of the chip.  Suspend and
resume operate on the chip (the t?nex nexus device) and affect all its
ports.  It is not possible to suspend/resume or reset individual ports.
All these operations can be performed on a running NIC.  A reset will
look like a link bounce to the networking stack.

Here are some ways to exercise this functionality:

 /* Manual suspend and resume. */
 # devctl suspend t6nex0
 # devctl resume t6nex0

 /* Manual reset. */
 # devctl reset t6nex0

 /* Manual reset with driver sysctl. */
 # sysctl dev.t6nex.0.reset=1

 /* Automatic adapter reset on any fatal error. */
 # hw.cxgbe.reset_on_fatal_err=1

Suspend disables the adapter (DMA, interrupts, and the port PHYs) and
marks the hardware as unavailable to the driver.  All ifnets associated
with the adapter are still visible to the kernel but operations that
require hardware interaction will fail with ENXIO.  All ifnets report
link-down while the adapter is suspended.

Resume will reattach to the card, reconfigure it as before, and recreate
the queues servicing the existing ifnets.  The ifnets are able to send
and receive traffic as soon as the link comes back up.

Reset is roughly the same as a suspend and a resume with at least one of
these events in between: D0->D3Hot->D0, FLR, PCIe link retrain.

MFC after:	1 month
Relnotes:	yes
Sponsored by:	Chelsio Communications
2021-04-27 22:48:51 -07:00
Rick Macklem
f6fec55fe3 nfscl: add check for NULL clp and forced dismounts to nfscl_delegreturnvp()
Commit aad780464f added a function called nfscl_delegreturnvp()
to return delegations during the NFS VOP_RECLAIM().
The function erroneously assumed that nm_clp would
be non-NULL. It will be NULL for NFSV4.0 mounts until
a regular file is opened. It will also be NULL during
vflush() in nfs_unmount() for a forced dismount.

This patch adds a check for clp == NULL to fix this.

Also, since it makes no sense to call nfscl_delegreturnvp()
during a forced dismount, the patch adds a check for that
case and does not do the call during forced dismounts.

PR:	255436
Reported by:	ish@amail.plala.or.jp
MFC after:	2 weeks
2021-04-27 17:30:16 -07:00
Rick Macklem
db8c27f499 nfsd: fix a NFSv4.1 Linux client mount stuck in CLOSE_WAIT
It was reported that a NFSv4.1 Linux client mount against
a FreeBSD12 server was hung, with the TCP connection in
CLOSE_WAIT state on the server.
When a NFSv4.1/4.2 mount is done and the back channel is
bound to the TCP connection, the soclose() is delayed until
a new TCP connection is bound to the back channel, due to
a reference count being held on the SVCXPRT structure in
the krpc for the socket. Without the soclose() call, the socket
will remain in CLOSE_WAIT and this somehow caused the Linux
client to hang.

This patch adds calls to soshutdown(.., SHUT_WR) that
are performed when the server side krpc sees that the
socket is no longer usable.  Since this can be done
before the back channel is bound to a new TCP connection,
it allows the TCP connection to proceed to CLOSED state.

PR:	254590
Reported by:	jbreitman@tildenparkcapital.com
Reviewed by:	tuexen
Comments by:	kevans
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D29526
2021-04-27 15:32:35 -07:00
Kevin Bowling
eea55de7b1 e1000: Rework em_msi_link interrupt filter
* Fix 82574 Link Status Changes, carrying the OTHER mask bit around as
  needed.
* Move igb-class LSC re-arming out of FAST back into the handler.
* Clarify spurious/other interrupt re-arms in FAST.

In MSI-X mode, 82574 and igb-class devices use an interrupt filter to
handle Link Status Changes. We want to do LSC re-arms in the handler
to take advantage of autoclear (EIAC) single shot behavior.

82574 uses 'Other' in ICR and IMS for LSC interrupt types when in MSI-X
mode, so we need to set and re-arm the 'Other' bit during attach and
after ICR reads in the FAST handler if not an LSC or after handling on
LSC due to autoclearing.

This work was primarily done to address the referenced PR, but inspired
some clarification and improvement for igb-class devices once the
intentions of previous bug fix attempts became clearer.

PR:		211219
Reported by:	Alexey <aserp3@gmail.com>
Tested by:	kbowling (I210 lagg), markj (I210)
Approved by:	markj
MFC after:	1 month
Differential Revision:	https://reviews.freebsd.org/D29943
2021-04-27 15:29:39 -07:00
Alexander V. Chernikov
8a0d57baec [fib algo] Delay algo init at fib growth to to allow to reliably use rib KPI.
Currently, most of the rib(9) KPI does not use rnh pointers, using
 fibnum and family parameters to determine the rib pointer instead.
This works well except for the case when we initialize new rib pointers
 during fib growth.
In that case, there is no mapping between fib/family and the new rib,
 as an entirely new rib pointer array is populated.

Address this by delaying fib algo initialization till after switching
 to the new pointer array and updating the number of fibs.
Set datapath pointer to the dummy function, so the potential callers
 won't crash the kernel in the brief moment when the rib exists, but
 no fib algo is attached.

This change allows to avoid creating duplicates of existing rib functions,
 with altered signature.

Differential Revision: https://reviews.freebsd.org/D29969
MFC after:	1 week
2021-04-27 22:10:08 +00:00
Brandon Bergren
6e1abda231 riscv: Remove old qemu compatibility code
During early qemu development, the /soc node was marked as compatible
with "riscv-virtio-soc" instead of "simple-bus".

This was changed in qemu 53f54508dae6 in Sep 2018, and predates the
baseline required qemu version (5.0) for riscv by a wide margin.

The generic simplebus code handles attachment in all cases nowadays.

Sponsored by:	Tag1 Consulting, Inc.
Reviewed by:	jrtc27, mhorne
Differential Revision:	https://reviews.freebsd.org/D30011
2021-04-27 16:22:04 -05:00
Ruslan Bukin
f17c4e38f5 Move IOMMU code to a separate pmap module and switch ARM System MMU
driver to use it.

Add ARM Mali Txxx (Midgard), Gxx (Bifrost) GPU page management code.

Sponsored by: UKRI
2021-04-27 19:16:09 +01:00
Emmanuel Vadot
f77d8d1011 dwc: Use mii_fdt function
Use the helper function to get phy mode and configure dwc accordingly.

Reviewed by:	ian
2021-04-27 19:07:33 +02:00
Emmanuel Vadot
80020d7888 mmccam: probe*: Style(9) 2021-04-27 19:03:16 +02:00
Emmanuel Vadot
e017c1c92c mmcprobe_done: Style(9) 2021-04-27 19:03:09 +02:00
Emmanuel Vadot
7cbdf8a05d dwmmc: Add \n to a debug printf 2021-04-27 19:01:09 +02:00
Emmanuel Vadot
f1cc48e5da mmc: dwmmc: Convert driver to use the mmc_sim interface
A lot more generic cam related things are done in mmc_sim so this simplify
the driver a lot.

Differential Revision:	https://reviews.freebsd.org/D27487
Reviewed by:	kibab
2021-04-27 19:00:47 +02:00
Emmanuel Vadot
2671bdb540 allwinner: aw_mmc: Convert driver to use the mmc_sim interface
A lot more generic cam related things are done in mmc_sim so this simplify
the driver a lot.

Differential Revision:	https://reviews.freebsd.org/D27486
Reviewed by:	imp
2021-04-27 19:00:42 +02:00
Emmanuel Vadot
47bde7925b mmccam: Add mmc_sim, a generic sim for mmc driver to use
This adds a generic sim that abstract a lot of what needs to be implemented
in a driver for mmccam support.
A new interface with three methods is added :

 - mmc_sim_get_tran_settings: Use to get what the controller supports in term
   of capabilities, freq etc ...
 - mmc_sim_set_tran_settings: Use to change the speed/freq/etc of the
   sdcard host controller
 - mmc_sim_cam_request: Used for MMCIO requests

Differential Revision:	https://reviews.freebsd.org/D27485
Reviewed by:	kibab
2021-04-27 19:00:38 +02:00
Ruslan Bukin
4c1ecf5502 Consider the broken card detect flag that comes from 'broken-cd;'
dts property.

This fixes operation on Intel Stratix 10 devices.

Tested on Terasic DE10-Pro.

Reviewed by: manu
Sponsored by: UKRI
Differential revision: https://reviews.freebsd.org/D29999
2021-04-27 12:19:05 +01:00
Michael Tuexen
059ec2225c sctp: cleanup verification of INIT and INIT-ACK chunks 2021-04-27 12:45:43 +02:00
Alexander V. Chernikov
439d087d0b [fib algo] always commit static routes synchronously.
Modular fib lookup framework features logic that allows
 route update batching for the algorithms that cannot easily
 apply the routing change without rebuilding. As a result,
 dataplane lookups may return old data until the the sync
 takes place. With the default sync timeout of 50ms, it is
 possible that new binary like ping(8) executed exactly after
 route(8) will still use the old fib data.

To address some aspects of the problem, framework executes
 all rtable changes without RTF_GATEWAY synchronously.

To fix the aforementioned problem, this diff extends sync
 execution for all RTF_STATIC routes (e.g. ones maintained by
 route(8).
This fixes a bunch of tests in the networking space.

Reported by:	ci, arichardson
MFC after:	2 weeks
2021-04-27 08:31:40 +00:00
Alexander V. Chernikov
25682e6a49 Fix rtsock sockaddr alignment.
b31fbebeb3 introduced alloc_sockaddr_aligned() which, in fact,
 failed to produce aligned addresses.

Reported by:	Oskar Holmlund <oskar.holmlund at yahoo.com>
MFC after:	immediately
2021-04-27 08:04:19 +00:00
Alexander V. Chernikov
bc5ef45aec Fix drace CTF for the rib_head.
33cb3cb2e3 introduced an `rib_head` structure field under the
FIB_ALGO define. This may be problematic for the CTF, as some
 of the files including `route_var.h` do not have `fib_algo`
 defined.

Make dtrace happy by making the field unconditional.

Suggested by:	markj
2021-04-27 07:47:53 +00:00
Rick Macklem
f5ff282bc0 nfscl: fix the handling of NFSERR_DELAY for Open/LayoutGet RPCs
For a pNFS mount, the NFSv4.1/4.2 client uses compound RPCs that
have both Open and LayoutGet operations in them.
If the pNFS server were tp reply NFSERR_DELAY for one of these
compounds, the retry after a delay cannot be handled by
newnfs_request(), since there is a reference held on the open
state for the Open operation in them.

Fix this by adding these RPCs to the "don't do delay here"
list in newnfs_request().

This patch is only needed if the mount is using pNFS (the "pnfs"
mount option) and probably only matters if the MDS server
is issuing delegations as well as pNFS layouts.

Found by code inspection.

MFC after:	2 weeks
2021-04-26 17:48:21 -07:00
Rick Macklem
61aea7fa3c param.h: bump __FreeBSD_version for commit 8759773148
Commit 8759773148 changed the internal KPI between the
nfsd and nfscommon modules, so both need to be rebuilt
from sources.
2021-04-26 16:35:18 -07:00
Rick Macklem
8759773148 nfsd: fix the slot sequence# when a callback fails
Commit 4281bfec36 patched the server so that the
callback session slot would be free'd for reuse when
a callback attempt fails.
However, this can often result in the sequence# for
the session slot to be advanced such that the client
end will reply NFSERR_SEQMISORDERED.

To avoid the NFSERR_SEQMISORDERED client reply,
this patch negates the sequence# advance for the
case where the callback has failed.
The common case is a failed back channel, where
the callback cannot be sent to the client, and
not advancing the sequence# is correct for this
case.  For the uncommon case where the client's
reply to the callback is lost, not advancing the
sequence# will indicate to the client that the
next callback is a retry and not a new callback.
But, since the FreeBSD server always sets "csa_cachethis"
false in the callback sequence operation, a retry
and a new callback should be handled the same way
by the client, so this should not matter.

Until you have this patch in your NFSv4.1/4.2 server,
you should consider avoiding the use of delegations.
Even with this patch, interoperation with the
Linux NFSv4.1/4.2 client in kernel versions prior
to 5.3 can result in frequent 15second delays if
delegations are enabled.  This occurs because, for
kernels prior to 5.3, the Linux client does a TCP
reconnect every time it sees multiple concurrent
callbacks and then it takes 15seconds to recover
the back channel after doing so.

MFC after:	2 weeks
2021-04-26 16:24:10 -07:00
Navdeep Parhar
43bbae1948 cxgbe(4): Separate the sw- and hw-specific parts of resource allocations
The driver uses both software resources (locks, callouts, memory for
descriptors and for bookkeeping, sysctls, etc.) and hardware resources
(VIs, DMA queues, TCAM entries, etc.) to operate the NIC.  This commit
splits the single *_ALLOCATED flag used to track all these resources
into separate *_SW_ALLOCATED and *_HW_ALLOCATED flags.

This is the simplified pseudocode that now applies to most queues (foo
can be ctrlq/txq/rxq/ofld_txq/ofld_rxq):

/* Idempotent */
alloc_foo
{
	if (!SW_ALLOCATED)
		init_iq/init_eq/init_fl		no-fail sw init
		alloc_iq_fl/alloc_eq/alloc_wrq	may-fail sw alloc
		add_foo_sysctls, etc.		no-fail post-alloc items
	if (!HW_ALLOCATED)
		alloc_iq_fl_hwq/alloc_eq_hwq	hw resource allocation
}

/* Idempotent */
free_foo
{
	if (!HW_ALLOCATED)
		free_iq_fl_hwq/free_eq_hwq	release hw resources
	if (!SW_ALLOCATED)
		free_iq_fl/free_eq/free_wrq	release sw resources
}

The routines that take the driver to FULL_INIT_DONE and VI_INIT_DONE and
back are now all idempotent.  The quiesce routines pay attention to the
HW_ALLOCATED flag and will not wait on the hardware for pidx/cidx
updates and other completions if this flag is not set.

MFC after:	1 month
Sponsored by:	Chelsio Communications
2021-04-26 14:09:59 -07:00
Michael Tuexen
c70d1ef15d sctp: improve handling of illegal packets containing INIT chunks
Stop further processing of a packet when detecting that it
contains an INIT chunk, which is too small or is not the only
chunk in the packet. Still allow to finish the processing
of chunks before the INIT chunk.

Thanks to Antoly Korniltsev and Taylor Brandstetter for reporting
an issue with the userland stack, which made me aware of this
issue.

MFC after:	3 days
2021-04-26 10:43:58 +02:00
Martin Matuska
9f1dc86c46 zfs: restore copyright disclaimer change from 4b84b4cca
The change will be pull-requested to upstream.

X-MFC-with:	4b84b4cca4
2021-04-26 22:16:50 +02:00
Mark Johnston
409ab7e109 imgact_elf: Ensure that the return value in parse_notes is initialized
parse_notes relies on the caller-supplied callback to initialize "res".
Two callbacks are used in practice, brandnote_cb and note_fctl_cb, and
the latter fails to initialize res.  Fix it.

In the worst case, the bug would cause the inner loop of check_note to
examine more program headers than necessary, and the note header usually
comes last anyway.

Reviewed by:	kib
Reported by:	KMSAN
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D29986
2021-04-26 14:53:16 -04:00
Warner Losh
099919b76d newbus: remove support for SINGLETON
Revert rest of de8dd262c4 since it's now unused.

jhibbits@ introduced this to give powerpc MMU functions IFUNC like
performance while retaining the kobj interface, speeding up operations
10-20%. Since there was only ever one instance of the mmu interface
active at any given time, we could cache the looked up results more
agressively.

powerpc migrated to using IFUNCs to get an even larger performance boost
in 45b69dd63e, deleting the two files it was added to in de8dd262c4.

However, there's few, if any, other potential applications of this to
the tree today. It's now unused and undocumented. Retire it to eliminate
this wart and to preclude the need to document it. Should a simmilar
case arise in the future, the code is in git...

Discusssed with:	jhibbits@
Reviewed by:		jhb@
Sponsored by:		Netflix
Differential Revision:	https://reviews.freebsd.org/D29997
2021-04-26 11:41:08 -06:00
Kevin Bowling
ba7b31b3e9 e1000: Fix register name in reg_dump sysctl
The correct name of this register is CTRL_EXT.

Approved by:	markj
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D29967
2021-04-26 09:30:54 -07:00
Kristof Provost
402dfb0a8d pf: Fix parsing of long table names
When parsing the nvlist for a struct pf_addr_wrap we unconditionally
tried to parse "ifname". This broke for PF_ADDR_TABLE when the table
name was longer than IFNAMSIZ. PF_TABLE_NAME_SIZE is longer than
IFNAMSIZ, so this is a valid configuration.

Only parse (or return) ifname or tblname for the corresponding
pf_addr_wrap type.

This manifested as a failure to set rules such as these, where the pfctl
optimiser generated an automatic table:

	pass in proto tcp to 192.168.0.1 port ssh
	pass in proto tcp to 192.168.0.2 port ssh
	pass in proto tcp to 192.168.0.3 port ssh
	pass in proto tcp to 192.168.0.4 port ssh
	pass in proto tcp to 192.168.0.5 port ssh
	pass in proto tcp to 192.168.0.6 port ssh
	pass in proto tcp to 192.168.0.7 port ssh

Reported by:	Florian Smeets
Tested by:	Florian Smeets
Reviewed by:	donner
X-MFC-With:	5c11c5a365
MFC after:	2 weeks
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D29962
2021-04-26 18:08:15 +02:00
Neel Chauhan
e657f3de6d linuxkpi: Remove unneeded {} in atomic_dec_and_lock_irqsave() 2021-04-26 08:25:33 -07:00
Neel Chauhan
c8de6e2015 linuxkpi: Elimiate brackets on return in spinlock.h 2021-04-26 08:16:48 -07:00
Neel Chauhan
ce65353ac1 linuxkpi: Implement atomic_dec_and_lock_irqsave()
This is needed by the drm-kmod 5.5 update.

Reviewed by:		hselasky, manu
MFC after:		3 days
Differential Revision:	https://reviews.freebsd.org/D29988
2021-04-26 08:15:49 -07:00
Neel Chauhan
057f145aae linuxkpi: Implement the wait_event_interruptible macro
This is needed by the drm-kmod 5.5 update and is similar in logic to the
existing wait_event_killable macro.

Reviewed by:		hselasky, manu
MFC after:		3 days
Differential Revision:	https://reviews.freebsd.org/D29987
2021-04-26 08:12:18 -07:00
Kristof Provost
5f5bf88949 pfsync: Expose PFSYNCF_OK flag to userspace
Add 'syncok' field to ifconfig's pfsync interface output. This allows
userspace to figure out when pfsync has completed the initial bulk
import.

Reviewed by:	donner
MFC after:	2 weeks
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D29948
2021-04-26 14:31:17 +02:00
Kristof Provost
6fcc8e042a pf: Allow multiple labels to be set on a rule
Allow up to 5 labels to be set on each rule.
This offers more flexibility in using labels. For example, it replaces
the customer 'schedule' keyword used by pfSense to terminate states
according to a schedule.

Reviewed by:	glebius
MFC after:	2 weeks
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D29936
2021-04-26 14:14:21 +02:00
Michael Tuexen
163153c2a0 sctp: small cleanup, no functional change
MFC:		3 days
2021-04-26 02:56:48 +02:00
Kevin Bowling
0f6bea61ed e1000: Improve device name strings
This is just clerical work to ease bug triage and may be used to set
expectations around the ability for anyone in the community to perform
testing and development on older parts (this driver covers over 20 years
of silicon)

Reviewed by:	erj
Approved by:	markj
Sponsored by:	Pink Floyd - Any Colour You Like (in kind)
Differential Revision:	https://reviews.freebsd.org/D29872
2021-04-25 22:08:54 -07:00
Patrick Kelsey
ca7005f189 iflib: Improve mapping of TX/RX queues to CPUs
iflib now supports mapping each (TX,RX) queue pair to the same CPU
(default), to separate CPUs, or to a pair of physical and logical CPUs
that share the same L2 cache.  The mapping mechanism supports unequal
numbers of TX and RX queues, with the excess queues always being
mapped to consecutive physical CPUs.  When the platform cannot
distinguish between physical and logical CPUs, all are treated as
physical CPUs.  See the comment on get_cpuid_for_queue() for the
entire matrix.

The following device-specific tunables influence the mapping process:
dev.<device>.<unit>.iflib.core_offset       (existing)
dev.<device>.<unit>.iflib.separate_txrx     (existing)
dev.<device>.<unit>.iflib.use_logical_cores (new)

The following new, read-only sysctls provide visibility of the mapping
results:
dev.<device>.<unit>.iflib.{t,r}xq<n>.cpu

When an iflib driver allocates TX softirqs without providing reference
RX IRQs, iflib now binds those TX softirqs to CPUs using the above
mapping mechanism (that is, treats them as if they were TX IRQs).
Previously, such bindings were left up to the grouptaskqueue code and
thus fell outside of the iflib CPU mapping strategy.

Reviewed by:	kbowling
Tested by:	olivier, pkelsey
MFC after:	3 weeks
Differential Revision:	https://reviews.freebsd.org/D24094
2021-04-26 01:06:34 -04:00
Martin Matuska
4b84b4cca4 zfs: fix non-functional mismerges from vendor/openzfs
- fix copyright in module/os/freebsd/spl/spl_acl.c
- fix mismerge in non-processed module/os/linux/zfs/zfs_uio.c

MFC after:      3 days
Obtained from:  OpenZFS
2021-04-26 03:05:13 +02:00
Rick Macklem
aad780464f nfscl: return delegations in the NFS VOP_RECLAIM()
After a vnode is recycled it can no longer be
acquired via vfs_hash_get() and, as such,
a delegation for the vnode cannot be recalled.

In the unlikely event that a delegation still
exists when the vnode is being recycled, return
the delegation since it will no longer be
recallable.

Until you have this patch in your NFSv4 client,
you should consider avoiding the use of delegations.

MFC after:	2 weeks
2021-04-25 17:57:55 -07:00
Rick Macklem
02695ea890 nfscl: fix delegation recall when the file is not open
Without this patch, if a NFSv4 server recalled a
delegation when the file is not open, the renew
thread would block in the NFS VOP_INACTIVE()
trying to acquire the client state lock that it
already holds.

This patch fixes the problem by delaying the
vrele() call until after the client state
lock is released.

This bug has been in the NFSv4 client for
a long time, but since it only affects
delegation when recalled due to another
client opening the file, it got missed
during previous testing.

Until you have this patch in your client,
you should avoid the use of delegations.

MFC after:	2 weeks
2021-04-25 12:55:00 -07:00
Alexander V. Chernikov
7d222ce3c1 Fix NOINET[6],!VIMAGE builds after FIB_ALGO addition to GENERIC
Reported by:	jbeich
PR:		255390
2021-04-21 05:53:42 +01:00
Edward Tomasz Napierala
5d1d844a77 kern_linkat: modify to accept AT_ flags instead of FOLLOW/NOFOLLOW
This makes this API match other kern_xxxat() functions.

Reviewed By:	kib
Sponsored By:	EPSRC
Differential Revision:	https://reviews.freebsd.org/D29776
2021-04-25 14:13:12 +01:00
Alexander V. Chernikov
67372fb3e0 Fix NOINET[6] build after enabling FIB_ALGO in GENERIC.
Submitted by:	jbeich
PR:		255389
2021-04-21 02:49:18 +01:00
Alexander V. Chernikov
c23385612d [fib algo] Do not print algo attach/detach message on boot
MFC after:	1 day
2021-04-25 08:58:06 +00:00
Alexander V. Chernikov
a81e2e7890 Make gcc happy by initializing error in rib_handle_ifaddr_info(). 2021-04-25 08:44:59 +00:00
Stefan Eßer
6409e59427 Fix build with gcc
Correctly declare function without arguments as f(void) instead of f().
2021-04-25 10:15:17 +02:00
Alexander V. Chernikov
6993187a8c Add FIB_ALGO to GENERIC on amd64/arm64.
Option `FIB_ALGO` gates new modular fib lookup functionality,
 enabling more performant routing table lookups and improving
 control plane convergence under the load.

Detailed feature description is available in D27401.

Reviewed By: olivier, gnn
Differential Revision: https://reviews.freebsd.org/D28434
2021-04-24 23:22:58 +00:00
Alexander V. Chernikov
5d1403a79a [rtsock] Enforce netmask/RTF_HOST consistency.
Traditionally we had 2 sources of information whether the
 added/delete route request targets network or a host route:
netmask (RTA_NETMASK) and RTF_HOST flag.

The former one is tricky: netmask can be empty or can explicitly
 specify the host netmask. Parsing netmask sockaddr requires per-family
 parsing and that's what rtsock code traditionally avoided. As a result,
 consistency was not enforced and it was possible to specify network with
 the RTF_HOST flag and vice versa.

Continue normalization efforts from D29826 and D29826 and ensure that
 RTF_HOST flag always reflects host/network data from netmask field.

Differential Revision: https://reviews.freebsd.org/D29958
MFC after:	2 days
2021-04-24 22:41:27 +00:00
Robert Watson
af14713d49 Support run-time configuration of the PIPE_MINDIRECT threshold.
PIPE_MINDIRECT determines at what (blocking) write size one-copy
optimizations are applied in pipe(2) I/O.  That threshold hasn't
been tuned since the 1990s when this code was originally
committed, and allowing run-time reconfiguration will make it
easier to assess whether contemporary microarchitectures would
prefer a different threshold.

(On our local RPi4 baords, the 8k default would ideally be at least
32k, but it's not clear how generalizable that observation is.)

MFC after:	3 weeks
Reviewers:	jrtc27, arichardson
Differential Revision: https://reviews.freebsd.org/D29819
2021-04-24 20:04:28 +01:00
Vladimir Kondratyev
e68d76c054 hkbd: Fix typo which disables keyboard input in kdb
Reported by:	Greg V
MFC after:	1 week
2021-04-24 22:01:14 +03:00
Edward Tomasz Napierala
77651151f3 linux: make ptrace(2) return EIO when trying to peek invalid address
Previously we've returned the error from native ptrace(2), ENOMEM.
This confused Linux strace(2).

Reviewed By:	emaste
Sponsored By:	EPSRC
Differential Revision:	https://reviews.freebsd.org/D29925
2021-04-24 11:37:50 +01:00
Hans Petter Selasky
a9b66dbd91 Allow the tcp_lro_flush_all() function to be called when the control
structure is zeroed, by setting the VNET after checking the mbuf count
for zero. It appears there are some cases with early interrupts on some
network devices which still trigger page-faults on accessing a NULL "ifp"
pointer before the TCP LRO control structure has been initialized.
This basically preserves the old behaviour, prior to
9ca874cf74 .

No functional change.

Reported by:	rscheff@
Differential Revision:	https://reviews.freebsd.org/D29564
MFC after:	2 weeks
Sponsored by:	Mellanox Technologies // NVIDIA Networking
2021-04-24 12:23:42 +02:00
Alexander Motin
b99419aee4 mpr/mps(4): Make device mapping some more robust.
Allow new enclosure to replace previously existing one if there is
no completely unused table entry, same as it is done for devices.

If we can not process DPM due to corruption -- wipe it and restart
from scratch.  Otherwise I don't see a way to recover persistence if
something go wrong and there is no BIOS to recover it for us.

Together this solves a problem that appeared when 9300-8i firmware
update to 16.00.10.00 somehow switched its mapping mode from Device
Persistence to Enclosure/Slot without wiping the DPM table.  It made
HBA completely unusable, since overflowed and conflicting mapping
table was unable to map any of enclosures and so devices.

Also while there make some enclosure mapping errors more informative.

MFC after:	1 month
Sponsored by:	iXsystems, Inc.
2021-04-23 23:36:51 -04:00