Commit Graph

138112 Commits

Author SHA1 Message Date
Hans Petter Selasky
8fc2a3c417 Factor out repeated code in the USB controller drivers to avoid bugs
computing the same isochronous start frame number over and over again.

PR:		257082
MFC after:	1 week
Sponsored by:	NVIDIA Networking
2021-07-10 20:59:00 +02:00
Hans Petter Selasky
3f5054862a Make sure the avr32dci_odevd structure is used.
This fixes a compilation error.

MFC after:	1 week
Sponsored by:	NVIDIA Networking
2021-07-10 19:57:52 +02:00
Hans Petter Selasky
d038463bd2 Make sure the XHCI driver obeys the isochronous scheduling threshold value
as given by the XHCI hardware parameters to avoid scheduling isochronous
transfers too early.

MFC after:	1 week
Sponsored by:	NVIDIA Networking
2021-07-10 19:57:52 +02:00
Hans Petter Selasky
e036ee6ce2 Let the xhci_hw_root structure span exactly XHCI_PAGE_SIZE bytes by increasing
the number of completion event TRBs. This avoids wasting memory.

MFC after:	1 week
Sponsored by:	NVIDIA Networking
2021-07-10 19:57:52 +02:00
David Chisnall
3a522ba1bc Pass the syscall number to capsicum permission-denied signals
The syscall number is stored in the same register as the syscall return
on amd64 (and possibly other architectures) and so it is impossible to
recover in the signal handler after the call has returned.  This small
tweak delivers it in the `si_value` field of the signal, which is
sufficient to catch capability violations and emulate them with a call
to a more-privileged process in the signal handler.

Approved by:	markj (mentor)

Reviewed by:	kib, bcr (manpages)

Differential Revision: https://reviews.freebsd.org/D29185
2021-07-10 17:19:52 +01:00
Martin Matuska
476ef25d32 zfs: update zfs_config.h to match current OpenZFS version (bdd11cbb9)
TBD: build with fetch(3) support for keylocation=http(s)://
2021-07-10 17:43:16 +02:00
Konstantin Belousov
fdc71fa112 amd64 pmap: unexpand the NBPDR macro definition
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2021-07-10 14:46:54 +03:00
Konstantin Belousov
71463a34ab amd64 mpboot.S: fix typo in comment
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2021-07-10 14:46:54 +03:00
Konstantin Belousov
63664df720 amd64 locore.S: add FF copyright for LA57 work
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2021-07-10 14:46:53 +03:00
Konstantin Belousov
9dc715230c amd64 locore.S: trim .globl list from symbols gone for long time
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2021-07-10 14:46:53 +03:00
Konstantin Belousov
55e63ed307 x86: use ANSI C definition style for trap_fatal
PR:	257062
Submitted by:	Vijay Sharma <vijaysh312@gmail.com>
MFC after:	1 week
2021-07-10 14:46:53 +03:00
Stefan Eßer
58080fbca0 libalias: fix divide by zero causing panic
The packet_limit can fall to 0, leading to a divide by zero abort in
the "packets % packet_limit".

An possible solution would be to apply a lower limit of 1 after the
calculation of packet_limit, but since any number modulo 1 gives 0,
the more efficient solution is to skip the modulo operation for
packet_limit <= 1.

Since this is a fix for a panic observed in stable/12, merging this
fix to stable/12 and stable/13 before expiry of the 3 day waiting
period might be justified, if it works for the reporter of the issue.

Reported by:	Karl Denninger <karl@denninger.net>
MFC after:	3 days
2021-07-10 13:08:18 +02:00
Alexander Motin
63ca9ea4f3 Use sleepq_signal(SLEEPQ_DROP) in cv_signal().
Same as wakeup_one()/wakeup_any() commit before it reduces the lock
hold time and so contention.

MFC after:	1 week
2021-07-09 20:57:58 -04:00
Mark Johnston
588c7a06df KASAN: Implement __asan_unregister_globals()
It will be called during KLD unload to unpoison the redzones following
global variables.  Otherwise, virtual address ranges previously used for
a KLD may be left tainted, triggering false positives when they are
recycled.

Reported by:	pho
Sponsored by:	The FreeBSD Foundation
2021-07-09 20:38:50 -04:00
Mark Johnston
b0dfc48684 uma: Fix a few problems with KASAN integration
- Ensure that all items returned by UMA are aligned to
  KASAN_SHADOW_SCALE (8).  This was true in practice since smaller
  alignments are not used by any consumers, but we should enforce it
  anyway.
- Use a non-zero code for marking redzones that appear naturally in
  items that are not a multiple of the scale factor in size.  Currently
  we do not modify keg layouts to force the creation of redzones.
- Use a non-zero code for marking freed per-CPU items, otherwise
  accesses of freed per-CPU items are not detected by the runtime.

Sponsored by:	The FreeBSD Foundation
2021-07-09 20:38:50 -04:00
Mark Johnston
36226163fa x86: Mark the trapframe as initialized in ipi_bitmap_handler()
Otherwise KASAN may generate false positives if the trapframe was
written into a poisoned region of the stack.

Reported by:	pho
Reported by:	syzbot+ee60455cd58e6eed20c9@syzkaller.appspotmail.com
Reported by:	syzbot+be5f9df26426ace3a00c@syzkaller.appspotmail.com
Sponsored by:	The FreeBSD Foundation
2021-07-09 20:38:50 -04:00
Mark Johnston
5d243d41b1 hwpmc: Disable KASAN in pmc_save_kernel_callchain()
As in commit 831850d8b0, this routine can trigger false positives, so
exclude it from instrumentation.

Reported by:	pho
Sponsored by:	The FreeBSD Foundation
2021-07-09 20:38:50 -04:00
Mark Johnston
f08f0ae524 amd64: Mark the trapframe as initialized in trap()
Otherwise KASAN may generate false positives if the trapframe was
written into a poisoned region of the stack.

Reported by:	pho
Sponsored by:	The FreeBSD Foundation
2021-07-09 20:38:50 -04:00
Michael Tuexen
105b68b42d sctp: Fix errno in case of association setup failures
Do not report always ETIMEDOUT, but only when appropriate. In
other cases report ECONNABORTED.

MFC after:	3 days
2021-07-09 23:19:25 +02:00
Vladimir Kondratyev
82626fef62 iichid(4): Perform bus_teardown_intr/bus_setup_intr to disable interrupts
during suspend/resume cycle. Previously used bus_generic_suspend_intr and
bus_generic_resume_intr may cause interrupt storm because of missed
interrupt acknowledges caused by blocking of intr handler.

Reported by:	J.R. Oldroyd <jr_AT_opal_DOT_com>
MFC after:	1 week
2021-07-09 22:32:59 +03:00
MIHIRA Sanpei Yoshiro
9e3761d126 arm: remove fslsdma from GENERIC
The fslsdma device requires sdma_fw, but that's not included in
GENERIC. That firmware is not in the FreeBSD tree at the moment, but
could easily be.

The license for the firmware can be found in the linux firmware repo:
https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/commit/?id=3123d78e09d2f815de4d94aa35c07b3c0469c80e
and looks to be a BSD license + no reverse engineer.

We can add this back after the firmware is imported, made a port, or
whose automatic loading can be made to happen.

Reviewed by:		imp (with ian finding the license)
PR:			237466
MFC after:		1 week
2021-07-09 11:21:40 -06:00
Warner Losh
72821668b0 stand/kmem_zalloc: panic when a M_WAITOK allocation fails
Malloc() might return NULL, in which case we will panic with a NULL
pointer deref. Make it panic when the allocation fails to preserve the
postcondtion that we never return a non-NULL value.

Reviewed by:		tsoome
PR:			249859
Sponsored by:		Netflix
Differential Revision:	https://reviews.freebsd.org/D31106
2021-07-09 11:21:18 -06:00
Andrew Turner
1472117a1e Support fixed size, variable location acpi resources
These have been found in some Arm ACPI tables generated by edk2, e.g.
when describing the pl011 uart on the Arm AEMv8 model.

Reviewed by:	imp, jkim
Sponsored by:	The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D31110
2021-07-09 01:31:19 +00:00
Michael Tuexen
ce64352a70 sctp: provide consistent stream information in case of early errors
While there, make sure the function is called correctly.

MFC after:	3 days
2021-07-09 14:16:59 +02:00
Michael Tuexen
84992a3251 sctp: provide sac_error also for ABORT chunk being sent
Thanks to Florent Castelli for bringing this issue up for the
userland stack and providing an initial patch.

MFC:		3 days
2021-07-09 13:46:27 +02:00
Kristof Provost
3fc12ae042 pf: bound DIOCGETSTATESV2 memory use
Rather than allocating however much memory userspace asks for we only
allocate enough for a handful of states, and copy to userspace for each
completed row.
We start out with enough space for 16 states (per row), but grow that as
required. In most configurations we expect at most a handful of states
per row (more than that would have other negative effects on packet
processing performance).

Reviewed by:	mjg
MFC after:	1 week
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D31111
2021-07-09 10:30:02 +02:00
Kristof Provost
c6bf20a2a4 pf: add DIOCGETSTATESV2
Add a new version of the DIOCGETSTATES call, which extends the struct to
include the original interface information.

MFC after:	1 week
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D31097
2021-07-09 10:29:53 +02:00
Rick Macklem
1e0a518d65 nfscl: Add a Linux compatible "nconnect" mount option
Linux has had an "nconnect" NFS mount option for some time.
It specifies that N (up to 16) TCP connections are to created for a mount,
instead of just one TCP connection.

A discussion on freebsd-net@ indicated that this could improve
client<-->server network bandwidth, if either the client or server
have one of the following:
- multiple network ports aggregated to-gether with lagg/lacp.
- a fast NIC that is using multiple queues
It does result in using more IP port#s and might increase server
peak load for a client.

One difference from the Linux implementation is that this implementation
uses the first TCP connection for all RPCs composed of small messages
and uses the additional TCP connections for RPCs that normally have
large messages (Read/Readdir/Write).  The Linux implementation spreads
all RPCs across all TCP connections in a round robin fashion, whereas
this implementation spreads Read/Readdir/Write across the additional
TCP connections in a round robin fashion.

Reviewed by:	markj
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D30970
2021-07-08 17:39:04 -07:00
Bjoern A. Zeeb
bf3a385217 fan53555: attach to build and switch from syr827
Rather than extending syr827 for syr828 (as initially done in D31103)
switch to the Fairchild Semiconductor Corporation fan53555 implementation
which is in-tree but was not attached to the build.  The fan53555
implementation also supports syr827/syr8278 already. [1]
Update NOTES and the arm64 GENERIC configuration for the switch.
syr827 for now stays in the tree but is not used by any
kernel configuration.

Suggested by:	mmel [1]
Reviewed by:	mmel, manu
Differential Revision: https://reviews.freebsd.org/D31112
2021-07-08 20:17:45 +00:00
Luiz Otavio O Souza
c5dd8bac0b dummynet: reduce console spam
Only print this warning when boot verbose is enabled.
This can get pretty annoying (and useless) in some systems.

Reviewed by:	kp
MFC after:	1 week
Sponsored by:	Rubicon Communications, LLC ("Netgate")
2021-07-08 20:25:05 +02:00
Kristof Provost
3464105282 pf: pf_killstates() never fails, so remove the return value
Suggested by:	mjg
MFC after:	1 week
Sponsored by:	Rubicon Communications, LLC ("Netgate")
2021-07-08 18:56:29 +02:00
Mateusz Guzik
19d6e29b87 pf: add pf_find_state_all_exists
Reviewed by:	kp
Sponsored by:	Rubicon Communications, LLC ("Netgate")
2021-07-08 14:00:55 +00:00
Andrew Turner
c0edde3021 Fix the name of the arm64 SCTLR_E0E register
The character between the E's was the letter O, however in the Arm
Documentation and XML the character is the number 0 (zero).

Sponsored by:	The FreeBSD Foundation
2021-07-07 23:18:04 +00:00
Randall Stewart
7312e4e5cf tcp: Fix 32 bit platform breakage
This fixes the incorrect use of a sysctl add to u64. It
was for a useconds time, but on 32 bit platforms its
not a u64. Instead use the long directive.

Reviewed by: tuexen
Sponsored by: Netflix Inc.
Differential Revision: https://reviews.freebsd.org/D31107
2021-07-08 08:16:45 -04:00
Kristof Provost
fa96701c8a pf: Handle errors returned by pf_killstates()
Happily this wasn't a real bug, because pf_killstates() never fails, but
we should check the return value anyway, in case it does ever start
returning errors.

Reported by:	clang --analyze
MFC after:	1 week
Sponsored by:	Rubicon Communications, LLC ("Netgate")
2021-07-08 10:32:42 +02:00
Kristof Provost
8cceacc0f1 pf: Remove unneeded NULL check
pidx is never NULL, and is used unconditionally later on in the
function.
Add an assertion, as documentation for the requirement to provide an idx
pointer.

Reported by:	clang --analyze
MFC after:	1 week
Sponsored by:	Rubicon Communications, LLC ("Netgate")
2021-07-08 10:32:33 +02:00
Kristof Provost
211cddf9e3 pf: rename pf_state to pf_kstate
Indicate that this is a kernel-only structure, and make it easier to
distinguish from others used to communicate with userspace.

Reviewed by:	mjg
MFC after:	1 week
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D31096
2021-07-08 10:31:43 +02:00
Michal Meloun
e88c3b1b02 intrng: remove now redundant shadow variable.
Should not be a functional change.

Submitted by: 	ehem_freebsd@m5p.com
Discussed in:	https://reviews.freebsd.org/D29310
MFC after:	4 weeks
2021-07-08 08:46:41 +02:00
Michal Meloun
a49f208d94 intrng: Releasing interrupt source should clear interrupt table full state.
The first release of an interrupt in a situation where the interrupt table
is full should schedule a full table check the next time an interrupt is
allocated. A full check is necessary to ensure maximum separation between
the order of allocation and the order of release.

Submitted by:	ehem_freebsd@m5p.com (initial version)
Discussed in:	https://reviews.freebsd.org/D29310
MFC after:	4 weeks
2021-07-08 08:16:46 +02:00
Martin Matuska
7cd22ac434 zfs: merge openzfs/zfs@bdd11cbb9 (master) into main
Notable upstream pull request merges:
  #12274 Optimize txg_kick() process
  #12281 Move gethrtime() calls out of vdev queue lock
  #12287 Remove refcount from spa_config_*(
  #12289 Compact dbuf/buf hashes and lock arrays
  #12290 Remove avl_size field from struct avl_tree
  #12294 Upstream: dmu_zfetch_stream_fini leaks refcount
  #12295 Fix abd leak, kmem_free correct size of abd_t
  #12328 FreeBSD: Hardcode abd_chunk_size to PAGE_SIZE

Obtained from:	OpenZFS
OpenZFS commit:	bdd11cbb90
2021-07-07 23:31:52 +02:00
Andrew Gallatin
0756bdf19c ktls: make ktls_disable_ifnet() shim static
A user reported that when compiling without KERN_TLS, and
with -O0, the kernel failed to link due to ktls_disable_ifnet()
being undefined.   Making the shim static works around this issue.

Reported by: Gary Jennejohn
Sponsored by: Netflix
2021-07-07 15:08:13 -04:00
Alan Cox
0add3c9945 arm64: Simplify fcmpset failure in pmap_promote_l2()
When the initial fcmpset in pmap_promote_l2() fails, there is no need
to repeat the check for the physical address being 2MB aligned or for
the accessed bit being set.  While the pmap is locked the hardware can
only transition the accessed bit from 0 to 1, and we have already
determined that it is 1 when the fcmpset fails.

MFC after:	1 week
2021-07-07 13:34:11 -05:00
Andrew Gallatin
b1e806c0ed tcp: fix alternate stack build with LINT-NO{INET,INET6,IP}
When fixing another bug, I noticed that the alternate
TCP stacks do not build when various combinations of
ipv4 and ipv6 are disabled.

Reviewed by:	rrs, tuexen
Differential Revision:	https://reviews.freebsd.org/D31094
Sponsored by: Netflix
2021-07-07 13:02:08 -04:00
Andrew Gallatin
4150a5a87e ktls: fix NOINET build
Reported by: mjguzik
Sponsored by: Netflix
2021-07-07 10:40:02 -04:00
Randall Stewart
d7955cc0ff tcp: HPTS performance enhancements
HPTS drives both rack and bbr, and yet there have been many complaints
about performance. This bit of work restructures hpts to help reduce CPU
overhead. It does this by now instead of relying on the timer/callout to
drive it instead use user return from a system call as well as lro flushes
to drive hpts. The timer becomes a backstop that dynamically adjusts
based on how "late" we are.

Reviewed by: tuexen, glebius
Sponsored by: Netflix Inc.
Differential Revision: https://reviews.freebsd.org/D31083
2021-07-07 07:22:35 -04:00
Konstantin Belousov
747a6b7ace cloudabi and linux ABIs: do not call umtx_thread_cleanup() from thr_exit syscall
These ABIs do not use umtx at all, so there is nothing to clean.
Cloudabi references to umtx keys do not require any cleanups anyway.

Requested by:	dchagin
Reviewed by:	dchagin, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D30987
2021-07-07 14:12:14 +03:00
Konstantin Belousov
28a66fc3da Do not call FreeBSD-ABI specific code for all ABIs
Use sysentvec hooks to only call umtx_thread_exit/umtx_exec, which handle
robust mutexes, for native FreeBSD ABI.  Similarly, there is no sense
in calling sigfastblock_clear() for non-native ABIs.

Requested by:	dchagin
Reviewed by:	dchagin, markj (previous version)
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D30987
2021-07-07 14:12:07 +03:00
Konstantin Belousov
55976ce11a Move sv_onexit() sysentvec hook slightly later
after itimers are stopped.  This makes it more usable for e.g. native FreeBSD
ABI sysentvecs.

Reviewed by:	dchagin, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D30987
2021-07-07 14:12:07 +03:00
Konstantin Belousov
71ab344524 Add sv_onexec_old() sysent hook for exec event
Unlike sv_onexec(), it is called from the old (pre-exec) sysentvec structure.
The old vmspace for the process is still intact during the call.

Reviewed by:	dchagin, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D30987
2021-07-07 14:12:07 +03:00
Mateusz Guzik
edcf1054d3 cxgb: use m_gethdr_raw
Sponsored by:	Rubicon Communications, LLC ("Netgate")
2021-07-07 11:05:46 +00:00
Mateusz Guzik
a56888534d iflib: use m_gethdr_raw
Reviewed by:	gallatin
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D31081
2021-07-07 11:05:46 +00:00
Mateusz Guzik
c2c34ee540 mbuf: add m_get_raw and m_gethdr_raw
The intent is to eliminate the MT_NOINIT flag and consequently a branch
from the constructor.

Reviewed by:	gallatin
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D31080
2021-07-07 11:05:46 +00:00
Mateusz Guzik
0a718a6e6e mbuf: replace all direct uma_zfree(zone_mbuf) calls with m_free_raw
Reviewed by:	donner
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D31082
2021-07-07 11:05:46 +00:00
Andrew Turner
a7b05eb16c Sync the arm64 special registers with the Armv8.5 XML
Add the missing macros and decode all the fields as described in the
Arm Architecture System Registers XML corresponding to Armv8.5.

Sponsored by:	The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D30983
2021-07-06 20:46:55 +00:00
Edward Tomasz Napierala
6f147a0734 cam: enable kern.cam.ada.enable_uma_ccbs by default
This makes the ada(4) driver use UMA for its CCBs.  While it's
da(4) counterpart needs some more testing, this one seems to be
safe now.

Please let me know via email if you notice any suspicious kernel
messages,

Reviewed By:	imp
Sponsored by:	NetApp, Inc.
Sponsored by:	Klara, Inc.
Differential Revision:	https://reviews.freebsd.org/D30567
2021-07-07 09:40:34 +01:00
Bjoern A. Zeeb
da2f833f7a MMCCAM: fix a panic after cam_sim_alloc_dev() removal in sdhci.c
During the removal of cam_sim_alloc_dev() in
aeb04e88f5 for sdhci.c and the
follow-up build-fix in a72af82e31
slot->dev and slot->bus got mixed up for MMCCAM;  slot->dev is
only used in the !MMCCAM case so is uninitialised here leading to
a panic;  switch back to slot->bus to return to the status quo.

Reviewed by:	imp (ack on arm@)
X-Differential Revision:	https://reviews.freebsd.org/D30857
2021-07-07 00:37:45 +00:00
Randall Stewart
e834f9a44a tcp: Address goodput and TLP edge cases.
There are several cases where we make a goodput measurement and we are running
out of data when we decide to make the measurement. In reality we should not make
such a measurement if there is no chance we can have "enough" data. There is also
some corner case TLP's that end up not registering as a TLP like they should, we
fix this by pushing the doing_tlp setup to the actual timeout that knows it did
a TLP. This makes it so we always have the appropriate flag on the sendmap
indicating a TLP being done as well as count correctly so we make no more
that two TLP's.

In addressing the goodput lets also add a "quality" metric that can be viewed via
blackbox logs so that a casual observer does not have to figure out how good
of a measurement it is. This is needed due to the fact that we may still make
a measurement that is of a poorer quality as we run out of data but still have
a minimal amount of data to make a measurement.

Reviewed by: tuexen
Sponsored by: Netflix Inc.
Differential Revision: https://reviews.freebsd.org/D31076
2021-07-06 15:26:37 -04:00
Mateusz Guzik
2a69eb8c87 cxgb: switch bare zone_mbuf use to m_free_raw
Sponsored by:	Rubicon Communications, LLC ("Netgate")
2021-07-06 19:05:11 +00:00
Alexander Motin
d0732fa819 Add ocs_gendump.c to the build, missed in 29e2dbd42c. 2021-07-06 15:03:06 -04:00
Ram Kishore Vegesna
29e2dbd42c ocs_fc: Add gendump and dump_to_host ioctl command support.
Support to generate firmware dump.

Approved by: mav(mentor)
2021-07-06 21:08:11 +05:30
Andrew Gallatin
28d0a740dd ktls: auto-disable ifnet (inline hw) kTLS
Ifnet (inline) hw kTLS NICs typically keep state within
a TLS record, so that when transmitting in-order,
they can continue encryption on each segment sent without
DMA'ing extra state from the host.

This breaks down when transmits are out of order (eg,
TCP retransmits).  In this case, the NIC must re-DMA
the entire TLS record up to and including the segment
being retransmitted.  This means that when re-transmitting
the last 1448 byte segment of a TLS record, the NIC will
have to re-DMA the entire 16KB TLS record. This can lead
to the NIC running out of PCIe bus bandwidth well before
it saturates the network link if a lot of TCP connections have
a high retransmoit rate.

This change introduces a new sysctl (kern.ipc.tls.ifnet_max_rexmit_pct),
where TCP connections with higher retransmit rate will be
switched to SW kTLS so as to conserve PCIe bandwidth.

Reviewed by:	hselasky, markj, rrs
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D30908
2021-07-06 10:28:32 -04:00
Edward Tomasz Napierala
a081a943a0 cam: drop unused 'saved_ccb' field from softcs
No functional changes.  Do not MFC this, it changes kernel ABI.

Sponsored by:	NetApp, Inc.
Sponsored by:	Klara, Inc.
Differential Revision:	https://reviews.freebsd.org/D30698
2021-07-06 10:04:38 +01:00
Edward Tomasz Napierala
13aa56fcd5 cam(4): preserve alloc_flags when copying CCBs
Before UMA CCBs, all CCBs were of the same size, and could
be trivially copied using bcopy(9).  Now we have to preserve
alloc_flags, otherwise we might end up attempting to free
stack-allocated CCB to UMA; we also need to take CCB size
into account.

This fixes kernel panic which would occur when trying to access
a stopped (as in, SCSI START STOP, also "ctladm stop") SCSI device.

Reported By:	Gary Jennejohn <gljennjohn@gmail.com>
Tested By:	Gary Jennejohn <gljennjohn@gmail.com>
Reviewed By:	imp
Sponsored by:	NetApp, Inc.
Sponsored by:	Klara, Inc.
Differential Revision:	https://reviews.freebsd.org/D31054
2021-07-06 09:27:22 +01:00
Wojciech Macek
382376f398 enetc: Add support for 2.5G fixed-link speed
With the v5.13 device-tree update speed of the CPU switch port was
changed to 2.5G. Reflect that in the driver.

Submitted by: Kornel Duleba <mindal@semihalf.com>
Obtained from: Semihalf
Sponsored by: Alstom Group
2021-07-06 09:01:30 +02:00
Alexander Motin
e3bcd07d83 nvme(4): Report NPWA before NPWG as stripesize.
New Samsung 980 SSDs report Namespace Preferred Write Alignment of
8 (4KB) and Namespace Preferred Write Granularity of 32 (16KB).
My quick tests show that 16KB is a minimal sequential write size
when the SSD reaches peak IOPS, so writing much less is very slow.
But writing slightly less or slightly more does not change much,
so it seems not so much a size granularity as minimum I/O size.

Thinking about different stripesize consumers:
 - Partition alignment should be based on NPWA by definition.
 - ZFS ashift in part of forcing alignment of all I/Os should also
be based on NPWA.  In part of forcing size granularity, if really
needed, it may be set to NPWG, but too big value can make ZFS too
space-inefficient, and the 16KB is actually the biggest supported
value there now.
 - ZFS recordsize/volblocksize could potentially be tuned up toward
NPWG to work as I/O size granularity, but enabled compression makes
it too fuzzy.  And those are normally user-configurable things.
 - ZFS I/O aggregation code could definitely use Optimal Write Size
value and may be NPWG, but we don't have fields in GEOM now to report
the minimal and optimal I/O sizes, and even maximal is not reported
outside GEOM DISK to be used by ZFS.

MFC after:	1 week
2021-07-05 23:13:15 -04:00
Alan Cox
e41fde3ed7 On a failed fcmpset don't pointlessly repeat tests
In a few places, on a failed compare-and-set, both the amd64 pmap and
the arm64 pmap repeat tests on bits that won't change state while the
pmap is locked.  Eliminate some of these unnecessary tests.

Reviewed by:	andrew, kib, markj
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D31014
2021-07-05 21:07:40 -05:00
Jessica Clarke
348c41d181 riscv: Implement non-stub __vdso_gettc and __vdso_gettimekeep
PR:	256905
Reviewed by:	arichardson, mhorne
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D30963
2021-07-05 16:16:53 +01:00
Jessica Clarke
af433832f7 geom_label: Remove an old sysinstall(8) workaround
We removed sysinstall(8) back in 2011, so this workaround should be long
since unnecessary. This workaround can end up breaking cases that are
hit in the real world, such as dd'ing a small pre-built disk image to a
large partition that you intend to grow on first boot and uses a UFS
disk label for / in its /etc/fstab (as the only reliable thing a raw UFS
image can reference).

Reviewed by:	imp, mckusick
Differential Revision:	https://reviews.freebsd.org/D30825
2021-07-05 16:15:32 +01:00
Jessica Clarke
55c57a7811 rman: Remove an outdated comment that no longer applies
Since commit 2dd1bdf183 in 2016 the r_start and r_end fields have been
rman_res_t, which was briefly unsigned long, but commit da1b038af9
changed the typedef to be uintmax_t instead. C99 is also something we
assume these days.

Reviewed by:	imp
Differential Revision:	https://reviews.freebsd.org/D30808
2021-07-05 16:15:03 +01:00
Mateusz Guzik
f649cff587 pf: padalign global locks found in pf.c
Reviewed by:	kp
Sponsored by:	Rubicon Communications, LLC ("Netgate")
2021-07-05 09:56:54 +00:00
Emmanuel Vadot
48687f733f armv7: allwinner: Add aw_r_intc driver
This is also needed after the 5.13 dts update.

Sponsored by:	Diablotin Systems
Reported by:	Mark Millard <marklmi@yahoo.com>
2021-07-05 11:38:23 +02:00
Mateusz Guzik
dc1ab04e4c pf: allow table stats clearing and reading with ruleset rlock
Instead serialize against these operations with a dedicated lock.

Prior to the change, When pushing 17 mln pps of traffic, calling
DIOCRGETTSTATS in a loop would restrict throughput to about 7 mln.  With
the change there is no slowdown.

Reviewed by:	kp (previous version)
Sponsored by:	Rubicon Communications, LLC ("Netgate")
2021-07-05 10:42:01 +02:00
Mateusz Guzik
f92c21a28c pf: depessimize table handling
Creating tables and zeroing their counters induces excessive IPIs (14
per table), which in turns kills single- and multi-threaded performance.

Work around the problem by extending per-CPU counters with a general
counter populated on "zeroing" requests -- it stores the currently found
sum. Then requests to report the current value are the sum of per-CPU
counters subtracted by the saved value.

Sample timings when loading a config with 100k tables on a 104-way box:

stock:

pfctl -f tables100000.conf  0.39s user 69.37s system 99% cpu 1:09.76 total
pfctl -f tables100000.conf  0.40s user 68.14s system 99% cpu 1:08.54 total

patched:

pfctl -f tables100000.conf  0.35s user 6.41s system 99% cpu 6.771 total
pfctl -f tables100000.conf  0.48s user 6.47s system 99% cpu 6.949 total

Reviewed by:	kp (previous version)
Sponsored by:	Rubicon Communications, LLC ("Netgate")
2021-07-05 10:42:01 +02:00
Vladimir Kondratyev
5fa1eb1cd9 Bump __FreeBSD_version to 1400025 for LinuxKPI change. 2021-07-05 03:22:19 +03:00
Vladimir Kondratyev
8b33cb8303 LinuxKPI: Implement sequence counters and sequential locks
as a thin wrapper around native version found in sys/seqc.h.
This replaces out-of-base GPLv2-licensed code used by drm-kmod.

Reviewed by:	hselasky
Differential revision:	https://reviews.freebsd.org/D31006
2021-07-05 03:20:55 +03:00
Vladimir Kondratyev
019391bf85 LinuxKPI: Implement strscpy
strscpy copies the src string, or as much of it as fits, into the dst
buffer.  The dst buffer is always NUL terminated, unless it's zero-sized.
strscpy returns the number of characters copied (not including the
trailing NUL) or -E2BIG if len is 0 or src was truncated.

Currently drm-kmod replaces strscpy with strncpy that is not quite
correct as strncpy does not NUL-terminate truncated strings and returns
different values on exit.

Reviewed by:	hselasky, imp, manu
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D31005
2021-07-05 03:20:42 +03:00
Vladimir Kondratyev
98a6984a9e LinuxKPI: Use macro for implementation of some dma_map_* functions
This allows to remove unimplemented attrs parameter which type differs
between Linux kernel versions and to compile both drm-kmod and ofed
callers unmodified.
Also convert it to 'unsigned long' type to match modern Linuxes.

Reviewed by:	hselasky
Differential revision:	https://reviews.freebsd.org/D30932
2021-07-05 03:20:23 +03:00
Vladimir Kondratyev
864b11007a LinuxKPI: Implement irq_work_sync() routine.
irq_work_sync() performs draining of irq_work task.
Required by drm-kmod.

Reviewed by:	hselasky
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D30818
2021-07-05 03:20:06 +03:00
Vladimir Kondratyev
1ab61a1932 LinuxKPI: Do not wait for a grace period in rcu_barrier()
Linux docs explicitly state that this is not required [1]:

"Important note: The rcu_barrier() function is not, repeat, not,
obligated to wait for a grace period.  It is instead only required to
wait for RCU callbacks that have already been posted.  Therefore, if
there are no RCU callbacks posted anywhere in the system, rcu_barrier()
is within its rights to return immediately.  Even if there are
callbacks posted, rcu_barrier() does not necessarily need to wait for
a grace period."

[1] https://www.kernel.org/doc/Documentation/RCU/Design/Requirements/Requirements.html

Reviewed by:	emaste, hselasky, manu
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D30809
2021-07-05 03:19:50 +03:00
Vladimir Kondratyev
c0862b2b1f LinuxKPI: Add compiler barriers to list_for_each_entry_lockless macro
so this list-traversal primitive may safely run concurrently with the
_rcu list-mutation primitives such as list_add_rcu() as long as the
traversal is guarded by rcu_read_lock().

Do it by reusing the "list_for_each_entry_rcu" macro which does the same.
On Linux it implements some additional lockdep stuff which we skip.

Also move the macro to linux/rculist.h where it resides on Linux.

Reviewed by:	hselasky
Differential revision:	https://reviews.freebsd.org/D30795
2021-07-05 03:19:35 +03:00
Vladimir Kondratyev
c77ec79b57 LinuxKPI: Change flags parameter type of atomic_dec_and_lock_irqsave
On Linux atomic_dec_and_lock_irqsave is a wrapper macro which provides
a reference to third parameter rather than parameter value itself to
implementation routine called _atomic_dec_and_lock_irqsave [1].

While here, implement a fast path.

[1] https://github.com/torvalds/linux/blob/master/include/linux/spinlock.h#L476

Reviewed by:	hselasky
Differential revision:	https://reviews.freebsd.org/D30781
2021-07-05 03:19:01 +03:00
Vladimir Kondratyev
78a02d8b33 LinuxKPI: Add #defines required by drm-kmod v5.5
Reviewed by:	hselasky, manu
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D30767
2021-07-05 03:18:47 +03:00
Vladimir Kondratyev
a2b83b59db LinuxKPI: Allow kmem_cache_free() to be called from critical sections
as it is required by i915kms driver from Linux kernel v 5.5.
This is done with asynchronous freeing of requested memory areas from
taskqueue thread. As memory to be freed is reused to store linked list
entry, backing UMA zone item size is rounded up to pointer size.

While here, make struct linux_kmem_cache private to LKPI to reduce amount
of BSD headers included by linux/slab.h and switch RCU code to usage of
LKPI's linux_irq_work_tq taskqueue to avoid injection of current into
system-wide taskqueue_fast thread context.

Submitted by:	nc (initial version for drm-kmod)
Reviewed by:	manu, nc
Differential revision:	https://reviews.freebsd.org/D30760
2021-07-05 03:18:14 +03:00
Lutz Donnerhacke
4060e77f49 libalias: Remove a stray directive
Removal of a preprocessor line was missed during development.
Do it now and MFC it together with the other patches.

MFC after:	2 days
2021-07-04 17:54:45 +02:00
Lutz Donnerhacke
2f4d91f9cb libalias: Rewrite HISTORY
Fix the history entry (wrong year) and add the missing recent work.
MFC together with the other patches.

MFC after:	2 days
2021-07-04 17:46:47 +02:00
Lutz Donnerhacke
f284553444 libalias: Fix API bug on initialization
The kernel part of ipfw(8) does initialize LibAlias uncondistionally
with an zeroized port range (allowed ports from 0 to 0).  During
restucturing of libalias, port ranges are used everytime and are
therefor initialized with different values than zero.  The secondary
initialization from ipfw (and probably others) overrides the new
default values and leave the instance in an unfunctional state.  The
obvious solution is to detect such reinitializations and use the new
default value instead.

MFC after:	3 days
2021-07-03 23:03:07 +02:00
Pavel Balaev
d12d651f86 EFI RT: resurrect EFIIOC_GET_TABLE
Make it work, but change the interface to be safe for non-root users. In
particular, right now interface only works for the tables which can be
minimally parsed by kernel to determine the table size. Then, userspace can
query the table size, after that it provides a buffer of needed size
and kernel copies out just table to userspace.

Main advantage is that user no longer need to be able to read /dev/mem,
the disadvantage is the need to have minimal parsers aware of the table
types.  Right now the parsers are implemented for ESRT and PROP tables.

Future extension of the present interface might be a return of only
the table physical address, in case kernel does not have suitable
parser yet. Then, a privileged user could read the table from /dev/mem.
This extension, which logically equivalent to the old (non-worked)
EFIIOC_GET_TABLE variant, is not implemented until needed.

Submitted by:	Pavel Balaev <pavel.balaev@3mdeb.com>
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D30104
2021-07-03 20:06:48 +03:00
Edward Tomasz Napierala
2f514e6f13 linux(4): implement PR_SET_NO_NEW_PRIVS
This makes prctl(2) support PR_SET_NO_NEW_PRIVS, by mapping it
to the native PROC_NO_NEW_PRIVS_CTL procctl(2).

Sponsored By:	EPSRC
Differential Revision:	https://reviews.freebsd.org/D30973
2021-07-03 08:42:37 +01:00
Edward Tomasz Napierala
45d99014ca linux(4): implement coredumps on arm64
Previously they only worked on amd64.

Sponsored By:	EPSRC
Differential Revision:	https://reviews.freebsd.org/D30975
2021-07-03 08:06:31 +01:00
K Staring
ef790cc740 hdaa: update pin patch configurations
A number of structural changes:
  - Use decimal nid numbers instead of hex
  - updated the branch to incoorporate the suggestions made in the
    ALC280 pull request github thread
  - Convert magic pin values into strings.
  - Also update hdaa_patches to use clearer enums..
  - made pin patch type enum clearer, add macro for 'string' type
    patches
  - Added pin_patch structures to separate data from logic.
  - Integrated Realtek patches into new structure.

These incorporate fixes for ALC255, ALC256, ALC260, ALC262, ALC268,
ALC269, ALC280, ALC282, ALC283, ALC286, ALC290, ALC293, ALC296, ALC2880

And have definitions for a number of Dell and HP laptops.

Much of this data has been mined fromt he tables in the Linux driver.

imp squashed these into one commit because the changes from the github
pull requests no longer cleanly apply individually and made light style
changes after feedback from jhb.

Pull Request:		https://github.com/freebsd/freebsd-src/pull/139
Pull Request:		https://github.com/freebsd/freebsd-src/pull/140
Pull Request:		https://github.com/freebsd/freebsd-src/pull/141
Pull Request:		https://github.com/freebsd/freebsd-src/pull/142
Pull Request:		https://github.com/freebsd/freebsd-src/pull/143
Pull Request:		https://github.com/freebsd/freebsd-src/pull/144
Pull Request:		https://github.com/freebsd/freebsd-src/pull/145
Pull Request:		https://github.com/freebsd/freebsd-src/pull/146
Pull Request:		https://github.com/freebsd/freebsd-src/pull/147
Pull Request:		https://github.com/freebsd/freebsd-src/pull/148
Pull Request:		https://github.com/freebsd/freebsd-src/pull/149
Pull Request:		https://github.com/freebsd/freebsd-src/pull/150
Differential Revision:	https://reviews.freebsd.org/D30619
2021-07-03 00:15:49 -06:00
Lutz Donnerhacke
b50a4dce18 libalias: Avoid uninitialized expiration
The expiration time of direct address mappings is explicitly
uninitialized.  Expire times are always compared during housekeeping.
Despite the uninitialized value does not harm, it's simpler to just
set it to a reasonable default.  This was detected during valgrinding
the test suite.

MFC after:	3 days
2021-07-03 01:09:18 +02:00
Lutz Donnerhacke
25392fac94 libalias: Fix splay comparsion bug
Comparing elements in a tree requires transitiviy.  If a < b and b < c
then a must be smaller than c.  This way the tree elements are always
pairwise comparable.

Tristate comparsion functions returning values lower, equal, or
greater than zero, are usually implemented by a simple subtraction of
the operands.  If the size of the operands are equal to the size of
the result, integer modular arithmetics kick in and violates the
transitivity.

Example:
Working on byte with 0, 120, and 240. Now computing the differences:
  120 -   0 = 120
  240 - 120 = 120
  240 -   0 = -16

MFC after:	3 days
2021-07-03 00:31:53 +02:00
Warner Losh
aa0ab681ae nvme: coherently read status of completion records
Coherently read the phase bit of the status completion record. We loop
over the completion record array, looking for all the transactions in
the same phase that have been completed. In doing that, we have to be
careful to read the status field first, and if it indicates a complete
record, we need to read and process that record. Otherwise, the host
might be overtaken by device when reading this completion record,
leading to a mistaken belief that the record is in phase. This leads to
the code using old values and looking at an already completed entry, which
has no current tracker.

To work around this problem, we read the status and make sure it is in
phase, we then re-read the entire completion record guaranteeing it's
complete, valid, and consistent . In addition we resync the dmatag to
reflect changes since the prior loop for the bouncing dma case.

Reviewed by:		jrtc27@, chuck@
Found by:		jrtc27 (this fix is based in part on her D30995 fix)
Sponsored by:		Netflix
Differential Revision:	https://reviews.freebsd.org/D31002
2021-07-02 16:05:19 -06:00
Warner Losh
fea3cf1d6d nvme: Fix alignment on nvme structures
Remove __packed from nvme_command, nvme_completion and
nvme_dsm_trim. Add super-alignment to nvme_completion since it's always
at least that aligned in hardware (and in our existing uses of it
embedded in structures). It generates better code in
nvme_qpair_process_completions on riscv64 because otherwise the ABI
assumes a 4-byte alignment, and the same on all other platforms.

Reviewed by:		jrtc27@, mav@, chuck@
Sponsored by:		Netflix
Differential Revision:	https://reviews.freebsd.org/D31001
2021-07-02 16:05:19 -06:00
Warner Losh
80a75155e1 nvme: style nit
Put the { on the same line as the struct nvme_foo when we define these
structures. It's FreeBSD standard and these were inconsistent.

Sponsored by:		Netflix
2021-07-02 16:05:19 -06:00
Kristof Provost
a19ff8ce9b pf: getstates: avoid taking the hashrow lock if the row is empty
Reviewed by:	mjg
MFC after:	1 week
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D30946
2021-07-02 14:47:54 +02:00
Kristof Provost
34285eefdd pf: Reduce the data returned in DIOCGETSTATESNV
This call is particularly slow due to the large amount of data it
returns. Remove all fields pfctl does not use. There is no functional
impact to pfctl, but it somewhat speeds up the call.

It might affect other (i.e. non-FreeBSD) code that uses the new
interface, but this call is very new, so there's unlikely to be any. No
releases contained the previous version, so we choose to live with the
ABI modification.

Reviewed by:	donner
MFC after:	1 week
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D30944
2021-07-02 14:47:23 +02:00
Mateusz Guzik
48d5b86364 pf: make DIOCGETSTATESNV iterations killable
Reviewed by:	kp
Sponsored by:	Rubicon Communications, LLC ("Netgate")
2021-07-02 08:30:22 +00:00
Mateusz Guzik
904a08f342 ktls: switch bare zone_mbuf use to m_free_raw
Reviewed by:	gallatin
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D30955
2021-07-02 08:30:22 +00:00
Mateusz Guzik
bad5f0b6c2 iflib: switch bare zone_mbuf use to m_free_raw
Reviewed by:	kbowling
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D30961
2021-07-02 08:30:22 +00:00
Mateusz Guzik
05462babd4 mbuf: add m_free_raw to be used instead of directly calling uma_zfree
The intent is to remove all direct zone_mbuf consumers so that ctor/dtor
from that zone can be reimplemented as wrappers around uma, avoiding an
indirect function call.

Reviewed by:	kbowling
Discussed with:	gallatin
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D30959
2021-07-02 08:30:22 +00:00
Mateusz Guzik
fb32c8dbeb iflib: retire MB_DTOR_SKIP
The flag was added in 2016 but remains unused.

Reviewed by:	kbowling
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D30958
2021-07-02 08:30:22 +00:00
Ganbold Tsagaankhuu
3416513a41 dtb: rockchip: Add NanoPI-R4S and RockPI E to the build 2021-07-02 15:20:25 +08:00
Alexander Motin
fa3d57c256 mrsas(4): Report more correct maximum I/O size.
Subtract one SGE for the case of misaligned address.  Also take into
account maximum number of sectors reported by firmware, that gives
nicer 256KB limit instead of 276KB calculated from the SGE limit.

While there, remove number of I/O size checks, duplicating what is
already checked by CAM and busdma(9).

MFC after:	1 month
Sponsored by:	iXsystems, Inc.
2021-07-01 15:37:01 -04:00
Kristof Provost
8f76eebce4 dummynet: fix sysctls
The sysctl nodes which use V_dn_cfg must be marked as CTLFLAG_VNET so
that we use the correct per-vnet offset

PR:		256819
Reviewed by:	donner
MFC after:	1 week
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D30974
2021-07-01 21:34:08 +02:00
Kornel Duleba
9428765626 ofw_pci: fix probing for non-DT cases
phandle_t is a uint32_t type, <= 0 comparison doesn't work with it as intended.
This caused the ofw_pci code to attach to PCI bus on ACPI based systems.
Since 3eae4e106a ("Fix error value returned by ofw_bus_gen_get_node().")
ofw subsystem can only return -1 for invalid nodes. Use that.

MFC after: 4 weeks
Reviewed by: mw
Differential revision: https://reviews.freebsd.org/D30953
2021-07-01 20:35:23 +02:00
Kornel Duleba
ddb928096b dts: fsl-ls1028a: Correct ECAM PCIE window ranges
Currently all PCIE windows point to bus address 0x0, which does not match
the values obtained from hardware during EA.
Replace those values with CPU addresses, since in reality we
have a 1:1 mapping between the two.

This patch is queued for Linux v5.14 in linux-next tree:
6bee93d93111 "arm64: dts: fsl-ls1028a: Correct ECAM PCIE window ranges"
2021-07-01 20:23:40 +02:00
Ferhat Gecdogan
8df71ea1aa tegra_pcie: use switch instead of if in tegra_pcib_pex_ctrl
Simplify obtaining per-port data in tegra_pcib_pex_ctrl() routine.

Reviewed by:    imp, mw
Pull Request:   https://github.com/freebsd/freebsd-src/pull/481
2021-07-01 20:09:46 +02:00
Emmanuel Vadot
2ca21223c5 dts: Bump the freebsd branding version to 5.13
Sponsored by:	Diablotin Systems
2021-07-01 18:48:56 +02:00
Emmanuel Vadot
993e8236c3 arm64: allwinner: Add r_intc driver
The r intc interrupt controller seems to do a lot of things :
- It can handle the NMI interrupt
- It have local interrupts for some device that also can be muxed with GIC
- It can serve as an forwarder for the GIC

It's mostly used for deepsleep/wakeup if I understood correctly and we do not
support this on arm64.

For now just forward everything to the GIC so interrupts works again for device
which now have this interrupts controller set since dts v5.12

Sponsored by:	Diablotin Systems
2021-07-01 18:46:38 +02:00
Emmanuel Vadot
2eb4d8dc72 Import device-tree files from Linux 5.13
Sponsored by:	Diablotin Systems
2021-07-01 17:51:01 +02:00
Emmanuel Vadot
82ea1a07b4 Import device-tree files from Linux 5.12
Sponsored by:	Diablotin Systems
2021-07-01 17:41:57 +02:00
Emmanuel Vadot
5def4c47d4 Import device-tree files from Linux 5.11
Sponsored by:	Diablotin Systems
2021-07-01 17:21:47 +02:00
Andrew Turner
244002b482 Switch the order of the ID_AA64PFR1_EL1 fields
This makes them consistent with the fields in other registers.

Sponsored by:	The FreeBSD Foundation
2021-07-01 00:48:56 +00:00
Edward Tomasz Napierala
db8d680ebe procctl(2): add PROC_NO_NEW_PRIVS_CTL, PROC_NO_NEW_PRIVS_STATUS
This introduces a new, per-process flag, "NO_NEW_PRIVS", which
is inherited, preserved on exec, and cannot be cleared.  The flag,
when set, makes subsequent execs ignore any SUID and SGID bits,
instead executing those binaries as if they not set.

The main purpose of the flag is implementation of Linux
PROC_SET_NO_NEW_PRIVS prctl(2), and possibly also unpriviledged
chroot.

Reviewed By:	kib
Sponsored By:	EPSRC
Differential Revision:	https://reviews.freebsd.org/D30939
2021-07-01 09:42:07 +01:00
Edward Tomasz Napierala
93c3453f11 linux(4): revert arm64 part of 447636e43c
The arm64 part of the patch was incomplete and prevented
linux64.ko from loading due to missing symbol.

Sponsored By:	EPSRC
2021-07-01 08:29:23 +00:00
Rick Macklem
c5f4772c66 nfscl: Improve "Consider increasing kern.ipc.maxsockbuf" message
When the setting of kern.ipc.maxsockbuf is less than what is
desired for I/O based on vfs.maxbcachebuf and vfs.nfs.bufpackets,
a console message of "Consider increasing kern.ipc.maxsockbuf".
is printed.

This patch modifies the message to provide a suggested value
for kern.ipc.maxsockbuf.
Note that the setting is only needed when the NFS rsize/wsize
is set to vfs.maxbcachebuf.

While here, make nfs_bufpackets global, so that it can be used
by a future patch that adds a sysctl to set the NFS server's
maximum I/O size.  Also, remove "sizeof(u_int32_t)" from the maximum
packet length, since NFS_MAXXDR is already an "overestimate"
of the actual length.

MFC after:	2 weeks
2021-06-30 15:15:41 -07:00
Edward Tomasz Napierala
447636e43c linux(4): implement coredump support
Implement dumping core for Linux binaries on amd64, for both
32- and 64-bit executables.  Some bits are still missing.

This is based on a prototype by chuck@.

Reviewed By:	kib
Sponsored By:	EPSRC
Differential Revision:	https://reviews.freebsd.org/D30019
2021-06-30 22:45:06 +01:00
Mitchell Horne
13f5a3076b hwpmc_arm64: add a PMCDBG to the interrupt handler
MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
2021-06-30 18:21:23 -03:00
Olivier Houchard
8c3bd133dd arm: Make sure we can handle a thumb entry point.
Similarly to what's been done on arm64 with commit
712c060c94, when executing a binary, if the
entry point is a thumb symbol, then make sure we set the PSL_T flag, otherwise
the CPU will interpret it in ARM mode, and that will likely leads to an
undefined instruction.

PR:	256899
MFC after: 	1 week
2021-06-30 22:56:50 +02:00
Mitchell Horne
8cc3815f02 hwpmc_arm64: accept raw event codes for PMC_OP_PMCALLOCATE
Make it possible to specify event codes without an offset of
PMC_EV_ARMV8_FIRST, by setting a machine-dependent flag. This is
required to make use of event definitions from pmu-events.

Reviewed by:	ray (slightly earlier version)
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D30602
2021-06-30 16:47:09 -03:00
Mitchell Horne
5867cccdc4 hwpmc_arm64: fill kern.hwpmc.cpuid
This will be used to detect supported pmu events. The expected format is
the MIDR register with the revision and variant fields masked. See also:
lib/libpmc/pmu-events/arch/arm64/mapfile.csv.

MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D30601
2021-06-30 16:26:07 -03:00
Mitchell Horne
2129c8f677 hwpmc_arm64.c: fix return style
In accordance to style(9).

MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
2021-06-30 16:26:07 -03:00
Alan Cox
1a8bcf30f9 amd64: a simplication to pmap_remove_{all,write}
Eliminate some unnecessary unlocking and relocking when we have to retry
the operation to avoid deadlock.  (All of the other pmap functions that
iterate over a PV list already implemented retries without these same
unlocking and relocking operations.)

Reviewed by:	kib, markj
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D30951
2021-06-30 13:12:25 -05:00
Olivier Houchard
712c060c94 arm64: Make sure COMPAT_FREEBSD32 handles thumb entry point.
If the entry point for the binary executed is a thumb 2 entry point, make
sure we set the PSR_T bit, or the CPU will interpret it as arm32 code and
bad things will happen.

PR: 256899
MFC after: 1 week
2021-06-30 14:55:18 +02:00
Olivier Houchard
c1da17a86c arm: Garbage collect _arm_memcpy/_arm_bzero.
Remove any attempt to use _arm_memcpy and _arm_bzero. It was used by some
xscale platforms to provide functions to use the DMA engine for big
zeroing/copying work, but those platforms are long gone, and it's unlikely
anything else will use those.
2021-06-30 14:53:57 +02:00
Martin Matuska
c0149d0300 zfs: update zfs_config.h (missing in 2617128a2) 2021-06-30 08:34:36 +02:00
Martin Matuska
2617128a21 zfs: merge openzfs/zfs@4694131a0 (master) into main
Notable upstream pull request merges:
  #12253 module/zfs: simplify ddt_stat_add() loop
  #12288 Avoid 64bit division in multilist index functions

Obtained from:	OpenZFS
OpenZFS commit:	4694131a0a
2021-06-30 08:02:44 +02:00
Dmitry Chagin
5ca9d41700 LinuxKPI: Rename a short description of the kmalloc type.
To avoid duplication in the vmstat -m output rename the kmalloc type short
description to 'lkpikmalloc' as the Linux emulation layer historically names
its linux malloc type as 'linux'.

Reviewed by:		hselasky, kib, emaste
Differential Revision:	https://reviews.freebsd.org/D30928
MFC after:		2 weeks
2021-06-29 20:20:01 +03:00
Dmitry Chagin
1fd26da926 LinuxKPI: Put compat code under appropriate condition.
Reviewed by:		hselasky, emaste, kib
Differential Revision:	https://reviews.freebsd.org/D30927
MFC after:		2 weeks
2021-06-29 20:19:17 +03:00
Dmitry Chagin
5d9f790191 Eliminate p_elf_machine from struct proc.
Instead of p_elf_machine use machine member of the Elf_Brandinfo which is now
cached in the struct proc at p_elf_brandinfo member.

Note to MFC: D30918, KBI

Reviewed by:		kib, markj
Differential Revision:	https://reviews.freebsd.org/D30926
MFC after:		2 weeks
2021-06-29 20:18:29 +03:00
Dmitry Chagin
945accf502 LinuxKPI: Use the proper API to determine the ABI of the running process.
Reviewed by:		markj, hselasky, kib
Differential Revision:	https://reviews.freebsd.org/D30924
MFC after:		2 weeks
2021-06-29 20:17:16 +03:00
Dmitry Chagin
615f22b2fb Add a link to the Elf_Brandinfo into the struc proc.
To allow the ABI to make a dicision based on the Brandinfo add a link
to the Elf_Brandinfo into the struct proc. Add a note that the high 8 bits
of Elf_Brandinfo flags is private to the ABI.

Note to MFC: it breaks KBI.

Reviewed by:		kib, markj
Differential Revision:	https://reviews.freebsd.org/D30918
MFC after:		2 weeks
2021-06-29 20:15:08 +03:00
Mateusz Guzik
f77697dd9f mac: cheaper check for ifnet_create_mbuf and ifnet_check_transmit
Sponsored by:	Rubicon Communications, LLC ("Netgate")
2021-06-29 15:06:45 +02:00
Jason A. Harmening
372691a7ae unionfs: release parent vnodes in deferred context
Each unionfs node holds a reference to its parent directory vnode.
A single open file reference can therefore end up keeping an
arbitrarily deep vnode hierarchy in place.  When that reference is
released, the resulting VOP_RECLAIM call chain can then exhaust the
kernel stack.

This is easily reproducible by running the unionfs.sh stress2 test.
Fix it by deferring recursive unionfs vnode release to taskqueue
context.

PR: 238883
Reviewed By:	kib (earlier version), markj
Differential Revision: https://reviews.freebsd.org/D30748
2021-06-29 06:02:01 -07:00
Edward Tomasz Napierala
435754a59e Add infrastructure required for Linux coredump support
This adds `sv_elf_core_osabi`, `sv_elf_core_abi_vendor`,
and `sv_elf_core_prepare_notes` fields to `struct sysentvec`,
and modifies imgact_elf.c to make use of them instead
of hardcoding FreeBSD-specific values.  It also updates all
of the ABI definitions to preserve current behaviour.

This makes it possible to implement non-native ELF coredump
support without unnecessary code duplication.  It will be used
for Linux coredumps.

Reviewed By:	kib
Sponsored By:	EPSRC
Differential Revision:	https://reviews.freebsd.org/D30921
2021-06-29 08:49:12 +01:00
Mateusz Guzik
d26ef5c7ac pf: make sure the dtrace probe has safe access to state
Reviewed by:	kp
Sponsored by:	Rubicon Communications, LLC ("Netgate")
2021-06-29 07:24:53 +00:00
Mateusz Guzik
55cc305dfc pf: revert: Use counter(9) for pf_state byte/packet tracking
stats are not shared and consequently per-CPU counters only waste
memory.

No slowdown was measured when passing over 20M pps.

Reviewed by:	kp
Sponsored by:	Rubicon Communications, LLC ("Netgate")
2021-06-29 07:24:53 +00:00
Mateusz Guzik
803dfe3da0 pf: deduplicate V_pf_state_z handling with pfsync
Reviewed by:	kp
Sponsored by:	Rubicon Communications, LLC ("Netgate")
2021-06-29 07:24:53 +00:00
Mateusz Guzik
7f025db57c pf: fix error-case leaks in pf_create_state
The hand-rolled clean up failed to free counters.

Reviewed by:	kp
Sponsored by:	Rubicon Communications, LLC ("Netgate")
2021-06-29 07:24:52 +00:00
Alan Cox
26a357245f arm64: a few simplications to pmap_remove_{all,write}
Eliminate some unnecessary unlocking and relocking when we have to retry
the operation to avoid deadlock.  (All of the other pmap functions that
iterate over a PV list already implemented retries without these same
unlocking and relocking operations.)

Avoid a pointer dereference by using an existing local variable that
already holds the desired value.

Eliminate some unnecessary repetition of code on a failed fcmpset.
Specifically, there is no point in retesting the DBM bit because it
cannot change state while the pmap lock is held.

Reviewed by:	kib, markj
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D30931
2021-06-28 22:21:24 -05:00
John-Mark Gurney
3d5104182c ued may be NULL here which will cause a panic... reproducable by
simply doing a usbconfig reset on a device which doesn't reset itself
properly...
2021-06-28 18:09:14 -07:00
Warner Losh
a72af82e31 cam: Fix GENERIC-MMCCAM build
Fix forgotten argument and type error. MMCCAM isn't enabled by default,
and I'd mistakenly thought it was, so these went undetected precommit.

Sponsored by:		Netflix
2021-06-28 17:22:35 -06:00
Warner Losh
9f0febd6a4 cam_sim: remove unused sim_doneq member
Its use was removed in 227d67aa54 by mav when locking was revamped.

Reviewed by:		scottl@, mav@
Sponsored by:		Netflix
Differential Revision:	https://reviews.freebsd.org/D30890
2021-06-28 16:13:03 -06:00
Warner Losh
50aa1daf14 cam: change xpt_clone_path to return int
xpt_clone_path originally returned a cam_status, but it doesn't do I/O
and should return an errno instead. I added it last year and it's only
used in one place. It's not yet documented, so no doc changes are
nneeded.

Reviewed by:		scottl@, mav@
Sponsored by:		Netflix
Differential Revision:	https://reviews.freebsd.org/D30884
2021-06-28 16:13:03 -06:00
Warner Losh
2b09870238 cam: Remove CAM_TRUE and CAM_FALSE, they are unused and duplicate bool
These were in the original CAM commit in 3.0, but were not used there,
nor have they been used since then. They also duplicate the now-standard
bool type. Remove them.

Reviewed by:		scottl@
Sponsored by:		Netflix
Differential Revision:	https://reviews.freebsd.org/D30879
2021-06-28 16:13:03 -06:00
Warner Losh
30f8afd027 cam: fix xpt_bus_register and xpt_bus_deregister return errno
xpt_bus_register and xpt_bus_deregister returns a hybrid error that's
neither a cam_status, nor an errno, but a mix of both.  Update
xpt_bus_register and xpt_bus_deregister to return an errno. The vast
majority of current users compare against zero, which can also be
spelled CAM_SUCCESS. Nobody uses CAM_FAILURE, so remove that symbol
to prevent comfusion (nothing returns it either).

Where the return value is saved, ensure that the variable 'error' is
used to store an errno and 'status' is used to store a cam_status where
it makes the code clearer (usually just in functions that already mix
and match). Where the return value isn't used at all, avoid storing it
at all.

Reviewed by:		scottl@, mav@ (earlier version)
Sponsored by:		Netflix
Differential Revision:	https://reviews.freebsd.org/D30860
2021-06-28 16:13:03 -06:00
Warner Losh
dcd5dea965 cam: delete cam_sim_alloc_dev
cam_sim_alloc_dev was only used internally by the MMC system. That has
been convered to using xpt_path_device() and has stopped using this
interface, so this can be retired.

Reviewed by:		scottl@, mav@
Sponsored by:		Netflix
Differential Revision:	https://reviews.freebsd.org/D30858
2021-06-28 16:13:03 -06:00
Warner Losh
aeb04e88f5 sdhci: stop using cam_sim_alloc_dev
Sponsored by:		Netflix
Differential Revision:	https://reviews.freebsd.org/D30857
2021-06-28 16:13:02 -06:00
Warner Losh
bd69852be1 mmc_sim: stop using cam_sim_alloc_dev
Use the vanilla flavor of cam_sim_alloc. Now that sdiob has been
converted to get the device_t from the cam_path, this data is no longer
necessary.

Reviewed by:		scottl@
Sponsored by:		Netflix
Differential Revision:	https://reviews.freebsd.org/D30856
2021-06-28 16:13:02 -06:00