Commit Graph

139813 Commits

Author SHA1 Message Date
Edward Tomasz Napierala
c91d0e59be linux: Make linux_ptrace.c portable
Make sys/amd64/linux/linux_ptrace.c machine-independent,
in preparation for moving it into sys/compat/linux/.
No functional changes.

Reviewed By:	kib
Sponsored By:	EPSRC
Differential Revision:	https://reviews.freebsd.org/D32756
2021-11-03 08:54:35 +00:00
Edward Tomasz Napierala
4dfd612286 linux: mv sys/i386/linux/linux_ptrace{,_machdep}.c
In preparation for machine-independent sys/compat/linux/linux_ptrace.c,
rename the i386-specific Linux ptrace(2) implementation.  No functional
changes.

Sponsored By:	EPSRC
Differential Revision: https://reviews.freebsd.org/D32757
2021-11-03 08:50:17 +00:00
Edward Tomasz Napierala
91be6286e2 linprocfs: Fix formatting of Uid and Gid lines
The separator here should be tabs, not spaces.  This fixes a warning
from chromium-browser on Bionic:

[1022/162248.137612:ERROR:process_info_linux.cc(107)] format error: unrecognized Uid format

Sponsored By:	EPSRC
Differential Revision:	https://reviews.freebsd.org/D32612
2021-11-03 08:40:55 +00:00
Kyle Evans
7771f2a0c9 kern: physmem: improve region coalescing logic
The existing logic didn't take into account newly inserted mappings
wholly contained by an existing region (or vice versa), nor did it
account for weird overlap scenarios.  The latter is probably unlikely
to happen, but the former may happen in UEFI: BootServicesData allocated
within a large chunk of ConventionalMemory.  This situation blows up vm
initialization.

While we're here, remove the "exact match" logic as it's likely wrong;
if an exact match exists with conflicting flags, for instance, then we
should probably be doing something else.  The new logic takes into
account exact matches as part of the overlapping efforts.

Reviewed by:	kib, mhorne (both earlier version)
Differential Revision:	https://reviews.freebsd.org/D32701
2021-11-03 02:32:46 -05:00
Rick Macklem
331883a2f2 nfscl: Check for a forced dismount in nfscl_getref()
The nfscl_getref() function is called within nfscl_doiods() when
the NFSv4.1/4.2 pNFS client is doing I/O on a DS.  As such,
nfscl_getref() needs to check for a forced dismount.
This patch adds that check.

Found during a recent IETF NFSv4 working group testing event.

MFC after:	2 weeks
2021-11-02 17:28:13 -07:00
Warner Losh
edfbbfd541 gpart: Move MBR efimedia reporting to a separate routine
Move the efimedia reporting to g_part_mbr_efimedia and use that from
g_part_mbr_dumpconf to report it.

Sponsored by:		Netflix
Reviewed by:		mav
Differential Revision:	https://reviews.freebsd.org/D32781
2021-11-02 17:09:17 -06:00
Warner Losh
e3ab141fda gpart: Move GPT efimedia reporting to a separate routine
Move the efimedia reporting to g_part_gpt_efimedia and use that from
g_part_gpt_dumpconf to report it.

Sponsored by:		Netflix
Reviewed by:		mav
Differential Revision:	https://reviews.freebsd.org/D32780
2021-11-02 17:09:17 -06:00
Ruslan Bukin
4bb6991531 arm/pmu: add ACPI attachment.
This makes hwpmc(4) sampling work on ACPI-based AArch64 systems.
Tested on ARM Neoverse N1.

Submitted by: Greg V <greg@unrelenting.technology>
Reviewed by: jrtc27, mhorne
Differential Revision: https://reviews.freebsd.org/D24423
2021-11-02 19:35:29 +00:00
John Baldwin
4e057806cf crypto: Cleanup mtx_init() calls.
Don't pass the same name to multiple mutexes while using unique types
for WITNESS.  Just use the unique types as the mutex names.

Reviewed by:	markj
MFC after:	1 week
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D32740
2021-11-02 12:18:05 -07:00
John Baldwin
7178578192 crypto: Use a single "crypto" kproc for all of the OCF kthreads.
Reported by:	julian
Reviewed by:	markj
MFC after:	1 week
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D32739
2021-11-02 12:18:05 -07:00
Bjoern A. Zeeb
1a8f198fa6 epair: remove "All rights reserved"
Remove "All rights reserved" from The FreeBSD Foundation owned
copyrights on epair code and documentation.

Approved by:	emaste (FreeBSD Foundation)
2021-11-02 16:50:26 +00:00
Hans Petter Selasky
2390a1441e LinuxKPI: Add sysctl(8) knob to control verbosity of WARN_ON's.
The purpose of this change is to reduce the amount of dmesg(8) noise when
VT switching after a panic.

Submitted by:	Greg V <greg@unrelenting.technology>
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D30174
Sponsored by:	NVIDIA Networking
2021-11-02 16:53:34 +01:00
Michal Meloun
a670e1c13a arm: Fix handling of undefined instruction aborts in THUMB2 mode.
Correctly recognize NEON/SIMD and VFP instructions in THUMB2 mode and pass
these to the appropriate handler. Note that it is not necessary to filter
all undefined instruction variant or register combinations, this is a job
for given handler.

Reported by:	Robert Clausecker <fuz@fuz.su>
PR:		259187
MFC after:	2 weks
2021-11-02 11:11:44 +01:00
Bjoern A. Zeeb
3dd5760aa5 if_epair: rework
Rework if_epair(4) to no longer use netisr and dpcpu.
Instead use mbufq and swi_net.
This simplifies the code and seems to make it work better and
no longer hang.

Work largely by bz@, with minor tweaks by kp@.

Reviewed by:	bz, kp
MFC after:	3 weeks
Differential Revision:	https://reviews.freebsd.org/D31077
2021-11-02 09:23:46 +01:00
Rick Macklem
5a95a6e8e4 nfscl: Use a smaller initial delay time for NFSERR_DELAY
For NFS RPCs that receive a NFSERR_DELAY reply, the delay time
is initially 1sec and then increases exponentially to NFS_TRYLATERDEL.
It was found that this delay time is excessive for some NFSv4
servers, which work well with a 1msec delay.
A 1sec delay resulted in very slow performance for Remove and
Rename when delegations and pNFS were enabled.

This patch decreases the initial delay time to 1msec.

Found during a recent IETF NFSv4 working group testing event.

MFC after:	2 weeks
2021-11-01 17:21:31 -07:00
Mateusz Guzik
8e27968786 inet: remove tcp_debug from netinet/tcp_debug.h
It was a hack only needed for trpt, which can just define it locally.

This makes it possible to fix up systat which also includes the file.

Sponsored by:	Rubicon Communications, LLC ("Netgate")
2021-11-01 23:10:30 +00:00
Mateusz Guzik
8f3d786cb3 pf: remove the flags argument from pf_unlink_state
All consumers call it with PF_ENTER_LOCKED.

Reviewed by:	kp
Sponsored by:	Rubicon Communications, LLC ("Netgate")
2021-11-01 20:59:14 +01:00
Mateusz Guzik
edf6dd82e9 pf: fix use-after-free from pf_find_state_all
state was returned without any locks nor references held

Reviewed by:	kp
Sponsored by:	Rubicon Communications, LLC ("Netgate")
2021-11-01 20:59:05 +01:00
Marius Halden
1019354b54 carp: deal with negative net.inet.carp.demotion
Given nodes 1 and 2, where node 1 has an advskew of 0 and node 2 has an
advskew of 100, making them master and backup respectively.

If net.inet.carp.demotion is set to a negative value on node 1, node 2
might become master while node 1 still retains it master status. Wether
or not node 2 becomes master seems to depend on the nodes advskew and
what the demotion sysctl was set to on node 1.

The reason for node 2 becoming master seems to be that the calculated
advskew taking demotion into account is truncated to a single unsigned
byte when copied into the carp header for sending, and node 1 stays
master since it takes uses the whole non-truncated calculated advskew
when deciding wether to stay master.

PR:		259528
Reviewed by:	donner, glebius
MFC after:	3 weeks
Sponsored by:	Modirum MDPay
Differential Revision:	https://reviews.freebsd.org/D32759
2021-11-01 17:08:23 +01:00
Mark Johnston
7585c5db25 uma: Fix handling of reserves in zone_import()
Kegs with no items reserved have uk_reserve = 0.  So the check
keg->uk_reserve >= dom->ud_free_items will be true once all slabs are
depleted.  Then, rather than go and allocate a fresh slab, we return to
the cache layer.

The intent was to do this only when the keg actually has a reserve, so
modify the check to verify this first.  Another approach would be to
make uk_reserve signed and set it to -1 until uma_zone_reserve() is
called, but this requires a few casts elsewhere.

Fixes:	1b2dcc8c54 ("uma: Avoid depleting keg reserves when filling a bucket")
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D32516
2021-11-01 09:51:43 -04:00
Mark Johnston
fab343a716 uma: Improve M_USE_RESERVE handling in keg_fetch_slab()
M_USE_RESERVE is used in a couple of places in the VM to avoid unbounded
recursion when the direct map is not available, as is the case on 32-bit
platforms or when certain kernel sanitizers (KASAN and KMSAN) are
enabled.  For example, to allocate KVA, the kernel might allocate a
kernel map entry, which might require a new slab, which requires KVA.

For these zones, we use uma_prealloc() to populate a reserve of items,
and then in certain serialized contexts M_USE_RESERVE can be used to
guarantee a successful allocation.  uma_prealloc() allocates the
requested number of items, distributing them evenly among NUMA domains.
Thus, in a first-touch zone, to satisfy an M_USE_RESERVE allocation we
might have to check the slab lists of other domains than the current one
to provide the semantics expected by consumers.

So, try harder to find an item if M_USE_RESERVE is specified and the keg
doesn't have anything for current (first-touch) domain.  Specifically,
fall back to a round-robin slab allocation.  This change fixes boot-time
panics on NUMA systems with KASAN or KMSAN enabled.[1]

Alternately we could have uma_prealloc() allocate the requested number
of items for each domain, but for some existing consumers this would be
quite wasteful.  In general I think keg_fetch_slab() should try harder
to find free slabs in other domains before trying to allocate fresh
ones, but let's limit this to M_USE_RESERVE for now.

Also fix a separate problem that I noticed: in a non-round-robin slab
allocation with M_WAITOK, rather than sleeping after a failed slab
allocation we simply try again.  Call vm_wait_domain() before retrying.

Reported by:	mjg, tuexen [1]
Reviewed by:	alc
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D32515
2021-11-01 09:51:18 -04:00
Andrew Turner
62cbc00d2f Print the correct register for the arm64 elr
In 7ec86b6609 ("Also print symbols when printing arm64 registers")
a new function was created to print most registers. Unfortunately the
Link Register (LR) was being printed when we should have printed the
Exception Link Register (ELR).

Fix this by adding the missing 'e'.

Sponsored by:	The FreeBSD Foundation
2021-11-01 11:19:57 +00:00
Philip Paeps
91feb4f420 riscv: add iicbus and iicoc to GENERIC
The iicoc driver supports the OpenCores I2C IP.  This is included in at
least the SiFive "Unleashed" and "Unmatched" cores and probably others.

Suggested by:	jrtc27
2021-11-01 13:19:55 +08:00
Thomas Skibo
99443830fa iicoc: support building as a module
Only build on RISC-V for now, since we're not aware of any other cores
with this IP supported by FreeBSD.

Reviewed by:	jrtc27, philip
MFC after:	3 days
Differential Revision:	https://reviews.freebsd.org/D32737
2021-11-01 12:33:39 +08:00
Thomas Skibo
2a36909a94 iicoc: fix repeated start
Reviewed by:	jrtc27, philip
MFC after:	3 days
Differential Revision:	https://reviews.freebsd.org/D32737
2021-11-01 12:29:29 +08:00
Thomas Skibo
e528757ca6 iicoc: add support for SiFive HiFive Unmatched
Reviewed by:	jrtc27, philip
MFC after:	3 days
Differential Revision:	https://reviews.freebsd.org/D32737
2021-11-01 12:26:49 +08:00
Rick Macklem
d5d2ce1c85 nfscl: Do pNFS layout return_on_close synchronously
For pNFS servers that specify that Layouts are to be returned
upon close, they may expect that LayoutReturn to happen before
the associated Close.

This patch modifies the NFSv4.1/4.2 pNFS client so that this
is done.  This only affects a pNFS mount against a non-FreeBSD
NFSv4.1/4.2 server that specifies return_on_close in LayoutGet
replies.

Found during a recent IETF NFSv4 working group testing event.

MFC after:	2 weeks
2021-10-31 16:31:31 -07:00
Mateusz Guzik
627d5d1966 geli: eli data -> eli_data for consistency with other geom classes
PR:	259392
Reported by:	dewayne@heuristicsystems.com.au
MFC after:	1 week
2021-10-31 20:36:51 +00:00
Bjoern A. Zeeb
917181dddf net80211: add a driver-private pointer to struct ieee80211_node
Add a void *ni_drv_data field to struct ieee80211_node that drivers
can use to backtrack to their internal state from a net80211 node.

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
X-Differential Revision: https://reviews.freebsd.org/D30654 (abandoned)
2021-10-31 19:08:28 +00:00
Xin LI
f38bef2ce4 Bump __FreeBSD_version following the libdialog shared library
version number bump.
2021-10-30 23:09:29 -07:00
Konstantin Belousov
e5248548f9 procfs: return right hardlink from /proc/curproc/file
Use proc_get_binpath() to get the hardlink right.

PR:	248184
Reviewed by:	emaste, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D32738
2021-10-31 03:05:14 +02:00
Konstantin Belousov
f34fc6ba06 Extract proc_get_binpath() from sysctl_kern_proc_pathname()
Reviewed by:	emaste, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D32738
2021-10-31 03:05:14 +02:00
Konstantin Belousov
b4c7d45c84 sys/proc.h: put proc_add_orphan() into proper place
Noted by:	markj
Reviewed by:	emaste, markjd
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D32738
2021-10-31 03:05:14 +02:00
Rick Macklem
50dcff0816 nfscl: Add setting n_localmodtime to the Write RPC code
Similar to commit 2be417843a, I believe there could be a race between
the NFS client VOP_LOOKUP() and file Writing that could result in stale
file attributes being loaded into the NFS vnode by VOP_LOOKUP().

I have not been able to reproduce a failure due to this race, but
I believe that there are two possibilities:

The Lookup RPC happens while VOP_WRITE() is being executed and loads
stale file attributes after VOP_WRITE() returns when it has already
completed the Write/Commit RPC(s).
--> For this case, setting the local modify timestamp at the end of
  VOP_WRITE() should ensure that stale file attributes are not loaded.

The Lookup RPC occurs after VOP_WRITE() has returned, while
asynchronous Write/Commit RPCs are in progress and then is
blocked by the vnode held by VOP_OPEN/VOP_CLOSE/VOP_FSYNC which
will flush writes via ncl_flush() or ncl_vinvalbuf(), clearing the
NMODIFIED flag (which indicates Writes-in-progress). The VOP_LOOKUP()
then acquires the NFS vnode lock and fills in stale file attributes.
 --> Setting the local modify timestamp in ncl_flsuh() and ncl_vinvalbuf()
   when they clear NMODIFIED should ensure that stale file attributes
   are not loaded.

This patch does the above.

PR:	259071
Reviewed by:	asomers
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D32677
2021-10-30 17:08:28 -07:00
Rick Macklem
ab87c39c25 nfscl: Set n_localmodtime in Deallocate
Commit 2be417843a added n_localmodtime, which is used by Lookup
and ReaddirPlus to check to see if the file attributes in an RPC
reply might be stale.  This patch sets n_localmodtime in Deallocate.
Done as a separate commit, since Deallocate is not in stable/13.

PR:	259071
Reviewed by:	asomers
Differential Revision:	https://reviews.freebsd.org/D32635
2021-10-30 16:46:14 -07:00
Rick Macklem
2be417843a PR#259071 provides a test program that fails for the NFS client.
Testing with it, there appears to be a race between Lookup
and VOPs like Setattr-of-size, where Lookup ends up loading
stale attributes (including what might be the wrong file size)
into the NFS vnode's attribute cache.

The race occurs when the modifying VOP (which holds a lock
on the vnode), blocks the acquisition of the vnode in Lookup,
after the RPC (with now potentially stale attributes).

Here's what seems to happen:
Child                                Parent

does stat(), which does
VOP_LOOKUP(), doing the Lookup
RPC with the directory vnode
locked, acquiring file attributes
valid at this point in time

blocks waiting for locked file       does ftruncate(), which
vnode                                does VOP_SETATTR() of Size,
                                     changing the file's size
                                     while holding an exclusive
                                     lock on the file's vnode
                                     releases the vnode lock
acquires file vnode and fills in
now stale attributes including
the old wrong Size
                                     does a read() which returns
                                     wrong data size

This patch fixes the problem by saving a timestamp in the NFS vnode
in the VOPs that modify the file (Setattr-of-size, Allocate).
Then lookup/readdirplus compares that timestamp with the time just
before starting the RPC after it has acquired the file's vnode.
If the modifying RPC occurred during the Lookup, the attributes
in the RPC reply are discarded, since they might be stale.

With this patch the test program works as expected.

Note that the test program does not fail on a July stable/12,
although this race is in the NFS client code.  I suspect a
fairly recent change to the name caching code exposed this
bug.

PR:	259071
Reviewed by:	asomers
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D32635
2021-10-30 16:35:02 -07:00
Edward Tomasz Napierala
f0d9a6a781 linux: make PTRACE_SETREGS use a correct struct
Note that this is largely untested at this point, as was
the previous version; I'm committing this mostly to get
rid of `struct linux_pt_reg`.

Sponsored By:	EPSRC
Differential Revision:	https://reviews.freebsd.org/D32735
2021-10-30 10:13:37 +01:00
Edward Tomasz Napierala
8bbc0600cc linux: Add additional ptracestop only if the debugger is Linux
In 6e66030c4c, additional ptracestop was added in order
to implement PTRACE_EVENT_EXEC.  Make it only apply to cases
where the debugger is a Linux processes; native FreeBSD
debuggers can trace Linux processes too, but they don't
expect that additonal ptracestop.

Fixes:		6e66030c4c
Reported By:	kib
Reviewed By:	kib
Sponsored By:	EPSRC
Differential Revision:	https://reviews.freebsd.org/D32726
2021-10-30 09:54:17 +01:00
Rick Macklem
dc6dd769de nfscl: Use NFSMNTP_DELEGISSUED in two more functions
Commit 5e5ca4c8fc added a NFSMNTP_DELEGISSUED flag to indicate when
a delegation has been issued to the mount.  For the common case
where an NFSv4 server is not issuing delegations, this flag
can be checked to avoid acquisition of the NFSCLSTATEMUTEX.

This patch adds checks for NFSMNTP_DELEGISSUED being set
to two more functions.

This change appears to be performance neutral for a small number
of opens, but should reduce lock contention for a large number of opens
for the common case where server is not issuing delegations.

MFC after:	2 week
2021-10-29 20:35:02 -07:00
Randall Stewart
141a53cd58 tcp: Rack might retransmit forever.
If we get a Sacked peer with an MTU change we can retransmit forever if the
last bytes are sacked and the client goes away (think power off). Then we
never see the end condition and continually retransmit.

Reviewed by: Michael Tuexen
Sponsored by: Netflix Inc.
Differential Revision: https://reviews.freebsd.org/D32671
2021-10-29 17:37:49 -04:00
Mark Johnston
26f76aea2d timecounter: Load the currently selected tc once in tc_windup()
Reported by:	Sebastian Huber <sebastian.huber@embedded-brains.de>
Reviewed by:	kib
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D32729
2021-10-29 14:30:15 -04:00
Olivier Houchard
74e9b5f29a Merge commit 'ce929fe84f9c453263af379f3b255ff8eca01d48'
Import CK as of commit 2265c7846f4ce667f5216456afe2779b23c3e5f7.
2021-10-29 19:18:03 +02:00
Edward Tomasz Napierala
ad0379660d linux: make PTRACE_GETREGS return correct struct
Previously it returned a shorter struct.  I can't find any
modern software that uses it, but tests/ptrace from strace(1)
repo complained.

Differential Revision: https://reviews.freebsd.org/D32601
2021-10-29 16:18:28 +01:00
Edward Tomasz Napierala
f939dccfd7 linux: Make PTRACE_GETREGSET return proper buffer size
This fixes Chrome warning:

[1022/152319.328632:ERROR:ptracer.cc(476)] Unexpected registers size 0 != 216, 68

Reviewed By:	emaste
Sponsored By:	EPSRC
Differential Revision: https://reviews.freebsd.org/D32616
2021-10-29 15:31:33 +01:00
Edward Tomasz Napierala
c8c93b1516 linux: Also translate the signal if the code is CLD_KILLED
This fixes ./waitid.gen.test from the strace(1) test suite.

Reviewed By:	kib
Sponsored By:	EPSRC
Differential Revision:	https://reviews.freebsd.org/D32617
2021-10-29 15:28:00 +01:00
Edward Tomasz Napierala
6547153e46 linux: Fix ptrace panic with ERESTART
Translate ERESTART into Linux "internal" errno ERESTARTSYS.
This fixes the erestartsys.gen.test from strace(1).

Reviewed By:	kib
Sponsored By:	EPSRC
Differential Revision:	https://reviews.freebsd.org/D32623
2021-10-29 14:55:59 +01:00
Wojciech Macek
680920237b Revert "qoriq_gpio: Implement interrupt controller functionality"
This reverts commit 027a58aab2.
2021-10-29 12:05:55 +02:00
Wojciech Macek
f5639a06b8 mvneta: fix encap property
Fix MVNETA encap property.
2021-10-29 10:56:57 +02:00
Kornel Duleba
027a58aab2 qoriq_gpio: Implement interrupt controller functionality
The pic_* interface was used.
Only edge interrupts are supported by this controller.
Driver mutex had to be converted to a spin lock so that it can
be used in the interrupt filter context.
Two types of intr_map_data are supported - INTR_MAP_DATA_GPIO and
INTR_MAP_DATA_FDT. This way interrupts can be allocated using the
userspace gpio interrupt allocation method, as well as directly from
simplebus. The latter can be used by devices that have its irq routed
to a GPIO pin.

Obtained from: Semihalf
Sponsored by: Alstom Group
Differential revision: https://reviews.freebsd.org/D32587
2021-10-29 10:08:26 +02:00
Kornel Duleba
d88aecce69 felix: Add a sysctl to control timer routine frequency
Driver polls status of all PHYs connected to the switch in a
fixed interval.
Add a sysctl that allows to control frequency of that.
The value is expressed in ticks and defaults to "hz", or 1 second.

Obtained from: Semihalf
Sponsored by: Alstom Group
2021-10-29 10:08:26 +02:00
Kornel Duleba
8c5fead105 Remove enetc_mdio driver
It was previously used by felix(4) for PHY communication.
Since that is not the case anymore this driver is now left unused.

Obtained from: Semihalf
Sponsored by: Alstom Group
2021-10-29 10:08:26 +02:00
Kornel Duleba
29cf6a79ac felix: Use internal MDIO regs for PHY communication
Previously we would use an external MDIO device found on the PCI bus.
Switch to using MDIO mapped in a separate BAR of the switch device.
It is much easier this way since we don't have to depend on another
driver anymore.

Obtained from: Semihalf
Sponsored by: Alstom Group
2021-10-29 10:08:26 +02:00
Kornel Duleba
06e6ca6dd3 dmar: Disable protected memory regions after initialization
Some BIOSes protect memory region they reside in by using DMAR to
prevent devices from doing any DMA transactions to that part of RAM.
AMI refers to this as "DMA Control Guarantee".
Disable the protection when address translation is enabled.
I stumbled upon this while investigation a failing coredump on a device
which has this feature enabled.

Sponsored by:		Stormshield
Obtained from:		Semihalf
Reviewed by:		kib
Differential revision:	https://reviews.freebsd.org/D32591
2021-10-29 10:08:25 +02:00
Kornel Duleba
3c02da8096 dmar: Don't try to reserve PCI regions for non-existing devices
In some cases we might have to create DMAR context before the
corresponding device has been enumerated by the PCI bus.
In that case we get called with NULL dev, because of that trying
to reserve PCI regions causes a NULL pointer dereference in
pci_find_pcie_root_port.

Sponsored by:		Stormshield
Obtained from:		Semihalf
MFC after:		2 weeks
Reviewed by:		kib, rlibby
Differential revision:	https://reviews.freebsd.org/D32589
2021-10-29 10:08:25 +02:00
Wojciech Macek
ccfa9ac5ac NXP: Add ls1028a SPI clock driver
Provide driver for LS1028A and LX2160 SPI clock modules.

Obtained from:		Semihalf
Sponsored by:		Alstom
Differential revision:	https://reviews.freebsd.org/D32689
2021-10-29 09:52:20 +02:00
Randall Stewart
aeda852782 tcp: Rack at times can miscalculate the RTT from what it thinks is a persists probe respone.
Turns out that if a peer sends in a window update right after rack fires off
a persists probe, we can mis-interpret the window update and calculate
a bogus RTT (very short). We still process the window update and send
the data but we incorrectly generate an RTT. We should be only doing
the RTT stuff if the rwnd is still small and has not changed.

Reviewed by: Michael Tuexen
Sponsored by: Netflix Inc.
Differential Revision: https://reviews.freebsd.org/D32717
2021-10-29 03:17:43 -04:00
Gleb Smirnoff
92b3e07229 Enable net.inet.tcp.nolocaltimewait.
This feature has been used for many years at large sites and
didn't show any pitfalls.
2021-10-28 15:34:00 -07:00
Sebastian Huber
ae750fbac7 kern_tc.c: Scaling/large delta recalculation
This change is a slight performance optimization for systems with a slow
64-bit division.

The th->th_scale and th->th_large_delta values only depend on the
timecounter frequency and the th->th_adjustment. The timecounter
frequency of a timehand only changes when a new timecounter is activated
for the timehand. The th->th_adjustment is only changed by the NTP
second update. The NTP second update is not done for every call of
tc_windup().

Move the code block to recalculate the scaling factor and
the large delta of a timehand to the new helper function
recalculate_scaling_factor_and_large_delta().

Call recalculate_scaling_factor_and_large_delta() when a new timecounter
is activated and a NTP second update occurred.

MFC after:	1 week
2021-10-29 00:31:14 +03:00
Konstantin Belousov
1c69690319 Unmap shared page manually before doing vm_map_remove() on exit or exec
This allows the pmap_remove(min, max) call to see empty pmap and exploit
empty pmap optimization.

Reviewed by:	markj
Tested by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D32569
2021-10-28 22:01:59 +03:00
Konstantin Belousov
0b3bc72889 amd64 pmap: adjust the empty pmap optimization in pmap_remove()
to match the added accounting of the top-level page table pages.

Reviewed by:	markj
Tested by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D32569
2021-10-28 22:01:58 +03:00
Konstantin Belousov
e93b5adb6b amd64 pmap: account for the top-level pages
both for kernel and user page tables, the later exist in the PTI case.

Reviewed by:	markj
Tested by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D32569
2021-10-28 22:01:58 +03:00
Konstantin Belousov
4d675b80f0 i386: fix struct proc layout asserts after 351d5f7fc5
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2021-10-28 21:56:21 +03:00
Konstantin Belousov
ee92c8a842 sysctl kern.proc.procname: report right hardlink name
PR:	248184
Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D32611
2021-10-28 20:50:02 +03:00
Konstantin Belousov
351d5f7fc5 exec: store parent directory and hardlink name of the binary in struct proc
While doing it, also move all the code to resolve pathnames and obtain
text vp and dvp, into single place.   Besides simplifying the code, it
avoids spurious vnode relocks and validates the explanation why
a transient text reference on the script vnode is not harmful.

Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D32611
2021-10-28 20:49:56 +03:00
Konstantin Belousov
0c10648fbb exec: provide right hardlink name in AT_EXECPATH
For this, use vn_fullpath_hardlink() to resolve executable name for
execve(2).

This should provide the right hardlink name, used for execution, instead
of random hardlink pointing to this binary.  Also this should make the
AT_EXECNAME reliable for execve(2), since kernel only needs to resolve
parent directory path, which should always succeed (except pathological
cases like unlinking a directory).

PR:	248184
Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D32611
2021-10-28 20:49:31 +03:00
Konstantin Belousov
9a0bee9f6a Make vn_fullpath_hardlink() externally callable
Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D32611
2021-10-28 20:49:26 +03:00
Konstantin Belousov
15bf81f354 struct image_params: use bool type for boolean members
Also re-align comments, and group booleans and char members together.

Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D32611
2021-10-28 20:49:21 +03:00
Konstantin Belousov
9d58243fbc do_execve(): switch boolean locals to use bool type
Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D32611
2021-10-28 20:49:16 +03:00
Konstantin Belousov
143dba3a91 kern_exec.c: style
Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D32611
2021-10-28 20:49:10 +03:00
Kristof Provost
e5c4987e3f pf: fix dummynet + NAT
Dummynet differs from ALTQ in that ALTQ schedules packets after they
leave pf. Dummynet schedules them after they leave pf, but then
re-injects them.
We currently deal with this by ensuring we don't re-schedule a packet we
get from dummynet, but this produces unexpected results when combined
with NAT, as dummynet processing is done after the NAT transformation.
In other words, the second time the packet is handed to pf it may have a
different source and destination address.

Simplify this by moving dummynet processing to after all other pf
processing, and not re-processing (but always passing) packets from
dummynet.

This fixes NAT of dummynet delayed packets, and also reduces processing
overhead (because we only do state/rule lookup for each dummynet packet
once, rather than twice).

MFC after:	3 weeks
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D32665
2021-10-28 10:41:17 +02:00
Kristof Provost
7fe0c3f8d3 mbuf: PACKET_TAG_PF should not be persistent
We should clear firewall tags on loopback, icmp reflection, or if_epair
transmission. Left over tags can produce unexpected behaviour,
especially on if_epair where a and b interfaces can be in different
vnets, and have different firewall policies set.

MFC after:	3 weeks
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D32664
2021-10-28 10:41:17 +02:00
Kristof Provost
62d2dcafb7 if_epair: delete mbuf tags
Remove all (non-persistent) tags when we transmit a packet. Real network
interfaces do not carry any tags either, and leaving tags attached can
produce unexpected results.

Reviewed by:	bz, glebius
MFC after:	3 weeks
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D32663
2021-10-28 10:41:16 +02:00
Wojciech Macek
8a727c3df8 mroute: add missing WUNLOCK
Add missing WNLOCK as in all other error cases.

Reported by:		Stormshield
Obtained from:		Semihalf
2021-10-28 07:12:23 +02:00
Wojciech Macek
fb3854845f mroute: fix memory leak
Add MFC to linked list to store incoming packets
before MCAST JOIN was captured.

Sponsored by:		Stormshield
Obtained from:		Semihalf
MFC after:		2 weeks
2021-10-28 07:12:16 +02:00
Gleb Smirnoff
840680e601 Wrap mutex(9), rwlock(9) and sx(9) macros into __extension__ ({})
instead of do {} while (0).

This makes them real void expressions, and they can be used anywhere
where a void function call can be used, for example in a conditional
operator.

Reviewed by:		kib, mjg
Differential revision:	https://reviews.freebsd.org/D32696
2021-10-27 18:58:36 -07:00
Jessica Clarke
63d24336fd Fix off-by-one error in msdosfs FAT32 volume label copying
I dropped the + 1 from the other two instances in each file but failed
to do so for this one, resulting in a more egregious buffer overread
than the one I was fixing (since the read character ended up in the
output if there was space).

Reported by:	Jenkins
Fixes:	34fb1c133c ("Fix intra-object buffer overread for labeled msdosfs volumes")
2021-10-28 01:01:00 +01:00
John Baldwin
4827bf76bc ktls: Fix assertion for TLS 1.0 CBC when using non-zero starting seqno.
The starting sequence number used to verify that TLS 1.0 CBC records
are encrypted in-order in the OCF layer was always set to 0 and not to
the initial sequence number from the struct tls_enable.

In practice, OpenSSL always starts TLS transmit offload with a
sequence number of zero, so this only matters for tests that use a
random starting sequence number.

Reviewed by:	markj
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D32676
2021-10-27 16:35:56 -07:00
Mateusz Guzik
628c3b307f cache: only let non-dir descriptors through when doing EMPTYPATH lookups
Otherwise things like realpath against a file and '.' end up with an
illegal state of having a regular vnode for the parent.

Reported by:	syzbot+9aa5439dd9c708aeb1a8@syzkaller.appspotmail.com
2021-10-27 18:27:47 +00:00
Jessica Clarke
34fb1c133c Fix intra-object buffer overread for labeled msdosfs volumes
Volume labels, like directory entries, are padded with spaces and so
have no NUL terminator. Whilst the MIN for the dsize argument to strlcpy
ensures that the copy does not overflow the destination, strlcpy is
defined to return the number of characters in the source string,
regardless of the provided dsize, and so keeps reading until it finds a
NUL, which likely exists somewhere within the following fields, but On
CHERI with the subobject bounds enabled in the compiler this buffer
overread will be detected and trap with a bounds violation.

Found by:	CHERI
Reviewed by:	imp
Differential Revision:	https://reviews.freebsd.org/D32579
2021-10-27 18:38:37 +01:00
Jessica Clarke
f350bc1dd3 ada: Fix intra-object buffer overread of identify strings
In the ATA/ATAPI spec these are space-padded fixed-length strings with
no NUL-terminator (and byte swapped). When performing the identify we
call ata_param_fixup to swap the bytes back to be in order, strip any
leading/trailing spaces and coalesce consecutive spaces, padding with
NULs. However, if the input has no padding spaces, the fixed-up strings
are still not NUL-terminated. This causes two issues. The first is that
strlcpy will truncate the string by replacing the final byte with a NUL.
The second is that strlcpy will keep reading src until it finds a NUL in
order to calculate the return value, which is defined as the length of
src (so that callers can then compare it with the dsize input to see if
the input string was truncated), thereby reading past the end of the
buffer and into whatever adjacent fields are in the structure. In
practice there's a NUL byte somewhere in the structure, but on CHERI
with subobject bounds enabled in the compiler this overread will be
detected and trap as a bounds violation.

Note this matches ata_xpt's aprobedone, which does a bcopy to a
malloc'ed buffer and manually NUL-terminates it for the CAM path's
device's serial_num.

Found by:	CHERI
Reviewed by:	imp, scottl
Differential Revision:	https://reviews.freebsd.org/D32567
2021-10-27 18:38:37 +01:00
Jessica Clarke
29863d1eff xhci: Rework 64-byte context support to avoid pointer abuse
Currently, to support 64-byte contexts, xhci_ctx_[gs]et_le(32|64) take a
pointer to the field within a 32-byte context and, if 64-byte contexts
are in use, compute where the 64-byte context field is and use that
instead by deriving a pointer from the 32-byte field pointer. This is
done by exploiting a combination of 64-byte contexts being the same
layout as their 32-byte counterparts, just with 32 bytes of padding at
the end, and that all individual contexts are either in a device
context or an input context which itself is page-aligned. By masking out
the low 4 bits (which is the offset of the field within the 32-byte
contxt) of the offset within the page, the offset of the invididual
context within the containing device/input context can be determined,
which is itself 32 times the number of preceding contexts. Thus, adding
this value to the pointer again gets 64 times the number of preceding
contexts plus the field offset, which gives the offset of the 64-byte
context plus the field offset, which is the address of the field in the
64-byte context.

However, this involves a fair amount of lying to the compiler when
constructing these intermediate pointers, and is rather difficult to
reason about. In particular, this is problematic for CHERI, where we
compile the kernel with subobject bounds enabled; that is, unless
annotated to opt out (e.g. for C struct inheritance reasons where you
need to be able to downcast, or containerof idioms), a pointer to a
member of a struct is a capability whose bounds only cover that field,
and any attempt to dereference outside those bounds will fault,
protecting against intra-object buffer overflows. Thus the pointer given
to xhci_ctx_[gs]et_le(32|64) is a capability whose bounds only cover the
field in the 32-byte context, and computing the pointer to the 64-byte
context field takes the address out of bounds, resulting in a fault when
later dereferenced.

This can be cleaned up by using a different abstraction. Instead of
doing the 32-byte to 64-byte conversion on access to the field, we can
do the conversion when getting a pointer to the context itself, and
define proper 64-byte versions of contexts in order to let the compiler
do all the necessary arithmetic rather than do it manually ourselves.
This provides a cleaner implementation, works for CHERI and may even be
slightly more performant as it avoids the need to mess with masking
pointers (which cannot in the general case be optimised by compilers to
be reused across accesses to different fields within the same context,
since it does not know that the contexts are over-aligned compared with
the C ABI requirements).

Reviewed by:	hselasky
Differential Revision:	https://reviews.freebsd.org/D32554
2021-10-27 18:38:37 +01:00
Warner Losh
aa15f7df64 arm: Remove obsolete comments
FreeBSD has never supported arm26, so remove comments about what
trapframes look like for that platform.

Noticed by:		kevans
Sponsored by:		Netflix
2021-10-27 09:44:58 -06:00
Gleb Smirnoff
5d3bf5b1d2 rack: Update the fast send block on setsockopt(2)
Rack caches TCP/IP header for fast send, so it doesn't call
tcpip_fillheaders().  After certain socket option changes,
namely IPV6_TCLASS, IP_TOS and IP_TTL it needs to update
its fast block to be in sync with the inpcb.

Reviewed by:		rrs
Differential Revision:	https://reviews.freebsd.org/D32655
2021-10-27 08:22:00 -07:00
Gleb Smirnoff
f581a26e46 Factor out tcp6_use_min_mtu() to handle IPV6_USE_MIN_MTU by TCP.
Pass control for IP/IP6 level options from generic tcp_ctloutput_set()
down to per-stack ctloutput.

Call tcp6_use_min_mtu() from tcp stack tcp_default_ctloutput().

Reviewed by:		rrs
Differential Revision:	https://reviews.freebsd.org/D32655
2021-10-27 08:22:00 -07:00
Gleb Smirnoff
de156263a5 Several IP level socket options may affect TCP.
After handling them in IP level ctloutput, pass them down to TCP
ctloutput.

We already have a hack to handle IPV6_USE_MIN_MTU. Leave it in place
for now, but comment out how it should be handled.

For IPv4 we are interested in IP_TOS and IP_TTL.

Reviewed by:		rrs
Differential Revision:	https://reviews.freebsd.org/D32655
2021-10-27 08:21:59 -07:00
Gleb Smirnoff
fc4d53cc2e Split tcp_ctloutput() into set/get parts.
Reviewed by:		rrs
Differential Revision:	https://reviews.freebsd.org/D32655
2021-10-27 08:21:59 -07:00
Peter Lei
e28330832b tcp: socket option to get stack alias name
TCP stack sysctl nodes are currently inserted using the stack
name alias. Allow the user to get the current stack's alias to
allow for programatic sysctl access.

Obtained from:	Netflix
2021-10-27 08:21:59 -07:00
Mark Johnston
71f31d784e rmslock: Update td_locks during lock and unlock operations
Reviewed by:	mjg
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D32692
2021-10-27 11:18:13 -04:00
Gordon Bergling
70de1003da jail(8): Fix a few common typos in source code comments
- s/phyiscal/physical/

MFC after:	3 days
2021-10-27 06:16:06 +02:00
Gordon Bergling
80abcfbdfe bxe(4): Fix a few common typos in source code comments
- s/controled/controlled/
- s/allignment/alignment/

MFC after:	3 days
2021-10-27 06:15:06 +02:00
Adrian Chadd
d524e370c4 iwm: Update SCD register accesses
This brings it inline with what's in openbsd.  I tested it locally
with 2G and 5G association; it seems to work.

Tested: Intel 7260 AC, hw 0x140, STA mode, 2G/5G

Differential Revision: https://reviews.freebsd.org/D32627
Subscribers: imp
Obtainde from: OpenBSD
2021-10-26 20:28:55 -07:00
Adrian Chadd
355c15130a iwm: update if_iwmreg.h to the latest (as of today) openbsd changes
Summary:
This updates the if_iwmreg.h definitions to;

OpenBSD: if_iwmreg.h,v 1.65 2021/10/11 09:03:22 stsp Exp

A few things haven't been fully converted, namely:

* I left a couple things as enums for now just to reduce the
  other diffs needed; but they're the same values

* The IWM_SCD_QUEUE_* macros have different offsets which I
  didn't update in case they broke things / changed based on later
  firmware.  But they also may be real bugfixes which are needed
  for later chips.  It'll need more testing before flipping this on.

The c file updates are:

* Use the newer names for things if the name changed but the semantics
  didn't
* Explicitly use the earlier firmware structs which maintain compat
  with the current firmware and code.  The newer ones are in here and
  they'll get converted when more openbsd code is merged into this tree.
* Use the older iwm rate table for now, which has entries for legacy
  rates, HT and VHT.  Our code works with that right now, updating it
  to openbsd's err, "different" version can be done at a later date
  when HT/VHT support is added.

Notably, a bunch of definitions were deleted that weren't used.
They're not used either in the openbsd/dfbsd drivers so I think it's
safe to delete them in the long run.

Test Plan: 7260 hw 0x140

Subscribers: imp
Differential Revision: https://reviews.freebsd.org/D32627
Reviewed by: md5
Obtained From: OpenBSD
2021-10-26 20:28:54 -07:00
John Baldwin
cdbc4a074b Further refine the ExpDataSN checks for SCSI Response PDUs.
According to 11.4.8 in RFC 7143, ExpDataSN MUST be 0 if the response
code is not Command Completed, but we were requiring it to always be
the count of DataIn PDUs regardless of the response code.

In addition, at least one target (OCI Oracle iSCSI block device)
returns an ExpDataSN of 0 when returning a valid completion with an
error status (Check Condition) in response to a SCSI Inquiry.  As a
workaround for this target, only warn without resetting the connection
for a 0 ExpDataSN for responses with a non-zero error status.

PR:		259152
Reported by:	dch
Reviewed by:	dch, mav, emaste
Fixes:		4f0f5bf995 iscsi: Validate DataSN values in Data-In PDUs in the initiator.
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D32650
2021-10-26 14:50:05 -07:00
Ed Maste
48cb3fee25 Retire obsolete iscsi_initiator(4)
The new iSCSI initiator iscsi(4) was introduced with FreeBSD 10.0, and
the old intiator was marked obsolete shortly thereafter (in commit
d32789d95c, MFC'd to stable/10 in ba54910169).  Remove it now.

Reviewed by:	jhb, mav
Relnotes:	yes
Sponsored by:	The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D32673
2021-10-26 16:17:35 -04:00
Randall Stewart
12752978d3 tcp: The rack stack can incorrectly have an overflow when calculating a burst delay.
If the congestion window is very large the fact that we multiply it by 1000 (for microseconds) can
cause the uint32_t to overflow and we incorrectly calculate a very small divisor. This will then
cause the burst timer to be very large when it should be 0. Instead lets make the three variables
uint64_t and avoid the issue.

Reviewed by: Michael Tuexen
Sponsored by: Netflix Inc.
Differential Revision: https://reviews.freebsd.org/D32668
2021-10-26 13:17:58 -04:00
Mark Johnston
426682b05a bpf: Fix the write filter for detached descriptors
A BPF descriptor only has an associated interface descriptor once it is
attached to an interface, e.g., with BIOCSETIF.  Avoid dereferencing a
NULL pointer in filt_bpfwrite() if the BPF descriptor is not attached.

Reviewed by:	ae
Reported by:	syzbot+ae45d5166afe15a5a21d@syzkaller.appspotmail.com
Fixes:	ded77e0237 ("Allow the BPF to be select for write.")
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D32561
2021-10-26 10:00:39 -04:00
Wei Hu
1833cf1373 Mana: move mana polling from EQ to CQ
-Each CQ start task queue to poll when completion happens.
    This means every rx and tx queue has its own cleanup task
    thread to poll the completion.
    - Arm EQ everytime no matter it is mana or hwc. CQ arming
    depends on the budget.
    - Fix a warning in mana_poll_tx_cq() when cqe_read is 0.
    - Move cqe_poll from EQ to CQ struct.
    - Support EQ sharing up to 8 vPorts.
    - Ease linkdown message from mana_info to mana_dbg.

Tested by:	whu
MFC after:	2 weeks
Sponsored by:	Microsoft
2021-10-26 12:25:22 +00:00
Rick Macklem
23024f004a nfscl: Add a missing delegation lock release
There was a case in nfscl_doiods() where the function would return
without releasing the delegation shared lock, if it was aquired by
the call to nfscl_getstateid().  This patch adds that release.

I have never observed a failure due to this missing release, so I
do not know if it ever happens in practice.  However, since the pNFS
client is not yet heavily used, it might be the case.

Found by code inspection during a recent NFSv4 IETF working group
testing event.

MFC after:	2 week
2021-10-25 19:11:45 -07:00
Michael Tuexen
b15b053596 tcp: allow new reno functions to be called from other CC modules
Some new reno functions use the internal data, but are also called
from functions of other CC modules. Ensure that in this case, the
internal data is not accessed.

Reported by:		syzbot+1d219ea351caa5109d4b@syzkaller.appspotmail.com
Reported by:    	syzbot+b08144f8cad9c67258c5@syzkaller.appspotmail.com
Reviewed by:		rrs
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D32649
2021-10-25 22:53:49 +02:00
Bjoern A. Zeeb
c5eec7b57c LinuxKPI: module.h add MODULE_SUPPORTED_DEVICE()
Add a dummy MODULE_SUPPORTED_DEVICE define as we do for other
MODULE_* macros.  This is needed by a wireless driver.

MFC after:	3 days
Differential Revision: https://reviews.freebsd.org/D32641
2021-10-25 20:26:01 +00:00
Bjoern A. Zeeb
548ada00e5 LinuxKPI: add bcd.h
Add bcd2bin() as linuxkpi_bcd2bin().  Libkern does provide a bcd2bin()
which cannot be used leaving us with a conflict (see comment in file).
Fortunately this is only seen in one driver so far and it seems easier
to drop this in and change a single line in the driver than to add this
inline in the driver.

MFC after:	3 days
Reviewed by:	hselasky
Differential Revision: https://reviews.freebsd.org/D32647
2021-10-25 20:20:53 +00:00
Bjoern A. Zeeb
cf89934842 LinuxKPI: pci.h / linux_pci.c rename pci_driver field
Rename the struct pci_driver {} field got the list_head from links
to node as a driver is actually initialsing this to {} which seems
questionable but it will at least make us match the Linux structure
field name.

MFC after:	3 days
Reviewed by:	manu, hselasky
Differential Revision: https://reviews.freebsd.org/D32645
2021-10-25 20:19:24 +00:00
Bjoern A. Zeeb
ed5600f532 LinuxKPI: pci.h make pci_dev argument const for pci_{read,write}_config*()
Make the struct pci_dev argument to the pci_{read,write}_config*()
functions "const" to match the Linux definition as some drivers
try to pass in a const argument which we currently fail to honor.

Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
Reviewed by:	hselasky
Differential Revision: https://reviews.freebsd.org/D32644
2021-10-25 20:17:56 +00:00
Bjoern A. Zeeb
490f9d8f0e LinuxKPI: add netdev_features.h
Add netdev_features.h as a spearate file from the future netdevice.h
implementation to avoid include problems with a future skbuff.h.

Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
Reviewed by:	hselasky
Differential Revision: https://reviews.freebsd.org/D32643
2021-10-25 20:16:23 +00:00
Bjoern A. Zeeb
41dee251ee LinuxKPI: add simple_open() to fs.h
Add a dummy simple_open() to fs.h as we have for other
(unsupported) functions.
This is needed by a wireless driver.

MFC after:	3 days
Reviewed by:	hselasky
Differential Revision: https://reviews.freebsd.org/D32642
2021-10-25 20:14:42 +00:00
Bjoern A. Zeeb
9d593d5a76 mlx4: rename conflicting netdev_priv() to mlx4_netdev_priv()
netdev_priv() is a LinuxKPI function which was used with the old ifnet
linux/netdevice.h implementation which was not adaptable to modern
Linux drviers unless rewriting them for ifnet in first place which
defeats the purpose.
Rename the netdev_priv() calls in mlx4 to mlx4_netdev_priv()
returning the ifnet softc to avoid conflicting symbol names
with different implementations in the future.

MFC after:	3 days
Reviewed by:	hselasky, kib
Differential Revision: https://reviews.freebsd.org/D32640
2021-10-25 20:12:32 +00:00
Mateusz Guzik
ea14af2d3c Inline critical enter/exit for "tied" kernel modules
Sponsored by:	Rubicon Communications, LLC ("Netgate")
2021-10-25 20:07:06 +00:00
Mateusz Guzik
e2493f4912 arm: fix a typo in nvidia/drm2/tegra_bo.c
Unbreaks building TEGRA124

Sponsored by:	Rubicon Communications, LLC ("Netgate")
2021-10-25 18:42:10 +00:00
Gleb Smirnoff
f2d266f3b0 Don't run ip_ctloutput() for divert socket.
It was here since divert(4) was introduced, probably just came with a
protocol definition boilerplate.  There is no useful socket option
that can be set or get for a divert socket.

Reviewed by:		donner
Differential Revision:	https://reviews.freebsd.org/D32608
2021-10-25 11:16:59 -07:00
Gleb Smirnoff
d89c820b0d Remove div_ctlinput().
This function does nothing since 97d8d152c2. It was introduced
in 252f24a2cf with a sidenote "may not be needed".

Reviewed by:		donner
Differential Revision:	https://reviews.freebsd.org/D32608
2021-10-25 11:16:49 -07:00
Konstantin Belousov
350fc36b4c sysctl vm.objects: yield if hog
Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D31163
2021-10-25 20:34:02 +03:00
Konstantin Belousov
7738118e9a vm.objects_swap: disable reporting some information
For making the call faster, do not count active/inactive object queues,
and do not report vnode info if any (for tmpfs).

Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D31163
2021-10-25 20:34:01 +03:00
Konstantin Belousov
42812ccc96 Add vm.swap_objects sysctl
Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D31163
2021-10-25 20:34:01 +03:00
Konstantin Belousov
1b610624fd vm_object_list: split sysctl handler in separate function
Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D31163
2021-10-25 20:34:01 +03:00
Mark Johnston
9ef7df022a hyperv: Register hyperv_timecounter later during boot
Previously the MSR-based timecounter was registered during
SI_SUB_HYPERVISOR, i.e., very early during boot, and before SI_SUB_LOCK.
After commit 621fd9dcb2 this triggers a panic since the timecounter
list lock is not yet initialized.

The hyperv timecounter does not need to be registered so early, so defer
that to SI_SUB_DRIVERS, at the same time the hyperv TSC timecounter is
registered.

Reported by:	whu
Approved by:	whu
Fixes:		621fd9dcb2 ("timecounter: Lock the timecounter list")
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2021-10-25 13:25:01 -04:00
Bjoern A. Zeeb
a5e2a27dca LinuxKPI: add strreplace() to string.h
Add strreplace() needed by a driver.
MFC after:	3 days

Reviewed by:	hselasky
Differential Revision: https://reviews.freebsd.org/D32597
2021-10-25 16:12:10 +00:00
Bjoern A. Zeeb
b382b78503 LinuxKPI: add kstrtou8() and kstrtou8_from_user() to kernel.h
Analogous to the other sized version of kstrto[u]<type>() and
kstrtobool_from_user() add the "u8" versions needed by a driver.

MFC after:	3 days
Reviewed by:	hselasky
Differential Revision: https://reviews.freebsd.org/D32598
2021-10-25 16:10:48 +00:00
Hans Petter Selasky
aad0c65d6b usb(4): Fix for use after free in combination with EVDEV_SUPPORT.
When EVDEV_SUPPORT was introduced, the USB transfers may be running
after the main FIFO is closed. In connection to this a race may appear
which can lead to use-after-free scenarios. Fix this for all FIFO
consumers by initializing and resetting the FIFO queues under the
lock used by the client. Then the client driver will see an empty
queue in all cases a race may appear.

Found by:	pho@
MFC after:	1 week
Sponsored by:	NVIDIA Networking
2021-10-24 19:37:17 +02:00
Jason A. Harmening
fd8ad2128d unionfs: implement vnode-based cache lookup
unionfs uses a per-directory hashtable to cache subdirectory nodes.
Currently this hashtable is looked up using the directory name, but
since unionfs nodes aren't removed from the cache until they're
reclaimed, this poses some problems.  For example, if a directory is
created on a unionfs mount shortly after deleting a previous directory
with the same path, the cache may end up reusing the node for the
previous directory, including its upper/lower FS vnodes.  Operations
against those vnodes with then likely fail because the vnodes
represent deleted files; for example UFS will reject VOP_MKDIR()
against such a vnode because its effective link count is 0.  This may
then manifest as e.g. mkdir(2) or open(2) returning ENOENT for an
attempt to create a file under the re-created directory.

While it would be possible to fix this by explicitly managing the
name-based cache during delete or rename operations, or by rejecting
cache hits if the underlying FS vnodes don't match those passed to
unionfs_nodeget(), it seems cleaner to instead hash the unionfs nodes
based on their underlying FS vnodes.  Since unionfs prefers to operate
against the upper vnode if one is present, the lower vnode will only
be used for hashing as long as the upper vnode is NULL.  This should
also make hashing faster by eliminating string traversal and using
the already-computed hash index stored in each vnode.

While here, fix a couple of other cache-related issues:

--Remove 8 bytes of unnecessary baggage from each unionfs node by
  getting rid of the stored hash mask field.  The mask is knowable
  at compile time.

--When a matching node is found in the cache, reference its vnode
  using vrefl() while still holding the vnode interlock.  Previously
  unionfs_nodeget() would vref() the vnode after the interlock was
  dropped, but the vnode may be reclaimed during that window.  This
  caused intermittent panics from vn_lock(9) during unionfs stress
  testing.

Reviewed by:	kib, markj
Tested by:	pho
Differential Revision:	https://reviews.freebsd.org/D32533
2021-10-24 10:05:50 -07:00
Kirk McKusick
dfd704b7fb Allow biodone() to be used as a completion routine.
An ordered series of BIO_READ and BIO_WRITE operations are
typically done as:

	while (work to do) {
		setup bp for I/O
		g_io_request(bp, consumer);
		biowait(bp);
	}

Here you need to have biodone() called at the completion of
the I/O to set the BIO_DONE flag and awaken the biowait(). The
obvious way to do this would be to set bio_done = biodone, but
biodone() will only take the desired action if bio_done == NULL.
The relevant code at the end of biodone() is:

	done = bp->bio_done;
	if (done == NULL) {
		mtxp = mtx_pool_find(mtxpool_sleep, bp);
		mtx_lock(mtxp);
		bp->bio_flags |= BIO_DONE;
		wakeup(bp);
		mtx_unlock(mtxp);
	} else
		done(bp);

This code would infinitely recurse if biodone() is specified as the
routine to use at completion. So before this change, a wrapper done
function had to be written:

static void
g_io_done(struct bio *bp)
{

	bp->bio_done = NULL;
	biodone(bp);
	bp->bio_done = g_io_done;
}

This commit changes

	if (done == NULL)

to

	if (done == NULL || done == biodone)

which eliminates the need for the wrapper function.

Reviewed by:  kib
Sponsored by: Netflix
2021-10-23 14:11:57 -07:00
Robert Wing
311b95bbcd sys/mount.h: remove dead prototype
vfs_getrootfsid() was removed in 245efbba4d

Reviewed by:	mjg
Differential Revision:	https://reviews.freebsd.org/D32606
2021-10-23 16:13:20 -08:00
Edward Tomasz Napierala
2ec26ae402 linux: Improve debug for PTRACE_GETEVENTMSG
No functional changes.

Sponsored By:	EPSRC
2021-10-23 19:53:12 +01:00
Edward Tomasz Napierala
6e66030c4c linux: implement PTRACE_EVENT_EXEC
This fixes strace(1) from Ubuntu Focal.

Reviewed By:	jhb
Sponsored By:	EPSRC
Differential Revision:	https://reviews.freebsd.org/D32367
2021-10-23 19:46:26 +01:00
Edward Tomasz Napierala
2558bb8e91 linux: Make PTRACE_GET_SYSCALL_INFO handle EJUSTRETURN
This fixes panic when trying to run strace(8) from Focal.

Reviewed By:	kib
Sponsored By:	EPSRC
Differential Revision:	https://reviews.freebsd.org/D32355
2021-10-23 18:56:39 +01:00
Edward Tomasz Napierala
e3a83df119 linux: Improve debug for PTRACE_GETREGSET
No functional changes.

Sponsored By:	EPSRC
2021-10-23 09:30:06 +01:00
Edward Tomasz Napierala
2c7f798282 linux: Fix ENOTSOCK handling in sendfile(2)
The Linux way for sendfile(2) to tell the application
to fallback to another way of copying data is by EINVAL,
not ENOTSOCK.  This fixes package installation scripts
for Mono packages from Focal.

Sponsored By:	EPSRC
Differential Revision:	https://reviews.freebsd.org/D32604
2021-10-23 09:15:58 +01:00
Edward Tomasz Napierala
3417c29851 linux: Constify bsd_to_linux_regset()
No functional changes.

Reviewed By:	emaste
Sponsored By:	EPSRC
Differential Revision:	https://reviews.freebsd.org/D32599
2021-10-23 08:33:58 +01:00
Konstantin Belousov
362c6d8dec nehemiah: manually assemble xstore(-rng)
It seems that clang IAS erronously adds repz prefix which should not be
there.  Cpu would try to store around %ecx bytes of random, while we
only expect a word.

PR:	259218
Reported and tested by:	 Dennis Clarke <dclarke@blastwave.org>
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2021-10-23 02:31:16 +03:00
Gleb Smirnoff
c8ee75f231 Use network epoch to protect local IPv4 addresses hash.
The modification to the hash are already naturally locked by
in_control_sx.  Convert the hash lists to CK lists. Remove the
in_ifaddr_rmlock. Assert the network epoch where necessary.

Most cases when the hash lookup is done the epoch is already entered.
Cover a few cases, that need entering the epoch, which mostly is
initial configuration of tunnel interfaces and multicast addresses.

Reviewed by:		melifaro
Differential revision:	https://reviews.freebsd.org/D32584
2021-10-22 14:40:53 -07:00
Mark Johnston
70f51f0e47 Revert "Handle partial reads in zfs_read"
This reverts commit 59eab1093a.

The change suppressed EFAULT originating from uiomove().  The deadlock
avoidance mechanism implemented by vn_io_fault1() in the VFS handles
such errors by wiring the user pages and retrying, but this change
caused read() to return early instead.  This can result in short I/O,
causing misbehaviour in some applications, and possibly other
consequences.

Until this is resolved somehow, revert the commit.

Approved by:	mm
2021-10-22 15:16:42 -04:00
Gleb Smirnoff
6aae3517ed Retire synchronous PPP kernel driver sppp(4).
The last two drivers that required sppp are cp(4) and ce(4).

These devices are still produced and can be purchased
at Cronyx <http://cronyx.ru/hardware/wan.html>.

Since Roman Kurakin <rik@FreeBSD.org> has quit them, they no
longer support FreeBSD officially.  Later they have dropped
support for Linux drivers to.  As of mid-2020 they don't even
have a developer to maintain their Windows driver.  However,
their support verbally told me that they could provide aid to
a FreeBSD developer with documentaion in case if there appears
a new customer for their devices.

These drivers have a feature to not use sppp(4) and create an
interface, but instead expose the device as netgraph(4) node.
Then, you can attach ng_ppp(4) with help of ports/net/mpd5 on
top of the node and get your synchronous PPP.  Alternatively
you can attach ng_frame_relay(4) or ng_cisco(4) for HDLC.
Actually, last time I used cp(4) back in 2004, using netgraph(4)
instead of sppp(4) was already the right way to do.

Thus, remove the sppp(4) related part of the drivers and enable
by default the negraph(4) part.  Further maintenance of these
drivers in the tree shouldn't be a big deal.

While doing that, remove some cruft and enable cp(4) compilation
on amd64.  The ce(4) for some unknown reason marks its internal
DDK functions with __attribute__ fastcall, which most likely is
safe to remove, but without hardware I'm not going to do that, so
ce(4) remains i386-only.

Reviewed by:		emaste, imp, donner
Differential Revision:	https://reviews.freebsd.org/D32590
See also:		https://reviews.freebsd.org/D23928
2021-10-22 11:41:36 -07:00
Mark Johnston
d7acbe481d vm_page: Break reservations to handle noobj allocations
vm_reserv_reclaim_*() will release pages to the default freepool, not
the direct freepool from which noobj allocations are drawn.  But if both
pools are empty, the noobj allocator variants must break reservations to
make progress.

Reported by:	cy
Reviewed by:	kib (previous version)
Fixes:	b498f71bc5 ("vm_page: Add a new page allocator interface for unnamed pages")
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D32592
2021-10-22 09:25:59 -04:00
Randall Stewart
4e4c84f8d1 tcp: Add hystart-plus to cc_newreno and rack.
TCP Hystart draft version -03:
https://datatracker.ietf.org/doc/html/draft-ietf-tcpm-hystartplusplus

Is a new version of hystart that allows one to carefully exit slow start if the RTT
spikes too much. The newer version has a slower-slow-start so to speak that then
kicks in for five round trips. To see if you exited too early, if not into congestion avoidance.
This commit will add that feature to our newreno CC and add the needed bits in rack to
be able to enable it.

Reviewed by: tuexen
Sponsored by: Netflix Inc.
Differential Revision:	https://reviews.freebsd.org/D32373
2021-10-22 07:10:28 -04:00
Peter Grehan
5a3eb6207a igc: correctly update RCTL when changing multicast filters.
Fix clearing of bits in RCTL for the non-bpf/non-allmulti case.
Update RCTL after modifying the multicast filter registers as per
the Linux driver.

This fixes LACP on igc interfaces, where incoming LACP multicasti
control packets were being dropped.

Reviewed by:	kbowling
Obtained from:	Rubicon Communications, LLC ("Netgate")
MFC after:	3 days
Differential Revision: https://reviews.freebsd.org/D32574
2021-10-22 21:16:12 +10:00
Bjoern A. Zeeb
3dc7a1897e net80211: correct input_sta length checks and control frame handling
Correct input_sta "assertion" checks.  CTS/ACK CTRL frames are shorter
then sizeof(struct ieee80211_frame_min) and were thus running into the
is_rx_tooshort error case.
Use ieee80211_anyhdrsize() to handle this better but make sure we do
at least have the first 2 octets needed for that.
While here move the safety checks before any code which may not obey
them later, just for good style.

The non-scanning check further down assumes a frame format also not
matching control frames.  For now skip the checks for control frames
which allows us to deal with some of them at least now.

Sponsored by:	The FreeBSD Foundation
Obtained from:	20210906 wireless v0.91 code drop
MFC after:	3 days
Reviewed by:	adrian
Differential Revision: https://reviews.freebsd.org/D32238
2021-10-22 10:42:06 +00:00
Bjoern A. Zeeb
9a6695532b net80211/drivers: improve ieee80211_rx_stats for band
While IEEE80211_R_BAND was defined, there was no place to store the
band.  Add a field for that, adjust ieee80211_lookup_channel_rxstatus()
to require it, and update drivers passing "R_{FREQ|IEEE}" in already to
provide the band as well.  For the moment keep the fall-back code
requiring all three fields.

Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
Reviewed by:	adrian
Differential Revision: https://reviews.freebsd.org/D30662
2021-10-22 09:55:54 +00:00
Luiz Otavio O Souza
ab238f1454 pf: ensure we have the correct source/destination IP address in ICMP errors
When we route-to a packet that later turns out to not fit in the
outbound interface MTU we generate an ICMP error.
However, if we've already changed those (i.e. we've passed through a NAT
rule) we have to undo the transformation first.

Obtained from:	pfSense
MFC after:	3 weeks
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D32571
2021-10-22 09:52:17 +02:00
Konstantin Belousov
3b5331dd8d uipc_shm: silent warnings about write-only variables in largepage code
In shm_largepage_phys_populate(), the result from vm_page_grab() is only
needed for assertion.

In shm_dotruncate_largepage(), there is a commented-out prototype code
for managed largepages.   The oldobjsz is saved for its sake, so mark
the variable as __unused directly.

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2021-10-21 21:40:46 +03:00
Konstantin Belousov
3d2778515a sig_ast_checksusp(): mark the local p as __diagused
It is only used to assert that the (current) process is locked

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2021-10-21 21:40:46 +03:00
Konstantin Belousov
6776747a0e subr_firmware.c::unloadentry(): remote write-only variable
The function ignores result returned by linker_release_module().
The FW_UNLOAD flag on the file is cleared, so even on error it would
not be tried again.

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2021-10-21 21:40:46 +03:00
Konstantin Belousov
993446638c alq_open_flags(): mark local td variable as unused
It is passed to the NDINIT() macro which ignores the thread argument
for some time.

Sponsored by:	The FreeBSD Foundation
2021-10-21 21:40:46 +03:00
Konstantin Belousov
661bd70bd7 DMAR: clean up warnings about write-only variables
For some of them, used only when KTR or KMSAN are configured, apply
__unused attribute directly.

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2021-10-21 21:40:46 +03:00
Konstantin Belousov
bded8fa300 umtxq_requeue: remove write-only variable uh2
umtxq_queue_lookup() does not change state.  It is redone inside
umtxq_insert() later, anyway.

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2021-10-21 21:40:46 +03:00
Konstantin Belousov
2030ee0e1b ufs: remove write-only variables
Mark variables as __diagused for invariant-only vars

Reviewed by:	imp, mjg
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D32577
2021-10-21 21:40:46 +03:00
John Baldwin
96668a81ae ktls: Always create a software backend for receive sessions.
A future change to TOE TLS will require a software fallback for the
first few TLS records received.  Future support for NIC TLS on receive
will also require a software fallback for certain cases.

Reviewed by:	gallatin, hselasky
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D32566
2021-10-21 09:37:17 -07:00
John Baldwin
b33ff94123 ktls: Change struct ktls_session.cipher to an OCF-specific type.
As a followup to SW KTLS assuming an OCF backend, rename
struct ocf_session to struct ktls_ocf_session and forward
declare it in <sys/ktls.h> to use as the type of
struct ktls_session.cipher.

Reviewed by:	gallatin, hselasky
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D32565
2021-10-21 09:36:53 -07:00
John Baldwin
c57dbec69a ktls: Add a routine to query information in a receive socket buffer.
In particular, ktls_pending_rx_info() determines which TLS record is
at the end of the current receive socket buffer (including
not-yet-decrypted data) along with how much data in that TLS record is
not yet present in the socket buffer.

This is useful for future changes to support NIC TLS receive offload
and enhancements to TOE TLS receive offload.  Those use cases need a
way to synchronize a state machine on the NIC with the TLS record
boundaries in the TCP stream.

Reviewed by:	gallatin, hselasky
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D32564
2021-10-21 09:36:29 -07:00
Dawid Gorecki
8cb175ba0c Enable stack gap on arm64
Stack gap code used on amd64 can also be reused for arm64. Point
sv_stackgap to elf64_stackgap to enable this feature.

Reviewed by: mw, kib, emaste
Tested by: mw
MFC: after 1 month
Differential Revision: https://reviews.freebsd.org/D32588
2021-10-21 17:20:08 +02:00
Martin Matuska
6ba2210ee0 zfs: merge openzfs/zfs@ec64fdb93 (master) into main
Notable upstream pull request merges:
  #12392 Avoid panic in case of pool errors and missing L2ARC
  #12448 skip snapshot in zfs_iter_mounted()
  #12516 Fix NFS and large reads on older kernels
  #12533 Fail invalid incremental recursive send gracefully
  #12569 FreeBSD: Really zero the zero page
  #12575 Reject zfs send -RI with nonexistent fromsnap
  #12602 Correct refcount_add in dmu_zfetch
  #12650 zpool should call zfs_nicestrtonum() with non-NULL handle

Obtained from:	OpenZFS
OpenZFS commit:	ec64fdb93d
2021-10-21 15:06:06 +02:00
Elliott Mitchell
5bb67f5f3f xen/devices: purge uses of intr_machdep.h
Devices in sys/dev should be architecture-independent and NOT #include
intr_machdep.h.

Reviewed by: mhorne royger
Differential Revision: https://reviews.freebsd.org/D29959
2021-10-21 09:39:16 +02:00