Commit Graph

139797 Commits

Author SHA1 Message Date
Konstantin Belousov
15bf81f354 struct image_params: use bool type for boolean members
Also re-align comments, and group booleans and char members together.

Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D32611
2021-10-28 20:49:21 +03:00
Konstantin Belousov
9d58243fbc do_execve(): switch boolean locals to use bool type
Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D32611
2021-10-28 20:49:16 +03:00
Konstantin Belousov
143dba3a91 kern_exec.c: style
Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D32611
2021-10-28 20:49:10 +03:00
Kristof Provost
e5c4987e3f pf: fix dummynet + NAT
Dummynet differs from ALTQ in that ALTQ schedules packets after they
leave pf. Dummynet schedules them after they leave pf, but then
re-injects them.
We currently deal with this by ensuring we don't re-schedule a packet we
get from dummynet, but this produces unexpected results when combined
with NAT, as dummynet processing is done after the NAT transformation.
In other words, the second time the packet is handed to pf it may have a
different source and destination address.

Simplify this by moving dummynet processing to after all other pf
processing, and not re-processing (but always passing) packets from
dummynet.

This fixes NAT of dummynet delayed packets, and also reduces processing
overhead (because we only do state/rule lookup for each dummynet packet
once, rather than twice).

MFC after:	3 weeks
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D32665
2021-10-28 10:41:17 +02:00
Kristof Provost
7fe0c3f8d3 mbuf: PACKET_TAG_PF should not be persistent
We should clear firewall tags on loopback, icmp reflection, or if_epair
transmission. Left over tags can produce unexpected behaviour,
especially on if_epair where a and b interfaces can be in different
vnets, and have different firewall policies set.

MFC after:	3 weeks
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D32664
2021-10-28 10:41:17 +02:00
Kristof Provost
62d2dcafb7 if_epair: delete mbuf tags
Remove all (non-persistent) tags when we transmit a packet. Real network
interfaces do not carry any tags either, and leaving tags attached can
produce unexpected results.

Reviewed by:	bz, glebius
MFC after:	3 weeks
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D32663
2021-10-28 10:41:16 +02:00
Wojciech Macek
8a727c3df8 mroute: add missing WUNLOCK
Add missing WNLOCK as in all other error cases.

Reported by:		Stormshield
Obtained from:		Semihalf
2021-10-28 07:12:23 +02:00
Wojciech Macek
fb3854845f mroute: fix memory leak
Add MFC to linked list to store incoming packets
before MCAST JOIN was captured.

Sponsored by:		Stormshield
Obtained from:		Semihalf
MFC after:		2 weeks
2021-10-28 07:12:16 +02:00
Gleb Smirnoff
840680e601 Wrap mutex(9), rwlock(9) and sx(9) macros into __extension__ ({})
instead of do {} while (0).

This makes them real void expressions, and they can be used anywhere
where a void function call can be used, for example in a conditional
operator.

Reviewed by:		kib, mjg
Differential revision:	https://reviews.freebsd.org/D32696
2021-10-27 18:58:36 -07:00
Jessica Clarke
63d24336fd Fix off-by-one error in msdosfs FAT32 volume label copying
I dropped the + 1 from the other two instances in each file but failed
to do so for this one, resulting in a more egregious buffer overread
than the one I was fixing (since the read character ended up in the
output if there was space).

Reported by:	Jenkins
Fixes:	34fb1c133c ("Fix intra-object buffer overread for labeled msdosfs volumes")
2021-10-28 01:01:00 +01:00
John Baldwin
4827bf76bc ktls: Fix assertion for TLS 1.0 CBC when using non-zero starting seqno.
The starting sequence number used to verify that TLS 1.0 CBC records
are encrypted in-order in the OCF layer was always set to 0 and not to
the initial sequence number from the struct tls_enable.

In practice, OpenSSL always starts TLS transmit offload with a
sequence number of zero, so this only matters for tests that use a
random starting sequence number.

Reviewed by:	markj
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D32676
2021-10-27 16:35:56 -07:00
Mateusz Guzik
628c3b307f cache: only let non-dir descriptors through when doing EMPTYPATH lookups
Otherwise things like realpath against a file and '.' end up with an
illegal state of having a regular vnode for the parent.

Reported by:	syzbot+9aa5439dd9c708aeb1a8@syzkaller.appspotmail.com
2021-10-27 18:27:47 +00:00
Jessica Clarke
34fb1c133c Fix intra-object buffer overread for labeled msdosfs volumes
Volume labels, like directory entries, are padded with spaces and so
have no NUL terminator. Whilst the MIN for the dsize argument to strlcpy
ensures that the copy does not overflow the destination, strlcpy is
defined to return the number of characters in the source string,
regardless of the provided dsize, and so keeps reading until it finds a
NUL, which likely exists somewhere within the following fields, but On
CHERI with the subobject bounds enabled in the compiler this buffer
overread will be detected and trap with a bounds violation.

Found by:	CHERI
Reviewed by:	imp
Differential Revision:	https://reviews.freebsd.org/D32579
2021-10-27 18:38:37 +01:00
Jessica Clarke
f350bc1dd3 ada: Fix intra-object buffer overread of identify strings
In the ATA/ATAPI spec these are space-padded fixed-length strings with
no NUL-terminator (and byte swapped). When performing the identify we
call ata_param_fixup to swap the bytes back to be in order, strip any
leading/trailing spaces and coalesce consecutive spaces, padding with
NULs. However, if the input has no padding spaces, the fixed-up strings
are still not NUL-terminated. This causes two issues. The first is that
strlcpy will truncate the string by replacing the final byte with a NUL.
The second is that strlcpy will keep reading src until it finds a NUL in
order to calculate the return value, which is defined as the length of
src (so that callers can then compare it with the dsize input to see if
the input string was truncated), thereby reading past the end of the
buffer and into whatever adjacent fields are in the structure. In
practice there's a NUL byte somewhere in the structure, but on CHERI
with subobject bounds enabled in the compiler this overread will be
detected and trap as a bounds violation.

Note this matches ata_xpt's aprobedone, which does a bcopy to a
malloc'ed buffer and manually NUL-terminates it for the CAM path's
device's serial_num.

Found by:	CHERI
Reviewed by:	imp, scottl
Differential Revision:	https://reviews.freebsd.org/D32567
2021-10-27 18:38:37 +01:00
Jessica Clarke
29863d1eff xhci: Rework 64-byte context support to avoid pointer abuse
Currently, to support 64-byte contexts, xhci_ctx_[gs]et_le(32|64) take a
pointer to the field within a 32-byte context and, if 64-byte contexts
are in use, compute where the 64-byte context field is and use that
instead by deriving a pointer from the 32-byte field pointer. This is
done by exploiting a combination of 64-byte contexts being the same
layout as their 32-byte counterparts, just with 32 bytes of padding at
the end, and that all individual contexts are either in a device
context or an input context which itself is page-aligned. By masking out
the low 4 bits (which is the offset of the field within the 32-byte
contxt) of the offset within the page, the offset of the invididual
context within the containing device/input context can be determined,
which is itself 32 times the number of preceding contexts. Thus, adding
this value to the pointer again gets 64 times the number of preceding
contexts plus the field offset, which gives the offset of the 64-byte
context plus the field offset, which is the address of the field in the
64-byte context.

However, this involves a fair amount of lying to the compiler when
constructing these intermediate pointers, and is rather difficult to
reason about. In particular, this is problematic for CHERI, where we
compile the kernel with subobject bounds enabled; that is, unless
annotated to opt out (e.g. for C struct inheritance reasons where you
need to be able to downcast, or containerof idioms), a pointer to a
member of a struct is a capability whose bounds only cover that field,
and any attempt to dereference outside those bounds will fault,
protecting against intra-object buffer overflows. Thus the pointer given
to xhci_ctx_[gs]et_le(32|64) is a capability whose bounds only cover the
field in the 32-byte context, and computing the pointer to the 64-byte
context field takes the address out of bounds, resulting in a fault when
later dereferenced.

This can be cleaned up by using a different abstraction. Instead of
doing the 32-byte to 64-byte conversion on access to the field, we can
do the conversion when getting a pointer to the context itself, and
define proper 64-byte versions of contexts in order to let the compiler
do all the necessary arithmetic rather than do it manually ourselves.
This provides a cleaner implementation, works for CHERI and may even be
slightly more performant as it avoids the need to mess with masking
pointers (which cannot in the general case be optimised by compilers to
be reused across accesses to different fields within the same context,
since it does not know that the contexts are over-aligned compared with
the C ABI requirements).

Reviewed by:	hselasky
Differential Revision:	https://reviews.freebsd.org/D32554
2021-10-27 18:38:37 +01:00
Warner Losh
aa15f7df64 arm: Remove obsolete comments
FreeBSD has never supported arm26, so remove comments about what
trapframes look like for that platform.

Noticed by:		kevans
Sponsored by:		Netflix
2021-10-27 09:44:58 -06:00
Gleb Smirnoff
5d3bf5b1d2 rack: Update the fast send block on setsockopt(2)
Rack caches TCP/IP header for fast send, so it doesn't call
tcpip_fillheaders().  After certain socket option changes,
namely IPV6_TCLASS, IP_TOS and IP_TTL it needs to update
its fast block to be in sync with the inpcb.

Reviewed by:		rrs
Differential Revision:	https://reviews.freebsd.org/D32655
2021-10-27 08:22:00 -07:00
Gleb Smirnoff
f581a26e46 Factor out tcp6_use_min_mtu() to handle IPV6_USE_MIN_MTU by TCP.
Pass control for IP/IP6 level options from generic tcp_ctloutput_set()
down to per-stack ctloutput.

Call tcp6_use_min_mtu() from tcp stack tcp_default_ctloutput().

Reviewed by:		rrs
Differential Revision:	https://reviews.freebsd.org/D32655
2021-10-27 08:22:00 -07:00
Gleb Smirnoff
de156263a5 Several IP level socket options may affect TCP.
After handling them in IP level ctloutput, pass them down to TCP
ctloutput.

We already have a hack to handle IPV6_USE_MIN_MTU. Leave it in place
for now, but comment out how it should be handled.

For IPv4 we are interested in IP_TOS and IP_TTL.

Reviewed by:		rrs
Differential Revision:	https://reviews.freebsd.org/D32655
2021-10-27 08:21:59 -07:00
Gleb Smirnoff
fc4d53cc2e Split tcp_ctloutput() into set/get parts.
Reviewed by:		rrs
Differential Revision:	https://reviews.freebsd.org/D32655
2021-10-27 08:21:59 -07:00
Peter Lei
e28330832b tcp: socket option to get stack alias name
TCP stack sysctl nodes are currently inserted using the stack
name alias. Allow the user to get the current stack's alias to
allow for programatic sysctl access.

Obtained from:	Netflix
2021-10-27 08:21:59 -07:00
Mark Johnston
71f31d784e rmslock: Update td_locks during lock and unlock operations
Reviewed by:	mjg
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D32692
2021-10-27 11:18:13 -04:00
Gordon Bergling
70de1003da jail(8): Fix a few common typos in source code comments
- s/phyiscal/physical/

MFC after:	3 days
2021-10-27 06:16:06 +02:00
Gordon Bergling
80abcfbdfe bxe(4): Fix a few common typos in source code comments
- s/controled/controlled/
- s/allignment/alignment/

MFC after:	3 days
2021-10-27 06:15:06 +02:00
Adrian Chadd
d524e370c4 iwm: Update SCD register accesses
This brings it inline with what's in openbsd.  I tested it locally
with 2G and 5G association; it seems to work.

Tested: Intel 7260 AC, hw 0x140, STA mode, 2G/5G

Differential Revision: https://reviews.freebsd.org/D32627
Subscribers: imp
Obtainde from: OpenBSD
2021-10-26 20:28:55 -07:00
Adrian Chadd
355c15130a iwm: update if_iwmreg.h to the latest (as of today) openbsd changes
Summary:
This updates the if_iwmreg.h definitions to;

OpenBSD: if_iwmreg.h,v 1.65 2021/10/11 09:03:22 stsp Exp

A few things haven't been fully converted, namely:

* I left a couple things as enums for now just to reduce the
  other diffs needed; but they're the same values

* The IWM_SCD_QUEUE_* macros have different offsets which I
  didn't update in case they broke things / changed based on later
  firmware.  But they also may be real bugfixes which are needed
  for later chips.  It'll need more testing before flipping this on.

The c file updates are:

* Use the newer names for things if the name changed but the semantics
  didn't
* Explicitly use the earlier firmware structs which maintain compat
  with the current firmware and code.  The newer ones are in here and
  they'll get converted when more openbsd code is merged into this tree.
* Use the older iwm rate table for now, which has entries for legacy
  rates, HT and VHT.  Our code works with that right now, updating it
  to openbsd's err, "different" version can be done at a later date
  when HT/VHT support is added.

Notably, a bunch of definitions were deleted that weren't used.
They're not used either in the openbsd/dfbsd drivers so I think it's
safe to delete them in the long run.

Test Plan: 7260 hw 0x140

Subscribers: imp
Differential Revision: https://reviews.freebsd.org/D32627
Reviewed by: md5
Obtained From: OpenBSD
2021-10-26 20:28:54 -07:00
John Baldwin
cdbc4a074b Further refine the ExpDataSN checks for SCSI Response PDUs.
According to 11.4.8 in RFC 7143, ExpDataSN MUST be 0 if the response
code is not Command Completed, but we were requiring it to always be
the count of DataIn PDUs regardless of the response code.

In addition, at least one target (OCI Oracle iSCSI block device)
returns an ExpDataSN of 0 when returning a valid completion with an
error status (Check Condition) in response to a SCSI Inquiry.  As a
workaround for this target, only warn without resetting the connection
for a 0 ExpDataSN for responses with a non-zero error status.

PR:		259152
Reported by:	dch
Reviewed by:	dch, mav, emaste
Fixes:		4f0f5bf995 iscsi: Validate DataSN values in Data-In PDUs in the initiator.
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D32650
2021-10-26 14:50:05 -07:00
Ed Maste
48cb3fee25 Retire obsolete iscsi_initiator(4)
The new iSCSI initiator iscsi(4) was introduced with FreeBSD 10.0, and
the old intiator was marked obsolete shortly thereafter (in commit
d32789d95c, MFC'd to stable/10 in ba54910169).  Remove it now.

Reviewed by:	jhb, mav
Relnotes:	yes
Sponsored by:	The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D32673
2021-10-26 16:17:35 -04:00
Randall Stewart
12752978d3 tcp: The rack stack can incorrectly have an overflow when calculating a burst delay.
If the congestion window is very large the fact that we multiply it by 1000 (for microseconds) can
cause the uint32_t to overflow and we incorrectly calculate a very small divisor. This will then
cause the burst timer to be very large when it should be 0. Instead lets make the three variables
uint64_t and avoid the issue.

Reviewed by: Michael Tuexen
Sponsored by: Netflix Inc.
Differential Revision: https://reviews.freebsd.org/D32668
2021-10-26 13:17:58 -04:00
Mark Johnston
426682b05a bpf: Fix the write filter for detached descriptors
A BPF descriptor only has an associated interface descriptor once it is
attached to an interface, e.g., with BIOCSETIF.  Avoid dereferencing a
NULL pointer in filt_bpfwrite() if the BPF descriptor is not attached.

Reviewed by:	ae
Reported by:	syzbot+ae45d5166afe15a5a21d@syzkaller.appspotmail.com
Fixes:	ded77e0237 ("Allow the BPF to be select for write.")
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D32561
2021-10-26 10:00:39 -04:00
Wei Hu
1833cf1373 Mana: move mana polling from EQ to CQ
-Each CQ start task queue to poll when completion happens.
    This means every rx and tx queue has its own cleanup task
    thread to poll the completion.
    - Arm EQ everytime no matter it is mana or hwc. CQ arming
    depends on the budget.
    - Fix a warning in mana_poll_tx_cq() when cqe_read is 0.
    - Move cqe_poll from EQ to CQ struct.
    - Support EQ sharing up to 8 vPorts.
    - Ease linkdown message from mana_info to mana_dbg.

Tested by:	whu
MFC after:	2 weeks
Sponsored by:	Microsoft
2021-10-26 12:25:22 +00:00
Rick Macklem
23024f004a nfscl: Add a missing delegation lock release
There was a case in nfscl_doiods() where the function would return
without releasing the delegation shared lock, if it was aquired by
the call to nfscl_getstateid().  This patch adds that release.

I have never observed a failure due to this missing release, so I
do not know if it ever happens in practice.  However, since the pNFS
client is not yet heavily used, it might be the case.

Found by code inspection during a recent NFSv4 IETF working group
testing event.

MFC after:	2 week
2021-10-25 19:11:45 -07:00
Michael Tuexen
b15b053596 tcp: allow new reno functions to be called from other CC modules
Some new reno functions use the internal data, but are also called
from functions of other CC modules. Ensure that in this case, the
internal data is not accessed.

Reported by:		syzbot+1d219ea351caa5109d4b@syzkaller.appspotmail.com
Reported by:    	syzbot+b08144f8cad9c67258c5@syzkaller.appspotmail.com
Reviewed by:		rrs
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D32649
2021-10-25 22:53:49 +02:00
Bjoern A. Zeeb
c5eec7b57c LinuxKPI: module.h add MODULE_SUPPORTED_DEVICE()
Add a dummy MODULE_SUPPORTED_DEVICE define as we do for other
MODULE_* macros.  This is needed by a wireless driver.

MFC after:	3 days
Differential Revision: https://reviews.freebsd.org/D32641
2021-10-25 20:26:01 +00:00
Bjoern A. Zeeb
548ada00e5 LinuxKPI: add bcd.h
Add bcd2bin() as linuxkpi_bcd2bin().  Libkern does provide a bcd2bin()
which cannot be used leaving us with a conflict (see comment in file).
Fortunately this is only seen in one driver so far and it seems easier
to drop this in and change a single line in the driver than to add this
inline in the driver.

MFC after:	3 days
Reviewed by:	hselasky
Differential Revision: https://reviews.freebsd.org/D32647
2021-10-25 20:20:53 +00:00
Bjoern A. Zeeb
cf89934842 LinuxKPI: pci.h / linux_pci.c rename pci_driver field
Rename the struct pci_driver {} field got the list_head from links
to node as a driver is actually initialsing this to {} which seems
questionable but it will at least make us match the Linux structure
field name.

MFC after:	3 days
Reviewed by:	manu, hselasky
Differential Revision: https://reviews.freebsd.org/D32645
2021-10-25 20:19:24 +00:00
Bjoern A. Zeeb
ed5600f532 LinuxKPI: pci.h make pci_dev argument const for pci_{read,write}_config*()
Make the struct pci_dev argument to the pci_{read,write}_config*()
functions "const" to match the Linux definition as some drivers
try to pass in a const argument which we currently fail to honor.

Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
Reviewed by:	hselasky
Differential Revision: https://reviews.freebsd.org/D32644
2021-10-25 20:17:56 +00:00
Bjoern A. Zeeb
490f9d8f0e LinuxKPI: add netdev_features.h
Add netdev_features.h as a spearate file from the future netdevice.h
implementation to avoid include problems with a future skbuff.h.

Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
Reviewed by:	hselasky
Differential Revision: https://reviews.freebsd.org/D32643
2021-10-25 20:16:23 +00:00
Bjoern A. Zeeb
41dee251ee LinuxKPI: add simple_open() to fs.h
Add a dummy simple_open() to fs.h as we have for other
(unsupported) functions.
This is needed by a wireless driver.

MFC after:	3 days
Reviewed by:	hselasky
Differential Revision: https://reviews.freebsd.org/D32642
2021-10-25 20:14:42 +00:00
Bjoern A. Zeeb
9d593d5a76 mlx4: rename conflicting netdev_priv() to mlx4_netdev_priv()
netdev_priv() is a LinuxKPI function which was used with the old ifnet
linux/netdevice.h implementation which was not adaptable to modern
Linux drviers unless rewriting them for ifnet in first place which
defeats the purpose.
Rename the netdev_priv() calls in mlx4 to mlx4_netdev_priv()
returning the ifnet softc to avoid conflicting symbol names
with different implementations in the future.

MFC after:	3 days
Reviewed by:	hselasky, kib
Differential Revision: https://reviews.freebsd.org/D32640
2021-10-25 20:12:32 +00:00
Mateusz Guzik
ea14af2d3c Inline critical enter/exit for "tied" kernel modules
Sponsored by:	Rubicon Communications, LLC ("Netgate")
2021-10-25 20:07:06 +00:00
Mateusz Guzik
e2493f4912 arm: fix a typo in nvidia/drm2/tegra_bo.c
Unbreaks building TEGRA124

Sponsored by:	Rubicon Communications, LLC ("Netgate")
2021-10-25 18:42:10 +00:00
Gleb Smirnoff
f2d266f3b0 Don't run ip_ctloutput() for divert socket.
It was here since divert(4) was introduced, probably just came with a
protocol definition boilerplate.  There is no useful socket option
that can be set or get for a divert socket.

Reviewed by:		donner
Differential Revision:	https://reviews.freebsd.org/D32608
2021-10-25 11:16:59 -07:00
Gleb Smirnoff
d89c820b0d Remove div_ctlinput().
This function does nothing since 97d8d152c2. It was introduced
in 252f24a2cf with a sidenote "may not be needed".

Reviewed by:		donner
Differential Revision:	https://reviews.freebsd.org/D32608
2021-10-25 11:16:49 -07:00
Konstantin Belousov
350fc36b4c sysctl vm.objects: yield if hog
Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D31163
2021-10-25 20:34:02 +03:00
Konstantin Belousov
7738118e9a vm.objects_swap: disable reporting some information
For making the call faster, do not count active/inactive object queues,
and do not report vnode info if any (for tmpfs).

Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D31163
2021-10-25 20:34:01 +03:00
Konstantin Belousov
42812ccc96 Add vm.swap_objects sysctl
Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D31163
2021-10-25 20:34:01 +03:00
Konstantin Belousov
1b610624fd vm_object_list: split sysctl handler in separate function
Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D31163
2021-10-25 20:34:01 +03:00
Mark Johnston
9ef7df022a hyperv: Register hyperv_timecounter later during boot
Previously the MSR-based timecounter was registered during
SI_SUB_HYPERVISOR, i.e., very early during boot, and before SI_SUB_LOCK.
After commit 621fd9dcb2 this triggers a panic since the timecounter
list lock is not yet initialized.

The hyperv timecounter does not need to be registered so early, so defer
that to SI_SUB_DRIVERS, at the same time the hyperv TSC timecounter is
registered.

Reported by:	whu
Approved by:	whu
Fixes:		621fd9dcb2 ("timecounter: Lock the timecounter list")
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2021-10-25 13:25:01 -04:00
Bjoern A. Zeeb
a5e2a27dca LinuxKPI: add strreplace() to string.h
Add strreplace() needed by a driver.
MFC after:	3 days

Reviewed by:	hselasky
Differential Revision: https://reviews.freebsd.org/D32597
2021-10-25 16:12:10 +00:00
Bjoern A. Zeeb
b382b78503 LinuxKPI: add kstrtou8() and kstrtou8_from_user() to kernel.h
Analogous to the other sized version of kstrto[u]<type>() and
kstrtobool_from_user() add the "u8" versions needed by a driver.

MFC after:	3 days
Reviewed by:	hselasky
Differential Revision: https://reviews.freebsd.org/D32598
2021-10-25 16:10:48 +00:00
Hans Petter Selasky
aad0c65d6b usb(4): Fix for use after free in combination with EVDEV_SUPPORT.
When EVDEV_SUPPORT was introduced, the USB transfers may be running
after the main FIFO is closed. In connection to this a race may appear
which can lead to use-after-free scenarios. Fix this for all FIFO
consumers by initializing and resetting the FIFO queues under the
lock used by the client. Then the client driver will see an empty
queue in all cases a race may appear.

Found by:	pho@
MFC after:	1 week
Sponsored by:	NVIDIA Networking
2021-10-24 19:37:17 +02:00
Jason A. Harmening
fd8ad2128d unionfs: implement vnode-based cache lookup
unionfs uses a per-directory hashtable to cache subdirectory nodes.
Currently this hashtable is looked up using the directory name, but
since unionfs nodes aren't removed from the cache until they're
reclaimed, this poses some problems.  For example, if a directory is
created on a unionfs mount shortly after deleting a previous directory
with the same path, the cache may end up reusing the node for the
previous directory, including its upper/lower FS vnodes.  Operations
against those vnodes with then likely fail because the vnodes
represent deleted files; for example UFS will reject VOP_MKDIR()
against such a vnode because its effective link count is 0.  This may
then manifest as e.g. mkdir(2) or open(2) returning ENOENT for an
attempt to create a file under the re-created directory.

While it would be possible to fix this by explicitly managing the
name-based cache during delete or rename operations, or by rejecting
cache hits if the underlying FS vnodes don't match those passed to
unionfs_nodeget(), it seems cleaner to instead hash the unionfs nodes
based on their underlying FS vnodes.  Since unionfs prefers to operate
against the upper vnode if one is present, the lower vnode will only
be used for hashing as long as the upper vnode is NULL.  This should
also make hashing faster by eliminating string traversal and using
the already-computed hash index stored in each vnode.

While here, fix a couple of other cache-related issues:

--Remove 8 bytes of unnecessary baggage from each unionfs node by
  getting rid of the stored hash mask field.  The mask is knowable
  at compile time.

--When a matching node is found in the cache, reference its vnode
  using vrefl() while still holding the vnode interlock.  Previously
  unionfs_nodeget() would vref() the vnode after the interlock was
  dropped, but the vnode may be reclaimed during that window.  This
  caused intermittent panics from vn_lock(9) during unionfs stress
  testing.

Reviewed by:	kib, markj
Tested by:	pho
Differential Revision:	https://reviews.freebsd.org/D32533
2021-10-24 10:05:50 -07:00
Kirk McKusick
dfd704b7fb Allow biodone() to be used as a completion routine.
An ordered series of BIO_READ and BIO_WRITE operations are
typically done as:

	while (work to do) {
		setup bp for I/O
		g_io_request(bp, consumer);
		biowait(bp);
	}

Here you need to have biodone() called at the completion of
the I/O to set the BIO_DONE flag and awaken the biowait(). The
obvious way to do this would be to set bio_done = biodone, but
biodone() will only take the desired action if bio_done == NULL.
The relevant code at the end of biodone() is:

	done = bp->bio_done;
	if (done == NULL) {
		mtxp = mtx_pool_find(mtxpool_sleep, bp);
		mtx_lock(mtxp);
		bp->bio_flags |= BIO_DONE;
		wakeup(bp);
		mtx_unlock(mtxp);
	} else
		done(bp);

This code would infinitely recurse if biodone() is specified as the
routine to use at completion. So before this change, a wrapper done
function had to be written:

static void
g_io_done(struct bio *bp)
{

	bp->bio_done = NULL;
	biodone(bp);
	bp->bio_done = g_io_done;
}

This commit changes

	if (done == NULL)

to

	if (done == NULL || done == biodone)

which eliminates the need for the wrapper function.

Reviewed by:  kib
Sponsored by: Netflix
2021-10-23 14:11:57 -07:00
Robert Wing
311b95bbcd sys/mount.h: remove dead prototype
vfs_getrootfsid() was removed in 245efbba4d

Reviewed by:	mjg
Differential Revision:	https://reviews.freebsd.org/D32606
2021-10-23 16:13:20 -08:00
Edward Tomasz Napierala
2ec26ae402 linux: Improve debug for PTRACE_GETEVENTMSG
No functional changes.

Sponsored By:	EPSRC
2021-10-23 19:53:12 +01:00
Edward Tomasz Napierala
6e66030c4c linux: implement PTRACE_EVENT_EXEC
This fixes strace(1) from Ubuntu Focal.

Reviewed By:	jhb
Sponsored By:	EPSRC
Differential Revision:	https://reviews.freebsd.org/D32367
2021-10-23 19:46:26 +01:00
Edward Tomasz Napierala
2558bb8e91 linux: Make PTRACE_GET_SYSCALL_INFO handle EJUSTRETURN
This fixes panic when trying to run strace(8) from Focal.

Reviewed By:	kib
Sponsored By:	EPSRC
Differential Revision:	https://reviews.freebsd.org/D32355
2021-10-23 18:56:39 +01:00
Edward Tomasz Napierala
e3a83df119 linux: Improve debug for PTRACE_GETREGSET
No functional changes.

Sponsored By:	EPSRC
2021-10-23 09:30:06 +01:00
Edward Tomasz Napierala
2c7f798282 linux: Fix ENOTSOCK handling in sendfile(2)
The Linux way for sendfile(2) to tell the application
to fallback to another way of copying data is by EINVAL,
not ENOTSOCK.  This fixes package installation scripts
for Mono packages from Focal.

Sponsored By:	EPSRC
Differential Revision:	https://reviews.freebsd.org/D32604
2021-10-23 09:15:58 +01:00
Edward Tomasz Napierala
3417c29851 linux: Constify bsd_to_linux_regset()
No functional changes.

Reviewed By:	emaste
Sponsored By:	EPSRC
Differential Revision:	https://reviews.freebsd.org/D32599
2021-10-23 08:33:58 +01:00
Konstantin Belousov
362c6d8dec nehemiah: manually assemble xstore(-rng)
It seems that clang IAS erronously adds repz prefix which should not be
there.  Cpu would try to store around %ecx bytes of random, while we
only expect a word.

PR:	259218
Reported and tested by:	 Dennis Clarke <dclarke@blastwave.org>
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2021-10-23 02:31:16 +03:00
Gleb Smirnoff
c8ee75f231 Use network epoch to protect local IPv4 addresses hash.
The modification to the hash are already naturally locked by
in_control_sx.  Convert the hash lists to CK lists. Remove the
in_ifaddr_rmlock. Assert the network epoch where necessary.

Most cases when the hash lookup is done the epoch is already entered.
Cover a few cases, that need entering the epoch, which mostly is
initial configuration of tunnel interfaces and multicast addresses.

Reviewed by:		melifaro
Differential revision:	https://reviews.freebsd.org/D32584
2021-10-22 14:40:53 -07:00
Mark Johnston
70f51f0e47 Revert "Handle partial reads in zfs_read"
This reverts commit 59eab1093a.

The change suppressed EFAULT originating from uiomove().  The deadlock
avoidance mechanism implemented by vn_io_fault1() in the VFS handles
such errors by wiring the user pages and retrying, but this change
caused read() to return early instead.  This can result in short I/O,
causing misbehaviour in some applications, and possibly other
consequences.

Until this is resolved somehow, revert the commit.

Approved by:	mm
2021-10-22 15:16:42 -04:00
Gleb Smirnoff
6aae3517ed Retire synchronous PPP kernel driver sppp(4).
The last two drivers that required sppp are cp(4) and ce(4).

These devices are still produced and can be purchased
at Cronyx <http://cronyx.ru/hardware/wan.html>.

Since Roman Kurakin <rik@FreeBSD.org> has quit them, they no
longer support FreeBSD officially.  Later they have dropped
support for Linux drivers to.  As of mid-2020 they don't even
have a developer to maintain their Windows driver.  However,
their support verbally told me that they could provide aid to
a FreeBSD developer with documentaion in case if there appears
a new customer for their devices.

These drivers have a feature to not use sppp(4) and create an
interface, but instead expose the device as netgraph(4) node.
Then, you can attach ng_ppp(4) with help of ports/net/mpd5 on
top of the node and get your synchronous PPP.  Alternatively
you can attach ng_frame_relay(4) or ng_cisco(4) for HDLC.
Actually, last time I used cp(4) back in 2004, using netgraph(4)
instead of sppp(4) was already the right way to do.

Thus, remove the sppp(4) related part of the drivers and enable
by default the negraph(4) part.  Further maintenance of these
drivers in the tree shouldn't be a big deal.

While doing that, remove some cruft and enable cp(4) compilation
on amd64.  The ce(4) for some unknown reason marks its internal
DDK functions with __attribute__ fastcall, which most likely is
safe to remove, but without hardware I'm not going to do that, so
ce(4) remains i386-only.

Reviewed by:		emaste, imp, donner
Differential Revision:	https://reviews.freebsd.org/D32590
See also:		https://reviews.freebsd.org/D23928
2021-10-22 11:41:36 -07:00
Mark Johnston
d7acbe481d vm_page: Break reservations to handle noobj allocations
vm_reserv_reclaim_*() will release pages to the default freepool, not
the direct freepool from which noobj allocations are drawn.  But if both
pools are empty, the noobj allocator variants must break reservations to
make progress.

Reported by:	cy
Reviewed by:	kib (previous version)
Fixes:	b498f71bc5 ("vm_page: Add a new page allocator interface for unnamed pages")
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D32592
2021-10-22 09:25:59 -04:00
Randall Stewart
4e4c84f8d1 tcp: Add hystart-plus to cc_newreno and rack.
TCP Hystart draft version -03:
https://datatracker.ietf.org/doc/html/draft-ietf-tcpm-hystartplusplus

Is a new version of hystart that allows one to carefully exit slow start if the RTT
spikes too much. The newer version has a slower-slow-start so to speak that then
kicks in for five round trips. To see if you exited too early, if not into congestion avoidance.
This commit will add that feature to our newreno CC and add the needed bits in rack to
be able to enable it.

Reviewed by: tuexen
Sponsored by: Netflix Inc.
Differential Revision:	https://reviews.freebsd.org/D32373
2021-10-22 07:10:28 -04:00
Peter Grehan
5a3eb6207a igc: correctly update RCTL when changing multicast filters.
Fix clearing of bits in RCTL for the non-bpf/non-allmulti case.
Update RCTL after modifying the multicast filter registers as per
the Linux driver.

This fixes LACP on igc interfaces, where incoming LACP multicasti
control packets were being dropped.

Reviewed by:	kbowling
Obtained from:	Rubicon Communications, LLC ("Netgate")
MFC after:	3 days
Differential Revision: https://reviews.freebsd.org/D32574
2021-10-22 21:16:12 +10:00
Bjoern A. Zeeb
3dc7a1897e net80211: correct input_sta length checks and control frame handling
Correct input_sta "assertion" checks.  CTS/ACK CTRL frames are shorter
then sizeof(struct ieee80211_frame_min) and were thus running into the
is_rx_tooshort error case.
Use ieee80211_anyhdrsize() to handle this better but make sure we do
at least have the first 2 octets needed for that.
While here move the safety checks before any code which may not obey
them later, just for good style.

The non-scanning check further down assumes a frame format also not
matching control frames.  For now skip the checks for control frames
which allows us to deal with some of them at least now.

Sponsored by:	The FreeBSD Foundation
Obtained from:	20210906 wireless v0.91 code drop
MFC after:	3 days
Reviewed by:	adrian
Differential Revision: https://reviews.freebsd.org/D32238
2021-10-22 10:42:06 +00:00
Bjoern A. Zeeb
9a6695532b net80211/drivers: improve ieee80211_rx_stats for band
While IEEE80211_R_BAND was defined, there was no place to store the
band.  Add a field for that, adjust ieee80211_lookup_channel_rxstatus()
to require it, and update drivers passing "R_{FREQ|IEEE}" in already to
provide the band as well.  For the moment keep the fall-back code
requiring all three fields.

Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
Reviewed by:	adrian
Differential Revision: https://reviews.freebsd.org/D30662
2021-10-22 09:55:54 +00:00
Luiz Otavio O Souza
ab238f1454 pf: ensure we have the correct source/destination IP address in ICMP errors
When we route-to a packet that later turns out to not fit in the
outbound interface MTU we generate an ICMP error.
However, if we've already changed those (i.e. we've passed through a NAT
rule) we have to undo the transformation first.

Obtained from:	pfSense
MFC after:	3 weeks
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D32571
2021-10-22 09:52:17 +02:00
Konstantin Belousov
3b5331dd8d uipc_shm: silent warnings about write-only variables in largepage code
In shm_largepage_phys_populate(), the result from vm_page_grab() is only
needed for assertion.

In shm_dotruncate_largepage(), there is a commented-out prototype code
for managed largepages.   The oldobjsz is saved for its sake, so mark
the variable as __unused directly.

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2021-10-21 21:40:46 +03:00
Konstantin Belousov
3d2778515a sig_ast_checksusp(): mark the local p as __diagused
It is only used to assert that the (current) process is locked

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2021-10-21 21:40:46 +03:00
Konstantin Belousov
6776747a0e subr_firmware.c::unloadentry(): remote write-only variable
The function ignores result returned by linker_release_module().
The FW_UNLOAD flag on the file is cleared, so even on error it would
not be tried again.

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2021-10-21 21:40:46 +03:00
Konstantin Belousov
993446638c alq_open_flags(): mark local td variable as unused
It is passed to the NDINIT() macro which ignores the thread argument
for some time.

Sponsored by:	The FreeBSD Foundation
2021-10-21 21:40:46 +03:00
Konstantin Belousov
661bd70bd7 DMAR: clean up warnings about write-only variables
For some of them, used only when KTR or KMSAN are configured, apply
__unused attribute directly.

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2021-10-21 21:40:46 +03:00
Konstantin Belousov
bded8fa300 umtxq_requeue: remove write-only variable uh2
umtxq_queue_lookup() does not change state.  It is redone inside
umtxq_insert() later, anyway.

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2021-10-21 21:40:46 +03:00
Konstantin Belousov
2030ee0e1b ufs: remove write-only variables
Mark variables as __diagused for invariant-only vars

Reviewed by:	imp, mjg
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D32577
2021-10-21 21:40:46 +03:00
John Baldwin
96668a81ae ktls: Always create a software backend for receive sessions.
A future change to TOE TLS will require a software fallback for the
first few TLS records received.  Future support for NIC TLS on receive
will also require a software fallback for certain cases.

Reviewed by:	gallatin, hselasky
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D32566
2021-10-21 09:37:17 -07:00
John Baldwin
b33ff94123 ktls: Change struct ktls_session.cipher to an OCF-specific type.
As a followup to SW KTLS assuming an OCF backend, rename
struct ocf_session to struct ktls_ocf_session and forward
declare it in <sys/ktls.h> to use as the type of
struct ktls_session.cipher.

Reviewed by:	gallatin, hselasky
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D32565
2021-10-21 09:36:53 -07:00
John Baldwin
c57dbec69a ktls: Add a routine to query information in a receive socket buffer.
In particular, ktls_pending_rx_info() determines which TLS record is
at the end of the current receive socket buffer (including
not-yet-decrypted data) along with how much data in that TLS record is
not yet present in the socket buffer.

This is useful for future changes to support NIC TLS receive offload
and enhancements to TOE TLS receive offload.  Those use cases need a
way to synchronize a state machine on the NIC with the TLS record
boundaries in the TCP stream.

Reviewed by:	gallatin, hselasky
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D32564
2021-10-21 09:36:29 -07:00
Dawid Gorecki
8cb175ba0c Enable stack gap on arm64
Stack gap code used on amd64 can also be reused for arm64. Point
sv_stackgap to elf64_stackgap to enable this feature.

Reviewed by: mw, kib, emaste
Tested by: mw
MFC: after 1 month
Differential Revision: https://reviews.freebsd.org/D32588
2021-10-21 17:20:08 +02:00
Martin Matuska
6ba2210ee0 zfs: merge openzfs/zfs@ec64fdb93 (master) into main
Notable upstream pull request merges:
  #12392 Avoid panic in case of pool errors and missing L2ARC
  #12448 skip snapshot in zfs_iter_mounted()
  #12516 Fix NFS and large reads on older kernels
  #12533 Fail invalid incremental recursive send gracefully
  #12569 FreeBSD: Really zero the zero page
  #12575 Reject zfs send -RI with nonexistent fromsnap
  #12602 Correct refcount_add in dmu_zfetch
  #12650 zpool should call zfs_nicestrtonum() with non-NULL handle

Obtained from:	OpenZFS
OpenZFS commit:	ec64fdb93d
2021-10-21 15:06:06 +02:00
Elliott Mitchell
5bb67f5f3f xen/devices: purge uses of intr_machdep.h
Devices in sys/dev should be architecture-independent and NOT #include
intr_machdep.h.

Reviewed by: mhorne royger
Differential Revision: https://reviews.freebsd.org/D29959
2021-10-21 09:39:16 +02:00
Roger Pau Monné
535badd1b8 xen/pcifront: purge from tree
Xen pcifront has been unhooked from the build for a long time, as it's
only used by PV mode which FreeBSD doesn't support. Remove it from the
tree.
2021-10-21 09:39:16 +02:00
Roy Marples
5c5340108e net: Allow binding of unspecified address without address existance
Previously in_pcbbind_setup returned EADDRNOTAVAIL for empty
V_in_ifaddrhead (i.e., no IPv4 addresses configured) and in6_pcbbind
did the same for empty V_in6_ifaddrhead (no IPv6 addresses).

An equivalent test has existed since 4.4-Lite.  It was presumably done
to avoid extra work (assuming the address isn't going to be found
later).

In normal system operation *_ifaddrhead will not be empty: they will
at least have the loopback address(es).  In practice no work will be
avoided.

Further, this case caused net/dhcpd to fail when run early in boot
before assignment of any addresses.  It should be possible to bind the
unspecified address even if no addresses have been configured yet, so
just remove the tests.

The now-removed "XXX broken" comments were added in 59562606b9,
which converted the ifaddr lists to TAILQs.  As far as I (emaste) can
tell the brokenness is the issue described above, not some aspect of
the TAILQ conversion.

PR:		253166
Reviewed by:	ae, bz, donner, emaste, glebius
MFC after:	1 month
Differential Revision:	https://reviews.freebsd.org/D32563
2021-10-20 19:25:51 -04:00
Konstantin Belousov
2ff7c2cc4f sys/bus.h: silence warnings about write-only variables
in the generated functions for bus accessors.  These are the most
noising instances for drivers when non-debug kernel is compiled with
clang 13.

Reviewed by:	imp
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D32578
2021-10-20 21:29:49 +03:00
Konstantin Belousov
2bd6d910b2 msdosfs_rename: remove write-only variables
Reviewed by:	imp, mjg
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D32577
2021-10-20 21:29:49 +03:00
Konstantin Belousov
80bca63cf4 tmpfs: remove write-only variables
Reviewed by:	imp, mjg
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D32577
2021-10-20 21:29:49 +03:00
Andrew Turner
4fb002805e Pass the ACPI ID when reading the ACPI domain
The ACPI ID may not be the same as the FreeBSD CPU id. Use the former
when finding the CPU domain as there is no requirement for it to be
identical to the latter.

Reported by:	dch, kevans
Reviewed by:	kevans
Sponsored by:	The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D32546
2021-10-20 11:02:06 +01:00
Mark Johnston
4c812fe61b vlapic: Schedule callouts on the local CPU
The virtual LAPIC driver uses callouts to implement the LAPIC timer.
Callouts are armed using callout_reset_sbt(), which currently puts
everything on CPU 0.  On systems running many bhyve VMs this results in
a large amount of contention for CPU 0's callout lock.

Modify vlapic to schedule callouts on the local CPU instead.  This
allows timer interrupts to be scheduled more evenly among CPUs where
bhyve is running.

Reviewed by:	grehan, jhb
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D32559
2021-10-19 21:22:57 -04:00
Mark Johnston
bd49c454ca Bump __FreeBSD_version for the preceding page allocator changes
None of the usual suspects (e.g., drm-kmod, virtualbox-ose-kmod,
nvidia-driver) seem to be affected by the changes, but it is likely that
something else is affected.

Sponsored by:	The FreeBSD Foundation
2021-10-19 21:22:57 -04:00
Mark Johnston
34fac29e98 amd64: Add comments to pmap_pinit_type()
... explaining why we don't pass the pmap pointer to
pmap_alloc_pt_page().

Reported by:	alc
Reviewed by:	alc, kib
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D32528
2021-10-19 21:22:57 -04:00
Mark Johnston
ff93447d8e Use the vm_radix_init() helper when initializing pmaps
No functional change intended.

Reviewed by:	alc, kib
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D32527
2021-10-19 21:22:56 -04:00
Mark Johnston
a9d6f1fe0a Remove some remaining references to VM_ALLOC_NOOBJ
Reviewed by:	alc, kib
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D32037
2021-10-19 21:22:56 -04:00
Mark Johnston
b801c79dda vm_fault: Stop specifying VM_ALLOC_ZERO
Now vm_page_alloc() and friends will unconditionally preserve PG_ZERO,
so there is no point in setting this flag.

Eliminate a local variable and add a comment explaining why we
prioritize the allocation when the process is doomed.

No functional change intended.

Reviewed by:	kib, alc
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D32036
2021-10-19 21:22:56 -04:00
Mark Johnston
02fb0585e7 vm_page: Drop handling of VM_ALLOC_NOOBJ in vm_page_alloc_contig_domain()
As in vm_page_alloc_domain_after(), unconditionally preserve PG_ZERO.

Implement vm_page_alloc_noobj_contig_domain().

Reviewed by:	kib
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D32034
2021-10-19 21:22:56 -04:00
Mark Johnston
c40cf9bc62 vm_page: Stop handling VM_ALLOC_NOOBJ in vm_page_alloc_domain_after()
This makes the allocator simpler since it can assume object != NULL.
Also modify the function to unconditionally preserve PG_ZERO, so
VM_ALLOC_ZERO is effectively ignored (and still must be implemented by
the caller for now).

Reviewed by:	alc, kib
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D32033
2021-10-19 21:22:56 -04:00
Mark Johnston
84c3922243 Convert consumers to vm_page_alloc_noobj_contig()
Remove now-unneeded page zeroing.  No functional change intended.

Reviewed by:	alc, hselasky, kib
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D32006
2021-10-19 21:22:56 -04:00
Mark Johnston
92db9f3bb7 Introduce vm_page_alloc_noobj_contig()
This is the same as vm_page_alloc_noobj(), but allocates physically
contiguous runs of memory.  For now it is implemented in terms of
vm_page_alloc_contig(), with the difference that
vm_page_alloc_noobj_contig() implements VM_ALLOC_ZERO by zeroing the
page.

Reviewed by:	alc, kib
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D32005
2021-10-19 21:22:56 -04:00
Mark Johnston
a4667e09e6 Convert vm_page_alloc() callers to use vm_page_alloc_noobj().
Remove page zeroing code from consumers and stop specifying
VM_ALLOC_NOOBJ.  In a few places, also convert an allocation loop to
simply use VM_ALLOC_WAITOK.

Similarly, convert vm_page_alloc_domain() callers.

Note that callers are now responsible for assigning the pindex.

Reviewed by:	alc, hselasky, kib
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D31986
2021-10-19 21:22:56 -04:00
Mark Johnston
b498f71bc5 vm_page: Add a new page allocator interface for unnamed pages
The diff adds vm_page_alloc_noobj() and vm_page_alloc_noobj_domain().
These mostly correspond to vm_page_alloc() and vm_page_alloc_domain()
when no VM object is specified, with the exception that they handle
VM_ALLOC_ZERO by zeroing the page, rather than by preserving PG_ZERO.

This simplifies callers and will permit simplification of the
vm_page_alloc_domain() definition.

Since the new allocator variant is similar to vm_page_alloc_freelist(),
implement both of them using a common backend allocator function.  No
functional change intended.

Reviewed by:	alc, kib
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D31985
2021-10-19 21:22:55 -04:00
Mark Johnston
a23e6a1078 vm_page: Move vm_page_alloc_check() to after page allocator definitions
This way all of the vm_page_alloc_*() allocator functions are grouped
together.

MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2021-10-19 21:22:50 -04:00
Konstantin Belousov
c7f38a2df1 procctl: stop using SA_*LOCKED, define local enum
Using SA_*LOCKED constants breaks !INVARIANT builds

Reported by:	cy
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2021-10-20 00:25:19 +03:00
Konstantin Belousov
49db81aa05 kern_procctl: skip zombies for process group operations
When iterating over the process group members, skip zombies same as it
is done by pfind() for single-process operation.

Reviewed by:	emaste, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D32513
2021-10-19 23:04:34 +03:00
Konstantin Belousov
3692877a6c kern_procctl.c: use td->td_proc instead of curproc
Suggested by:	markj
Reviewed by:	emaste, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D32513
2021-10-19 23:04:34 +03:00
Konstantin Belousov
f5bb6e5a6d procctl: actually require debug privileges over target
for state control over TRACE, TRAPCAP, ASLR, PROTMAX, STACKGAP,
NO_NEWPRIVS, and WXMAP.

Reported by:	emaste
Reviewed by:	emaste, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D32513
2021-10-19 23:04:34 +03:00
Konstantin Belousov
1c4dbee5dd procctl: make it possible to specify that some operations require debug privilege over the target
Reviewed by:	emaste, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D32513
2021-10-19 23:04:34 +03:00
Konstantin Belousov
32026f5983 sys_procctl(): zero the data buffer once, on syscall entry
and remove zeroing of it from specific functions.  This way it is
guaranteed that we do not leak kernel data.

Suggested by:	markj
Reviewed by:	emaste, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D32513
2021-10-19 23:04:34 +03:00
Konstantin Belousov
56d5323b4d sys_procctl(): use table data to do copyin/copyout
Reviewed by:	emaste, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D32513
2021-10-19 23:04:34 +03:00
Konstantin Belousov
68dc5b381a kern_procctl_single(): convert to use table data
Reviewed by:	emaste, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D32513
2021-10-19 23:04:34 +03:00
Konstantin Belousov
34f39a8c0e procctl: convert PDEATHSIG_CTL/STATUS to regular kern_procctl_single() cases
Reviewed by:	emaste, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D32513
2021-10-19 23:04:34 +03:00
Konstantin Belousov
f833ab9dd1 procctl(2): add consistent shortcut P_ID:0 as curproc
Reported by:	bdrewery, emaste
Reviewed by:	emaste, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D32513
2021-10-19 23:04:34 +03:00
Konstantin Belousov
7ae879b14a kern_procctl(): convert the function to be table-driven
Reviewed by:	emaste, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D32513
2021-10-19 23:04:34 +03:00
Konstantin Belousov
31faa565ed sys_procctl(2): remove sysproto and argused
Reviewed by:	emaste, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D32513
2021-10-19 23:04:33 +03:00
Mateusz Guzik
bcd4c17cca pf: fix some cc --analyze warnings
Reviewed by:	kp
Sponsored by:	Rubicon Communications, LLC ("Netgate")
2021-10-19 11:59:11 +00:00
Emmanuel Vadot
225639e7db vt: Disable bell by default
Bell is either useless if you're working on remote servers or really annoying
when you're working with a local machine that have a loud buzzer.
Switch the default to have it disable.

Reviewed by:	   imp, pstef, tsoome
Sponsored by:	   Beckhoff Automation GmbH & Co. KG
Differential Revision:	    https://reviews.freebsd.org/D32543
2021-10-19 09:37:28 +02:00
Rick Macklem
52dee2bc03 nfscl: Handle NFSv4.1/4.2 Close RPC NFSERR_DELAY replies better
Without this patch, if a NFSv4.1/4.2 server replies NFSERR_DELAY to
a Close operation, the client loops retrying the Close while holding
a shared lock on the clientID.  This shared lock blocks returns of
delegations, even though the server has issued a CB_RECALL to request
the delegation return.

This patch delays doing a retry of a Close that received a reply of
NFSERR_DELAY until after the shared lock on the clientID is released,
for NFSv4.1/4.2.  To fix this for NFSv4.0 would be very difficult and
since the only known NFSv4 server to reply NFSERR_DELAY to Close only
does NFSv4.1/4.2, this fix is hoped to be sufficient.

This problem was detected during a recent IETF working group NFSv4
testing event.

MFC after:	2 week
2021-10-18 15:05:34 -07:00
Adrian Chadd
015ff812d6 ipq4018: add initial IPQ4018/IPQ4019 support
Summary:
This adds required IPQ4018/IPQ4019 SoC support to boot.
It also includes support for disabling the ARMv7 hardware
breakpoint / debug stuff at compile time as this is
required for the IPQ SoCs, and printing out the undefined
instruction itself.

Test Plan: * compiled/booted on an IPQ4019 SoC AP

Reviewers: #core_team!

Subscribers: imp, andrew

Differential Revision: https://reviews.freebsd.org/D32538
2021-10-18 19:19:06 +00:00
Adrian Chadd
fb7a007728 arm: add a std.qca for 32 bit armv7 platforms
This is the minimal config required to boot on the IPQ4018 SoC
and likely future ones as well in this family.
2021-10-18 19:19:03 +00:00
Adrian Chadd
02438ce5fd ipq4018: add initial IPQ4018/IPQ4019 support
This is for the Qualcomm Atheros quad-core ARMv7 SoC with built-in
2x2 2GHz and 5GHz ath10k devices.

It's enough (with an upcoming set of config files) to netboot
on an ASUS router I have here and get to a single core mountroot
prompt.
2021-10-18 19:19:00 +00:00
Adrian Chadd
c29c0e6876 arm: allow the debug stuff in CP14 to be disabled at compile time
The upcoming QCA ipq401x support detects the CP14 debug features,
but any attempt to use it causes an undefined instruction error.
It apparently needs a specific TZ image loaded by the early bootloader
(SBL) in order to enable these kinds of features.

So add a new kernel option that explicitly disables this in the
arm code - the debugger works fine without it.
2021-10-18 19:18:56 +00:00
Adrian Chadd
8398d52d65 arm: print out the undefined instruction upon an undefined instruction panic
It's SUPER useful to be able to see the actual undefined instruction
when we hit said undefined instruction.
2021-10-18 19:18:52 +00:00
Adrian Chadd
9264bd386c ipq4018: add a device tree file for the ASUS rt-ac58u router
This is the initial device tree file describing the ASUS
RT-AC58U 2GHz/5GHz 11ac router.

Obtained from: OpenWRT
2021-10-18 19:18:46 +00:00
Adrian Chadd
8e53cd7099 ipq4018: add TCSR definitions from Linux.
These are hardware configuration options which are required in
the linux/openwrt device trees for the IPQ4018/IPQ4019 devices.

Since this isn't obtained from linux upstream but instead from
openwrt, this can't go in contrib; instead it is going in
sys/dts/include/ .

Obtained from: OpenWRT

Tested:

* IPQ4019 ASUS RT-AC58U AP, initial bootstrapping
2021-10-18 19:18:01 +00:00
Gleb Smirnoff
9b7501e797 in_mcast: garbage collect inp_gcmoptions()
It is is used only once, merge it into inp_freemoptions().
2021-10-18 11:36:07 -07:00
Gleb Smirnoff
0f617ae48a Add in_pcb_var.h for KPIs that are private to in_pcb.c and in6_pcb.c. 2021-10-18 10:19:57 -07:00
Gleb Smirnoff
147f018a72 Move in6_pcbsetport() to in6_pcb.c
This function was originally carved out of in6_pcbbind(), which
is in in6_pcb.c. This function also uses KPI private to the PCB
database - in_pcb_lport().
2021-10-18 10:19:03 -07:00
Gleb Smirnoff
744a64bd92 in_pcb: garbage collect in_pcbrele() 2021-10-18 10:07:16 -07:00
Gleb Smirnoff
5a78df20ce in_pcb: garbage collect unused structure in_pcblist 2021-10-18 10:06:39 -07:00
Warner Losh
7881db8346 Remove POWER_PM_TYPE_APM. It's now unused.
Sponsored by:		Netflix
Reviewed by:		markj
Differential Revision:	https://reviews.freebsd.org/D32549
2021-10-18 08:41:17 -06:00
Mark Johnston
621fd9dcb2 timecounter: Lock the timecounter list
Timecounter registration is dynamic, i.e., there is no requirement that
timecounters must be registered during single-threaded boot.  Loadable
drivers may in principle register timecounters (which can be switched to
automatically).  Timecounters cannot be unregistered, though this could
be implemented.

Registered timecounters belong to a global linked list.  Add a mutex to
synchronize insertions and the traversals done by (mpsafe) sysctl
handlers.  No functional change intended.

Reviewed by:	imp, kib
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D32511
2021-10-18 09:56:59 -04:00
Mark Johnston
06ebadc5f5 x86: Remove some leftover APM support
This is obsolete since commit 8c576a279e ("Remove APM BIOS support").

Reviewed by:	imp, kib
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D32510
2021-10-18 09:56:59 -04:00
Mark Johnston
81f2e9063d signal: Add SIG_FOREACH and refactor issignal()
Add a SIG_FOREACH macro that can be used to iterate over a signal set.
This is a bit cleaner and more efficient than calling sig_ffs() in a
loop.  The implementation is based on BIT_FOREACH_ISSET(), except
that the bitset limbs are always 32 bits wide, and signal sets are
1-indexed rather than 0-indexed like bitset(9) sets.

issignal() cannot really be modified to use SIG_FOREACH() directly.
Take this opportunity to split the function into two explicit loops.
I've always found this function hard to read and think that this change
is an improvement.

Remove sig_ffs(), nothing uses it now.

Reviewed by:	kib
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D32473
2021-10-18 09:56:58 -04:00
Mark Johnston
de8554295b cpuset(9): Add CPU_FOREACH_IS(SET|CLR) and modify consumers to use it
This implementation is faster and doesn't modify the cpuset, so it lets
us avoid some unnecessary copying as well.  No functional change
intended.

This is a re-application of commit
9068f6ea69.

Reviewed by:	cem, kib, jhb
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D32029
2021-10-18 09:56:58 -04:00
Mark Johnston
51425cb210 bitset: Reimplement BIT_FOREACH_IS(SET|CLR)
Eliminate the nested loops and re-implement following a suggestion from
rlibby.

Add some simple regression tests.

Reviewed by:	rlibby, kib
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D32472
2021-10-18 09:56:58 -04:00
Mark Johnston
36e4dcf47d safexcel: Set the context record unconditionally
The condition added in commit 5bdb8b273a excludes plain SHA
transforms, so for such sessions crypto operations would return
incorrect results.

Fixes:	5bdb8b273a ("safexcel: Maintain per-session context records")
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2021-10-18 09:50:42 -04:00
Mark Johnston
b0423d0f5e amd64: Zero the PML5 PTI page when initializing a pmap
The root page is not zeroed at allocation time since with 4-level tables
each entry is copied from a template.  However, with 5-level tables only
a single entry is filled, so the rest need to be cleared.

Reported by:	alc
Reviewed by:	alc, kib
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D32525
2021-10-18 09:50:42 -04:00
Rick Macklem
d95c0a12a2 nfscl: Modify Close RPC so that it does not use "owner" for NFSv4.1/4.2
This patch modifies the function that does the Close RPC (nfsrpc_closerpc)
so that it does not use the open_owner (nfso_own) for NFSv4.1/4.2.
Use of the seqid in the open_owner structure is only needed for NFSv4.0.
Same applies to a NFSERR_STALESTATEID reply, which should only happen
for NFSv4.0.  This allows nfsrpc_closerpc() to be called when nfso_own
is no longer valid.  This, in turn, allows nfsrpc_closerpc() to be called
after the shared lock on the clientID is released, for NFSv4.1/4.2.

This is being done to prepare the code for a future patch that fixes
the case where an NFSv4.1/4.2 server replies NFSERR_DELAY to a Close
operation.

MFC after:	2 week
2021-10-17 17:50:56 -07:00
Colin Percival
52e125c2bd TSLOG: Report final execname, not first
In cases such as daemons launched via limits(1), a process may call
exec multiple times; the last name of the last binary executed is
usually (always?) more informative.

Fixes:	46dd801acb Add userland boot profiling to TSLOG
Sponsored by:	https://www.patreon.com/cperciva
2021-10-17 13:36:38 -07:00
Jessica Clarke
0d6516b453 Bump __FreeBSD_version for LinuxKPI changes 2021-10-17 15:35:48 +01:00
Jessica Clarke
82098c8bb5 LinuxKPI: Support lazy BAR allocation
Linux KPIs like pci_resource_start/len assume that BARs have been
allocated, but FreeBSD lazily allocates BARs if it cannot allocate the
firmware-allocated BARs. Thus using the Linux KPIs must force allocation
of the BARs rather than returning 0 for the start and length, which can
crash drm-kmod drivers that assume the BARs are valid. This is needed
for the AMDGPU driver to be able to attach on SiFive's HiFive Unmatched.

Reviewed by:	hselasky, jhb, mav
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D32447
2021-10-17 15:32:35 +01:00
Jessica Clarke
60d962e041 LinuxKPI: Implement _ioremap_attr for riscv
Now that riscv implements pmap_mapdev_attr we can enable the non-stub
implementation for riscv, which is needed for drm-kmod to not fail at
run time for drivers that need to map I/O regions.

Reviewed by:	hselasky, bz
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D32446
2021-10-17 15:32:20 +01:00
Jessica Clarke
682c00a6ce riscv: Implement pmap_mapdev_attr
This is needed for LinuxKPI's _ioremap_attr. This reuses the generic
implementation introduced for aarch64, and itself requires implementing
pmap_kenter, which is trivial to do given riscv currently treats all
mapping attributes the same due to the Svpbmt extension not yet being
ratified and in hardware.

Reviewed by:	markj, mhorne
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D32445
2021-10-17 15:31:35 +01:00
Edward Tomasz Napierala
0f559a9f09 Make vmdaemon timeout configurable
Make vmdaemon timeout configurable, so that one can adjust
how often it runs.

Here's a trick: set this to 1, then run 'limits -m 0 sh',
then run whatever you want with 'ktrace -it XXX', and observe
how the working set changes over time.

Reviewed By:	kib
Sponsored By:	EPSRC
Differential Revision:	https://reviews.freebsd.org/D22038
2021-10-17 13:49:29 +01:00
Edward Tomasz Napierala
99f563ed76 linux: recognize TCP_INFO and ratelimit the warning
This ratelimits the "unsupported getsockopt level 6 optname 11"
warnings that happen all the time when watching Netflix.

Sponsored By:	EPSRC
Differential Revision:	https://reviews.freebsd.org/D32454
2021-10-17 13:19:10 +01:00
Edward Tomasz Napierala
a03d4d73e4 linux: Improve debugging for PTRACE_GETREGSET
It's triggered by gdb(1).

Sponsored By:	EPSRC
Differential Revision:	https://reviews.freebsd.org/D32456
2021-10-17 12:53:16 +01:00
Edward Tomasz Napierala
f9246e1484 linux: Implement some bits of PTRACE_PEEKUSER
This makes Linux gdb from Bionic a little less broken.

Sponsored By:	EPSRC
Differential Revision:	https://reviews.freebsd.org/D32455
2021-10-17 12:20:21 +01:00
Edward Tomasz Napierala
75a9d95b4d linux: Adjust PTRACE_GET_SYSCALL_INFO buffer size semantics
The tests/ptrace_syscall_info test from strace(1) complained
about this.

Sponsored By:	EPSRC
Differential Revision:	https://reviews.freebsd.org/D32368
2021-10-17 11:49:46 +01:00
Edward Tomasz Napierala
7e7859e7c2 linux: Partially implement TCSBRK
This fixes tcflush(3), unbreaking cheribuild.py under arm64 Focal.

Reviewed By:	imp
Sponsored By:	EPSRC
Differential Revision:	https://reviews.freebsd.org/D32291
2021-10-17 11:19:56 +01:00
Mateusz Guzik
1045352f15 cache: only assert on flags when dealing with EMPTYPATH
Reported by:	syzbot+bd48ee0843206a09e6b8@syzkaller.appspotmail.com
Fixes:		7dd419cabc ("cache: add empty path support")
2021-10-17 08:42:47 +00:00
Fangrui Song
1cf0633316 sys: Add definitions for RELR relative relocation format
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D32526
2021-10-17 02:37:13 +03:00
Kristof Provost
076b3a50fd pf: don't drop packets when redirection information comes from a state
For some traffic there might be no matching rule in the current ruleset,
for example when a state was imported via pfsync from a sytem with a
different ruleset checksum. In this case pf_route uses s->rt_addr for
routing target instead of r->rpool.cur but r->rpool is checked anyway,
resulting in dropped packets.

PR:		259183
Submitted by:	Kajetan Staszkiewicz <vegeta tuxpowered.net>
Sponsored by:	InnoGames GmbH
2021-10-16 23:02:26 +02:00
Rick Macklem
e2aab5e2d7 nfscl: Move release of the clientID lock into nfscl_doclose()
This patch moves release of the shared clientID lock from nfsrpc_close()
just after the nfscl_doclose() call to the end of nfscl_doclose() call.
This does make the code cleaner, since the shared lock is acquired at
the beginning of nfscl_doclose().  The only semantics change is that
the code no longer drops and reaquires the NFSCLSTATELOCK() mutex,
which I do not believe will have a negative effect on the NFSv4 client.

This is being done to prepare the code for a future patch that fixes
the case where an NFSv4.1/4.2 server replies NFSERR_DELAY to a Close
operation.

MFC after:	2 week
2021-10-16 15:49:38 -07:00
Mateusz Guzik
7dd419cabc cache: add empty path support
This avoids spurious drop offs as EMPTY is passed regardless of the
actual path name.

Pushign the work inside the lookup instead of just ignorign the flag
allows avoid checking for empty pathname for all other lookups.
2021-10-16 20:08:37 +00:00
Colin Percival
46dd801acb Add userland boot profiling to TSLOG
On kernels compiled with 'options TSLOG', record for each process ID:
* The timestamp of the fork() which creates it and the parent
process ID,
* The first path passed to execve(), if any,
* The first path resolved by namei, if any, and
* The timestamp of the exit() which terminates the process.

Expose this information via a new sysctl, debug.tslog_user.

On kernels lacking 'options TSLOG' (the default), no information is
recorded and the sysctl does not exist.

Note that recording namei is needed in order to obtain the names of
rc.d scripts being launched, as the rc system sources them in a
subshell rather than execing the scripts.

With this commit it is now possible to generate flamecharts of the
entire boot process from the start of the loader to the end of
/etc/rc.  The code needed to perform this processing is currently
found in github: https://github.com/cperciva/freebsd-boot-profiling

Reviewed by:	mhorne
Sponsored by:	https://www.patreon.com/cperciva
Differential Revision: https://reviews.freebsd.org/D32493
2021-10-16 11:47:34 -07:00
Kristof Provost
498cca1483 pf: selecting pf_map_addr is not an error
When a redirection/nat IP address is selected by pf_map_addr it is
logged with PF_DEBUG_MISC level. This one according to the manual means
"Generate debug messages for various errors". Selecting an IP address is
not an error, it's a normal function of pf for route-to, nat and some
other operations. Therefore PF_DEBUG_NOISY level should be choosen which
is means "Generate debug messages for common conditions".

PR:		259184
Submitted by:	Kajetan Staszkiewicz <vegeta tuxpowered.net>
Sponsored by:	InnoGames GmbH
2021-10-16 09:32:15 +02:00
Gordon Bergling
899a3b38f5 Fix two typos in source code comments
- s/alocated/allocated/
- s/realocated/reallocated/

MFC after:	3 days
2021-10-16 08:09:31 +02:00
Maxim Sobolev
461e6f23db Fix fragmented UDP packets handling since rev.360967.
Consider IP_MF flag when checking length of the UDP packet to
match the declared value.

Sponsored by:	Sippy Software, Inc.
Differential Revision:	https://reviews.freebsd.org/D32363
MFC after:	2 weeks
2021-10-15 16:48:12 -07:00
Rick Macklem
77c595ce33 nfscl: Add an argument to nfscl_tryclose()
This patch adds a new argument to nfscl_tryclose() to indicate
whether or not it should loop when a NFSERR_DELAY reply is received
from the NFSv4 server.  Since this new argument is always passed in
as "true" at this time, no semantics change should occur.

This is being done to prepare the code for a future patch that fixes
the case where an NFSv4.1/4.2 server replies NFSERR_DELAY to a Close
operation.

MFC after:	2 week
2021-10-15 14:25:38 -07:00
Dawid Gorecki
a97d697122 kern_exec: Add kern.stacktop sysctl.
With stack gap enabled top of the stack is moved down by a random
amount of bytes. Because of that some multithreaded applications
which use kern.usrstack sysctl to calculate address of stacks for
their threads can fail. Add kern.stacktop sysctl, which can be used
to retrieve address of the stack after stack gap is applied to it.
Returns value identical to kern.usrstack for processes which have
no stack gap.

Reviewed by: kib
Obtained from: Semihalf
Sponsored by: Stormshield
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D31897
2021-10-15 10:21:55 +02:00
Dawid Gorecki
889b56c8cd setrlimit: Take stack gap into account.
Calling setrlimit with stack gap enabled and with low values of stack
resource limit often caused the program to abort immediately after
exiting the syscall. This happened due to the fact that the resource
limit was calculated assuming that the stack started at sv_usrstack,
while with stack gap enabled the stack is moved by a random number
of bytes.

Save information about stack size in struct vmspace and adjust the
rlim_cur value. If the rlim_cur and stack gap is bigger than rlim_max,
then the value is truncated to rlim_max.

PR: 253208
Reviewed by: kib
Obtained from: Semihalf
Sponsored by: Stormshield
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D31516
2021-10-15 10:21:47 +02:00
Rick Macklem
6495766acf nfscl: Restructure nfscl_freeopen() slightly
This patch factors the unlinking of the nfsclopen structure out of
nfscl_freeopen() into a separate function called nfscl_unlinkopen().
It also adds a new argument to nfscl_freeopen() to conditionally do
the unlink.  Since this new argument is always passed in as "true"
at this time, no semantics change should occur.

This is being done to prepare the code for a future patch that fixes
the case where an NFSv4.1/4.2 server replies NFSERR_DELAY to a Close
operation.

MFC after:	2 week
2021-10-14 17:28:01 -07:00
John Baldwin
a72ee35564 ktls: Defer creation of threads and zones until first use.
Run ktls_init() when the first KTLS session is created rather than
unconditionally during boot.  This avoids creating unused threads and
allocating unused resources on systems which do not use KTLS.

Reviewed by:	gallatin, markj
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D32487
2021-10-14 15:48:34 -07:00
Konstantin Belousov
86929782cf Fix typo in comment
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2021-10-14 23:07:32 +03:00
Konstantin Belousov
1adebca1fc Style
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2021-10-14 23:07:32 +03:00
John Baldwin
ef3f98ae47 cxgbe: Only run ktls_tick when NIC TLS is enabled.
Previously the body of ktls_tick was a nop when NIC TLS was disabled,
but the callout was still scheduled consuming power on otherwise-idle
systems with Chelsio T6 adapters.  Now the callout only runs while NIC
TLS is enabled on at least one interface of an adapter.

Reported by:	mav
Reviewed by:	np, mav
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D32491
2021-10-14 10:59:16 -07:00
Leandro Lupori
8ecf9a8bab powerpc64: make radix with superpages default
As Radix MMU with superpages enabled is now stable, make it the
default choice on supported hardware (POWER9 and above), since its
performance is greater than that of HPT MMU.

Reviewed by:		alfredo, jhibbits
Sponsored by:		Instituto de Pesquisas Eldorado (eldorado.org.br)
Differential Revision:	https://reviews.freebsd.org/D30797
2021-10-14 13:13:27 -03:00
Warner Losh
2ec165e3f0 nvme: Reduce traffic to the doorbell register
Reduce traffic to doorbell register when processing multiple completion
events at once. Only write it at the end of the loop after we've
processed everything (assuming we found at least one completion,
even if that completion wasn't valid).

Sponsored by:		Netflix
Reviewed by:		mav
Differential Revision:	https://reviews.freebsd.org/D32470
2021-10-14 08:44:37 -06:00
Leandro Lupori
76384bd10f powerpc64: fix OFWFB with Radix MMU
Current implementation of Radix MMU doesn't support mapping
arbitrary virtual addresses, such as the ones generated by
"direct mapping" I/O addresses. This caused the system to hang, when
early I/O addresses, such as those used by OpenFirmware Frame Buffer,
were remapped after the MMU was up.

To avoid having to modify mmu_radix_kenter_attr just to support this
use case, this change makes early I/O map use virtual addresses from
KVA area instead (similar to what mmu_radix_mapdev_attr does), as
these can be safely remapped later.

Reviewed by:		alfredo (earlier version), jhibbits (in irc)
MFC after:		2 weeks
Sponsored by:		Instituto de Pesquisas Eldorado (eldorado.org.br)
Differential Revision:	https://reviews.freebsd.org/D31232
2021-10-14 10:39:52 -03:00
Gordon Bergling
0a8159d8ca ng_ppp(4): Fix a typo in a comment
- s/delcared/declared/

MFC after:	3 days
2021-10-14 15:30:32 +02:00
Jason A. Harmening
152c35ee4f unionfs: Ensure SAVENAME is set for unionfs vnode operations
"rm-style" system calls such as kern_frmdirat() and kern_funlinkat()
don't supply SAVENAME to preserve the pathname buffer for subsequent
vnode ops.  For unionfs this poses an issue because the pathname may
be needed for a relookup operation in unionfs_remove()/unionfs_rmdir().
Currently unionfs doesn't check for this case, leading to a panic on
DIAGNOSTIC kernels and use-after-free of cn_nameptr otherwise.

The unionfs node's stored buffer would suffice as a replacement for
cnp->cn_nameptr in some (but not all) cases, but it's cleaner to just
ensure that unionfs vnode ops always have a valid cn_nameptr by setting
SAVENAME in unionfs_lookup().

While here, do some light cleanup in unionfs_lookup() and assert that
HASBUF is always present in the relevant relookup calls.

Reported by:	pho
Reviewed by:	markj
Differential Revision: https://reviews.freebsd.org/D32148
2021-10-13 19:25:31 -07:00
Brooks Davis
04c91ac48a selsocket: handle sopoll() errors correctly
Without this change, unmounting smbfs filesystems with an INVARIANTS
kernel would panic after 10e64782ed.

Found by:	markj
Reviewed by:	markj, jhb
Obtained from:	CheriBSD
MFC after:	3 days
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D32492
2021-10-14 00:43:48 +01:00
Rick Macklem
24af0fcdfc nfscl: Make nfscl_getlayout() acquire the correct pNFS layout
Without this patch, if a pNFS read layout has already been acquired
for a file, writes would be redirected to the Metadata Server (MDS),
because nfscl_getlayout() would not acquire a read/write layout for
the file.  This happened because there was no "mode" argument to
nfscl_getlayout() to indicate whether reading or writing was being done.
Since doing I/O through the Metadata Server is not encouraged for some
pNFS servers, it is preferable to get a read/write layout for writes
instead of redirecting the write to the MDS.

This patch adds a access mode argument to nfscl_getlayout() and
nfsrpc_getlayout(), so that nfscl_getlayout() knows to acquire a read/write
layout for writing, even if a read layout has already been acquired.
This patch only affects NFSv4.1/4.2 client behaviour when pNFS ("pnfs" mount
option against a server that supports pNFS) is in use.

This problem was detected during a recent NFSv4 interoperability
testing event held by the IETF working group.

MFC after:	2 week
2021-10-13 15:48:54 -07:00
John Baldwin
9f03d2c001 ktls: Ensure FIFO encryption order for TLS 1.0.
TLS 1.0 records are encrypted as one continuous CBC chain where the
last block of the previous record is used as the IV for the next
record.  As a result, TLS 1.0 records cannot be encrypted out of order
but must be encrypted as a FIFO.

If the later pages of a sendfile(2) request complete before the first
pages, then TLS records can be encrypted out of order.  For TLS 1.1
and later this is fine, but this can break for TLS 1.0.

To cope, add a queue in each TLS session to hold TLS records that
contain valid unencrypted data but are waiting for an earlier TLS
record to be encrypted first.

- In ktls_enqueue(), check if a TLS record being queued is the next
  record expected for a TLS 1.0 session.  If not, it is placed in
  sorted order in the pending_records queue in the TLS session.

  If it is the next expected record, queue it for SW encryption like
  normal.  In addition, check if this new record (really a potential
  batch of records) was holding up any previously queued records in
  the pending_records queue.  Any of those records that are now in
  order are also placed on the queue for SW encryption.

- In ktls_destroy(), free any TLS records on the pending_records
  queue.  These mbufs are marked M_NOTREADY so were not freed when the
  socket buffer was purged in sbdestroy().  Instead, they must be
  freed explicitly.

Reviewed by:	gallatin, markj
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D32381
2021-10-13 12:30:15 -07:00
John Baldwin
a63752cce6 ktls: Reject attempts to enable AES-CBC with TLS 1.3.
AES-CBC cipher suites are not supported in TLS 1.3.

Reported by:	syzbot+ab501c50033ec01d53c6@syzkaller.appspotmail.com
Reviewed by:	tuexen, markj
Differential Revision:	https://reviews.freebsd.org/D32404
2021-10-13 12:12:58 -07:00
Gleb Smirnoff
2144431c11 Remove in_ifaddr_lock acquisiton to access in_ifaddrhead.
An IPv4 address is embedded into an ifaddr which is freed
via epoch. And the in_ifaddrhead is already a CK list. Use
the network epoch to protect against use after free.

Next step would be to CK-ify the in_addr hash and get rid of the...

Reviewed by:		melifaro
Differential Revision:	https://reviews.freebsd.org/D32434
2021-10-13 10:04:46 -07:00
Mark Johnston
03d5820f73 mount: Check for !VDIR mount points before handling -o emptydir
To implement -o emptydir, vfs_emptydir() checks that the passed
directory is empty.  This should be done after checking whether the
vnode is of type VDIR, though, or vfs_emptydir() may end up calling
VOP_READDIR on a non-directory.

Reported by:	syzbot+4006732c69fb0f792b2c@syzkaller.appspotmail.com
Reviewed by:	kib, imp
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D32475
2021-10-13 09:33:35 -04:00
Kristof Provost
776df104fa pf: Introduce pf_nvbool()
Similar to the existing functions for strings and ints, this lets us
simplify some of the nvlist conversion code.

MFC after:	1 week
Sponsored by:	Rubicon Communications, LLC ("Netgate")
2021-10-13 12:01:09 +02:00
Hartmut Brandt
ded77e0237 Allow the BPF to be select for write. This is needed for boost:asio
which otherwise fails to handle BPFs.
Reviewed by:	ae
Differential Revision:	https://reviews.freebsd.org/D31967
2021-10-10 17:03:51 +02:00
Rick Macklem
b82168e657 nfscl: Fix another deadlock related to the NFSv4 clientID lock
Without this patch, it is possible to hang the NFSv4 client,
when a rename/remove is being done on a file where the client
holds a delegation, if pNFS is being used.  For a delegation
to be returned, dirty data blocks must be flushed to the NFSv4
server.  When pNFS is in use, a shared lock on the clientID
must be acquired while doing a write to the DS(s).
However, if rename/remove is doing the delegation return
an exclusive lock will be acquired on the clientID, preventing
the write to the DS(s) from acquiring a shared lock on the clientID.

This patch stops rename/remove from doing a delegation return
if pNFS is enabled.  Since doing delegation return in the same
compound as rename/remove is only an optimization, not doing
so should not cause problems.

This problem was detected during a recent NFSv4 interoperability
testing event held by the IETF working group.

MFC after:	1 week
2021-10-12 17:21:01 -07:00
John Baldwin
d1b6fef075 Stop creating socket aio kprocs during boot.
Create the initial pool of kprocs on demand when the first socket AIO
request is submitted instead.  The pool of kprocs used for other AIO
requests is similarly created on first use.

Reviewed by:	asomers
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D32468
2021-10-12 14:03:07 -07:00
Warner Losh
18dc12bfd2 nvme: Restore hotplug warning
Restore hotplug warning in recovery state machine. No functional change
other than what message gets printed.

Sponsored by:		Netflix
2021-10-12 14:26:54 -06:00
Konstantin Belousov
4cc167a352 Restore PPS_SYNC in NOTES
This partially reverts e81e77c5a0, leaving the option both in
GENERICs on amd64/arm64/arm, and in global NOTES file.  Apparently
this better matches existing practice, where we do not try to hard
to make LINT and GENERIC complimentary.

Requested and reviewed by:	jhb
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2021-10-12 23:10:35 +03:00
Ruslan Bukin
aeb76076c6 Prevent repeated deallocation of a resource.
Also deactivate resource if needed.

Discussed with: jrtc27
Differential Revision: https://reviews.freebsd.org/D32458
2021-10-12 20:13:44 +01:00
Andrew Turner
0906563718 Stop reading the arm64 domain when it's known
There is no need to read the domain on arm64 when there is only one
in the ACPI tables. This can also happen when the table is missing
as it is unneeded.

Reported by:	dch
Sponsored by:	The FreeBSD Foundation
2021-10-12 13:16:00 +01:00
Kyle Evans
7259ca3104 fifos: delegate unhandled kqueue filters to underlying filesystem
This gives the vfs layer a chance to provide handling for EVFILT_VNODE,
for instance.  Change pipe_specops to use the default vop_kqfilter to
accommodate fifoops that don't specify the method (i.e. all in-tree).

Based on a patch by Jan Kokemüller.

PR:		225934
Reviewed by:	kib, markj (both pre-KASSERT)
Differential Revision:	https://reviews.freebsd.org/D32271
2021-10-12 02:43:07 -05:00
Rick Macklem
120b20bdf4 nfscl: Fix a deadlock related to the NFSv4 clientID lock
Without this patch, it is possible for a process doing an NFSv4
Open/create of a file to block to allow another process
to acquire the exclusive lock on the clientID when holding
a shared lock on the clientID.  As such, both processes
deadlock, with one wanting the exclusive lock, while the
other holds the shared lock.  This deadlock is unlikely to occur
unless delegations are in use on the NFSv4 mount.

This patch fixes the problem by not deferring to the process
waiting for the exclusive lock when a shared lock (reference cnt)
is already held by the process.

This problem was detected during a recent NFSv4 interoperability
testing event held by the IETF working group.

MFC after:	1 week
2021-10-11 21:58:24 -07:00
Warner Losh
cdccd11b36 forward declare struct thread
sys/sysctl.h moved struct thread forward declaration under #ifdef
_KERNEL and so this header fails when included from userland. Add a
forward declaration here.

Fixes:	     		99eefc727e
Sponsored by:		Netflix
2021-10-11 12:59:39 -06:00
Warner Losh
99eefc727e sysctl.h: Less namespace pollution
Remove unused struct ctlname. It is unused. Move struct thread to
inside #if _KERNEL.

Sponsored by:		Netflix
Reviewed by:		kib
Differential Revision:	https://reviews.freebsd.org/D32457
2021-10-11 11:20:07 -06:00
Warner Losh
56ee5c551f sysctl: make sys/sysctl.h self contained
sys/sysctl.h only needs u_int and size_t from sys/types.h. When the
sysctl interface was designed, having one more more prerequisites
(especially sys/types.h) was the norm. Times have changed, and to make
things more portable, make sys/types.h optional. We do this by including
sys/_types.h, defining size_t if needed, and changing u_int to 'unsigned
int' in a prototype for userland builds. For kernel builds, sys/types.h
is still required.

Sponsored by:		Netflix
Reviewed by:		kib, jhb
Differential Revision:	https://reviews.freebsd.org/D31827
2021-10-11 11:20:07 -06:00
Greg V
98dae405de O_PATH: allow vfs_extattr syscalls
These calls do operate on vnodes only, not file contents.
This is useful for e.g. the xdg-document-portal fuse filesystem.

Reviewed by:	kib, markj
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D32438
2021-10-11 20:09:49 +03:00
Mateusz Guzik
2b68eb8e1d vfs: remove thread argument from VOP_STAT
and fo_stat.
2021-10-11 13:22:32 +00:00
Mateusz Guzik
b4a58fbf64 vfs: remove cn_thread
It is always curthread.

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D32453
2021-10-11 13:21:47 +00:00
Alex Richardson
9017870541 Add missing const after 6c4f95161d
I accidentally didn't include hunk in the committed patch.

Fixes:		6c4f95161d ("virtio: make the write_config buffer argument const")
2021-10-11 13:20:56 +01:00
Alex Richardson
6c4f95161d virtio: make the write_config buffer argument const
No functional change intended, but noticed that we could add const here
while adding linuxkpi support for virtio.

Reviewed By:	bryanv, imp
Differential Revision: https://reviews.freebsd.org/D32370
2021-10-11 11:52:18 +01:00
Alex Richardson
d98f2712c7 linuxkpi: implement ida_alloc()
Needed for the virtio-gpu driver.

Reviewed By:	#linuxkpi, manu, bz, hselasky
Differential Revision: https://reviews.freebsd.org/D32366
2021-10-11 11:51:44 +01:00
Alex Richardson
6d15ccde4d linuxkpi: Allow BUILD_BUG_ON in if statements without braces
I got a compilation failure in virtio-gpu without this change.

Reviewed By:	#linuxkpi, manu, bz, hselasky
Differential Revision: https://reviews.freebsd.org/D32366
2021-10-11 11:51:44 +01:00
Alex Richardson
ff479cc6c9 linuxkpi: add PAGE_ALIGNED macro
Needed for the virtio-gpu driver.

Reviewed By:	#linuxkpi, manu, bz, hselasky
Differential Revision: https://reviews.freebsd.org/D32366
2021-10-11 11:51:43 +01:00
Alex Richardson
2686b10db4 linuxkpi: Add sg_init_one
Needed for the virtio-gpu driver.

Reviewed By:	#linuxkpi, manu, bz, hselasky
Differential Revision: https://reviews.freebsd.org/D32366
2021-10-11 11:51:43 +01:00