Commit Graph

147432 Commits

Author SHA1 Message Date
Alexander V. Chernikov
896e22fbc6 netlink: fix neighbour deleting for IPv6.
MFC after:	2 weeks
2023-04-25 12:27:02 +00:00
Alexander V. Chernikov
e83f23eb5e netlink: enable extended error reporting in snl(3).
MFC after:	2 weeks
2023-04-25 11:21:03 +00:00
Alexander V. Chernikov
5af9ad5359 netlink: add snl(3) support for dumping nexthops and neighbors
MFC after:	2 weeks
2023-04-25 11:14:12 +00:00
Alexander V. Chernikov
b32cf15d86 netlink: add support for dumping kernel nexthops.
MFC after:	2 weeks
2023-04-25 11:12:18 +00:00
Alexander V. Chernikov
a2728a9a5b netlink: allow creation of temporary lle entries.
MFC after:	2 weeks
2023-04-25 11:08:47 +00:00
Alexander V. Chernikov
ca1850478f lltable: properly set expire time to 0 for static IPv4 entries.
MFC after:	2 weeks
2023-04-25 10:59:50 +00:00
Alexander V. Chernikov
fab828b455 netlink: fix parameters in snl_attr_get_flag()
MFC after:	2 weeks
2023-04-25 10:57:59 +00:00
Alexander V. Chernikov
70810dc817 netlink: add nlattr_get_uint8() function to pack u8 attributes.
MFC after:	2 weeks
2023-04-25 10:56:42 +00:00
Alexander V. Chernikov
34066d0008 routing: add iterator-based nhop traversal KPI.
MFC after:	2 weeks
2023-04-25 10:55:16 +00:00
Alexander V. Chernikov
fd1aa866eb routing: add rt_tables_get_rnh_safe() that doesn't panic when af/fib is
incorrect.

MFC after:	2 weeks
2023-04-25 10:53:51 +00:00
Andrew Turner
6a9c2e63be Add padding for future use on arm64
Allow new features to be supported without changing the size of
existing structures.

Reviewed by:	kib
Sponsored by:	Arm Ltd
Differential Revision:	https://reviews.freebsd.org/D39777
2023-04-25 10:23:15 +01:00
Vladimir Kondratyev
19c804b74f bcm5974(4): Make Magic Trackpad 2 support endian-safe.
While here make touch orientation event matching with Linux

MFC after:	1 month
2023-04-25 12:20:53 +03:00
Val Packett
ef8397c28e bcm5974(4): add Magic Trackpad 2 (USB only) support
The MT2 uses a compact report format, but otherwise is similar in many
ways to the internal trackpads, it even uses the same mode switching
commands.

Reviewed by:	wulf
MFC after:	1 month
Differential revision:	https://reviews.freebsd.org/D34437
2023-04-25 12:20:53 +03:00
Stefan Eßer
88a795e80c sys/fs: do not report blocks allocated for synthetic file systems
The pseudo file systems (devfs, fdescfs, procfs, etc.) report total
and available blocks and inodes despite being synthetic with no
underlying storage device to which those values could be applied.

The current code of these file systems tends to report a fixed number
of total blocks but no free blocks, and in the case of procfs,
libprocfs, linsysfs also no free inodes.

This can be irritating in e.g. the "df" output, since 100% of the
resources seem to be in use, but it can also create warnings in
monitoring tools used for capacity management.

This patch makes these file systems return the same value for the
total and free parameters, leading to 0% in use being displayed by
"df". Since there is no resource that can be exhausted, this appears
to be a sensible result.

Reviewed by:	mckusick
Differential Revision:	https://reviews.freebsd.org/D39442
2023-04-25 09:59:15 +02:00
Stefan Eßer
0728695c63 fs/msdosfs: Fix potential panic and size calculations
Some combinations of FAT12 file system parameters could cause a kernel
panic due to an unmapped access if the size of the FAT was larger than
the CPU page size. The reason is that FAT12 uses 3 bytes to store
2 FAT pointers, leading to partial FAT pointers at the end of buffers
of a size that is not a multiple of 3.

With a typical page size of 4 KB, this caused the FAT entry at byte
offsets 4095 and 4096 to cross the page boundary, with only the first
page mapped. This was fixed by adjusting the mapping to always cover
both bytes of each FAT entry.

Testing revealed 2 other inconsistencies that are fixed by this commit:

1) The calculation of the size of the data area did not take into
   account the fact that the first two data block numbers are reserved
   and that the data area starts with block 2. This could cause a
   FAT12 file system created with the maximum supported number of
   blocks to be incorrectly identified as FAT16.

2) The root directory does not take up space in the data area of a
   FAT12 or FAT16 file system, since it is placed into a reserved
   area outside of that data area. This commits makes stat() report
   the logical size of the root directory, but with 0 blocks allocated
   from the data area.

PR:		270587
Reviewed by:	mckusick
Differential Revision:	https://reviews.freebsd.org/D39386
2023-04-25 09:58:29 +02:00
Konstantin Belousov
04d815f115 netipsec/key.c: use designated initializers for arrays
Also de-expand nitems() use in related asserts, and fix maxsize array
name in the assert message.

Sponsored by:	NVidia networking
2023-04-25 09:41:24 +03:00
Konstantin Belousov
fcc7aabdca netipsec: some style
Sponsored by:	NVidia networking
2023-04-25 09:39:51 +03:00
Cheng Cui
1f782fcc0c
Remove unused fields in siftr_stats. Thus, update the man page as well.
Summary: Remove unused fields in siftr_stats. Thus, update the man page as well.

Test Plan: Tested in Emulab testbed.

Reviewers: rscheff, tuexen
Approved by: rscheff, tuexen
Subscribers: imp, melifaro, glebius
Differential Revision: https://reviews.freebsd.org/D39776
2023-04-24 15:31:15 -04:00
Cheng Cui
8aa2be695e
Correct the value of macro TF2_TCP_ACCOUNTING.
Summary: Make sure the values are in order.

Reviewers: rscheff, tuexen, #transport!
Approved by: rscheff, tuexen, glebius
Subscribers: imp, melifaro, glebius
Differential Revision: https://reviews.freebsd.org/D39716
2023-04-24 15:28:41 -04:00
Olivier Certner
6a5e614015 vn_open_vnode(): fix locking around VOP_CLOSE() on advisory lock error
In the case of a FIFO or if trying to open a file for writing, an
exclusive lock is necessary.

Reviewed by:	kib
MFC after:	1 week
2023-04-25 01:37:58 +03:00
Olivier Certner
faec42a3ef sys/dirent.h: comment update, 'd_off' is offset of next entry
This is the historical (and still current) behavior, as well as that of
NetBSD, OpenBSD, illumos and Linux (getdents()/getdents64()).

Reviewed by:	kib
MFC after:	3 days
2023-04-25 00:32:10 +03:00
Konstantin Belousov
a718431c30 lookup(): ensure that openat("/", "..", O_RESOLVE_BENEATH) fails
PR:	269780
Reported by:	Dan Gohman <dev@sunfishcode.online>
Reviewed by:	emaste, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D39773
2023-04-25 00:32:10 +03:00
John Baldwin
9b02f2daf4 powerpc: Use valid prototypes for function declarations with no arguments.
Reviewed by:	emaste
Differential Revision:	https://reviews.freebsd.org/D39733
2023-04-24 08:53:50 -07:00
Andrew Turner
c94e4d91da Clean up PCI DEN0115 driver probing
Rather than checking for the SMCCC version check if the PCI_VERSION
call returns a valid version.

Sponsored by:	Arm Ltd
2023-04-24 16:34:21 +01:00
Warner Losh
d4c78130f4 powerpc: syscalls.c is standard
No need to add it here, much less make it optional on ktr.

Sponsored by:		Netflix
2023-04-24 09:25:42 -06:00
Justin Hibbits
02f3b17fa5 Mechanically convert Xen netfront/netback(4) to IfAPI
Reviewed by:	zlei
Sponsored by:	Juniper Networks, Inc.
Differential Revision: https://reviews.freebsd.org/D37800
2023-04-24 09:54:23 -04:00
Justin Hibbits
7814374b7c IfAPI: Hide the macros that touch ifnet members
Nothing should be directly touching the ifnet members, which are hidden
in <net/if_private.h>, so hide them in the same header to avoid errors
from users.

Sponsored by:	Juniper Networks, Inc.
2023-04-24 09:54:23 -04:00
Justin Hibbits
97583aa256 linuxkpi: Migrate to IfAPI
Summary:
Trivial changes for LinuxKPI to use IfAPI.  The 'bsdifp' looks unused,
so removed it instead of converting it to a pointer.

Bump __FreeBSD_version for change to struct net_device.

Reviewed by:	bz, hselasky
Sponsored by:	Juniper Networks, Inc.
Differential Revision: https://reviews.freebsd.org/D39491
2023-04-24 09:54:22 -04:00
Andrew Turner
c3785c3eb0 Remove unneeded SMMU macros
Sponsored by:	Arm Ltd
Differential Revision:	https://reviews.freebsd.org/D39186
2023-04-24 12:48:01 +01:00
Andrew Turner
cdd34e0038 Remove virtual addresses from smmu_pmap_remove_pages
This function needs to unmap all memory in a given SMMU context. Have
it iterate over all page table entries to find what has been mapped
rather than looking at virtual addresses.

While here use SMMU specific macros.

Sponsored by:	Arm Ltd
Differential Revision:	https://reviews.freebsd.org/D39185
2023-04-24 12:47:55 +01:00
Andrew Turner
b97e94d91e Move to a SMMU specific struct for the smmu pmap
This is not managed through the VM subsystem so only needs to hold the
data the SMMU driver needs.

Sponsored by:	Arm Ltd
Differential Revision:	https://reviews.freebsd.org/D39184
2023-04-24 12:47:50 +01:00
Andrew Turner
49ee1a7ef0 Create a common function to get the SMMU sid
Now the PCI drivers have a common interface to read the IOMMU xref
and SID create a common function to read it. This fixes an issue where
we will call into an ACPI specific function when booting with FDT when
both are enabled.

Sponsored by:	Arm Ltd
Differential Revision:	https://reviews.freebsd.org/D39183
2023-04-24 12:47:44 +01:00
Andrew Turner
913d04deed Add PCI_ID_OFW_IOMMU to the pci ecam ACPI driver
Teach the pci host generic ACPI attachment about PCI_ID_OFW_IOMMU. This
will be used by the arm64 smmu IOMMU driver to read the xref and ID
this interface provides in a bus-agnostic way.

Sponsored by:	Arm Ltd
Differential Revision:	https://reviews.freebsd.org/D39182
2023-04-24 12:47:38 +01:00
Andrew Turner
117beba8a8 arm64: Clean up smmu fdt xref handling
Use the xref from OF_xref_from_node for the smmu xref. We already have
a valid xref ID, there is no need to convert this to a memory address.

Sponsored by:	Arm Ltd
Differential Revision:	https://reviews.freebsd.org/D39181
2023-04-24 12:47:31 +01:00
Andrew Turner
8bc94f256e Remove redundant data from pci host generic
The bus tag and handle fields are already stored in the resource. Use
this with the bus_read/bus_write helper macros.

Sponsored by:	Arm Ltd
2023-04-24 12:33:50 +01:00
Andrew Turner
078a69abcb Use a uint64_t to store the arm64 mpidr
Use a single uint64_t to hole the mpidr register as we can break the
KBI on 14. Keep the macro so code can still be MFCd to 13.

Sponsored by:	Arm Ltd
2023-04-24 12:33:50 +01:00
Andrew Turner
c9a05c0722 Add a PCI driver that follows the Arm DEN0115 spec
Add a n attachment to the pci_host_generic driver for the Arm DEN0115
PCI Configuration Space Access Firmware Interface [1]. This can be used
when PCI controllers need to implement quirks in the PCI root bus.
To handle this the firmware implements a SMCCC interface the driver can
use to read and write the configuration register.

This has been tested on a Raspberry Pi 4 booting with EDK2.

[1] https://developer.arm.com/documentation/den0115/latest

Sponsored by:	Arm Ltd
Differential Revision:	https://reviews.freebsd.org/D39228
2023-04-24 12:33:50 +01:00
Andrew Turner
7029f2c887 Allow pci_host_generic attachments to manage registers
To allow for attachments that don't use memory mapped registers add
a flag they can set when the base driver shouldn't map them.

Sponsored by:	Arm Ltd
Differential Revision:	https://reviews.freebsd.org/D39227
2023-04-24 12:33:50 +01:00
Andrew Turner
fb421e96c0 Make arm64 pcb padding explicit
There is padding between some fields. Mark those I have found so they
can be reused later if needed.

Sponsored by:	Arm Ltd
2023-04-24 12:33:50 +01:00
Andrew Turner
1bf4991bbc Remove unneeded masks from the arm64 KASAN shadow
When mapping the arm64 KASAN shadow map we use Ln_TABLE_MASK to align
physical addresses, however these should already be aligned either
by rounding to a greater alignment, or the VM subsystem is giving us
a correctly aligned page.

Remove these extra alignment masks.

Reviewed by:	kevans
Sponsored by:	Arm Ltd
Differential Revision:	https://reviews.freebsd.org/D39752
2023-04-24 12:33:50 +01:00
Hu Shunchao
6ed3b9ca25 hid: fix typo in hid_is_collection
hid_input is equal to 0. It is leftover from NetBSD code.

Reviewed by:	hselasky, wulf
MFC after:	1 month
Differential revision:	https://reviews.freebsd.org/D28149
2023-04-24 14:17:51 +03:00
Val Packett
176939bd36 bcm5974: fix wellspring9 pressure settings to handle force sensitivity
Reviewed by:	wulf
MFC after:	1 month
Differential revision:	https://reviews.freebsd.org/D34435
2023-04-24 12:41:52 +03:00
Val Packett
1f40866feb intelspi: add PCI attachment (Lynx/Wildcat/Sunrise Point)
Also adds fixups and cleanups:

- apply the child's mode/speed
- implement suspend/resume support
- use RF_SHAREABLE interrupts
- use bus_delayed_attach_children since the transfer can use interrupts
- add support for newly added spibus features (cs_delay and flags)

Operation tested on Broadwell (Wildcat Point) MacBookPro12,1.
Attachment also tested on Kaby Lake (Sunrise Point) Pixelbook.

Reviewed by:	wulf
MFC after:	1 month
Differential revision:	https://reviews.freebsd.org/D29249
2023-04-24 12:41:52 +03:00
Val Packett
3c08673438 spibus: extend API: add cs_delay ivar, KEEP_CS and NO_SLEEP flags
These feature are required for an upcoming Apple MacBook topcase
(HID over SPI) driver:

A delay after toggling CS is required to avoid anomalies like an extra
junk byte in front of the message. Keeping CS asserted is required to
be able to read a status report after writing a command. (The device
won't return the status if CS was deasserted.)

Sleep is not allowed in the interrupt context where the Apple input
driver runs its transactions. Use a flag to tell the SPI driver to
avoid mtx_sleep.

Reviewed by:	manu (ok to SPI part of larger patch)
MFC afret:	1 month
Differential revision:	https://reviews.freebsd.org/D29534
2023-04-24 12:41:52 +03:00
Val Packett
b344bd3a7d ext2fs: extract crc16 into sys/crc16.h
deduplicate this as it might be needed for other drivers (e.g. Apple SPI-HID)

Sponsored by:	https://www.patreon.com/valpackett
Reviewed by:	chuck, imp
MFC after:	1 month
Differential revision:	https://reviews.freebsd.org/D32879
2023-04-24 12:41:52 +03:00
Bjoern A. Zeeb
da8fa4e37a ath10k: import ath10k driver
Import ISC-licensed ath10k driver assumed to be
based on Linux kvalo/ath.git master at
6bae9de622d3ef4805aba40e763eb4b0975c4f6d.

Import support to redirect fwlogs to kernel messages
from https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/389075

Complement the driver to make compile on FreeBSD
using LinuxKPI with changes covered by #ifdef (__FreeBSD__).
Further select updates were applied since the initial import
in order to keep compiling along with other LinuxKPI based
drivers.

Any other native driver using BUS_PROBE_DEFAULT will attach
ignoring this one by default given bsd_probe_return is set
to a lower priority.

Add the module build framework.

We only support PCI parts.

The firmware is provided by port net/wifi-firmware-ath10k-kmod.

Given the lack of full license texts on most files this is
imported under the draft policy for handling SPDX files (D29226). [1]

Approved by:	core (emaste, 2022-04-08) [1]
MFC after:	2 months
2023-04-23 21:31:07 +00:00
Bjoern A. Zeeb
ebacd8013f athk: import common code for ath1?k drivers
Import common ISC-licensed athk parts assumed to be
based on Linux kvalo/ath.git master at
6bae9de622d3ef4805aba40e763eb4b0975c4f6d.

The only modification should be for FreeBSD module
handling in main.c.

Add the module build framework unconnected to the
build for now.

These files will be shared by ath1?k drivers.

MFC after:	2 months
2023-04-23 21:31:07 +00:00
Bjoern A. Zeeb
06a1103fe3 ath10k: ath11k: add specific LinuxKPI support
Add files needed by ath1?k drivers to linuxkpi/linuxkpi_wlan.
This contain (skeleton) implementations of what is needed to
compile but specifically mhi/qmi/qrtr will need more work for
ath11k.

MFC after:	2 months
2023-04-23 21:31:07 +00:00
Bjoern A. Zeeb
3c4ba5f554 mt76: add module build framework and man pages
Add framework to build if_mt7915 and if_mt7921 with LinuxKPI
as well as initial man pages for the two mt76 chipset drivers.

MFC after:	2 months
2023-04-23 21:31:07 +00:00
Bjoern A. Zeeb
6c92544d7c mt76: import mediatek/mt76 driver
Import ISC-licensed driver parts of mediatek/mt76
assumed to be based on Linux wireless-testing at
a02411a5b98612c12be99349836d99f07db12a77 (tag: wt-2022-11-23).

Complement the driver and LinuxKPI with our own (dummy)
implementations of missing parts (util.h and soc/mediatek/)
as well as changes to make compile on FreeBSD with changes
covered by #ifdef (__FreeBSD__) conditions.
Further select updates were applied since the initial import
in order to keep compiling along with other LinuxKPI based
drivers.

For the moment we only target the mt7915 and mt7921 PCI parts.
More may follow in the future.

Firmware is provided by port net/wifi-firmware-mt76-kmod.

Given the lack of full license texts on non-local files this is
imported under the draft policy for handling SPDX files (D29226). [1]

Approved by:	core (emaste, 2022-04-08) [1]
MFC after:	2 months
2023-04-23 21:29:49 +00:00
Dimitry Andric
42162fb2fe kcsan: add __tsan_mem(cpy|move|set) aliases for clang >= 16
Summary:
After https://github.com/llvm/llvm-project/commit/b4257d3bf58c ("[tsan]
Replace mem intrinsics with calls to interceptors") intrinsic calls to
memcpy, memmove or memset will directly call sanitizer interceptors,
e.g. __tsan_memcpy, __tsan_memmove or __tsan_memset.

Building GENERIC-KCSAN with clang >= 16 would thus result in link errors
similar to:

  ld: error: undefined symbol: __tsan_memcpy
  >>> referenced by cam_compat.c:150 (/usr/src/sys/cam/cam_compat.c:150)
  >>>               cam_compat.o:(cam_compat_handle_0x17)
  >>> referenced by cam_compat.c:151 (/usr/src/sys/cam/cam_compat.c:151)
  >>>               cam_compat.o:(cam_compat_handle_0x17)
  >>> referenced by cam_compat.c:152 (/usr/src/sys/cam/cam_compat.c:152)
  >>>               cam_compat.o:(cam_compat_handle_0x17)
  >>> referenced 1692 more times

Similar to subr_msan.c, add aliases from the existing kcsan_* versions
of these functions to __tsan_* names.

Reviewed by:	markj
MFC after:	3 days
Differential Revision: https://reviews.freebsd.org/D39772
2023-04-23 20:59:06 +02:00
Mark Johnston
4b39a12830 arm64: Disable PAC when booting on a Windows Dev Kit 2023
It appears that PAC registers are configured to trap upon access, but
since the kernel starts in EL1 on this platform it has no ability to
inspect or modify this configuration.  Simply disable PAC on this
platform for now, since the kernel otherwise hangs during boot.

PR:		270472
Reviewed by:	andrew, emaste
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D39748
2023-04-23 13:55:57 -04:00
Mark Johnston
ff13b92475 riscv: Implement bus_describe_intr() for nexus
Reviewed by:	mhorne
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D39750
2023-04-23 13:55:57 -04:00
Mark Johnston
7623cc8f65 arm64: Implement bus_describe_intr() for nexus
Prompted by a compiler warning introduced by
e582d4a2b0 ("arm64: nexus code tidy-up").

Reviewed by:	mhorne, andrew
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D39749
2023-04-23 13:55:57 -04:00
Mark Johnston
93998d166f inpcb: Fix some bugs in _in_pcbinshash_wild()
- In _in_pcbinshash_wild(), we should avoid returning v6 sockets unless
  no other matches are available.  This preserves pre-existing
  semantics.
- Fix an inverted test: when inserting a non-jailed PCB, we want to
  search for the first non-jailed PCB in the hash chain.
- Test the right PCB when searching for a non-jailed PCB.

While here, add a required locking assertion.

Fixes:	7b92493ab1 ("inpcb: Avoid inp_cred dereferences in SMR-protected lookup")
2023-04-23 13:55:57 -04:00
Michael Tuexen
66d6fd5322 sctp: use constants from RFC 8260 to improve compliance
Keep the old constants for backwards compatibility.

MFC after:	1 week
2023-04-23 17:48:05 +02:00
Bjoern A. Zeeb
0e8953b94b LinuxKPI: pci.h: always initialize return value
In pcie_capability_read_*() always initialize the return value to
avoid warnings of uninitialized values in callers.

Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
Differential Revision:	https://reviews.freebsd.org/D39721
2023-04-23 13:29:30 +00:00
Dimitry Andric
d142601887 powerpc: fix a few pmap related functions to return correct types
While experimenting with changing boolean_t to another type, I noticed
that several powerpc pmap related functions returned the wrong type:
boolean_t instead of int.

Fix several declarations and definitions to match the actual pmap
function types: pmap_dev_direct_mapped_t and pmap_ts_referenced_t.

MFC after:	3 days
2023-04-23 15:23:04 +02:00
Zhenlei Huang
b658c0fce1 ip_mroute: Delete unreachable code
As the flag M_WAITOK is passed to ip_encap_attach(), then the function
will never return NULL, and the following code within NULL check branch
will be unreachable.

No functional change intended.

Reviewed by:	kp
Fixes:		6d8fdfa9d5 Rework IP encapsulation handling code
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D39746
2023-04-23 12:47:57 +08:00
Zhenlei Huang
c373e1d6ad if_stf: Delete unreachable code
As the flag M_WAITOK is passed to ip_encap_attach(), then the function
will never return NULL, and the following code within NULL check branch
will be unreachable.

No functional change intended.

Reviewed by:	kp
Fixes:		6d8fdfa9d5 Rework IP encapsulation handling code
MFC after:	1 week
Differential Revision:  https://reviews.freebsd.org/D39746
2023-04-23 12:47:57 +08:00
Dmitry Chagin
66e8f1f7d3 linux(4): Fix arm64 build after b7a6bcdd, missed chunk added
MFC after:		1 month
2023-04-23 01:41:12 +03:00
Dmitry Chagin
0ef30b5d6b linux(4): Bump osrelease to 5.15.0
Linux kernel version 5.15 named Trick or Treat is a 22nd LTS release.

Reviewed by:		trasz, emaste
Differential Revision:	https://reviews.freebsd.org/D39649
MFC after:		1 month
2023-04-22 22:18:08 +03:00
Dmitry Chagin
b7a6bcdd13 linux(4): Export the AT_MINSIGSTKSZ depending on the process osreldata
AT_MINSIGSTKSZ has appeared in the 5.13.0 Linux kernel first time.

Differential Revision:	https://reviews.freebsd.org/D39648
MFC after:		1 month
2023-04-22 22:17:52 +03:00
Dmitry Chagin
70eab81d6f linux(4): Export the AT_EXECFN depending on the process osreldata
AT_EXECFN has appeared in the 2.6.26 Linux kernel first time.

Reviewed by:		emaste
Differential Revision:	https://reviews.freebsd.org/D39647
MFC after:		1 month
2023-04-22 22:17:36 +03:00
Dmitry Chagin
40c36c4674 linux(4): Export the AT_RANDOM depending on the process osreldata
AT_RANDOM has appeared in the 2.6.30 Linux kernel first time.

Reviewed by:		emaste
Differential Revision:	https://reviews.freebsd.org/D39646
MFC after:		1 month
2023-04-22 22:17:17 +03:00
Dmitry Chagin
56c5230afd linux(4): Fix LINUX_AT_COUNT comments
Differential Revision:	https://reviews.freebsd.org/D39645
MFC after:		1 month
2023-04-22 22:16:43 +03:00
Dmitry Chagin
7d8c983983 linux(4): Deduplicate linux_copyout_auxargs()
Export default MINSIGSTKSZ value for the x86 until we do not preserve AVX
registers in the signal context.

Differential Revision:	https://reviews.freebsd.org/D39644
MFC after:		1 month
2023-04-22 22:16:02 +03:00
Justin Hibbits
0468e89cb3 zfs/powerpc64: Fix big-endian powerpc64 asm
The powerpc asm from openzfs assumes that big-endian is always ELFv1 and
ELFv2 is always little-endian, while FreeBSD uses ELFv2 everywhere.  Add
the necessary bits to the checksum asm to work on big-endian ELFv2.

This was also submitted upstream as PR#14779.

Tested by:	dbaio
2023-04-22 11:27:49 -04:00
Vladimir Kondratyev
06c844e175 LinuxKPI: Fix building on 32bit archs
Reported by:	Jenkins
Fixes:	e5cf9deb61 ("LinuxKPI: Add bitmap_to_arr32() to <linux/bitmap.h>")
2023-04-22 13:25:49 +03:00
Vladimir Kondratyev
af22da75a0 Bump __FreeBSD_version after LinuxKPI updates 2023-04-22 11:29:29 +03:00
Vladimir Kondratyev
53d821d651 LinuxKPI: Define noinline_for_stack compiler attribute
It is identical to noinline and used for documentation reasons.

Required by:	drm-kmod 5.15-lts
Reviewed by:	manu
Differential Revision:	https://reviews.freebsd.org/D39553
2023-04-22 11:29:29 +03:00
Vladimir Kondratyev
e5cf9deb61 LinuxKPI: Add bitmap_to_arr32() to <linux/bitmap.h>
bitmap_to_arr32() copies contents of bitmap to a uint32_t array of bits

Required by:	drm-kmod 5.15-lts
Reviewed by:	manu
Differential Revision:	https://reviews.freebsd.org/D39552
2023-04-22 11:29:29 +03:00
Vladimir Kondratyev
b14c03f808 LinuxKPI: define acpi_put_table() in <acpi/acpi.h>
FreeBSD ACPICA calls it AcpiPutTable()

Required by:	drm-kmod 5.15-lts
Reviewed by:	manu
Differential Revision:	https://reviews.freebsd.org/D39551
2023-04-22 11:29:29 +03:00
Warner Losh
7b42f338d7 freebsd32: Regen
Need to regen freebsd32 as well when sys/kern/syscalls.master is
updated.

Sponsored by:		Netflix
2023-04-21 10:25:10 -06:00
Warner Losh
2c19beeed2 newvers: Use correct regexp
There's no need to quote the # here. Inside of regexp, it's not treated
like a comment from an awk perspective. And inside if '' it's not
treated as special by the shell. gawk also warns.

Sponsored by:		Netflix
2023-04-21 10:24:25 -06:00
Mark Johnston
92fa22c6a5 riscv: Compile instr_size.c into the kernel when DTrace is configured
Reported by:	Jenkins
Fixes:	080e56a6c9 ("dtrace: expose dtrace_instr_size() to userland and implement it for riscv")
2023-04-21 09:26:17 -04:00
Randall Stewart
01216268f8 tcp: hpts needs to still call output even after input.
The other stacks it turns out actually expect the output to be called and can become stuck if it is
not. This is because they run there timer code from there and the input routine does not always
assure a timer is running. The real longterm fix here might be to go into the other stacks (rack and bbr)
and make sure that a timer is running after input if you don't do output.. as well as call the timer functions.
This would cut down on calls from hpts. But I think its too dramatic of a change for the immediate time.

Reviewed by: tuexen, glebius
Sponsored by: Netflix Inc
Differential Revision:https://reviews.freebsd.org/D39738
2023-04-21 07:12:25 -04:00
Mariusz Zaborski
444c661545 mpr: don't use hardcoded value in debug branch
Pointed out by:	imp
Sponsored by:   Klara Inc.
2023-04-21 10:01:38 +02:00
Mariusz Zaborski
ea6597c38c mpr: fix copying of event_mask
Before the commit 6cc44223cb the
field event_mask was fully copied to the EventMasks field.
After this commit the event_mask (uint8_t) is 4 times casted to
EventMask (uint32_t). Because of that 24 bits of each event_mask array
is lost.

This commits brings back simple copying of field, and after words
converting 32 bits field to the requested endian.

I don't think we need more sophisticated method,
as the array is of size 4 (for 32 bits version).

Reviewed by:	imp
MFC after:	1 week
Sponsored by:	Klara Inc.
Differential Revision:	https://reviews.freebsd.org/D39562
2023-04-21 10:01:38 +02:00
Austin Shafer
3f686532c9 linuxkpi: Fix __sg_alloc_table_from_pages loop
Commit 3e0856b63f updated
__sg_alloc_table_from_pages to use the same API as linux, but modified
the loop condition when going over the pages in a sg list. Part of the
change included moving the sg_next call out of the for loop and into the
body, which causes an off by one error when traversing the list. Since
sg_next is called before the loop body it will skip the first element
and read one past the last element.

This caused panics when running PRIME with nvidia-drm as the off-by-one
issue causes a NULL dereference.

Reviewed by:	bz, hselasky
Differential Revision:	https://reviews.freebsd.org/D39628
Fixes:	3e0856b63f ("linuxkpi: Fix `sg_alloc_table_from_pages()` to have the same API as Linux")
2023-04-21 09:56:50 +02:00
Warner Losh
9abba78acc syscalls: regenerate
The 4.2 sigreturn was a bit of a enima so the 4.2 was remove. Regenerate
to cope the very minor changes in comments and one string.

Sponsored by:		Netflix
2023-04-20 23:39:23 -06:00
Warner Losh
602b575a88 syscall.master: Remove stray 4.2
Back in 4.3BSD, the system call table wasn't generated, and there was an
entry:
        "4.2 sigreturn",        /* 139 = old 4.2 sigreturn */
This got converted to
139     OBSOL   0 4.2 sigreturn
in 4.3 RENO. Since it was obsolete, nothing bad happened. In fact,
there was code in makeyscalls.sh to cope:
        {       comment = $4
                for (i = 5; i <= NF; i++)
                        comment = comment " " $i
                if (NF < 5)
                        $5 = $4
        }
so the generated comment in syscalls.c was almost correct:
        "obs_4.2",                      /* 139 = obsolete 4.2 sigreturn */
a bug that we have to this very day, despite makesyscalls.sh being
rewritten in lua.

However, this historical wart is the only place in our current
syscalls.master file where we have an extra field for the 'not
generated' class of system calls. Remove the historical wart so that the
re-write of makesyscalls.lua can be simpler (so, I hope, qemu's bsd-user
can large swathes of code automatically generated too). This should help
make things more understandable (changes to simplify makesyscalls.lue
aren't quite debugged, so have to wait for another day).

There's 3 different obsolete sigreturns (but only 1 that was ever in
FreeBSD 2.x and newer).

Sponsored by:		Netflix
2023-04-20 23:39:23 -06:00
Warner Losh
8fc68c1b34 Remove stray line
Forgot to remove this in 559b94a122.

Fixes:		559b94a122
Sponsored by:	Netflix
2023-04-20 23:39:23 -06:00
Navdeep Parhar
ca5391bd85 cxgbe(4): Update firmwares to version 1.27.3.0
These are the changes since the last update (copy-pasted from the
release notes for Chelsio Unified Wire v3.18.0.0):

====================
Version : 1.27.3.0
Date    : 04/07/2023

Fixes
-----
BASE:
- Fixed a hang if module eeprom reads gives invalid data.
- KR backlplane no-fec link problem fixed.
OFLD:
- iscsi ddp errors fixed.
- iwarp connection abort in rare cases causing NIC traffic hang fixed.

ENHANCEMENTS
------------
BASE:
- Cisco GLC-TE 1G modules support added.

====================
Version : 1.27.1.0
Date    : 12/02/2022

Fixes
-----
BASE:
- memwrite dsgl cannot be used for T5.
OFLD:
- Enabled FCoE in SO adapters.
- TOE-TLS crash fixed.
- iscsi hang fixed.

MFC after:	2 weeks
Sponsored by:	Chelsio Communications
2023-04-20 20:57:38 -07:00
Warner Losh
559b94a122 syscall.master: Fix comments
Have more accruate comments. While #if, #else, etc are copied to the
header files, lines that don't start with # are not.  And #include files
are only output to sysinc (which winds up at the front of init_sysent.c
which seems a bit odd). This is all radically undocumented, and likely
has drifted somewhat from 4.4BSD and what other systems do (they've
drifted too, fwiw).

Sponsored by:		Netflix
2023-04-20 16:18:02 -06:00
Warner Losh
c1e987e062 makesyscalls.lua: Minor fluff removal
luacheck pointed out two minor issues: line isn't declared as a global,
so declare it local. Also remove an unused parameter.

Suggested by:		kevans
Sponsored by:		Netflix
2023-04-20 16:17:58 -06:00
Warner Losh
8341a74afe makesyscalls.lua: Use "sysxxx" consistently
Find the few places where we use 'sysxxx' and use "sysxxx" instead to be
more consistent.

Sponsored by:		Netflix
2023-04-20 16:17:25 -06:00
Warner Losh
1dd350fce0 makesyscalls.lua: Make more luaish
x["y"] can be written as x.y, which looks better and is a more typical
lua idiom.

Sponsored by:		Netflix
Reviewed by:		kevans
Differential Revision:	https://reviews.freebsd.org/D39709
2023-04-20 16:17:25 -06:00
Navdeep Parhar
2791335104 cxgbe(4): Dump the firmware log before falling back to a minimal config.
It might have errors that explain why the attempted configuration
failed.

MFC after:	1 week
Sponsored by:	Chelsio Communications
2023-04-20 12:56:24 -07:00
Gleb Smirnoff
8e813d07c6 netstat: fix printing of TCP pcbs with -A
This change touches both kernel and netstat(1), but either of the changes
will fix printing pcb addresses with -A.

The thing is that historically netstat(1) treated TCP differently, and
printed tcpcb address instead of inpcb address.  This is not documented
anywhere!  With e68b379244 these two addresses became the same.  It is
highly likely they will be the same for a long time, but it might be they
will start to differ again in a far future.  My proposal is to stop
treating TCP differently with netstat(1) and right now is a good opportunity
to do that, since there will be no behavior change at all.  The kernel
change to tcp_inptoxtp() will go into stable/14 to make it compatible with
netstat(1) binary from stable/13.  We can drop it later, probably together
with in_ppcb pointer from inpcb.  The in_ppcb in xinpcb will stay for size
compatibility.

Reviewed by:		tuexen, rrs
Differential Revision:	https://reviews.freebsd.org/D39736
2023-04-20 12:42:42 -07:00
Dimitry Andric
4214005276 kern.mk: clang >= 16 already infers ELFv2 for powerpc64
There is no need to pass -mabi=elfv2 explicitly anymore, and with clang
16 in fact results in a "unused argument" warning.

MFC after:	3 days
2023-04-20 21:27:27 +02:00
Bjoern A. Zeeb
72ef722b2a dpaa2: add console support for FDT based systems
Add DPAA2 console support for MC and AIOP (latter untested) for FDT
systems.  ACPI systems are prepared but need some proper bus function
in order to get the address from MC (and likely a file splitup then).
This will come at a later stage once other ACPI/FDT bus parts are
cleared up.
The work was originally done in July 2022 and finally switched to
bus_space[1] lately to be ready for main.

Suggested by:	andrew [1]
Reviewed by:	dsl
MFC after:	2 weeks
Differential Revision: https://reviews.freebsd.org/D38592
2023-04-20 18:59:03 +00:00
John Baldwin
bf04385521 arm: Use C89 function declaration for db_read_bytes. 2023-04-20 11:00:46 -07:00
John Baldwin
048606bec1 perfmon(4): Use a C89 function definition for a SYSINIT. 2023-04-20 11:00:46 -07:00
Christos Margiolis
1fef7abdc7 dtrace: add register bindings for RISC-V
Reviewed by:	mhorne, markj
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D39611
2023-04-20 13:35:57 -04:00
Christos Margiolis
75081b9ed8 dtrace: use dtrace_instr_size() in the riscv dtrace_subr.c
No functional change intended.

Reviewed by:	mhorne, markj
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D39652
2023-04-20 13:35:57 -04:00
Christos Margiolis
080e56a6c9 dtrace: expose dtrace_instr_size() to userland and implement it for riscv
dtrace_instr_size() is needed by the forthcoming RISC-V port of kinst,
as well as by libdtrace in D38825 for both amd64 and RISC-V.

Reviewed by:	markj, mhorne
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D39489
2023-04-20 13:35:57 -04:00
Christos Margiolis
1a149d65ba dtrace: get rid of uchar_t types
Callers are specifying uint8_t anyway and this slightly reduces
dependencies on compatibility typedefs.  No functional change intended.

Reviewed by:	markj, mhorne
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D39490
2023-04-20 13:35:56 -04:00
Dmitry Chagin
de4da6cd04 x86: Move i386 timerreg.h to x86
Reviewed by:		emaste, jhb
Differential Revision:	https://reviews.freebsd.org/D39656
MFC after:		1 month
2023-04-20 19:42:59 +03:00
Dmitry Chagin
d1f4c44aa8 x86: Move i386 ppireg.h to x86
Differential Revision:	https://reviews.freebsd.org/D39655
MFC after:		1 month
2023-04-20 19:42:59 +03:00
Mark Johnston
5fd1a67e88 inpcb: Release the inpcb cred reference before freeing the structure
Now that the inp_cred pointer is accessed only while the inpcb lock is
held, we can avoid deferring a crfree() call when freeing an inpcb.

This fixes a problem introduced when inpcb hash tables started being
synchronized with SMR: the credential reference previously could not be
released until all lockless readers have drained, and there is no
mechanism to explicitly purge cached, freed UMA items.  Thus, ucred
references could linger indefinitely, and since ucreds hold a jail
reference, the jail would linger indefinitely as well.  This manifests
as jails getting stuck in the DYING state.

Discussed with:	glebius
Tested by:	glebius
Sponsored by:	Klara, Inc.
Sponsored by:	Modirum MDPay
Differential Revision:	https://reviews.freebsd.org/D38573
2023-04-20 12:13:06 -04:00
Mark Johnston
7b92493ab1 inpcb: Avoid inp_cred dereferences in SMR-protected lookup
The SMR-protected inpcb lookup algorithm currently has to check whether
a matching inpcb belongs to a jail, in order to prioritize jailed
bound sockets.  To do this it has to maintain a ucred reference, and for
this to be safe, the reference can't be released until the UMA
destructor is called, and this will not happen within any bounded time
period.

Changing SMR to periodically recycle garbage is not trivial.  Instead,
let's implement SMR-synchronized lookup without needing to dereference
inp_cred.  This will allow the inpcb code to free the inp_cred reference
immediately when a PCB is freed, ensuring that ucred (and thus jail)
references are released promptly.

Commit 220d892129 ("inpcb: immediately return matching pcb on lookup")
gets us part of the way there.  This patch goes further to handle
lookups of unconnected sockets.  Here, the strategy is to maintain a
well-defined order of items within a hash chain so that a wild lookup
can simply return the first match and preserve existing semantics.  This
makes insertion of listening sockets more complicated in order to make
lookup simpler, which seems like the right tradeoff anyway given that
bind() is already a fairly expensive operation and lookups are more
common.

In particular, when inserting an unconnected socket, in_pcbinhash() now
keeps the following ordering:
- jailed sockets before non-jailed sockets,
- specified local addresses before unspecified local addresses.

Most of the change adds a separate SMR-based lookup path for inpcb hash
lookups.  When a match is found, we try to lock the inpcb and
re-validate its connection info.  In the common case, this works well
and we can simply return the inpcb.  If this fails, typically because
something is concurrently modifying the inpcb, we go to the slow path,
which performs a serialized lookup.

Note, I did not touch lbgroup lookup, since there the credential
reference is formally synchronized by net_epoch, not SMR.  In
particular, lbgroups are rarely allocated or freed.

I think it is possible to simplify in_pcblookup_hash_wild_locked() now,
but I didn't do it in this patch.

Discussed with:	glebius
Tested by:	glebius
Sponsored by:	Klara, Inc.
Sponsored by:	Modirum MDPay
Differential Revision:	https://reviews.freebsd.org/D38572
2023-04-20 12:13:06 -04:00
Mark Johnston
3e98dcb3d5 inpcb: Move inpcb matching logic into separate functions
These functions will get some additional callers in future revisions.

No functional change intended.

Discussed with:	glebius
Tested by:	glebius
Sponsored by:	Modirum MDPay
Sponsored by:	Klara, Inc.
Differential Revision:	https://reviews.freebsd.org/D38571
2023-04-20 12:13:06 -04:00
Mark Johnston
fdb987bebd inpcb: Split PCB hash tables
Currently we use a single hash table per PCB database for connected and
bound PCBs.  Since we started using net_epoch to synchronize hash table
lookups, there's been a bug, noted in a comment above in_pcbrehash():
connecting a socket can cause an inpcb to move between hash chains, and
this can cause a concurrent lookup to follow the wrong linkage pointers.
I believe this could cause rare, spurious ECONNREFUSED errors in the
worse case.

Address the problem by introducing a second hash table and adding more
linkage pointers to struct inpcb.  Now the database has one table each
for connected and unconnected sockets.

When inserting an inpcb into the hash table, in_pcbinhash() now looks at
the foreign address of the inpcb to figure out which table to use.  This
ensures that queue linkage pointers are stable until the socket is
disconnected, so the problem described above goes away.  There is also a
small benefit in that in_pcblookup_*() can now search just one of the
two possible hash buckets.

I also made the "rehash" parameter of in(6)_pcbconnect() unused.  This
parameter seems confusing and it is simpler to let the inpcb code figure
out what to do using the existing INP_INHASHLIST flag.

UDP sockets pose a special problem since they can be connected and
disconnected multiple times during their lifecycle.  To handle this, the
patch plugs a hole in the inpcb structure and uses it to store an SMR
sequence number.  When an inpcb is disconnected - an operation which
requires the global PCB database hash lock - the write sequence number
is advanced, and in order to reconnect, the connecting thread must wait
for readers to drain before reusing the inpcb's hash chain linkage
pointers.

raw_ip (ab)uses the hash table without using the corresponding
accessors.  Since there are now two hash tables, it arbitrarily uses the
"connected" table for all of its PCBs.  This will be addressed in some
way in the future.

inp interators which specify a hash bucket will only visit connected
PCBs.  This is not really correct, but nothing in the tree uses that
functionality except raw_ip, which as mentioned above places all of its
PCBs in the "connected" table and so is unaffected.

Discussed with:	glebius
Tested by:	glebius
Sponsored by:	Klara, Inc.
Sponsored by:	Modirum MDPay
Differential Revision:	https://reviews.freebsd.org/D38569
2023-04-20 12:13:06 -04:00
Bjoern A. Zeeb
35f7fa4ac1 LinuxKPI: 802.11: improve assertion and tkip code
Move a KASSERT out of a function and make it a CTASSERT with
appropriate comments.

Skeleton implement two tkip functions, still left TODO, initializing
variables with dummy values to quiten compiler warnings.  It is
unclear to me if we should still ever properly implement TKIP
compat code at this point.  If so the current code gives a good
idea what needs to be done in addition to allocating references
to real state along with keyconf.

Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2023-04-20 16:07:50 +00:00
Bjoern A. Zeeb
7db7bfe1a7 iwlwifi: quieten more compiler warnings
Quieten some more (valid) gcc warnings and disable dead code.
There are more warnings, some probably a compiler problem, the
other related to firmware structs which I do not want to adjust
just locally.  Leave a comment to revisit after a next driver
update.

Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2023-04-20 16:07:05 +00:00
Bjoern A. Zeeb
74e908b3c6 LinuxKPI: fix READ_ONCE() -Wcast-equal warnings
Rather than using ACCESS_ONCE() in READ_ONCE() add a missing cast
to const in order to satisfy -Wcast-equal by gcc.
Sadly we cannot do the same to WRITE_ONCE() which still is very
noisy.

Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
Reviewed by:	hselasky
Differential Revision: https://reviews.freebsd.org/D39706
2023-04-20 11:51:22 +00:00
Alexander V. Chernikov
56d4550c4d ifnet: factor out interface renaming into a separate function.
This change is required to support interface renaming via Netlink.
No functional changes intended.

Reviewed by:	zlei
Differential Revision: https://reviews.freebsd.org/D39692
MFC after:	2 weeks
2023-04-20 10:23:37 +00:00
Mateusz Guzik
9c4e270822 zfs: fix up EINVAL from getdirentries on .zfs
PR:	270909
2023-04-20 08:38:28 +00:00
Mateusz Guzik
7ff3143809 zfs: add missing vn state transition for .zfs
Reported by:	des
2023-04-20 08:09:59 +00:00
Bjoern A. Zeeb
f369f10dd8 LinuxKPI: 802.11: fix a -Wenum-compare warning
We are asserting that two values from different enums are the same.
gcc warns about these.  Cast the values to (int) to avoid the warning.

Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2023-04-19 21:49:17 +00:00
Bjoern A. Zeeb
b2dcb84868 LinuxKPI: skbuff.h: fix -Warray-bounds warnings
Harmonize sk_buff_head and sk_buff further and fix -Warray-bounds
warnings reports by gcc.  At the same time simplify some code by
re-using other functions or factoring some code out.

Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2023-04-19 21:49:00 +00:00
Dimitry Andric
87f55ab0b4 ichiic: use bool for one-bit wide bit-fields
A one-bit wide bit-field can take only the values 0 and -1. Clang 16
introduced a warning that "implicit truncation from 'int' to a one-bit
wide bit-field changes value from 1 to -1". Fix by using c99 bool.

Reported by:	Clang
Reviewed by:	emaste, wulf
MFC after:	3 days
Differential Revision: https://reviews.freebsd.org/D39665
2023-04-19 22:25:50 +02:00
Igor Ostapenko
0e0c47ecd6 vfs cache: fix vfs.cache.stats.* name typos
Two vfs.cache.stats names are fixed:
- s/.dotdothis/.dotdothits/
- s/.posszaps/.poszaps/

Signed-off-by: Igor Ostapenko <pm@igoro.pro>
[mjg: massaged the header a little bit]
2023-04-19 18:47:38 +00:00
Randall Stewart
4e8a20a764 tcp: rack the request level logging is a bit too noisy when doing point logging.
When doing request level BB logging the hybrid_bw_log() does not have proper screening to minimize logging
when point level logging is in use. Lets fix it properly so you have to have the proper knobs set to get the
more noisy logging.

Reviewed by: tuexen
Sponsored by: Netflix Inc
Differential Revision:https://reviews.freebsd.org/D39699
2023-04-19 14:02:12 -04:00
Randall Stewart
7a842346c3 tcp: Rack can crash with the new non-TSO fix..
Turns out the location of the check to see if we can do output is in the wrong place. We need
to jump off to the compressed acks before handling that case since th is NULL in the
compressed ack case which is handled differently anyway.

Reviewed by: tuexen
Sponsored by: Netflix Inc
Differential Revision:https://reviews.freebsd.org/D39690
2023-04-19 13:17:04 -04:00
Randall Stewart
303246dcdf We have a TCP_LOG_CONNEND log that should come out at the very last log of every connection. This
holds some nice stats about why/how the connection ended. Though with the current code it does not
come out without accounting due to the placement of the ifdefs. Also we need to make sure the stacks
fini has ran before calling in from tcp_subr so we get all logs the stack may make at its ending.

Reviewed by: rscheff
Sponsored by: Netflix Inc
Differential Revision:https://reviews.freebsd.org/D39693
2023-04-19 12:54:25 -04:00
Navdeep Parhar
7adf138ba9 cxgbe/iw_cxgbe: debug routines to dump STAG (steering tag) entries.
t4_dump_stag to dump hw state for a known STAG.

t4_dump_all_stag to dump hw state for all valid STAGs.  This routine
walks the entire STAG region looking for valid entries and this can take
a while for some configurations.

MFC after:	1 week
Sponsored by:	Chelsio Communications
2023-04-19 09:38:14 -07:00
Justin Hibbits
12e99b63d2 ofed: Fix a logic inversion from IfAPI conversion
Reported by:	bartosz.sobczak_intel.com
Fixes:		3e142e0767 ("ofed: Mechanically convert to IfAPI")
Sponsored by:	Juniper Networks, Inc.
2023-04-19 11:56:25 -04:00
Dmitry Salychev
4cd9661428
dpaa2: Avoid dpaa2_cmd race conditions
struct dpaa2_cmd is no longer malloc'ed, but can be allocated on stack
and initialized with DPAA2_CMD_INIT() on demand. Drivers stopped caching
their DPAA2 command objects (and associated tokens) in the software
contexts in order to avoid using them concurrently.

Reviewed by:		bz
Approved by:		bz (mentor)
MFC after:		3 weeks
Differential Revision:	https://reviews.freebsd.org/D39509
2023-04-19 17:39:05 +02:00
Bjoern A. Zeeb
f621b087c0 iwlwifi: rtw88: rtw89: fix gcc warnings
Fix -Wno-format and unused variables warnings with gcc by adopting
(to|the) FreeBSD-specific code.

Reported by:	jhb
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
Reviewed by:	jhb
Differential Revision: https://reviews.freebsd.org/D39673
2023-04-19 12:21:40 +00:00
Randall Stewart
960985a209 tcp: bbr.c is non-capable of doing ECN and sets an INP flag to fend off ECN however our syncache is not aware of that flag.
We need to make the syncache aware of the flag and not do ECN if its set. Note that this
is not 100% full proof but the best we can do (i.e. its still possible that you can get in a
situation where the peer try's to do ecn).

Reviewed by: tuexen, glebius, rscheff
Sponsored by: Netflix Inc
Differential Revision:https://reviews.freebsd.org/D39672
2023-04-18 12:21:56 -04:00
Kristof Provost
2e6cdfe293 pf: change pf_rules_lock and pf_ioctl_lock to per-vnet locks
Both pf_rules_lock and pf_ioctl_lock only ever affect one vnet, so
there's no point in having these locks affect other vnets.
(In fact, the only lock in pf that can affect multiple vnets is
pf_end_lock.)

That's especially important for the rules lock, because taking the write
lock suspends all network traffic until it's released. This will reduce
the impact a vnet running pf can have on other vnets, and improve
concurrency on machines running multiple pf-enabled vnets.

Reviewed by:	zlei
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D39658
2023-04-19 09:50:52 +02:00
Konstantin Belousov
617a11eab6 x86: initialize use_xsave once
The explanation from https://reviews.freebsd.org/D39637 by stevek:
The "use_xsave" variable is a global and that is only supposed to be
initialized early before scheduling gets started. However, with the way
the ifuncs for "fpusave" and "fpurestore" are implemented, the value
could be changed at runtime when scheduling is active if "use_xsave"
was set to 0 by the tunable. This leaves a window of opportunity where
"use_xsave" gets re-initialized to 1 and a context switch could occur
with a thread that was not set up to be able to use xsave functionality.
This can lead to an "privileged instruction fault".

The fix is to protect "use_xsave" from being initialized more than once.

Reported and reviewed by:	stevek
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D39660
2023-04-19 02:22:28 +03:00
Konstantin Belousov
93ca6ff295 umtx: allow to configure minimal timeout (in nanoseconds)
PR:	270785
Reviewed by:	markj, mav
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D39584
2023-04-19 02:22:28 +03:00
Konstantin Belousov
7aeea73e30 syncer vnode: add VOP_GETWRITEMOUNT() definition explicitly
Since syncer vnode vector does not provide a fallback to the default
one, its VOP_GETWRITEMOUNT() implementation implicitly returned
EOPNOTSUPP, which means that syncer ignored suspension.

Reported and tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2023-04-19 02:21:40 +03:00
Konstantin Belousov
d8a096621b sync_vnode(): add assert to check vn_start_write() correctness
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2023-04-19 02:21:40 +03:00
Steve Kiernan
8deb442cf7 mac: Honor order when registering MAC modules.
Ensure MAC modules are inserted in order that they are registered.

Reviewed by:	markj
Obtained from:	Juniper Networks, Inc.
Differential Revision: https://reviews.freebsd.org/D39589
2023-04-18 15:36:27 -04:00
Mateusz Guzik
5e954b9216 tmpfs: add missing vop_fplookup ops to tmpfs_fifoop_entries
Reported by:	gbe
PR:	270917
2023-04-18 18:06:30 +00:00
Marius Strobl
8defc88c13 gem(4): Remove onboard-only Sun ERI and remnants of SBus support
These bits are obsolete since 58aa35d429.
This change reverts part of 9ba2b298df as
well as effectively bd3d9826d7, i. e. the
SBus-related modifications. This also gets rid of a nasty hack required
as bus_{read,write}_N(9) doesn't really fit bus_space_subregion(9).
2023-04-18 19:17:24 +02:00
Marius Strobl
bd15d31cef mmc(4): Don't call bridge driver for timings not requiring tuning
The original idea behind calling into the bridge driver was to have the
logic deciding whether tuning is actually required for a particular bus
timing in a given slot as well as doing the sanity checking only on the
controller layer which also generally is better suited for these due to
say SDHCI_SDR50_NEEDS_TUNING. On another thought, not every such driver
should need to check whether tuning is required at all, though, and not
everything is SDHCI in the first place.
Adjust sdhci{,_fsl_fdt}(4) accordingly, but keep sdhci_generic_tune() a
bit cautious still.
2023-04-18 19:17:24 +02:00
Randall Stewart
2ad584c555 tcp: Inconsistent use of hpts_calling flag
Gleb has noticed there were some inconsistency's in the way the inp_hpts_calls flag was being used. One
such inconsistency results in a bug when we can't allocate enough sendmap entries to entertain a call to
rack_output().. basically a timer won't get started like it should. Also in cleaning this up I find that the
"no_output" side of input needs to be adjusted to make sure we don't try to re-pace too quickly outside
the hpts assurance of 250useconds.

Another thing here is we end up with duplicate calls to tcp_output() which we should not. If packets go
from hpts for processing the input side of tcp will call the output side of tcp on the last packet if it is needed.
This means that when that occurs a second call to tcp_output would be made that is not needed and if pacing
is going on may be harmful.

Lets fix all this and explicitly state the contract that hpts is making with transports that care about the
flag.

Reviewed by: tuexen, glebius
Sponsored by: Netflix Inc
Differential Revision:https://reviews.freebsd.org/D39653
2023-04-17 17:10:26 -04:00
Steve Kiernan
fb5ff7384c arm64: Use FULLKERNEL instead of .ALLSRC in .bin target
Using .ALLSRC may get additional arguments that we may not want
and could cause the objcopy to fail.

Reviewed by:	emaste
Obtained from:	Juniper Networks, Inc.
Differential Revision: https://reviews.freebsd.org/D39639
2023-04-18 11:41:57 -04:00
Kristof Provost
af94d8cc17 pf: fix incorrect lock define
PF_TABLE_STATS_ASSERT() should be checking pf_table_stats_lock not
pf_rules_lock.

Fortunately the define is not yet used anywhere so this was harmless.
Fix it anyway, in case it does get used.

Sponsored by:	Rubicon Communications, LLC ("Netgate")
2023-04-18 15:51:05 +02:00
Hans Petter Selasky
1943c40cd6 mlx5en(4): Don't wait for receive queue to fill up with mbufs during open channels.
Failure to get mbufs may be transient.
Don't permanently fail to open the channels due to lack of mbufs.
This also makes modifying channel parameters faster.

MFC after:	1 week
Sponsored by:	NVIDIA Networking
2023-04-18 15:01:07 +02:00
Hans Petter Selasky
6bd4bb9bdb mlx5en(4): Explain why CQE zipping is off.
MFC after:	1 week
Sponsored by:	NVIDIA Networking
2023-04-18 15:01:07 +02:00
Hans Petter Selasky
80b4ef6d10 mlx5: Remove unused debugfs node pointers.
No functional change intended.

MFC after:	1 week
Sponsored by:	NVIDIA Networking
2023-04-18 15:01:07 +02:00
Hans Petter Selasky
aa7bbdabde mlx5: Implement diagostic counters as sysctl(8) nodes.
MFC after:	1 week
Sponsored by:	NVIDIA Networking
2023-04-18 15:01:07 +02:00
Hans Petter Selasky
95bf70a4bf mlx5: Don't give zero number of pages to the firmware.
Can happen when using virtual mlx5_core<N> functions, VFs.

MFC after:	1 week
Sponsored by:	NVIDIA Networking
2023-04-18 15:01:06 +02:00
Hans Petter Selasky
273bfac08f mlx5: Implement mlx5_core_modify_cq_by_mask().
Implement one CQ modify function supporting all firmware versions,
instead of having more variants of CQ modify.

MFC after:	1 week
Sponsored by:	NVIDIA Networking
2023-04-18 15:01:06 +02:00
Hans Petter Selasky
2f7e9a8a21 mlx5: Fix duplicate free of default flow rule in error case.
MFC after:	1 week
Sponsored by:	NVIDIA Networking
2023-04-18 15:01:06 +02:00
Hans Petter Selasky
b0b87d9151 mlx5: Make mlx5_del_flow_rule() NULL safe.
This change factors out repeated NULL checks.

No functional change intended.

MFC after:	1 week
Sponsored by:	NVIDIA Networking
2023-04-18 15:01:06 +02:00
Hans Petter Selasky
3bb3e4768f mlx5: Make MLX5_COMP_EQ_SIZE tunable.
When using hardware pacing, this value can be increased, because more SQ's
means more EQ events aswell. Make it tunable, hw.mlx5.comp_eq_size .

MFC after:	1 week
Sponsored by:	NVIDIA Networking
2023-04-18 15:01:06 +02:00
Randall Stewart
37229fed38 tcp: Blackbox logging and tcp accounting together can cause a crash.
If you currently turn BB logging on and in combination have TCP Accounting on we can get a
crash where we have no NULL check and we run out of memory. Also lets make sure we
don't do a divide by 0 in calculating any BB ratios.

Reviewed by: tuexen
Sponsored by: Netflix Inc
Differential Revision:https://reviews.freebsd.org/D39622
2023-04-17 13:52:00 -04:00
Alexander V. Chernikov
28abf63277 netlink: sync interface IFLA attributes
MFC after:	2 weeks
2023-04-18 12:34:05 +00:00
Gordon Bergling
105e397eb6 kern_sysctl: Remove double words in source code comments
- s/on on/on/

MFC after:	5 days
2023-04-18 07:14:57 +02:00
Gordon Bergling
93e4914816 net80211: Remove double words in source code comments
- s/we we/we/

MFC after:	5 days
2023-04-18 07:14:50 +02:00
Stephen J. Kiernan
76735c7439 flash: Add "n25q64" to mx25l driver
This is for 64Mb Micron N25Q serial NOR flash memory

Obtained from:	Juniper Networks, Inc.
2023-04-18 00:21:17 -04:00
Jason A. Harmening
0c01203e47 vfs_lookup(): re-check v_mountedhere on lock upgrade
The VV_CROSSLOCK handling logic may need to upgrade the covered
vnode lock depending upon the requirements of the filesystem into
which vfs_lookup() is walking.  This may involve transiently
dropping the lock, which can allow the target mount to be unmounted.

Tested by:	pho
Reviewed by:	markj
Differential Revision:	https://reviews.freebsd.org/D39272
2023-04-17 20:31:40 -05:00
Jason A. Harmening
93fe61afde unionfs_mkdir(): handle dvp reclamation
The underlying VOP_MKDIR() implementation may temporarily drop the
parent directory vnode's lock.  If the vnode is reclaimed during that
window, the unionfs vnode will effectively become unlocked because
the its v_vnlock field will be reset.  To uphold the locking
requirements of VOP_MKDIR() and to avoid triggering various VFS
assertions, explicitly re-lock the unionfs vnode before returning
in this case.

Note that there are almost certainly other cases in which we'll
similarly need to handle vnode relocking by the underlying FS; this
is the only one that's caused problems in stress testing so far.
A more general solution, such as that employed for nullfs in
null_bypass(), will likely need to be implemented.

Tested by:	pho
Reviewed by:	kib, markj
Differential Revision: https://reviews.freebsd.org/D39272
2023-04-17 20:31:40 -05:00
Jason A. Harmening
d711884e60 Remove unionfs_islocked()
The implementation is racy; if the unionfs vnode is not in fact
locked, vnode private data may be concurrently altered or freed.
Instead, simply rely upon the standard implementation to query the
v_vnlock field, which is type-stable and will reflect the correct
lower/upper vnode configuration for the unionfs node.

Tested by:	pho
Reviewed by:	kib, markj
Differential Revision:	https://reviews.freebsd.org/D39272
2023-04-17 20:31:40 -05:00
Jason A. Harmening
a5d82b55fe Remove an impossible condition from unionfs_lock()
We hold the vnode interlock, so vnode private data cannot suddenly
become NULL.

Tested by:	pho
Reviewed by:	kib, markj
Differential Revision:	https://reviews.freebsd.org/D39272
2023-04-17 20:31:40 -05:00
Jason A. Harmening
a18c403fbd unionfs: remove LK_UPGRADE if falling back to the standard lock
The LK_UPGRADE operation may have temporarily dropped the upper or
lower vnode's lock.  If the unionfs vnode was reclaimed during that
window, its lock field will be reset to no longer point at the
upper/lower vnode lock, so the lock operation will use the standard
lock stored in v_lock.  Remove LK_UPGRADE from the flags in this case
to avoid a lockmgr assertion, as this lock has not been previously
owned by the calling thread.

Reported by:	pho
Tested by:	pho
Reviewed by:	kib, markj
Differential Revision:	https://reviews.freebsd.org/D39272
2023-04-17 20:31:40 -05:00
Ed Maste
00172f3416 geom: use bool for one-bit wide bit-field
A one-bit wide bit-field can take only the values 0 and -1.  Clang 16
introduced a warning that "implicit truncation from 'int' to a one-bit
wide bit-field changes value from 1 to -1".  Fix by using c99 bool.

Reported by:	Clang, via dim
Reviewed by:	dim
Sponsored by:	The FreeBSD Foundation
2023-04-17 15:43:00 -04:00
Gleb Smirnoff
3232b1f4a9 tcp: fix build
The recent 25685b7537 came in conflict with a540cdca31.  Remove the
code that cleans up the old style input queue.  Note that two lines
below we assert that the new style input queue is empty.  The TCP
stacks that use the queue are supposed to flush it in their
tfb_tcp_fb_fini method.
2023-04-17 10:24:20 -07:00
Gleb Smirnoff
a6b55ee6be net: replace IFF_KNOWSEPOCH with IFF_NEEDSEPOCH
Expect that drivers call into the network stack with the net epoch
entered. This has already been the fact since early 2020. The net
interrupts, that are marked with INTR_TYPE_NET, were entering epoch
since 511d1afb6b. For the taskqueues there is NET_TASK_INIT() and
all drivers that were known back in 2020 we marked with it in
6c3e93cb5a. However in e87c494015 we took conservative approach
and preferred to opt-in rather than opt-out for the epoch.

This change not only reverts e87c494015 but adds a safety belt to
avoid panicing with INVARIANTS if there is a missed driver. With
INVARIANTS we will run in_epoch() check, print a warning and enter
the net epoch.  A driver that prints can be quickly fixed with the
IFF_NEEDSEPOCH flag, but better be augmented to properly enter the
epoch itself.

Note on TCP LRO: it is a backdoor to enter the TCP stack bypassing
some layers of net stack, ignoring either old IFF_KNOWSEPOCH or the
new IFF_NEEDSEPOCH.  But the tcp_lro_flush_all() asserts the presence
of network epoch.  Indeed, all NIC drivers that support LRO already
provide the epoch, either with help of INTR_TYPE_NET or just running
NET_EPOCH_ENTER() in their code.

Reviewed by:		zlei, gallatin, erj
Differential Revision:	https://reviews.freebsd.org/D39510
2023-04-17 09:08:35 -07:00
Gleb Smirnoff
a540cdca31 tcp_hpts: use queue(9) STAILQ for the input queue
Reviewed by:		rrs
Differential Revision:	https://reviews.freebsd.org/D39574
2023-04-17 09:07:23 -07:00
Steve Kiernan
48ffacbc84 veriexec: Add function to get label associated with a file
Add mac_veriexec_metadata_get_file_label to avoid the need to
expose internals to other MAC modules.

Obtained from:	Juniper Networks, Inc.
2023-04-17 11:47:33 -04:00
Steve Kiernan
bd4742c970 veriexec: Rename old VERIEXEC_SIGNED_LOAD as VERIEXEC_SIGNED_LOAD32
We need to handle old ioctl from old binary.

Add some missing ioctls.

Obtained from:	Juniper Networks, Inc.
2023-04-17 11:47:32 -04:00
Steve Kiernan
d195f39d1d veriexec: Add option MAC_VERIEXEC_DEBUG
Obtained from:	Juniper Networks, Inc.
2023-04-17 11:47:32 -04:00
Simon J. Gerraty
8c3e263dc1 veriexec: mac_veriexec_syscall compat32 support
Some 32bit apps may need to be able to use
MAC_VERIEXEC_GET_PARAMS_PID_SYSCALL
MAC_VERIEXEC_GET_PARAMS_PATH_SYSCALL

Therefore compat32 support is required.

Obtained from:	Juniper Networks, Inc.
2023-04-17 11:47:32 -04:00
Steve Kiernan
8512d82ea0 veriexec: Additional functionality for MAC/veriexec
Ensure veriexec opens the file before doing any read operations.

When the MAC_VERIEXEC_CHECK_PATH_SYSCALL syscall is requested, veriexec
needs to open the file before calling mac_veriexec_check_vp. This is to
ensure any set up is done by the file system. Most file systems do not
explicitly need an open, but some (e.g. virtfs) require initialization
of access tokens (file identifiers, etc.) before doing any read or write
operations.

The evaluate_fingerprint() function needs to ensure it has an open file
for reading in order to evaluate the fingerprint. The ideal solution is
to have a hook after the VOP_OPEN call in vn_open. For now, we open the
file for reading, envaluate the fingerprint, and close the file. While
this leaves a potential hole that could possibly be taken advantage of
by a dedicated aversary, this code path is not typically visited often
in our use cases, as we primarily encounter verified mounts and not
individual files. This should be considered a temporary workaround until
discussions about the post-open hook have concluded and the hook becomes
available.

Add MAC_VERIEXEC_GET_PARAMS_PATH_SYSCALL and
MAC_VERIEXEC_GET_PARAMS_PID_SYSCALL to mac_veriexec_syscall so we can
fetch and check label contents in an unconstrained manner.

Add a check for PRIV_VERIEXEC_CONTROL to do ioctl on /dev/veriexec

Make it clear that trusted process cannot be debugged. Attempts to debug
a trusted process already fail, but the failure path is very obscure.
Add an explicit check for VERIEXEC_TRUSTED in
mac_veriexec_proc_check_debug.

We need mac_veriexec_priv_check to not block PRIV_KMEM_WRITE if
mac_priv_gant() says it is ok.

Reviewed by:	sjg
Obtained from:	Juniper Networks, Inc.
2023-04-17 11:47:32 -04:00
Mark Johnston
d95fbf4e1a riscv: save the thread pointer in both modes
The contents of frame->tf_tp are uninitialized if accessed by DTrace (in
probe context), resulting in a panic when trying to access the memory
pointed to by tp. This saves the thread pointer to the trap frame when
handling both userland and kernel exceptions.

Reviewed by:	markj, mhorne
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D39582
2023-04-17 09:49:52 -04:00
Alexander V. Chernikov
f656a96020 tests: make ktest build on ppc.
MFC after:	2 weeks
2023-04-17 13:47:07 +00:00
Alexander V. Chernikov
9742519b22 netlink: fix operations with link-local routes/gateways.
MFC after:	3 days
2023-04-17 12:04:43 +00:00
Alexander V. Chernikov
b8da3b62a5 tests: add ktest modules to build
MFC after:	2 weeks
2023-04-17 10:46:05 +00:00
Pawel Jakub Dawidek
068913e4ba zfs: Add vfs.zfs.bclone_enabled sysctl.
Keep block cloning disabled by default for now, but allow to enable and
use it after setting vfs.zfs.bclone_enabled to 1, so people can easily
try it.

Approved by:	oshogbo
Reviewed by:	mm, oshogbo
Differential Revision:	https://reviews.freebsd.org/D39613
2023-04-17 03:38:30 -07:00
Zhenlei Huang
401f03445e lagg(4): Correctly define some sysctl variables
939a050ad9 virtualized lagg(4), but the corresponding sysctl of some
virtualized global variables are not marked with CTLFLAG_VNET. A try to
operate on those variables via sysctl will effectively go to the 'master'
copies and the virtualized ones are not read or set accordingly. As a
side effect, on updating the 'master' copy, the virtualized global
variables of newly created vnets will have correct values.

PR:		270705
Reviewed by:	kp
Fixes:		939a050ad9 Virtualize lagg(4) cloner
MFC after:	3 days
Differential Revision:	https://reviews.freebsd.org/D39467
2023-04-17 18:24:35 +08:00
Zhenlei Huang
a7acce3491 vnet: Fix a typo in a source code comment
- s/form/from/

MFC after:	3 days
2023-04-17 18:24:35 +08:00
Pawel Jakub Dawidek
1959e122d9 zfs: Merge https://github.com/openzfs/zfs/pull/14739
The zfs_log_clone_range() function is never called from the
zfs_clone_range_replay() function, so I assumed it is safe to assert
that zil_replaying() is never TRUE here. It turns out zil_replaying()
also returns TRUE when the sync property is set to disabled.

Fix the problem by just returning if zil_replaying() returns TRUE.

Reported by: Florian Smeets
Signed-off-by: Pawel Jakub Dawidek pawel@dawidek.net

Approved by: oshogbo, mm
2023-04-17 02:22:56 -07:00
Pawel Jakub Dawidek
e0bb199925 zfs: cherry-pick openzfs/zfs@c71fe7164
Fix data corruption when cloning embedded blocks

Don't overwrite blk_phys_birth, as for embedded blocks it is part of
the payload.

Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Pawel Jakub Dawidek <pawel@dawidek.net>
Issue #13392
Closes #14739

Approved by: oshogbo, mm
2023-04-17 02:19:49 -07:00
Stephen J. Kiernan
88a3358ea4 veriexec: Add SPDX-License-Identifier 2023-04-16 21:23:00 -04:00
Stephen J. Kiernan
894bcc876d sys/modules/Makefile: conditionally add MAC/veriexec modules
Only build MAC/veriexec modules when MK_VERIEXEC is yes or we
are building all modules.

Add VERIEXEC knob to kernel __DEFAULT_NO_OPTIONS

Reviewed by:	sjg
Obtained from:	Juniper Networks, Inc.
2023-04-16 20:24:54 -04:00
Stephen J. Kiernan
8050e0a429 sys/modules/Makefile: add MAC/veriexec modules into the build
Build the MAC/veriexec module and the SHA2, SHA256, SHA384, and
SHA512 fingerprint modules.

Obtained from:	Juniper Networks, Inc.
2023-04-16 19:18:55 -04:00
Simon J. Gerraty
6ae8d57652 mac_veriexec: add mac_priv_grant check for NODEV
Allow other MAC modules to override some veriexec checks.

We need two new privileges:
PRIV_VERIEXEC_DIRECT	process wants to override 'indirect' flag
			on interpreter
PRIV_VERIEXEC_NOVERIFY	typically associated with PRIV_VERIEXEC_DIRECT
			allow override of O_VERIFY

We also need to check for PRIV_VERIEXEC_NOVERIFY override
for FINGERPRINT_NODEV and FINGERPRINT_NOENTRY.
This will only happen if parent had PRIV_VERIEXEC_DIRECT override.

This allows for MAC modules to selectively allow some applications to
run without verification.

Needless to say, this is extremely dangerous and should only be used
sparingly and carefully.

Obtained from:	Juniper Networks, Inc.

Reviewers: sjg
Subscribers: imp, dab

Differential Revision: https://reviews.freebsd.org/D39537
2023-04-16 19:14:40 -04:00
Stephen J. Kiernan
4819e5aeda Add new privilege PRIV_KDB_SET_BACKEND
Summary:
Check for PRIV_KDB_SET_BACKEND before allowing a thread to change
the KDB backend.

Obtained from:	Juniper Networks, Inc.
Reviewers: sjg, emaste
Subscribers: imp

Differential Revision: https://reviews.freebsd.org/D39538
2023-04-16 14:37:58 -04:00
Val Packett
77f0e198d9 procctl: add state flags to PROC_REAP_GETPIDS reports
For a process supervisor using the reaper API to track process subtrees,
it is very useful to know the state of the processes on the list.

Sponsored by:   https://www.patreon.com/valpackett
Reviewed by:    kib
MFC after:	1 week
Differential Revision: https://reviews.freebsd.org/D39585
2023-04-16 13:48:20 +03:00
Stephen J. Kiernan
b1a00c2b13 Quiet compiler warnings for fget_noref and fdget_noref
Summary:
Typecasting both parts of the comparison to u_int quiets compiler
warnings about signed/unsigned comparison and takes care of positive
and negative numbers for the file descriptor in a single comparison.

Obtained from:	Juniper Netwowrks, Inc.

Reviewers: mjg

Subscribers: imp

Differential Revision: https://reviews.freebsd.org/D39593
2023-04-15 23:50:54 -04:00
Warner Losh
214909d669 Revert "cam: fix up world compilation after previous"
This reverts commit 1d35493e46. It was the wrong fix. 757fc6666b has
the proper fix to include stdbool for userland.

Sponsored by:		Netflix
2023-04-15 18:25:55 -06:00
Warner Losh
757fc6666b cam: Include stdbool.h for userland
Sponsored by:		Netflix
2023-04-15 18:25:22 -06:00
Mateusz Guzik
1d35493e46 cam: fix up world compilation after previous
Sponsored by:	Rubicon Communications, LLC ("Netgate")
2023-04-15 23:11:27 +00:00
Warner Losh
fd02926a68 cam: Properly mask out the status bits to get completion code
ccb_h.status has two parts: the actual status and some addition bits to
indicate additional information. It must be masked before comparing
against completion codes. Add new inline function cam_ccb_success to
simplify this to test whether or not the request succeeded. Most of the
code already does this, but a few places don't (the rest likely should
be converted to use cam_ccb_status and/or cam_ccb_success, but that's
for another day). This caused at least one bug in recognizing devices
behind a SATA port multiplexer, though some of these checks were
fine with the special knowledge of the code paths involved.

PR:			270459
Sponsored by:		Netflix
MFC After:		1 week (and maybe a EN requst)
Reviewed by:		ken, mav
Differential Revision:	https://reviews.freebsd.org/D39572
2023-04-15 16:32:41 -06:00
Mateusz Guzik
63ee747feb zfs: Revert "ZFS_IOC_COUNT_FILLED does unnecessary txg_wait_synced()"
This reverts commit 519851122b.

It results in data corruption, see:
https://github.com/openzfs/zfs/issues/14753

Sponsored by:	Rubicon Communications, LLC ("Netgate")
2023-04-15 21:34:54 +00:00
Mateusz Guzik
46ac8f2e7d zfs: don't use zfs_freebsd_copy_file_range
There is one data corruption problem reported and fixed upstream, not
cherry-picked here yet.

On top of it the following fires under load:
        VERIFY(zil_replaying(zfsvfs->z_log, tx));

The patch which introduced the entire machinery is a revert candidate,
but as the machinery came with a dedicated feature flag, doing so would
render affected pools read-only at best. To be figured out.

As a temporary bandaid at least stop the active usage.
Note this patch does not make the feature disappear from zpool upgrade.

Sponsored by:	Rubicon Communications, LLC ("Netgate")
2023-04-15 21:34:54 +00:00
Bjoern A. Zeeb
42742fe725 KASAN: add bus_space*read*_8 for aarch64
Add the remaining bus_space*read*_8 functions conditionally for
only arm64 in order to not break KASAN builds with new code using
one of them.

Suggested by:	markj
Reviewed by:	markj
MFC after:	3 days
Differential Revision: https://reviews.freebsd.org/D39581
2023-04-15 16:13:56 +00:00
Eugene Grosbein
5ee1c90e50 tmpfs: unbreak module build outside of kernel build environment
MFC after:	3 days
2023-04-15 11:00:03 +07:00
Konstantin Belousov
1e0e335b0f amd64: fix PKRU and swapout interaction
When vm_map_remove() is called from vm_swapout_map_deactivate_pages()
due to swapout, PKRU attributes for the removed range must be kept
intact.  Provide a variant of pmap_remove(), pmap_map_delete(), to
allow pmap to distinguish between real removes of the UVA mappings
and any other internal removes, e.g. swapout.

For non-amd64, pmap_map_delete() is stubbed by define to pmap_remove().

Reported by:	andrew
Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D39556
2023-04-15 02:53:59 +03:00
Randall Stewart
3cc7b66732 tcp: stack unloading crash in rack and bbr
Its possible to induce a crash in either rack or bbr. This would be done
if the rack stack were say the default and bbr was being used by a connection.
If the bbr stack is then unloaded and it was active, we will trigger a MPASS assert
in tcp_hpts since the new stack (default rack) would start a timer, and the old stack
(bbr) would have the inp already in hpts.

Reviewed by: tuexen
Sponsored by: Netflix Inc
Differential Revision:https://reviews.freebsd.org/D39576
2023-04-14 15:42:23 -04:00
Alexander V. Chernikov
9f324d8ac2 netlink: make netlink work correctly on CHERI.
Current Netlink message writer code relies on executing callbacks
 with arbitrary data (pointer or integer) to flush the completed
 messages.
This arbitrary data is stored as a union of { void *, uint64_t }.
At some stage, the message flushing code copied this data, using
 direct uint64_t assignment instead of copying the union. It lead
 to failure on CHERI, as sizeof(pointer) == 16 there.

Fix the code by making union non-anonymous and copying it entirely.

Reviewed by:	br, jhb, jrtc27
Differential Revision: https://reviews.freebsd.org/D39557
MFC after:	2 weeks
2023-04-14 16:33:43 +00:00
Alexander V. Chernikov
3e5d0784b9 Testing: add framework for the kernel unit tests.
This changes intends to reduce the bar to the kernel unit-testing by
 introducing a new kernel-testing framework ("ktest") based on Netlink,
 loadable test modules and python test suite integration.

This framework provides the following features:
* Integration to the FreeBSD test suite
* Automatic test discovery
* Automatic test module loading
* Minimal boiler-plate code in both kernel and userland
* Passing any metadata to the test
* Convenient environment pre-setup using python testing framework
* Streaming messages from the kernel to the userland
* Running tests in the dedicated taskqueues
* Skipping or parametrizing tests

Differential Revision: https://reviews.freebsd.org/D39385
MFC after:	2 weeks
2023-04-14 15:47:55 +00:00
Mikhail Pchelin
2f53b5991c net80211: fix a typo in Rx MCS set for unequal modulation case
RX MCS set defines which MCSs are supported for RX, bits 0-31 are for equal
modulation of the streams, bits 33-76 are for unequal case. Current code checks
txstreams variable instead of rxstreams to set bits from 53 to 76 for 4 spatial
streams case.

The modulations are defined in tables 19-38 and 19-41 of the IEEE Std
802.11-2020.

Spotted by bz in https://reviews.freebsd.org/D39476

Reviewed by:		bz
Approved by:		bz
Sponsored by:		Serenity Cybersecurity, LLC
Differential Revision:	https://reviews.freebsd.org/D39568
2023-04-14 18:20:09 +03:00
Mikhail Pchelin
ea26545cc5 net80211: wrong transmit MCS set in HT cap IE
Current code checks whether or not txstreams are equal to rxstreams and if it
isn't - sets needed bits in "Transmit MCS Set". But if they are equal it sets
whole set to zero, which contradicts the standard, if tx and rx streams are
equal 'Tx MCS Set Defined' (table 9-186, IEEE Std 802.11-2020) must be set to
one.

Reviewed by:		bz
Approved by:		bz
Sponsored by:		Serenity Cybersecurity, LLC
Differential Revision:	https://reviews.freebsd.org/D39476
2023-04-14 18:16:29 +03:00
Kyle Evans
d1b6271118 uart(4): add Sunrise Point UART controllers
Sponsored by:	Zenith Electronics LLC
Sponsored by:	Klara, Inc.
2023-04-14 09:58:00 -05:00
Elliott Mitchell
6d765bff6f xen: move common variables off of sys/x86/xen/hvm.c
The xen_domain_type and HYPERVISOR_shared_info variables are shared by
all Xen architectures, so they should be in common rather than
reimplemented by each architecture.

hvm_start_flags is used by xen_initial_domain() and so needs to be in
common.

Reviewed by: royger
Differential Revision: https://reviews.freebsd.org/D28982
2023-04-14 15:59:11 +02:00
Julien Grall
5e2183dab8 xen/intr: move sys/x86/xen/xen_intr.c to sys/dev/xen/bus/
The event channel source code or equivalent is needed on all
architectures.  Since much of this is viable to share, get this moved out
of x86-land.  Each interrupt interface then needs a distinct back-end
implementation.

Reviewed by: royger
Submitted by: Elliott Mitchell <ehem+freebsd@m5p.com>
Original implementation: Julien Grall <julien@xen.org>, 2014-01-13 17:41:04
Differential Revision: https://reviews.freebsd.org/D30236
2023-04-14 15:58:57 +02:00
Elliott Mitchell
6699c22c1c xen/intr: move interrupt allocation/release to architecture
Simply moving the interrupt allocation and release functions into files
which belong to the architecture.  Since x86 interrupt handling is quite
distinct from other architectures, this is a crucial necessary step.

Identifying the border between x86 and architecture-independent is
actually quite tricky.  Similarly, getting the prototypes for the
border right is also quite tricky.

Inspired by the work of Julien Grall <julien@xen.org>,
2015-10-20 09:14:56, but heavily adjusted.

Reviewed by: royger
Differential Revision: https://reviews.freebsd.org/D30936
2023-04-14 15:58:56 +02:00
Julien Grall
2d795ab1ea xen/intr: move x86 PIC interface to xen_arch_intr.c, introduce wrappers
The x86 PIC interface is very much x86-specific and not used by other
architectures.  Since most of xen_intr.c can be shared with other
architectures, the PIC interface needs to be broken off.

Introduce wrappers for calls into the architecture-dependent interrupt
layer.  All architectures need roughly the same functionality, but the
interface is slightly different between architectures.  Due to the
wrappers being so thin, all of them are implemented as inline in
arch-intr.h.

The original implementation was done by Julien Grall in 2015, but this
has required major updating.

Removal of PVHv1 meant substantial portions disappeared.  The original
implementation took care of moving interrupt allocation to
xen_arch_intr.c, but this has required massive rework and was broken
off.

In the original implementation the wrappers were normal functions.  Some
had empty stubs in xen_intr.c and were removed.

Reviewed by: royger
Submitted by: Elliott Mitchell <ehem+freebsd@m5p.com>
Original implementation: Julien Grall <julien@xen.org>, 2015-10-20 09:14:56
Differential Revision: https://reviews.freebsd.org/D30909
2023-04-14 15:58:56 +02:00
Elliott Mitchell
373301019f xen/intr: remove type argument from xen_intr_alloc_isrc()
This value doesn't need to be set in xen_intr_alloc_isrc().  What is
needed is simply to ensure the allocated xenisrc won't appear as free,
even if xi_type is written non-atomically.  Since the type is no longer
used to indicate free or not, the calling function should take care of
all non-architecture initialization.

Reviewed by: royger
Differential Revision: https://reviews.freebsd.org/D31188
2023-04-14 15:58:55 +02:00
Elliott Mitchell
d0a69069bb xen/x86: rework isrc allocation to use list instead of table scanning
Scanning the list of interrupts to find an unused entry is rather
inefficient.  Instead overlay a free list structure and use a list
instead.

This also has the useful effect of removing the last use of evtchn_type
values outside of xen_intr.c.

Reviewed by: royger
[royger]
 - Make avail_list static.
2023-04-14 15:58:54 +02:00
Elliott Mitchell
d32d65276b xen/intr: move evtchn_type to intr-internal.h
The evtchn_type enum is only touched by the Xen interrupt code.  Other
event channel uses no longer need the value, so that has been moved to
restrict its use.

Copyright note.  The current evtchn_type was introduced at 76acc41fb7
by Justin T. Gibbs.  This in turn appears to have been heavily inspired
by 30d1eefe39 done by Kip Macy.

Reviewed by: royger
2023-04-14 15:58:53 +02:00
Julien Grall
ab7ce14b1d xen/intr: introduce dev/xen/bus/intr-internal.h
Move the xenisrc structure which needs to be shared between the core Xen
interrupt code and architecture-dependent code into a separate header.  A
similar situation exists for the NR_EVENT_CHANNELS constant.

Turn xi_intsrc into a type definition named xi_arch to reflect the new
purpose of being an architectural variable for the interrupt source.

This was originally implemented by Julien Grall, but has been heavily
modified.  The core side was renamed "intr-internal.h" and is #include'd
by "arch-intr.h" instead of the other way around.  This allows the
architecture to add function definitions which use struct xenisrc.

The original version only moved xi_intsrc into xen_arch_isrc_t.  Moving
xi_vector was done by the submitter.

The submitter had also moved xi_activehi and xi_edgetrigger into
xen_arch_isrc_t.  Those disappeared with the removal of PVHv1 support.

Copyright note.  The current xenisrc structure was introduced at
76acc41fb7 by Justin T. Gibbs.  Traces remain, but the strength of
Copyright claims from before 2013 seem pretty weak.

Reviewed by: royger
Submitted by: Elliott Mitchell <ehem+freebsd@m5p.com>, 2021-03-17 19:09:01
Original implementation: Julien Grall <julien@xen.org>, 2015-10-20 09:14:56
Differential Revision: https://reviews.freebsd.org/D30648
[royger]
 - Adjust some line lengths
 - Fix comment about NR_EVENT_CHANNELS after movement.
 - Use #include instead of symlinks.
2023-04-14 15:58:53 +02:00
Elliott Mitchell
af610cabf1 xen/intr: adjust xen_intr_handle_upcall() to match driver filter
xen_intr_handle_upcall() has two interfaces.  It needs to be called by
the x86 assembly code invoked by the APIC.  Second, it needs to be called
as a driver_filter_t for the XenPCI code and for architectures besides
x86.

Unfortunately the driver_filter_t interface was implemented as a wrapper
around the x86-APIC interface.  Now create a simple wrapper for the
x86-APIC code, which calls an architecture-independent
xen_intr_handle_upcall().

When called via intr_event_handle(), driver_filter_t functions expect
preemption to be disabled.  This removes the need for
critical_enter()/critical_exit() when called this way.

The lapic_eoi() call is only needed on x86 in some cases when invoked
directly as an APIC vector handler.

Additionally driver_filter_t functions have no need to handle interrupt
counters.  The intrcnt_add() calling function was reworked to match the
current situation.  intrcnt_add() is now only called via one path.

The increment/decrement of curthread->td_intr_nesting_level had
previously been left out.  Appears this was mostly harmless, but this
was noticed during implementation and has been added.

CONFIG_X86 is a leftover from use with Linux.  While the barrier isn't
needed for FreeBSD on x86, it will be needed for FreeBSD on other
architectures.

Copyright note.  xen_intr_intrcnt_add() was introduced at 76acc41fb7
by Justin T. Gibbs.  xen_intrcnt_init() was introduced at fd036deac1
by John Baldwin.

sys/x86/xen/xen_arch_intr.c was originally created by Julien Grall in
2015 for the purpose of holding the x86 interrupt interface.  Later it
was found xen_intr_handle_upcall() was better earlier, and the x86
interrupt interface better later.  As such the filename and header list
belong to Julien Grall, but what those were created for is later.

Reviewed by: royger
Differential Revision: https://reviews.freebsd.org/D30006
2023-04-14 15:58:52 +02:00
Elliott Mitchell
2794893ebf xen/intr: do full xenisrc initialization during binding
Keeping released xenisrcs in a known state simplifies allocation, but
forces the allocation function to maintain that state.  This turns into
a problem when trying to allow for interchangeable allocation functions.
Fix this issue by ensuring xenisrcs are always *fully* initialized
during binding.

Reviewed by: royger
2023-04-14 15:58:51 +02:00
Elliott Mitchell
ff73b1d69b xen/intr: split xen_intr_isrc_lock uses
There are actually several distinct locking domains in xen_intr.c, but
all were sharing the same lock.  Both xen_intr_port_to_isrc[] and the
x86 interrupt structures needed protection.  Split these two apart as a
precursor to splitting the architecture portions off the file.

Reviewed by: royger
Differential Revision: https://reviews.freebsd.org/D30726
2023-04-14 15:58:51 +02:00
Elliott Mitchell
834013dea2 xen/intr: rework xen_intr_alloc_isrc() locking
Locking for allocation was being done in xen_intr_bind_isrc(), but the
unlock was inside xen_intr_alloc_isrc().  While the lock acquisition at
the end of xen_intr_alloc_isrc() was to modify xen_intr_port_to_isrc[],
NOT allocation.  Fix this garbled (though working) locking scheme.

Now locking for allocation is strictly in xen_intr_alloc_isrc(), while
locking to modify xen_intr_port_to_isrc[] is in xen_intr_bind_isrc().

Reviewed by: royger
Differential Revision: https://reviews.freebsd.org/D30726
2023-04-14 15:58:50 +02:00
Elliott Mitchell
09bd542d17 xen/intr: rework xen_intr_alloc_isrc() call structure
The call structure around xen_intr_alloc_isrc() was rather awful.
Notably finding a structure for reuse is part of allocation, but this
was done outside xen_intr_alloc_isrc().  Move this into
xen_intr_alloc_isrc() so the function handles all allocation steps.

Reviewed by: royger
Differential Revision: https://reviews.freebsd.org/D30726
2023-04-14 15:58:49 +02:00
Elliott Mitchell
149c581018 xen/intr: adjust xenisrc types, adjust format strings to match
As "CPUs", IRQs (vector) and virtual IRQs are always positive integers,
adjust the Xen code to use unsigned integers.  Several format strings
need adjustment to match.  Additionally single-bit bitfields are
boolean.

No functional change expected.

Reviewed by: royger
2023-04-14 15:58:49 +02:00
Elliott Mitchell
ecdcad6516 xen: remove CONFIG_XEN_COMPAT, purge Xen 3.0 compatibility
This overlaps the purpose of __XEN_INTERFACE_VERSION__.  Remove Xen 3.0.2
compatibility.  __XEN_INTERFACE_VERSION__ has compatibility to Xen 3.2.8
enabled.  As Xen 3.3 was released almost 15 years ago, it seems unlikely
anyone hasn't updated.

Reviewed by: royger
2023-04-14 15:58:48 +02:00
Elliott Mitchell
61ccede8cf xen: purge no longer used hypervisor functions
HYPERVISOR_poll(), HYPERVISOR_block(), and HYPERVISOR_crash() appear no
longer used.  Further get_system_time() appears to have disappeared at
some point in the past, so HYPERVISOR_poll() was broken anyway.

No functional change intended.

Reviewed by: royger
2023-04-14 15:58:47 +02:00
Elliott Mitchell
b2c50bb934 xen/efi: make Xen PV EFI clock optional
The present implementation is only for x86.  Other architectures need
adjustments for querying presence of EFI.

Xen's EFI support is also quite troublesome on non-x86.  This is being
slowly remedied, but until in better shape the EFI clock functionality
should be disabled.

Reviewed by: royger
Differential Revision: https://reviews.freebsd.org/D31065
2023-04-14 15:58:47 +02:00
Julien Grall
28a78d860e xen: introduce XEN_CPUID_TO_VCPUID()/XEN_VCPUID()
Part of the series for allowing FreeBSD/ARM to run on Xen.  On ARM the
function is a trivial pass-through, other architectures need distinct
implementations.

While implementing XEN_VCPUID() as a call to XEN_CPUID_TO_VCPUID()
works, that involves multiple accesses to the PCPU region.  As such make
this a distinct macro.  Only callers in machine independent code have
been switched.

Add a wrapper for the x86 PIC interface to use matching the old
prototype.

Partially inspired by the work of Julien Grall <julien@xen.org>,
2015-08-01 09:45:06, but XEN_VCPUID() was redone by Elliott Mitchell on
2022-06-13 12:51:57.

Reviewed by: royger
Submitted by: Elliott Mitchell <ehem+freebsd@m5p.com>
Original implementation: Julien Grall <julien@xen.org>, 2014-04-19 08:57:40
Original implementation: Julien Grall <julien@xen.org>, 2014-04-19 14:32:01
Differential Revision: https://reviews.freebsd.org/D29404
2023-04-14 15:58:46 +02:00
Elliott Mitchell
054073c283 xen/intr: xen_intr_bind_isrc() always set handle
Previously the upper layer handle was being set before the last
potential error condition.  The reasoning appears to have been it was
assumed invalid in case of an error being returned.  Now ensure it is
invalid until just before a successful return.

Fixes: 76acc41fb7 ("Implement vector callback for PVHVM and unify event channel implementations")
Fixes: 6d54cab1fe ("xen: allow to register event channels without handlers")
Reviewed by: royger
2023-04-14 15:58:45 +02:00
Randall Stewart
9903bf34f0 tcp: rack pacing has some caveats that need to be obeyed when LRO is missing
n further non-LRO testing I found a case where rack is supposed to be waking up but
it is not now. In this special case it sets the flag rc_ack_can_sendout_data. When that is
set we should not prohibit output.

Reviewed by: tuexen
Sponsored by: Netflix Inc
Differential Revision:https://reviews.freebsd.org/D39565
2023-04-14 09:33:36 -04:00
Kristof Provost
b0e38a1373 bridge: distinguish no vlan and vlan 1
The bridge treated no vlan tag as being equivalent to vlan ID 1, which
causes confusion if the bridge sees both untagged and vlan 1 tagged
traffic.

Use DOT1Q_VID_NULL when there's no tag, and fix up the lookup code by
using 'DOT1Q_VID_RSVD_IMPL' to mean 'any vlan', rather than vlan 0. Note
that we have to account for userspace expecting to use 0 as meaning 'any
vlan'.

PR:		270559
Suggested by:	Zhenlei Huang <zlei@FreeBSD.org>
Reviewed by:	philip, zlei
Differential Revision:  https://reviews.freebsd.org/D39478
2023-04-14 13:17:02 +02:00
Zhenlei Huang
9af6f4268a bridge: Use the %D identifier to format MAC address
It is shorter and more readable.

No functional change intended.

Reviewed by:	kp
Fixes:		2d3614fb13 bridge: Log MAC address port flapping
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D39542
2023-04-14 18:08:56 +08:00
Kajetan Staszkiewicz
39282ef356 pf: backport OpenBSD syntax of "scrub" option for "match" and "pass" rules
Introduce the OpenBSD syntax of "scrub" option for "match" and "pass"
rules and the "set reassemble" flag. The patch is backward-compatible,
pf.conf can be still written in FreeBSD-style.

Obtained from:	OpenBSD
MFC after:	never
Sponsored by:	InnoGames GmbH
Differential Revision:	https://reviews.freebsd.org/D38025
2023-04-14 09:04:06 +02:00
Gordon Bergling
26713ad9cf arm: Remove a double word in a comment in setjmp
- s/number number/number/

MFC after:	5 days
2023-04-13 20:37:25 +02:00
Gordon Bergling
c159f76713 kern: remove a double word in a KASSERT in subr_trap
- s/with with/with/

MFC after:	5 days
2023-04-13 20:03:37 +02:00
Henri Hennebert
71883128e5 rtsx: Add plug-and-play info
Add MODULE_PNP_INFO() to the driver to make it autoload if not linked
statically into the kernel. Remove the device from amd64/i386 GENERIC.

Reviewed by:	imp
Differential Revision:	https://reviews.freebsd.org/D35074
2023-04-13 11:12:50 -03:00
Randall Stewart
25685b7537 TCP: Misc cleanups of tcp_subr.c
In going through all the TCP stacks I have found we have a few little bugs and niggles that need to
be cleaned up in tcp_subr.c including the following:

a) Set tcp_restoral_thresh to 450 (45%) not 550. This is a better proven value in my testing.
b) Lets track when we try to do pacing but fail via a counter for connections that do pace.
c) If a switch away from the default stack occurs and it fails we need to make sure the time
   scale is in the right mode (just in case the other stack changed it but then failed).
d) Use the TP_RXTCUR() macro when starting the TT_REXMT timer.
e) When we end a default flow lets log that in BBlogs as well as cleanup any t_acktime (disable).
f) When we respond with a RST lets make sure to update the log_end_status properly.
g) When starting a new pcb lets assure that all LRO features are off.
h) When discarding a connection lets make sure that any t_in_pkt's that might be there are freed properly.

Reviewed by: tuexen
Sponsored by: Netflix Inc
Differential Revision:https://reviews.freebsd.org/D39501
2023-04-13 09:29:05 -04:00
Ed Maste
2ef2c26f3f link_elf: fix SysV hash function overflow
Quoting from https://maskray.me/blog/2023-04-12-elf-hash-function:

The System V Application Binary Interface (generic ABI) specifies the
ELF object file format. When producing an output executable or shared
object needing a dynamic symbol table (.dynsym), a linker generates a
.hash section with type SHT_HASH to hold a symbol hash table. A DT_HASH
tag is produced to hold the address of .hash.

The function is supposed to return a value no larger than 0x0fffffff.
Unfortunately, there is a bug. When unsigned long consists of more than
32 bits, the return value may be larger than UINT32_MAX. For instance,
elf_hash((const unsigned char *)"\xff\x0f\x0f\x0f\x0f\x0f\x12") returns
0x100000002, which is clearly unintended, as the function should behave
the same way regardless of whether long represents a 32-bit integer or
a 64-bit integer.

Reviewed by:	kib, Fangrui Song
Sponsored by:	The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D39517
2023-04-12 15:33:55 -04:00
John Baldwin
1ca12bd927 Remove the riscv64sf architecture.
Reviewed by:	jrtc27, arichardson, br, kp, imp, emaste
Differential Revision:	https://reviews.freebsd.org/D39496
2023-04-12 11:09:27 -07:00
Michael Tuexen
2ba2849c82 tcp: fix typo in comment
Reported by:	cc
MFC after:	1 week
Sponsored by:	Netflix, Inc.
2023-04-12 18:08:21 +02:00
Michael Tuexen
c687f21add tcp: make net.inet.tcp.functions_default vnet specific
Reviewed by:		cc, rrs
MFC after:		1 week
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D39516
2023-04-12 18:04:27 +02:00
Randall Stewart
1073f41657 tcp_lro: When processing compressed acks lets support the new early wake feature for rack.
During compressed ack and mbuf queuing we determine if we need to wake up. A
new function was added that is optional to the tfb so that the stack itself can also
be asked if a wakeup should happen. This helps compensate for late hpts calls.

Reviewed by: tuexen
Sponsored by: Netflix Inc
Differential Revision:https://reviews.freebsd.org/D39502
2023-04-12 11:35:14 -04:00
Andrew Turner
421516f25e Create pmap_mask_set_locked on arm64
Create a locked version of pmap_mask_set. We will need this for BTI
support.

Sponsored by:	Arm Ltd
2023-04-12 13:10:13 +01:00
Michael Tuexen
73c48d9d8f tcp: fix deregistering stacks when vnets are used
This fixes a bug where stacks could not be deregistered when
end points in the non-default vnet are using it.

Reviewed by:		glebius, zlei
MFC after:		1 week
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D39514
2023-04-12 10:52:53 +02:00
Zhenlei Huang
c3c5e6c3e6 tarfs: Use the existing CTLFLAG_RWTUN flag definition
Use it when possible, instead of separated flags.

No functional change intended.

Reviewed by:	hselasky, erj
MFC after:	3 days
Differential Revision:	https://reviews.freebsd.org/D39466
2023-04-12 12:20:38 +08:00
Zhenlei Huang
deac4c7f07 iicbus(4): Use the existing CTLFLAG_RWTUN flag definition
Use it when possible, instead of separated flags.

No functional change intended.

Reviewed by:	hselasky, erj
MFC after:	3 days
Differential Revision:	https://reviews.freebsd.org/D39466
2023-04-12 12:20:38 +08:00
Zhenlei Huang
8bd9afe9e1 bxe(4): Use CTLFLAG_RDTUN flag definition
sysctl variables rx_budget and max_aggregation_size are read-only loader
tunable. Mark them with CTLFLAG_RD flag.

No functional change intended.

Reviewed by:	hselasky, erj
MFC after:	3 days
Differential Revision:	https://reviews.freebsd.org/D39466
2023-04-12 12:20:38 +08:00
Zhenlei Huang
5ff8018108 ice(4): Use the existing CTLFLAG_RWTUN flag definition
Use it when possible, instead of separated flags.

No functional change intended.

Reviewed by:	hselasky, erj
MFC after:	3 days
Differential Revision:	https://reviews.freebsd.org/D39466
2023-04-12 12:20:38 +08:00
Zhenlei Huang
69cb72b872 cam iosched: Use the existing CTLFLAG_RDTUN and CTLFLAG_RWTUN flag definitions
Use them when possible, instead of separated flags.

No functional change intended.

Reviewed by:	hselasky, erj
MFC after:	3 days
Differential Revision:	https://reviews.freebsd.org/D39466
2023-04-12 12:20:38 +08:00
Zhenlei Huang
dc1c5138c3 powerpc: Use the existing CTLFLAG_RDTUN and CTLFLAG_RWTUN flag definitions
Use them when possible, instead of separated flags.

No functional change intended.

Reviewed by:	hselasky, erj
MFC after:	3 days
Differential Revision:	https://reviews.freebsd.org/D39466
2023-04-12 12:20:38 +08:00
John Baldwin
cd800d3c96 Enable -Warray-parameter for clang.
I fixed many of these previously for GCC 12 and make tinderbox passes
with this enabled.

Differential Revision:	https://reviews.freebsd.org/D39378
2023-04-11 13:47:59 -07:00
Richard Scheffenegger
2169f71277 tcp: use IPV6_FLOWLABEL_LEN
Avoid magic numbers when handling the IPv6 flow ID for
DSCP and ECN fields and use the named variable instead.

Reviewed By:		tuexen, #transport
Sponsored by:		NetApp, Inc.
Differential Revision:	https://reviews.freebsd.org/D39503
2023-04-11 18:53:51 +02:00
Konstantin Belousov
c53e990b8d DEBUG_VFS_LOCKS: restore diagnostic for the witness use case
Reviewed by:	jah, markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D39477
2023-04-11 15:59:55 +03:00
Konstantin Belousov
75fc6f86c3 Add witness_is_owned(9)
which returns an indicator if the current thread owns the specified
lock.

Reviewed by:	jah, markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D39477
2023-04-11 15:59:49 +03:00
Konstantin Belousov
afa8f8971b vn_start_write(): consistently set *mpp to NULL on error or after failed sleep
This ensures that *mpp != NULL iff vn_finished_write() should be
called, regardless of the returned error, except for V_NOWAIT.
The only exception that must be maintained is the case where
vn_start_write(V_NOWAIT) is called with the intent of later dropping
other locks and then doing vn_start_write(V_XSLEEP), which needs the mp
value calculated from the non-waitable call above it.

Also note that V_XSLEEP is not supported by vn_start_secondary_write().

Reviewed by:	markj, mjg (previous version), rmacklem (previous version)
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D39441
2023-04-11 15:59:46 +03:00
Konstantin Belousov
b2f3288747 vn_start_write(): minor style
Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D39441
2023-04-11 15:59:39 +03:00
Eugene Grosbein
37f4cb29bd imgact_binmisc: unbreak module build outside of kernel build environment
MFC after:	3 days
2023-04-11 17:32:29 +07:00
domienschepers
61605e0ae5 net80211: fail for unicast traffic without unicast key
Falling back to the multicast key may cause unicast traffic to leak.
Instead fail when no key is found.

For more information see the 'Framing Frames: Bypassing Wi-Fi Encryption
by Manipulating Transmit Queues' paper.

[ I updated the commit message to reference the paper and the code
comment to record historic behaviour as discussed in private email. ]

Security:	CVE-2022-47522
2023-04-10 23:38:57 +00:00
Randall Stewart
a2b33c9a7a tcp: Rack - in the absence of LRO fixed rate pacing (loopback or interfaces with no LRO) does not work correctly.
Rack is capable of fixed rate or dynamic rate pacing. Both of these can get mixed up when
LRO is not available. This is because LRO will hold off waking up the tcp connection to
processing the inbound packets until the pacing timer is up. Without LRO the pacing only
sort-of works. Sometimes we pace correctly, other times not so much.

This set of changes will make it so pacing works properly in the absence of LRO.

Reviewed by: tuexen
Sponsored by: Netflix Inc
Differential Revision:https://reviews.freebsd.org/D39494
2023-04-10 16:33:56 -04:00
John Baldwin
e222461790 rack: mask and tclass are only used for INET6.
This fixes the LINT-NOINET6 build.
2023-04-10 12:21:03 -07:00
Joseph Koshy
0e9e9048ae
procfs: Sync a documentation comment with the code.
Approved by:	gnn (mentor)
Differential Revision: https://reviews.freebsd.org/D39488
2023-04-10 17:58:46 +00:00
John Baldwin
3b3762c34e sys: Enable -Wunused-but-set-variable for GCC.
It has been enabled for clang for a while now.

Reviewed by:	emaste
Differential Revision:	https://reviews.freebsd.org/D39358
2023-04-10 10:36:33 -07:00
John Baldwin
8e9db62e74 zfs: Appease set by unused warnings for spl_fstrans_*mark stubs.
Use a void cast to mark the cookie value as used in spl_fstrans_unmark.

Reported by:	GCC
Differential Revision:	https://reviews.freebsd.org/D39357
2023-04-10 10:36:14 -07:00
John Baldwin
5328efb3d0 if_mos: Remove set but unused variable.
Reviewed by:	hselasky
Reported by:	GCC
Differential Revision:	https://reviews.freebsd.org/D39356
2023-04-10 10:35:48 -07:00
John Baldwin
4b6228906f libalias: Mark set but unused variables as unused.
This function is clearly a stub, but it seems better to leave the stub
bits in place than to remove the function entirely.

Differential Revision:	https://reviews.freebsd.org/D39355
2023-04-10 10:35:29 -07:00
John Baldwin
16df72a9a2 udf: Remove set but unused variable from udf_getattr.
Reviewed by:	emaste
Reported by:	GCC
Differential Revision:	https://reviews.freebsd.org/D39354
2023-04-10 10:31:45 -07:00
John Baldwin
3a9e6624eb rtw88: Silence unused but set warnings from GCC for debug.c.
Reviewed by:	bz
Differential Revision:	https://reviews.freebsd.org/D39353
2023-04-10 10:31:26 -07:00