Commit Graph

147403 Commits

Author SHA1 Message Date
Gleb Smirnoff
c3c20de3b2 tcp: move HPTS/LRO flags out of inpcb to tcpcb
These flags are TCP specific.  While here, make also several LRO
internal functions to pass tcpcb pointer instead of inpcb one.

Reviewed by:		rrs
Differential Revision:	https://reviews.freebsd.org/D39698
2023-04-25 12:19:48 -07:00
Gleb Smirnoff
c2a69e846f tcp_hpts: move HPTS related fields from inpcb to tcpcb
This makes inpcb lighter and allows future cache line optimizations
of tcpcb.  The reason why HPTS originally used inpcb is the compressed
TIME-WAIT state (see 0d7445193a), that used to free a tcpcb, while the
associated connection is still on the HPTS ring.

Reviewed by:		rrs
Differential Revision:	https://reviews.freebsd.org/D39697
2023-04-25 12:18:33 -07:00
Gleb Smirnoff
144259f673 tcp: purge the input queue from tcp_discardcb()
The purge was intentionally removed in a540cdca31.  My assumption
was that the stacks that use the input queue always call the
tcp_handle_orphaned_packets() in their tfb_tcp_fb_fini method.
However, rack will skip doing that if t_fb_ptr is NULL and there are
scenarios when it is NULL, e.g. close(2) on a socket (but some
special close(2)).  Instead of working out all possible scenarios
let's put this safebelt back.

Reviewed by:		rrs
Differential Revision:	https://reviews.freebsd.org/D39696
2023-04-25 12:18:19 -07:00
Justin Hibbits
f46a05b5d3 al_eth: Finish conversion to IfAPI
Reviewed by:	zlei
Sponsored by:	Juniper Networks, Inc.
Differential Revision: https://reviews.freebsd.org/D38955
2023-04-25 14:25:31 -04:00
Justin Hibbits
cfaab41c95 irdma: Convert to IfAPI
Mostly mechanical changes, with some reworking in irdma_cm for iterating
over interfaces and addresses.  Further rework by Bartosz Sobczak.

Reviewed by:	bartosz.sobczak_intel.com
Tested by:	mateusz.moga_intel.com
Sponsored by:	Juniper Networks, Inc.
Differential Revision: https://reviews.freebsd.org/D38960
2023-04-25 14:25:31 -04:00
Mateusz Guzik
95e4f5ef7c x86: whack pmspcv from GENERIC
The driver is enormous and rarely used.

      text      data       bss        dec         hex   filename
  23076646   1870505   4415872   29363023   0x1c00b4f   kernel.before
  20017433   1870305   4416000   26303738   0x1915cfa   kernel.after

People using the driver will need to add pmspcv_load="YES" to
their loader.conf.

Reviewed by:	jhb
Relnotes:	yes
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D39816
2023-04-25 18:09:44 +00:00
Dimitry Andric
9ea31d78f0 al_eth: make function definitions consistent with declarations
The declarations for al_eth_lm_retimer_ds25_signal_detect() and
al_eth_lm_retimer_ds25_cdr_lock() say that these functions return
'al_bool', but the definitions actually return 'boolean_t'.

Make the definitions match the declarations.

Reviewed by:	jhb, emaste
MFC after:	3 days
Differential Revision: https://reviews.freebsd.org/D39759
2023-04-25 20:02:04 +02:00
Dimitry Andric
3029b0b0e9 boolean_t: change to unsigned int to avoid signed bitfield warnings
This is the final part, which actually makes boolean_t unsigned. Note
that we do not change its size, nor do we try to change it directly to
bool, since that results in a lot of regressions.

Converting the remaining instances of boolean_t to plain C99 bool can
now be done in a piecemeal fashion, after which boolean_t may hopefully
be retired.

MFC after:	1 week
Reviewed by:	jhb
Differential Revision: https://reviews.freebsd.org/D39753
2023-04-25 19:58:18 +02:00
Dimitry Andric
f74be55e30 vm: fix a number of functions to match the expected prototypes
Noticed while attempting to make boolean_t unsigned: some vm-related
function declarations and defintions were using boolean_t where they
should have used int, and vice versa.

MFC after:	1 week
Reviewed by:	jhb
Differential Revision: https://reviews.freebsd.org/D39753
2023-04-25 19:58:18 +02:00
Dimitry Andric
64b5d74fff zfs: make zfs_vfs_held() definition consistent with declaration
Noticed while attempting to change boolean_t into an actual bool: in
include/sys/zfs_ioctl_impl.h, zfs_vfs_held() is declared to return a
boolean_t, but in module/os/freebsd/zfs/zfs_ioctl_os.c it is defined to
return an int. Make the definition match the declaration.

Obtained from:	https://github.com/openzfs/zfs/commit/62cc9d4f6
Reviewed by:	jhb
MFC after:	1 week
Differential Revision: https://reviews.freebsd.org/D39753
2023-04-25 19:58:17 +02:00
Mark Johnston
47cf1b37f4 vmm: Expose some more AVX512 CPUID bits to guests
This is required to announce support for some accelerated AES
operations.  AVX512BW indicates support for the AVX512-FP16 extension
and AVX512VL indicates support for the use of AVX512 instructions with
vector lengths smaller than 512 bits.

VAES and VPCLMULQDQ extensions indicate that VEX-prefixed AES-NI and
pclmulqdq instructions are supported.

All of these bits are needed for OpenSSL to use VAES to accelerate
AES-GCM transforms.

Reviewed by:	corvink, kib, jhb
MFC after:	2 weeks
Sponsored by:	Stormshield
Sponsored by:	Klara, Inc.
Differential Revision:	https://reviews.freebsd.org/D39781
2023-04-25 13:35:14 -04:00
Dimitry Andric
bab8274c09 Use bool for one-bit wide bit-fields
A signed one-bit wide bit-field can take only the values 0 and -1. Clang
16 introduced a warning that "implicit truncation from 'int' to a
one-bit wide bit-field changes value from 1 to -1". Fix the warnings by
using C99 bool.

Reported by:	Clang 16
Reviewed by:	emaste, jhb
MFC after:	3 days
Differential Revision: https://reviews.freebsd.org/D39705
2023-04-25 19:26:03 +02:00
Mateusz Guzik
b0695c6abe zfs: Revert "Fix data race between zil_commit() and zil_suspend()"
This reverts commit 4c856fb333.

To quote a pending upstream PR:
This reverts commit 4c856fb to resolve a newly introduced deadlock which
in practice is more disruptive that the issue this commit intended to
address.

Causes deadlocks described in https://github.com/openzfs/zfs/issues/14775

Sponsored by:	Rubicon Communications, LLC ("Netgate")
2023-04-25 16:01:22 +00:00
Allan Jude
c98ecfceb3 Add support for zpool user properties
Usage:

    zpool set org.freebsd:comment="this is my pool" poolname

Tests are based on zfs_set's user property tests.

Also stop truncating property values at MAXNAMELEN, use ZFS_MAXPROPLEN.

Reviewed by:	markj
Approved by:	markj
Co-authored-by:	Mateusz Piotrowski <0mp@FreeBSD.org>
Obtained from:	OpenZFS 8eae2d214c Add support for zpool user properties
Sponsored by:	Beckhoff Automation GmbH & Co. KG.
Sponsored by:	Klara Inc.
Differential Revision: https://reviews.freebsd.org/D39657
2023-04-25 17:23:07 +02:00
Mateusz Guzik
b4d3dd8615 zfs: fix up bogus checksums with blake3 in face of cpu migration
This is a temporary measure until a better fix is sorted out.

Upstream report: https://github.com/openzfs/zfs/issues/14785
Reported by:	Evgeniy Khramtsov
Sponsored by:	Rubicon Communications, LLC ("Netgate")
2023-04-25 15:20:19 +00:00
Alexander V. Chernikov
04349d3094 netlink: remove now-unused rtnl_iface_find_cloner_locked(). 2023-04-25 15:04:11 +00:00
Alexander V. Chernikov
9e81e2c452 netlink: fix powerpc build. 2023-04-25 14:59:04 +00:00
Mark Johnston
722b2e2f9a dtrace: Sync dis_tables.c with illumos
This brings in the following commits:

    commit 584b574a3b16c6772c8204ec1d1c957c56f22a87
    12174 i86pc: variable may be used uninitialized
    Author: Toomas Soome <tsoome@me.com>
    Reviewed by: John Levon <john.levon@joyent.com>
    Reviewed by: Andrew Stormont <astormont@racktopsystems.com>
    Approved by: Dan McDonald <danmcd@joyent.com>

    commit a25e615d76804404e5fc63897a9196d4f92c3f5e
    12371 dis x86 EVEX prefix mishandled
    12372 dis EVEX encoding SIB mishandled
    12373 dis support for EVEX vaes instructions
    12374 dis support for EVEX vpclmulqdq instructions
    12375 dis support for gfni instructions
    Author: Robert Mustacchi <rm@fingolfin.org>
    Reviewed by: Jerry Jelinek <jerry.jelinek@joyent.com>
    Approved by: Joshua M. Clulow <josh@sysmgr.org>

    commit c1e9bf00765d7ac9cf1986575e4489dd8710d9b1
    12369 dis WBNOINVD support
    Author: Robert Mustacchi <rm@joyent.com>
    Reviewed by: Hans Rosenfeld <hans.rosenfeld@joyent.com>
    Reviewed by: Jerry Jelinek <jerry.jelinek@joyent.com>
    Reviewed by: Andy Fiddaman <andy@omniosce.org>
    Reviewed by: Toomas Soome <tsoome@me.com>
    Approved by: Dan McDonald <danmcd@joyent.com>

    commit e4f6ce7088a7dd335b9edf4774325f888692e5fb
    10893 Need support for new Cascade Lake Instructions
    Author: Robert Mustacchi <rm@joyent.com>
    Reviewed by: Hans Rosenfeld <hans.rosenfeld@joyent.com>
    Reviewed by: Dan McDonald <danmcd@joyent.com>
    Reviewed by: Richard Lowe <richlowe@richlowe.net>
    Approved by: Gordon Ross <gwr@nexenta.com>

    commit cff040f3ef42d16ae655969398f5a5e6e700b85e
    10226 Need support for new EPYC ISA extensions
    Author: Robert Mustacchi <rm@joyent.com>
    Reviewed by: Hans Rosenfeld <hans.rosenfeld@joyent.com>
    Reviewed by: Jason King <jason.king@joyent.com>
    Reviewed by: Richard Lowe <richlowe@richlowe.net>
    Approved by: Dan McDonald <danmcd@joyent.com>

    commit d242cdf5288b86d9070d88791c8ee696612becdc
    8492 AVX512 dis - legacy logical instructions
    Author: Jerry Jelinek <jerry.jelinek@joyent.com>
    Reviewed by: Robert Mustacchi <rm@joyent.com>
    Reviewed by: Gordon Ross <gordon.w.ross@gmail.com>
    Approved by: Richard Lowe <richlowe@richlowe.net>

    commit 81b505b772ab015c588c56bb116239ee549b6eee
    8384 AVX512 dis - EVEX prefix support
    8385 32-bit avx dis test mishandles EVEX prefix
    8386 32-bit bound dis is incorrect
    Author: Jerry Jelinek <jerry.jelinek@joyent.com>
    Reviewed by: Robert Mustacchi <rm@joyent.com>
    Reviewed by: Gordon Ross <gordon.w.ross@gmail.com>
    Approved by: Richard Lowe <richlowe@richlowe.net>

    commit 92381362ae635a3bea638d87b7119f1623b6212e
    8319 dis support for new xsave instructions
    Author: Jerry Jelinek <jerry.jelinek@joyent.com>
    Reviewed by: Robert Mustacchi <rm@joyent.com>
    Reviewed by: Gordon Ross <gordon.w.ross@gmail.com>
    Approved by: Richard Lowe <richlowe@richlowe.net>

    commit a4e73d5d60e566669c550027fae2b1d87b4be2b4
    8240 AVX512 dis - opmask instruction support
    Author: Jerry Jelinek <jerry.jelinek@joyent.com>
    Reviewed by: Robert Mustacchi <rm@joyent.com>
    Reviewed by: Toomas Soome <tsoome@me.com>
    Approved by: Gordon Ross <gordon.w.ross@gmail.com>

    959b2dfd39979fe8a9a315a52741d009eb168822
    7825 want avx dis tests
    7826 PCLMULQDQ psuedo-ops aren't properly described in dis
    7827 dis tests for f16c, movbe, cpuid, msr, tsc, fence instrs
    7828 sysenter and sysexit dis should be allowed in 64-bit x86
    Author: Robert Mustacchi <rm@joyent.com>
    Reviewed by: Jerry Jelinek <jerry.jelinek@joyent.com>
    Approved by: Richard Lowe <richlowe@richlowe.net>

MFC after:	2 weeks
2023-04-25 10:46:10 -04:00
Boris Lytochkin
fc727ad63d ipfw: add [fw]mark implementation for ipfw
Packet Mark is an analogue to ipfw tags with O(1) lookup from mbuf while
regular tags require a single-linked list traversal.
Mark is a 32-bit number that can be looked up in a table
[with 'number' table-type], matched or compared with a number with optional
mask applied before comparison.
Having generic nature, Mark can be used in a variety of needs.
For example, it could be used as a security group: mark will hold a security
group id and represent a group of packet flows that shares same access
control policy.

Reviewed By: pauamma_gundo.com
Differential Revision: https://reviews.freebsd.org/D39555
MFC after:	1 month
2023-04-25 12:40:23 +00:00
Alexander V. Chernikov
089104e0e0 netlink: add netlink interfaces to if_clone
This change adds netlink create/modify/dump interfaces to the `if_clone.c`.
The previous attempt with storing the logic inside `netlink/route/iface_drivers.c`
 did not quite work, as, for example, dumping interface-specific state
 (like vlan id or vlan parent) required some peeking into the private interfaces.

The new interfaces are added in a compatible way - callers don't have to do anything
unless they are extended with Netlink.

Reviewed by:	kp
Differential Revision: https://reviews.freebsd.org/D39032
MFC after:	1 month
2023-04-25 12:34:46 +00:00
Alexander V. Chernikov
acc65df45a netlink: require proper privileges when adding neighbor.
MFC after:	3 days
2023-04-25 12:28:22 +00:00
Alexander V. Chernikov
896e22fbc6 netlink: fix neighbour deleting for IPv6.
MFC after:	2 weeks
2023-04-25 12:27:02 +00:00
Alexander V. Chernikov
e83f23eb5e netlink: enable extended error reporting in snl(3).
MFC after:	2 weeks
2023-04-25 11:21:03 +00:00
Alexander V. Chernikov
5af9ad5359 netlink: add snl(3) support for dumping nexthops and neighbors
MFC after:	2 weeks
2023-04-25 11:14:12 +00:00
Alexander V. Chernikov
b32cf15d86 netlink: add support for dumping kernel nexthops.
MFC after:	2 weeks
2023-04-25 11:12:18 +00:00
Alexander V. Chernikov
a2728a9a5b netlink: allow creation of temporary lle entries.
MFC after:	2 weeks
2023-04-25 11:08:47 +00:00
Alexander V. Chernikov
ca1850478f lltable: properly set expire time to 0 for static IPv4 entries.
MFC after:	2 weeks
2023-04-25 10:59:50 +00:00
Alexander V. Chernikov
fab828b455 netlink: fix parameters in snl_attr_get_flag()
MFC after:	2 weeks
2023-04-25 10:57:59 +00:00
Alexander V. Chernikov
70810dc817 netlink: add nlattr_get_uint8() function to pack u8 attributes.
MFC after:	2 weeks
2023-04-25 10:56:42 +00:00
Alexander V. Chernikov
34066d0008 routing: add iterator-based nhop traversal KPI.
MFC after:	2 weeks
2023-04-25 10:55:16 +00:00
Alexander V. Chernikov
fd1aa866eb routing: add rt_tables_get_rnh_safe() that doesn't panic when af/fib is
incorrect.

MFC after:	2 weeks
2023-04-25 10:53:51 +00:00
Andrew Turner
6a9c2e63be Add padding for future use on arm64
Allow new features to be supported without changing the size of
existing structures.

Reviewed by:	kib
Sponsored by:	Arm Ltd
Differential Revision:	https://reviews.freebsd.org/D39777
2023-04-25 10:23:15 +01:00
Vladimir Kondratyev
19c804b74f bcm5974(4): Make Magic Trackpad 2 support endian-safe.
While here make touch orientation event matching with Linux

MFC after:	1 month
2023-04-25 12:20:53 +03:00
Val Packett
ef8397c28e bcm5974(4): add Magic Trackpad 2 (USB only) support
The MT2 uses a compact report format, but otherwise is similar in many
ways to the internal trackpads, it even uses the same mode switching
commands.

Reviewed by:	wulf
MFC after:	1 month
Differential revision:	https://reviews.freebsd.org/D34437
2023-04-25 12:20:53 +03:00
Stefan Eßer
88a795e80c sys/fs: do not report blocks allocated for synthetic file systems
The pseudo file systems (devfs, fdescfs, procfs, etc.) report total
and available blocks and inodes despite being synthetic with no
underlying storage device to which those values could be applied.

The current code of these file systems tends to report a fixed number
of total blocks but no free blocks, and in the case of procfs,
libprocfs, linsysfs also no free inodes.

This can be irritating in e.g. the "df" output, since 100% of the
resources seem to be in use, but it can also create warnings in
monitoring tools used for capacity management.

This patch makes these file systems return the same value for the
total and free parameters, leading to 0% in use being displayed by
"df". Since there is no resource that can be exhausted, this appears
to be a sensible result.

Reviewed by:	mckusick
Differential Revision:	https://reviews.freebsd.org/D39442
2023-04-25 09:59:15 +02:00
Stefan Eßer
0728695c63 fs/msdosfs: Fix potential panic and size calculations
Some combinations of FAT12 file system parameters could cause a kernel
panic due to an unmapped access if the size of the FAT was larger than
the CPU page size. The reason is that FAT12 uses 3 bytes to store
2 FAT pointers, leading to partial FAT pointers at the end of buffers
of a size that is not a multiple of 3.

With a typical page size of 4 KB, this caused the FAT entry at byte
offsets 4095 and 4096 to cross the page boundary, with only the first
page mapped. This was fixed by adjusting the mapping to always cover
both bytes of each FAT entry.

Testing revealed 2 other inconsistencies that are fixed by this commit:

1) The calculation of the size of the data area did not take into
   account the fact that the first two data block numbers are reserved
   and that the data area starts with block 2. This could cause a
   FAT12 file system created with the maximum supported number of
   blocks to be incorrectly identified as FAT16.

2) The root directory does not take up space in the data area of a
   FAT12 or FAT16 file system, since it is placed into a reserved
   area outside of that data area. This commits makes stat() report
   the logical size of the root directory, but with 0 blocks allocated
   from the data area.

PR:		270587
Reviewed by:	mckusick
Differential Revision:	https://reviews.freebsd.org/D39386
2023-04-25 09:58:29 +02:00
Konstantin Belousov
04d815f115 netipsec/key.c: use designated initializers for arrays
Also de-expand nitems() use in related asserts, and fix maxsize array
name in the assert message.

Sponsored by:	NVidia networking
2023-04-25 09:41:24 +03:00
Konstantin Belousov
fcc7aabdca netipsec: some style
Sponsored by:	NVidia networking
2023-04-25 09:39:51 +03:00
Cheng Cui
1f782fcc0c
Remove unused fields in siftr_stats. Thus, update the man page as well.
Summary: Remove unused fields in siftr_stats. Thus, update the man page as well.

Test Plan: Tested in Emulab testbed.

Reviewers: rscheff, tuexen
Approved by: rscheff, tuexen
Subscribers: imp, melifaro, glebius
Differential Revision: https://reviews.freebsd.org/D39776
2023-04-24 15:31:15 -04:00
Cheng Cui
8aa2be695e
Correct the value of macro TF2_TCP_ACCOUNTING.
Summary: Make sure the values are in order.

Reviewers: rscheff, tuexen, #transport!
Approved by: rscheff, tuexen, glebius
Subscribers: imp, melifaro, glebius
Differential Revision: https://reviews.freebsd.org/D39716
2023-04-24 15:28:41 -04:00
Olivier Certner
6a5e614015 vn_open_vnode(): fix locking around VOP_CLOSE() on advisory lock error
In the case of a FIFO or if trying to open a file for writing, an
exclusive lock is necessary.

Reviewed by:	kib
MFC after:	1 week
2023-04-25 01:37:58 +03:00
Olivier Certner
faec42a3ef sys/dirent.h: comment update, 'd_off' is offset of next entry
This is the historical (and still current) behavior, as well as that of
NetBSD, OpenBSD, illumos and Linux (getdents()/getdents64()).

Reviewed by:	kib
MFC after:	3 days
2023-04-25 00:32:10 +03:00
Konstantin Belousov
a718431c30 lookup(): ensure that openat("/", "..", O_RESOLVE_BENEATH) fails
PR:	269780
Reported by:	Dan Gohman <dev@sunfishcode.online>
Reviewed by:	emaste, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D39773
2023-04-25 00:32:10 +03:00
John Baldwin
9b02f2daf4 powerpc: Use valid prototypes for function declarations with no arguments.
Reviewed by:	emaste
Differential Revision:	https://reviews.freebsd.org/D39733
2023-04-24 08:53:50 -07:00
Andrew Turner
c94e4d91da Clean up PCI DEN0115 driver probing
Rather than checking for the SMCCC version check if the PCI_VERSION
call returns a valid version.

Sponsored by:	Arm Ltd
2023-04-24 16:34:21 +01:00
Warner Losh
d4c78130f4 powerpc: syscalls.c is standard
No need to add it here, much less make it optional on ktr.

Sponsored by:		Netflix
2023-04-24 09:25:42 -06:00
Justin Hibbits
02f3b17fa5 Mechanically convert Xen netfront/netback(4) to IfAPI
Reviewed by:	zlei
Sponsored by:	Juniper Networks, Inc.
Differential Revision: https://reviews.freebsd.org/D37800
2023-04-24 09:54:23 -04:00
Justin Hibbits
7814374b7c IfAPI: Hide the macros that touch ifnet members
Nothing should be directly touching the ifnet members, which are hidden
in <net/if_private.h>, so hide them in the same header to avoid errors
from users.

Sponsored by:	Juniper Networks, Inc.
2023-04-24 09:54:23 -04:00
Justin Hibbits
97583aa256 linuxkpi: Migrate to IfAPI
Summary:
Trivial changes for LinuxKPI to use IfAPI.  The 'bsdifp' looks unused,
so removed it instead of converting it to a pointer.

Bump __FreeBSD_version for change to struct net_device.

Reviewed by:	bz, hselasky
Sponsored by:	Juniper Networks, Inc.
Differential Revision: https://reviews.freebsd.org/D39491
2023-04-24 09:54:22 -04:00
Andrew Turner
c3785c3eb0 Remove unneeded SMMU macros
Sponsored by:	Arm Ltd
Differential Revision:	https://reviews.freebsd.org/D39186
2023-04-24 12:48:01 +01:00
Andrew Turner
cdd34e0038 Remove virtual addresses from smmu_pmap_remove_pages
This function needs to unmap all memory in a given SMMU context. Have
it iterate over all page table entries to find what has been mapped
rather than looking at virtual addresses.

While here use SMMU specific macros.

Sponsored by:	Arm Ltd
Differential Revision:	https://reviews.freebsd.org/D39185
2023-04-24 12:47:55 +01:00
Andrew Turner
b97e94d91e Move to a SMMU specific struct for the smmu pmap
This is not managed through the VM subsystem so only needs to hold the
data the SMMU driver needs.

Sponsored by:	Arm Ltd
Differential Revision:	https://reviews.freebsd.org/D39184
2023-04-24 12:47:50 +01:00
Andrew Turner
49ee1a7ef0 Create a common function to get the SMMU sid
Now the PCI drivers have a common interface to read the IOMMU xref
and SID create a common function to read it. This fixes an issue where
we will call into an ACPI specific function when booting with FDT when
both are enabled.

Sponsored by:	Arm Ltd
Differential Revision:	https://reviews.freebsd.org/D39183
2023-04-24 12:47:44 +01:00
Andrew Turner
913d04deed Add PCI_ID_OFW_IOMMU to the pci ecam ACPI driver
Teach the pci host generic ACPI attachment about PCI_ID_OFW_IOMMU. This
will be used by the arm64 smmu IOMMU driver to read the xref and ID
this interface provides in a bus-agnostic way.

Sponsored by:	Arm Ltd
Differential Revision:	https://reviews.freebsd.org/D39182
2023-04-24 12:47:38 +01:00
Andrew Turner
117beba8a8 arm64: Clean up smmu fdt xref handling
Use the xref from OF_xref_from_node for the smmu xref. We already have
a valid xref ID, there is no need to convert this to a memory address.

Sponsored by:	Arm Ltd
Differential Revision:	https://reviews.freebsd.org/D39181
2023-04-24 12:47:31 +01:00
Andrew Turner
8bc94f256e Remove redundant data from pci host generic
The bus tag and handle fields are already stored in the resource. Use
this with the bus_read/bus_write helper macros.

Sponsored by:	Arm Ltd
2023-04-24 12:33:50 +01:00
Andrew Turner
078a69abcb Use a uint64_t to store the arm64 mpidr
Use a single uint64_t to hole the mpidr register as we can break the
KBI on 14. Keep the macro so code can still be MFCd to 13.

Sponsored by:	Arm Ltd
2023-04-24 12:33:50 +01:00
Andrew Turner
c9a05c0722 Add a PCI driver that follows the Arm DEN0115 spec
Add a n attachment to the pci_host_generic driver for the Arm DEN0115
PCI Configuration Space Access Firmware Interface [1]. This can be used
when PCI controllers need to implement quirks in the PCI root bus.
To handle this the firmware implements a SMCCC interface the driver can
use to read and write the configuration register.

This has been tested on a Raspberry Pi 4 booting with EDK2.

[1] https://developer.arm.com/documentation/den0115/latest

Sponsored by:	Arm Ltd
Differential Revision:	https://reviews.freebsd.org/D39228
2023-04-24 12:33:50 +01:00
Andrew Turner
7029f2c887 Allow pci_host_generic attachments to manage registers
To allow for attachments that don't use memory mapped registers add
a flag they can set when the base driver shouldn't map them.

Sponsored by:	Arm Ltd
Differential Revision:	https://reviews.freebsd.org/D39227
2023-04-24 12:33:50 +01:00
Andrew Turner
fb421e96c0 Make arm64 pcb padding explicit
There is padding between some fields. Mark those I have found so they
can be reused later if needed.

Sponsored by:	Arm Ltd
2023-04-24 12:33:50 +01:00
Andrew Turner
1bf4991bbc Remove unneeded masks from the arm64 KASAN shadow
When mapping the arm64 KASAN shadow map we use Ln_TABLE_MASK to align
physical addresses, however these should already be aligned either
by rounding to a greater alignment, or the VM subsystem is giving us
a correctly aligned page.

Remove these extra alignment masks.

Reviewed by:	kevans
Sponsored by:	Arm Ltd
Differential Revision:	https://reviews.freebsd.org/D39752
2023-04-24 12:33:50 +01:00
Hu Shunchao
6ed3b9ca25 hid: fix typo in hid_is_collection
hid_input is equal to 0. It is leftover from NetBSD code.

Reviewed by:	hselasky, wulf
MFC after:	1 month
Differential revision:	https://reviews.freebsd.org/D28149
2023-04-24 14:17:51 +03:00
Val Packett
176939bd36 bcm5974: fix wellspring9 pressure settings to handle force sensitivity
Reviewed by:	wulf
MFC after:	1 month
Differential revision:	https://reviews.freebsd.org/D34435
2023-04-24 12:41:52 +03:00
Val Packett
1f40866feb intelspi: add PCI attachment (Lynx/Wildcat/Sunrise Point)
Also adds fixups and cleanups:

- apply the child's mode/speed
- implement suspend/resume support
- use RF_SHAREABLE interrupts
- use bus_delayed_attach_children since the transfer can use interrupts
- add support for newly added spibus features (cs_delay and flags)

Operation tested on Broadwell (Wildcat Point) MacBookPro12,1.
Attachment also tested on Kaby Lake (Sunrise Point) Pixelbook.

Reviewed by:	wulf
MFC after:	1 month
Differential revision:	https://reviews.freebsd.org/D29249
2023-04-24 12:41:52 +03:00
Val Packett
3c08673438 spibus: extend API: add cs_delay ivar, KEEP_CS and NO_SLEEP flags
These feature are required for an upcoming Apple MacBook topcase
(HID over SPI) driver:

A delay after toggling CS is required to avoid anomalies like an extra
junk byte in front of the message. Keeping CS asserted is required to
be able to read a status report after writing a command. (The device
won't return the status if CS was deasserted.)

Sleep is not allowed in the interrupt context where the Apple input
driver runs its transactions. Use a flag to tell the SPI driver to
avoid mtx_sleep.

Reviewed by:	manu (ok to SPI part of larger patch)
MFC afret:	1 month
Differential revision:	https://reviews.freebsd.org/D29534
2023-04-24 12:41:52 +03:00
Val Packett
b344bd3a7d ext2fs: extract crc16 into sys/crc16.h
deduplicate this as it might be needed for other drivers (e.g. Apple SPI-HID)

Sponsored by:	https://www.patreon.com/valpackett
Reviewed by:	chuck, imp
MFC after:	1 month
Differential revision:	https://reviews.freebsd.org/D32879
2023-04-24 12:41:52 +03:00
Bjoern A. Zeeb
da8fa4e37a ath10k: import ath10k driver
Import ISC-licensed ath10k driver assumed to be
based on Linux kvalo/ath.git master at
6bae9de622d3ef4805aba40e763eb4b0975c4f6d.

Import support to redirect fwlogs to kernel messages
from https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/389075

Complement the driver to make compile on FreeBSD
using LinuxKPI with changes covered by #ifdef (__FreeBSD__).
Further select updates were applied since the initial import
in order to keep compiling along with other LinuxKPI based
drivers.

Any other native driver using BUS_PROBE_DEFAULT will attach
ignoring this one by default given bsd_probe_return is set
to a lower priority.

Add the module build framework.

We only support PCI parts.

The firmware is provided by port net/wifi-firmware-ath10k-kmod.

Given the lack of full license texts on most files this is
imported under the draft policy for handling SPDX files (D29226). [1]

Approved by:	core (emaste, 2022-04-08) [1]
MFC after:	2 months
2023-04-23 21:31:07 +00:00
Bjoern A. Zeeb
ebacd8013f athk: import common code for ath1?k drivers
Import common ISC-licensed athk parts assumed to be
based on Linux kvalo/ath.git master at
6bae9de622d3ef4805aba40e763eb4b0975c4f6d.

The only modification should be for FreeBSD module
handling in main.c.

Add the module build framework unconnected to the
build for now.

These files will be shared by ath1?k drivers.

MFC after:	2 months
2023-04-23 21:31:07 +00:00
Bjoern A. Zeeb
06a1103fe3 ath10k: ath11k: add specific LinuxKPI support
Add files needed by ath1?k drivers to linuxkpi/linuxkpi_wlan.
This contain (skeleton) implementations of what is needed to
compile but specifically mhi/qmi/qrtr will need more work for
ath11k.

MFC after:	2 months
2023-04-23 21:31:07 +00:00
Bjoern A. Zeeb
3c4ba5f554 mt76: add module build framework and man pages
Add framework to build if_mt7915 and if_mt7921 with LinuxKPI
as well as initial man pages for the two mt76 chipset drivers.

MFC after:	2 months
2023-04-23 21:31:07 +00:00
Bjoern A. Zeeb
6c92544d7c mt76: import mediatek/mt76 driver
Import ISC-licensed driver parts of mediatek/mt76
assumed to be based on Linux wireless-testing at
a02411a5b98612c12be99349836d99f07db12a77 (tag: wt-2022-11-23).

Complement the driver and LinuxKPI with our own (dummy)
implementations of missing parts (util.h and soc/mediatek/)
as well as changes to make compile on FreeBSD with changes
covered by #ifdef (__FreeBSD__) conditions.
Further select updates were applied since the initial import
in order to keep compiling along with other LinuxKPI based
drivers.

For the moment we only target the mt7915 and mt7921 PCI parts.
More may follow in the future.

Firmware is provided by port net/wifi-firmware-mt76-kmod.

Given the lack of full license texts on non-local files this is
imported under the draft policy for handling SPDX files (D29226). [1]

Approved by:	core (emaste, 2022-04-08) [1]
MFC after:	2 months
2023-04-23 21:29:49 +00:00
Dimitry Andric
42162fb2fe kcsan: add __tsan_mem(cpy|move|set) aliases for clang >= 16
Summary:
After https://github.com/llvm/llvm-project/commit/b4257d3bf58c ("[tsan]
Replace mem intrinsics with calls to interceptors") intrinsic calls to
memcpy, memmove or memset will directly call sanitizer interceptors,
e.g. __tsan_memcpy, __tsan_memmove or __tsan_memset.

Building GENERIC-KCSAN with clang >= 16 would thus result in link errors
similar to:

  ld: error: undefined symbol: __tsan_memcpy
  >>> referenced by cam_compat.c:150 (/usr/src/sys/cam/cam_compat.c:150)
  >>>               cam_compat.o:(cam_compat_handle_0x17)
  >>> referenced by cam_compat.c:151 (/usr/src/sys/cam/cam_compat.c:151)
  >>>               cam_compat.o:(cam_compat_handle_0x17)
  >>> referenced by cam_compat.c:152 (/usr/src/sys/cam/cam_compat.c:152)
  >>>               cam_compat.o:(cam_compat_handle_0x17)
  >>> referenced 1692 more times

Similar to subr_msan.c, add aliases from the existing kcsan_* versions
of these functions to __tsan_* names.

Reviewed by:	markj
MFC after:	3 days
Differential Revision: https://reviews.freebsd.org/D39772
2023-04-23 20:59:06 +02:00
Mark Johnston
4b39a12830 arm64: Disable PAC when booting on a Windows Dev Kit 2023
It appears that PAC registers are configured to trap upon access, but
since the kernel starts in EL1 on this platform it has no ability to
inspect or modify this configuration.  Simply disable PAC on this
platform for now, since the kernel otherwise hangs during boot.

PR:		270472
Reviewed by:	andrew, emaste
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D39748
2023-04-23 13:55:57 -04:00
Mark Johnston
ff13b92475 riscv: Implement bus_describe_intr() for nexus
Reviewed by:	mhorne
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D39750
2023-04-23 13:55:57 -04:00
Mark Johnston
7623cc8f65 arm64: Implement bus_describe_intr() for nexus
Prompted by a compiler warning introduced by
e582d4a2b0 ("arm64: nexus code tidy-up").

Reviewed by:	mhorne, andrew
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D39749
2023-04-23 13:55:57 -04:00
Mark Johnston
93998d166f inpcb: Fix some bugs in _in_pcbinshash_wild()
- In _in_pcbinshash_wild(), we should avoid returning v6 sockets unless
  no other matches are available.  This preserves pre-existing
  semantics.
- Fix an inverted test: when inserting a non-jailed PCB, we want to
  search for the first non-jailed PCB in the hash chain.
- Test the right PCB when searching for a non-jailed PCB.

While here, add a required locking assertion.

Fixes:	7b92493ab1 ("inpcb: Avoid inp_cred dereferences in SMR-protected lookup")
2023-04-23 13:55:57 -04:00
Michael Tuexen
66d6fd5322 sctp: use constants from RFC 8260 to improve compliance
Keep the old constants for backwards compatibility.

MFC after:	1 week
2023-04-23 17:48:05 +02:00
Bjoern A. Zeeb
0e8953b94b LinuxKPI: pci.h: always initialize return value
In pcie_capability_read_*() always initialize the return value to
avoid warnings of uninitialized values in callers.

Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
Differential Revision:	https://reviews.freebsd.org/D39721
2023-04-23 13:29:30 +00:00
Dimitry Andric
d142601887 powerpc: fix a few pmap related functions to return correct types
While experimenting with changing boolean_t to another type, I noticed
that several powerpc pmap related functions returned the wrong type:
boolean_t instead of int.

Fix several declarations and definitions to match the actual pmap
function types: pmap_dev_direct_mapped_t and pmap_ts_referenced_t.

MFC after:	3 days
2023-04-23 15:23:04 +02:00
Zhenlei Huang
b658c0fce1 ip_mroute: Delete unreachable code
As the flag M_WAITOK is passed to ip_encap_attach(), then the function
will never return NULL, and the following code within NULL check branch
will be unreachable.

No functional change intended.

Reviewed by:	kp
Fixes:		6d8fdfa9d5 Rework IP encapsulation handling code
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D39746
2023-04-23 12:47:57 +08:00
Zhenlei Huang
c373e1d6ad if_stf: Delete unreachable code
As the flag M_WAITOK is passed to ip_encap_attach(), then the function
will never return NULL, and the following code within NULL check branch
will be unreachable.

No functional change intended.

Reviewed by:	kp
Fixes:		6d8fdfa9d5 Rework IP encapsulation handling code
MFC after:	1 week
Differential Revision:  https://reviews.freebsd.org/D39746
2023-04-23 12:47:57 +08:00
Dmitry Chagin
66e8f1f7d3 linux(4): Fix arm64 build after b7a6bcdd, missed chunk added
MFC after:		1 month
2023-04-23 01:41:12 +03:00
Dmitry Chagin
0ef30b5d6b linux(4): Bump osrelease to 5.15.0
Linux kernel version 5.15 named Trick or Treat is a 22nd LTS release.

Reviewed by:		trasz, emaste
Differential Revision:	https://reviews.freebsd.org/D39649
MFC after:		1 month
2023-04-22 22:18:08 +03:00
Dmitry Chagin
b7a6bcdd13 linux(4): Export the AT_MINSIGSTKSZ depending on the process osreldata
AT_MINSIGSTKSZ has appeared in the 5.13.0 Linux kernel first time.

Differential Revision:	https://reviews.freebsd.org/D39648
MFC after:		1 month
2023-04-22 22:17:52 +03:00
Dmitry Chagin
70eab81d6f linux(4): Export the AT_EXECFN depending on the process osreldata
AT_EXECFN has appeared in the 2.6.26 Linux kernel first time.

Reviewed by:		emaste
Differential Revision:	https://reviews.freebsd.org/D39647
MFC after:		1 month
2023-04-22 22:17:36 +03:00
Dmitry Chagin
40c36c4674 linux(4): Export the AT_RANDOM depending on the process osreldata
AT_RANDOM has appeared in the 2.6.30 Linux kernel first time.

Reviewed by:		emaste
Differential Revision:	https://reviews.freebsd.org/D39646
MFC after:		1 month
2023-04-22 22:17:17 +03:00
Dmitry Chagin
56c5230afd linux(4): Fix LINUX_AT_COUNT comments
Differential Revision:	https://reviews.freebsd.org/D39645
MFC after:		1 month
2023-04-22 22:16:43 +03:00
Dmitry Chagin
7d8c983983 linux(4): Deduplicate linux_copyout_auxargs()
Export default MINSIGSTKSZ value for the x86 until we do not preserve AVX
registers in the signal context.

Differential Revision:	https://reviews.freebsd.org/D39644
MFC after:		1 month
2023-04-22 22:16:02 +03:00
Justin Hibbits
0468e89cb3 zfs/powerpc64: Fix big-endian powerpc64 asm
The powerpc asm from openzfs assumes that big-endian is always ELFv1 and
ELFv2 is always little-endian, while FreeBSD uses ELFv2 everywhere.  Add
the necessary bits to the checksum asm to work on big-endian ELFv2.

This was also submitted upstream as PR#14779.

Tested by:	dbaio
2023-04-22 11:27:49 -04:00
Vladimir Kondratyev
06c844e175 LinuxKPI: Fix building on 32bit archs
Reported by:	Jenkins
Fixes:	e5cf9deb61 ("LinuxKPI: Add bitmap_to_arr32() to <linux/bitmap.h>")
2023-04-22 13:25:49 +03:00
Vladimir Kondratyev
af22da75a0 Bump __FreeBSD_version after LinuxKPI updates 2023-04-22 11:29:29 +03:00
Vladimir Kondratyev
53d821d651 LinuxKPI: Define noinline_for_stack compiler attribute
It is identical to noinline and used for documentation reasons.

Required by:	drm-kmod 5.15-lts
Reviewed by:	manu
Differential Revision:	https://reviews.freebsd.org/D39553
2023-04-22 11:29:29 +03:00
Vladimir Kondratyev
e5cf9deb61 LinuxKPI: Add bitmap_to_arr32() to <linux/bitmap.h>
bitmap_to_arr32() copies contents of bitmap to a uint32_t array of bits

Required by:	drm-kmod 5.15-lts
Reviewed by:	manu
Differential Revision:	https://reviews.freebsd.org/D39552
2023-04-22 11:29:29 +03:00
Vladimir Kondratyev
b14c03f808 LinuxKPI: define acpi_put_table() in <acpi/acpi.h>
FreeBSD ACPICA calls it AcpiPutTable()

Required by:	drm-kmod 5.15-lts
Reviewed by:	manu
Differential Revision:	https://reviews.freebsd.org/D39551
2023-04-22 11:29:29 +03:00
Warner Losh
7b42f338d7 freebsd32: Regen
Need to regen freebsd32 as well when sys/kern/syscalls.master is
updated.

Sponsored by:		Netflix
2023-04-21 10:25:10 -06:00
Warner Losh
2c19beeed2 newvers: Use correct regexp
There's no need to quote the # here. Inside of regexp, it's not treated
like a comment from an awk perspective. And inside if '' it's not
treated as special by the shell. gawk also warns.

Sponsored by:		Netflix
2023-04-21 10:24:25 -06:00
Mark Johnston
92fa22c6a5 riscv: Compile instr_size.c into the kernel when DTrace is configured
Reported by:	Jenkins
Fixes:	080e56a6c9 ("dtrace: expose dtrace_instr_size() to userland and implement it for riscv")
2023-04-21 09:26:17 -04:00
Randall Stewart
01216268f8 tcp: hpts needs to still call output even after input.
The other stacks it turns out actually expect the output to be called and can become stuck if it is
not. This is because they run there timer code from there and the input routine does not always
assure a timer is running. The real longterm fix here might be to go into the other stacks (rack and bbr)
and make sure that a timer is running after input if you don't do output.. as well as call the timer functions.
This would cut down on calls from hpts. But I think its too dramatic of a change for the immediate time.

Reviewed by: tuexen, glebius
Sponsored by: Netflix Inc
Differential Revision:https://reviews.freebsd.org/D39738
2023-04-21 07:12:25 -04:00
Mariusz Zaborski
444c661545 mpr: don't use hardcoded value in debug branch
Pointed out by:	imp
Sponsored by:   Klara Inc.
2023-04-21 10:01:38 +02:00
Mariusz Zaborski
ea6597c38c mpr: fix copying of event_mask
Before the commit 6cc44223cb the
field event_mask was fully copied to the EventMasks field.
After this commit the event_mask (uint8_t) is 4 times casted to
EventMask (uint32_t). Because of that 24 bits of each event_mask array
is lost.

This commits brings back simple copying of field, and after words
converting 32 bits field to the requested endian.

I don't think we need more sophisticated method,
as the array is of size 4 (for 32 bits version).

Reviewed by:	imp
MFC after:	1 week
Sponsored by:	Klara Inc.
Differential Revision:	https://reviews.freebsd.org/D39562
2023-04-21 10:01:38 +02:00
Austin Shafer
3f686532c9 linuxkpi: Fix __sg_alloc_table_from_pages loop
Commit 3e0856b63f updated
__sg_alloc_table_from_pages to use the same API as linux, but modified
the loop condition when going over the pages in a sg list. Part of the
change included moving the sg_next call out of the for loop and into the
body, which causes an off by one error when traversing the list. Since
sg_next is called before the loop body it will skip the first element
and read one past the last element.

This caused panics when running PRIME with nvidia-drm as the off-by-one
issue causes a NULL dereference.

Reviewed by:	bz, hselasky
Differential Revision:	https://reviews.freebsd.org/D39628
Fixes:	3e0856b63f ("linuxkpi: Fix `sg_alloc_table_from_pages()` to have the same API as Linux")
2023-04-21 09:56:50 +02:00
Warner Losh
9abba78acc syscalls: regenerate
The 4.2 sigreturn was a bit of a enima so the 4.2 was remove. Regenerate
to cope the very minor changes in comments and one string.

Sponsored by:		Netflix
2023-04-20 23:39:23 -06:00
Warner Losh
602b575a88 syscall.master: Remove stray 4.2
Back in 4.3BSD, the system call table wasn't generated, and there was an
entry:
        "4.2 sigreturn",        /* 139 = old 4.2 sigreturn */
This got converted to
139     OBSOL   0 4.2 sigreturn
in 4.3 RENO. Since it was obsolete, nothing bad happened. In fact,
there was code in makeyscalls.sh to cope:
        {       comment = $4
                for (i = 5; i <= NF; i++)
                        comment = comment " " $i
                if (NF < 5)
                        $5 = $4
        }
so the generated comment in syscalls.c was almost correct:
        "obs_4.2",                      /* 139 = obsolete 4.2 sigreturn */
a bug that we have to this very day, despite makesyscalls.sh being
rewritten in lua.

However, this historical wart is the only place in our current
syscalls.master file where we have an extra field for the 'not
generated' class of system calls. Remove the historical wart so that the
re-write of makesyscalls.lua can be simpler (so, I hope, qemu's bsd-user
can large swathes of code automatically generated too). This should help
make things more understandable (changes to simplify makesyscalls.lue
aren't quite debugged, so have to wait for another day).

There's 3 different obsolete sigreturns (but only 1 that was ever in
FreeBSD 2.x and newer).

Sponsored by:		Netflix
2023-04-20 23:39:23 -06:00
Warner Losh
8fc68c1b34 Remove stray line
Forgot to remove this in 559b94a122.

Fixes:		559b94a122
Sponsored by:	Netflix
2023-04-20 23:39:23 -06:00
Navdeep Parhar
ca5391bd85 cxgbe(4): Update firmwares to version 1.27.3.0
These are the changes since the last update (copy-pasted from the
release notes for Chelsio Unified Wire v3.18.0.0):

====================
Version : 1.27.3.0
Date    : 04/07/2023

Fixes
-----
BASE:
- Fixed a hang if module eeprom reads gives invalid data.
- KR backlplane no-fec link problem fixed.
OFLD:
- iscsi ddp errors fixed.
- iwarp connection abort in rare cases causing NIC traffic hang fixed.

ENHANCEMENTS
------------
BASE:
- Cisco GLC-TE 1G modules support added.

====================
Version : 1.27.1.0
Date    : 12/02/2022

Fixes
-----
BASE:
- memwrite dsgl cannot be used for T5.
OFLD:
- Enabled FCoE in SO adapters.
- TOE-TLS crash fixed.
- iscsi hang fixed.

MFC after:	2 weeks
Sponsored by:	Chelsio Communications
2023-04-20 20:57:38 -07:00
Warner Losh
559b94a122 syscall.master: Fix comments
Have more accruate comments. While #if, #else, etc are copied to the
header files, lines that don't start with # are not.  And #include files
are only output to sysinc (which winds up at the front of init_sysent.c
which seems a bit odd). This is all radically undocumented, and likely
has drifted somewhat from 4.4BSD and what other systems do (they've
drifted too, fwiw).

Sponsored by:		Netflix
2023-04-20 16:18:02 -06:00
Warner Losh
c1e987e062 makesyscalls.lua: Minor fluff removal
luacheck pointed out two minor issues: line isn't declared as a global,
so declare it local. Also remove an unused parameter.

Suggested by:		kevans
Sponsored by:		Netflix
2023-04-20 16:17:58 -06:00
Warner Losh
8341a74afe makesyscalls.lua: Use "sysxxx" consistently
Find the few places where we use 'sysxxx' and use "sysxxx" instead to be
more consistent.

Sponsored by:		Netflix
2023-04-20 16:17:25 -06:00
Warner Losh
1dd350fce0 makesyscalls.lua: Make more luaish
x["y"] can be written as x.y, which looks better and is a more typical
lua idiom.

Sponsored by:		Netflix
Reviewed by:		kevans
Differential Revision:	https://reviews.freebsd.org/D39709
2023-04-20 16:17:25 -06:00
Navdeep Parhar
2791335104 cxgbe(4): Dump the firmware log before falling back to a minimal config.
It might have errors that explain why the attempted configuration
failed.

MFC after:	1 week
Sponsored by:	Chelsio Communications
2023-04-20 12:56:24 -07:00
Gleb Smirnoff
8e813d07c6 netstat: fix printing of TCP pcbs with -A
This change touches both kernel and netstat(1), but either of the changes
will fix printing pcb addresses with -A.

The thing is that historically netstat(1) treated TCP differently, and
printed tcpcb address instead of inpcb address.  This is not documented
anywhere!  With e68b379244 these two addresses became the same.  It is
highly likely they will be the same for a long time, but it might be they
will start to differ again in a far future.  My proposal is to stop
treating TCP differently with netstat(1) and right now is a good opportunity
to do that, since there will be no behavior change at all.  The kernel
change to tcp_inptoxtp() will go into stable/14 to make it compatible with
netstat(1) binary from stable/13.  We can drop it later, probably together
with in_ppcb pointer from inpcb.  The in_ppcb in xinpcb will stay for size
compatibility.

Reviewed by:		tuexen, rrs
Differential Revision:	https://reviews.freebsd.org/D39736
2023-04-20 12:42:42 -07:00
Dimitry Andric
4214005276 kern.mk: clang >= 16 already infers ELFv2 for powerpc64
There is no need to pass -mabi=elfv2 explicitly anymore, and with clang
16 in fact results in a "unused argument" warning.

MFC after:	3 days
2023-04-20 21:27:27 +02:00
Bjoern A. Zeeb
72ef722b2a dpaa2: add console support for FDT based systems
Add DPAA2 console support for MC and AIOP (latter untested) for FDT
systems.  ACPI systems are prepared but need some proper bus function
in order to get the address from MC (and likely a file splitup then).
This will come at a later stage once other ACPI/FDT bus parts are
cleared up.
The work was originally done in July 2022 and finally switched to
bus_space[1] lately to be ready for main.

Suggested by:	andrew [1]
Reviewed by:	dsl
MFC after:	2 weeks
Differential Revision: https://reviews.freebsd.org/D38592
2023-04-20 18:59:03 +00:00
John Baldwin
bf04385521 arm: Use C89 function declaration for db_read_bytes. 2023-04-20 11:00:46 -07:00
John Baldwin
048606bec1 perfmon(4): Use a C89 function definition for a SYSINIT. 2023-04-20 11:00:46 -07:00
Christos Margiolis
1fef7abdc7 dtrace: add register bindings for RISC-V
Reviewed by:	mhorne, markj
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D39611
2023-04-20 13:35:57 -04:00
Christos Margiolis
75081b9ed8 dtrace: use dtrace_instr_size() in the riscv dtrace_subr.c
No functional change intended.

Reviewed by:	mhorne, markj
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D39652
2023-04-20 13:35:57 -04:00
Christos Margiolis
080e56a6c9 dtrace: expose dtrace_instr_size() to userland and implement it for riscv
dtrace_instr_size() is needed by the forthcoming RISC-V port of kinst,
as well as by libdtrace in D38825 for both amd64 and RISC-V.

Reviewed by:	markj, mhorne
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D39489
2023-04-20 13:35:57 -04:00
Christos Margiolis
1a149d65ba dtrace: get rid of uchar_t types
Callers are specifying uint8_t anyway and this slightly reduces
dependencies on compatibility typedefs.  No functional change intended.

Reviewed by:	markj, mhorne
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D39490
2023-04-20 13:35:56 -04:00
Dmitry Chagin
de4da6cd04 x86: Move i386 timerreg.h to x86
Reviewed by:		emaste, jhb
Differential Revision:	https://reviews.freebsd.org/D39656
MFC after:		1 month
2023-04-20 19:42:59 +03:00
Dmitry Chagin
d1f4c44aa8 x86: Move i386 ppireg.h to x86
Differential Revision:	https://reviews.freebsd.org/D39655
MFC after:		1 month
2023-04-20 19:42:59 +03:00
Mark Johnston
5fd1a67e88 inpcb: Release the inpcb cred reference before freeing the structure
Now that the inp_cred pointer is accessed only while the inpcb lock is
held, we can avoid deferring a crfree() call when freeing an inpcb.

This fixes a problem introduced when inpcb hash tables started being
synchronized with SMR: the credential reference previously could not be
released until all lockless readers have drained, and there is no
mechanism to explicitly purge cached, freed UMA items.  Thus, ucred
references could linger indefinitely, and since ucreds hold a jail
reference, the jail would linger indefinitely as well.  This manifests
as jails getting stuck in the DYING state.

Discussed with:	glebius
Tested by:	glebius
Sponsored by:	Klara, Inc.
Sponsored by:	Modirum MDPay
Differential Revision:	https://reviews.freebsd.org/D38573
2023-04-20 12:13:06 -04:00
Mark Johnston
7b92493ab1 inpcb: Avoid inp_cred dereferences in SMR-protected lookup
The SMR-protected inpcb lookup algorithm currently has to check whether
a matching inpcb belongs to a jail, in order to prioritize jailed
bound sockets.  To do this it has to maintain a ucred reference, and for
this to be safe, the reference can't be released until the UMA
destructor is called, and this will not happen within any bounded time
period.

Changing SMR to periodically recycle garbage is not trivial.  Instead,
let's implement SMR-synchronized lookup without needing to dereference
inp_cred.  This will allow the inpcb code to free the inp_cred reference
immediately when a PCB is freed, ensuring that ucred (and thus jail)
references are released promptly.

Commit 220d892129 ("inpcb: immediately return matching pcb on lookup")
gets us part of the way there.  This patch goes further to handle
lookups of unconnected sockets.  Here, the strategy is to maintain a
well-defined order of items within a hash chain so that a wild lookup
can simply return the first match and preserve existing semantics.  This
makes insertion of listening sockets more complicated in order to make
lookup simpler, which seems like the right tradeoff anyway given that
bind() is already a fairly expensive operation and lookups are more
common.

In particular, when inserting an unconnected socket, in_pcbinhash() now
keeps the following ordering:
- jailed sockets before non-jailed sockets,
- specified local addresses before unspecified local addresses.

Most of the change adds a separate SMR-based lookup path for inpcb hash
lookups.  When a match is found, we try to lock the inpcb and
re-validate its connection info.  In the common case, this works well
and we can simply return the inpcb.  If this fails, typically because
something is concurrently modifying the inpcb, we go to the slow path,
which performs a serialized lookup.

Note, I did not touch lbgroup lookup, since there the credential
reference is formally synchronized by net_epoch, not SMR.  In
particular, lbgroups are rarely allocated or freed.

I think it is possible to simplify in_pcblookup_hash_wild_locked() now,
but I didn't do it in this patch.

Discussed with:	glebius
Tested by:	glebius
Sponsored by:	Klara, Inc.
Sponsored by:	Modirum MDPay
Differential Revision:	https://reviews.freebsd.org/D38572
2023-04-20 12:13:06 -04:00
Mark Johnston
3e98dcb3d5 inpcb: Move inpcb matching logic into separate functions
These functions will get some additional callers in future revisions.

No functional change intended.

Discussed with:	glebius
Tested by:	glebius
Sponsored by:	Modirum MDPay
Sponsored by:	Klara, Inc.
Differential Revision:	https://reviews.freebsd.org/D38571
2023-04-20 12:13:06 -04:00
Mark Johnston
fdb987bebd inpcb: Split PCB hash tables
Currently we use a single hash table per PCB database for connected and
bound PCBs.  Since we started using net_epoch to synchronize hash table
lookups, there's been a bug, noted in a comment above in_pcbrehash():
connecting a socket can cause an inpcb to move between hash chains, and
this can cause a concurrent lookup to follow the wrong linkage pointers.
I believe this could cause rare, spurious ECONNREFUSED errors in the
worse case.

Address the problem by introducing a second hash table and adding more
linkage pointers to struct inpcb.  Now the database has one table each
for connected and unconnected sockets.

When inserting an inpcb into the hash table, in_pcbinhash() now looks at
the foreign address of the inpcb to figure out which table to use.  This
ensures that queue linkage pointers are stable until the socket is
disconnected, so the problem described above goes away.  There is also a
small benefit in that in_pcblookup_*() can now search just one of the
two possible hash buckets.

I also made the "rehash" parameter of in(6)_pcbconnect() unused.  This
parameter seems confusing and it is simpler to let the inpcb code figure
out what to do using the existing INP_INHASHLIST flag.

UDP sockets pose a special problem since they can be connected and
disconnected multiple times during their lifecycle.  To handle this, the
patch plugs a hole in the inpcb structure and uses it to store an SMR
sequence number.  When an inpcb is disconnected - an operation which
requires the global PCB database hash lock - the write sequence number
is advanced, and in order to reconnect, the connecting thread must wait
for readers to drain before reusing the inpcb's hash chain linkage
pointers.

raw_ip (ab)uses the hash table without using the corresponding
accessors.  Since there are now two hash tables, it arbitrarily uses the
"connected" table for all of its PCBs.  This will be addressed in some
way in the future.

inp interators which specify a hash bucket will only visit connected
PCBs.  This is not really correct, but nothing in the tree uses that
functionality except raw_ip, which as mentioned above places all of its
PCBs in the "connected" table and so is unaffected.

Discussed with:	glebius
Tested by:	glebius
Sponsored by:	Klara, Inc.
Sponsored by:	Modirum MDPay
Differential Revision:	https://reviews.freebsd.org/D38569
2023-04-20 12:13:06 -04:00
Bjoern A. Zeeb
35f7fa4ac1 LinuxKPI: 802.11: improve assertion and tkip code
Move a KASSERT out of a function and make it a CTASSERT with
appropriate comments.

Skeleton implement two tkip functions, still left TODO, initializing
variables with dummy values to quiten compiler warnings.  It is
unclear to me if we should still ever properly implement TKIP
compat code at this point.  If so the current code gives a good
idea what needs to be done in addition to allocating references
to real state along with keyconf.

Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2023-04-20 16:07:50 +00:00
Bjoern A. Zeeb
7db7bfe1a7 iwlwifi: quieten more compiler warnings
Quieten some more (valid) gcc warnings and disable dead code.
There are more warnings, some probably a compiler problem, the
other related to firmware structs which I do not want to adjust
just locally.  Leave a comment to revisit after a next driver
update.

Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2023-04-20 16:07:05 +00:00
Bjoern A. Zeeb
74e908b3c6 LinuxKPI: fix READ_ONCE() -Wcast-equal warnings
Rather than using ACCESS_ONCE() in READ_ONCE() add a missing cast
to const in order to satisfy -Wcast-equal by gcc.
Sadly we cannot do the same to WRITE_ONCE() which still is very
noisy.

Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
Reviewed by:	hselasky
Differential Revision: https://reviews.freebsd.org/D39706
2023-04-20 11:51:22 +00:00
Alexander V. Chernikov
56d4550c4d ifnet: factor out interface renaming into a separate function.
This change is required to support interface renaming via Netlink.
No functional changes intended.

Reviewed by:	zlei
Differential Revision: https://reviews.freebsd.org/D39692
MFC after:	2 weeks
2023-04-20 10:23:37 +00:00
Mateusz Guzik
9c4e270822 zfs: fix up EINVAL from getdirentries on .zfs
PR:	270909
2023-04-20 08:38:28 +00:00
Mateusz Guzik
7ff3143809 zfs: add missing vn state transition for .zfs
Reported by:	des
2023-04-20 08:09:59 +00:00
Bjoern A. Zeeb
f369f10dd8 LinuxKPI: 802.11: fix a -Wenum-compare warning
We are asserting that two values from different enums are the same.
gcc warns about these.  Cast the values to (int) to avoid the warning.

Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2023-04-19 21:49:17 +00:00
Bjoern A. Zeeb
b2dcb84868 LinuxKPI: skbuff.h: fix -Warray-bounds warnings
Harmonize sk_buff_head and sk_buff further and fix -Warray-bounds
warnings reports by gcc.  At the same time simplify some code by
re-using other functions or factoring some code out.

Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2023-04-19 21:49:00 +00:00
Dimitry Andric
87f55ab0b4 ichiic: use bool for one-bit wide bit-fields
A one-bit wide bit-field can take only the values 0 and -1. Clang 16
introduced a warning that "implicit truncation from 'int' to a one-bit
wide bit-field changes value from 1 to -1". Fix by using c99 bool.

Reported by:	Clang
Reviewed by:	emaste, wulf
MFC after:	3 days
Differential Revision: https://reviews.freebsd.org/D39665
2023-04-19 22:25:50 +02:00
Igor Ostapenko
0e0c47ecd6 vfs cache: fix vfs.cache.stats.* name typos
Two vfs.cache.stats names are fixed:
- s/.dotdothis/.dotdothits/
- s/.posszaps/.poszaps/

Signed-off-by: Igor Ostapenko <pm@igoro.pro>
[mjg: massaged the header a little bit]
2023-04-19 18:47:38 +00:00
Randall Stewart
4e8a20a764 tcp: rack the request level logging is a bit too noisy when doing point logging.
When doing request level BB logging the hybrid_bw_log() does not have proper screening to minimize logging
when point level logging is in use. Lets fix it properly so you have to have the proper knobs set to get the
more noisy logging.

Reviewed by: tuexen
Sponsored by: Netflix Inc
Differential Revision:https://reviews.freebsd.org/D39699
2023-04-19 14:02:12 -04:00
Randall Stewart
7a842346c3 tcp: Rack can crash with the new non-TSO fix..
Turns out the location of the check to see if we can do output is in the wrong place. We need
to jump off to the compressed acks before handling that case since th is NULL in the
compressed ack case which is handled differently anyway.

Reviewed by: tuexen
Sponsored by: Netflix Inc
Differential Revision:https://reviews.freebsd.org/D39690
2023-04-19 13:17:04 -04:00
Randall Stewart
303246dcdf We have a TCP_LOG_CONNEND log that should come out at the very last log of every connection. This
holds some nice stats about why/how the connection ended. Though with the current code it does not
come out without accounting due to the placement of the ifdefs. Also we need to make sure the stacks
fini has ran before calling in from tcp_subr so we get all logs the stack may make at its ending.

Reviewed by: rscheff
Sponsored by: Netflix Inc
Differential Revision:https://reviews.freebsd.org/D39693
2023-04-19 12:54:25 -04:00
Navdeep Parhar
7adf138ba9 cxgbe/iw_cxgbe: debug routines to dump STAG (steering tag) entries.
t4_dump_stag to dump hw state for a known STAG.

t4_dump_all_stag to dump hw state for all valid STAGs.  This routine
walks the entire STAG region looking for valid entries and this can take
a while for some configurations.

MFC after:	1 week
Sponsored by:	Chelsio Communications
2023-04-19 09:38:14 -07:00
Justin Hibbits
12e99b63d2 ofed: Fix a logic inversion from IfAPI conversion
Reported by:	bartosz.sobczak_intel.com
Fixes:		3e142e0767 ("ofed: Mechanically convert to IfAPI")
Sponsored by:	Juniper Networks, Inc.
2023-04-19 11:56:25 -04:00
Dmitry Salychev
4cd9661428
dpaa2: Avoid dpaa2_cmd race conditions
struct dpaa2_cmd is no longer malloc'ed, but can be allocated on stack
and initialized with DPAA2_CMD_INIT() on demand. Drivers stopped caching
their DPAA2 command objects (and associated tokens) in the software
contexts in order to avoid using them concurrently.

Reviewed by:		bz
Approved by:		bz (mentor)
MFC after:		3 weeks
Differential Revision:	https://reviews.freebsd.org/D39509
2023-04-19 17:39:05 +02:00
Bjoern A. Zeeb
f621b087c0 iwlwifi: rtw88: rtw89: fix gcc warnings
Fix -Wno-format and unused variables warnings with gcc by adopting
(to|the) FreeBSD-specific code.

Reported by:	jhb
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
Reviewed by:	jhb
Differential Revision: https://reviews.freebsd.org/D39673
2023-04-19 12:21:40 +00:00
Randall Stewart
960985a209 tcp: bbr.c is non-capable of doing ECN and sets an INP flag to fend off ECN however our syncache is not aware of that flag.
We need to make the syncache aware of the flag and not do ECN if its set. Note that this
is not 100% full proof but the best we can do (i.e. its still possible that you can get in a
situation where the peer try's to do ecn).

Reviewed by: tuexen, glebius, rscheff
Sponsored by: Netflix Inc
Differential Revision:https://reviews.freebsd.org/D39672
2023-04-18 12:21:56 -04:00
Kristof Provost
2e6cdfe293 pf: change pf_rules_lock and pf_ioctl_lock to per-vnet locks
Both pf_rules_lock and pf_ioctl_lock only ever affect one vnet, so
there's no point in having these locks affect other vnets.
(In fact, the only lock in pf that can affect multiple vnets is
pf_end_lock.)

That's especially important for the rules lock, because taking the write
lock suspends all network traffic until it's released. This will reduce
the impact a vnet running pf can have on other vnets, and improve
concurrency on machines running multiple pf-enabled vnets.

Reviewed by:	zlei
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D39658
2023-04-19 09:50:52 +02:00
Konstantin Belousov
617a11eab6 x86: initialize use_xsave once
The explanation from https://reviews.freebsd.org/D39637 by stevek:
The "use_xsave" variable is a global and that is only supposed to be
initialized early before scheduling gets started. However, with the way
the ifuncs for "fpusave" and "fpurestore" are implemented, the value
could be changed at runtime when scheduling is active if "use_xsave"
was set to 0 by the tunable. This leaves a window of opportunity where
"use_xsave" gets re-initialized to 1 and a context switch could occur
with a thread that was not set up to be able to use xsave functionality.
This can lead to an "privileged instruction fault".

The fix is to protect "use_xsave" from being initialized more than once.

Reported and reviewed by:	stevek
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D39660
2023-04-19 02:22:28 +03:00
Konstantin Belousov
93ca6ff295 umtx: allow to configure minimal timeout (in nanoseconds)
PR:	270785
Reviewed by:	markj, mav
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D39584
2023-04-19 02:22:28 +03:00
Konstantin Belousov
7aeea73e30 syncer vnode: add VOP_GETWRITEMOUNT() definition explicitly
Since syncer vnode vector does not provide a fallback to the default
one, its VOP_GETWRITEMOUNT() implementation implicitly returned
EOPNOTSUPP, which means that syncer ignored suspension.

Reported and tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2023-04-19 02:21:40 +03:00
Konstantin Belousov
d8a096621b sync_vnode(): add assert to check vn_start_write() correctness
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2023-04-19 02:21:40 +03:00
Steve Kiernan
8deb442cf7 mac: Honor order when registering MAC modules.
Ensure MAC modules are inserted in order that they are registered.

Reviewed by:	markj
Obtained from:	Juniper Networks, Inc.
Differential Revision: https://reviews.freebsd.org/D39589
2023-04-18 15:36:27 -04:00
Mateusz Guzik
5e954b9216 tmpfs: add missing vop_fplookup ops to tmpfs_fifoop_entries
Reported by:	gbe
PR:	270917
2023-04-18 18:06:30 +00:00
Marius Strobl
8defc88c13 gem(4): Remove onboard-only Sun ERI and remnants of SBus support
These bits are obsolete since 58aa35d429.
This change reverts part of 9ba2b298df as
well as effectively bd3d9826d7, i. e. the
SBus-related modifications. This also gets rid of a nasty hack required
as bus_{read,write}_N(9) doesn't really fit bus_space_subregion(9).
2023-04-18 19:17:24 +02:00
Marius Strobl
bd15d31cef mmc(4): Don't call bridge driver for timings not requiring tuning
The original idea behind calling into the bridge driver was to have the
logic deciding whether tuning is actually required for a particular bus
timing in a given slot as well as doing the sanity checking only on the
controller layer which also generally is better suited for these due to
say SDHCI_SDR50_NEEDS_TUNING. On another thought, not every such driver
should need to check whether tuning is required at all, though, and not
everything is SDHCI in the first place.
Adjust sdhci{,_fsl_fdt}(4) accordingly, but keep sdhci_generic_tune() a
bit cautious still.
2023-04-18 19:17:24 +02:00
Randall Stewart
2ad584c555 tcp: Inconsistent use of hpts_calling flag
Gleb has noticed there were some inconsistency's in the way the inp_hpts_calls flag was being used. One
such inconsistency results in a bug when we can't allocate enough sendmap entries to entertain a call to
rack_output().. basically a timer won't get started like it should. Also in cleaning this up I find that the
"no_output" side of input needs to be adjusted to make sure we don't try to re-pace too quickly outside
the hpts assurance of 250useconds.

Another thing here is we end up with duplicate calls to tcp_output() which we should not. If packets go
from hpts for processing the input side of tcp will call the output side of tcp on the last packet if it is needed.
This means that when that occurs a second call to tcp_output would be made that is not needed and if pacing
is going on may be harmful.

Lets fix all this and explicitly state the contract that hpts is making with transports that care about the
flag.

Reviewed by: tuexen, glebius
Sponsored by: Netflix Inc
Differential Revision:https://reviews.freebsd.org/D39653
2023-04-17 17:10:26 -04:00
Steve Kiernan
fb5ff7384c arm64: Use FULLKERNEL instead of .ALLSRC in .bin target
Using .ALLSRC may get additional arguments that we may not want
and could cause the objcopy to fail.

Reviewed by:	emaste
Obtained from:	Juniper Networks, Inc.
Differential Revision: https://reviews.freebsd.org/D39639
2023-04-18 11:41:57 -04:00
Kristof Provost
af94d8cc17 pf: fix incorrect lock define
PF_TABLE_STATS_ASSERT() should be checking pf_table_stats_lock not
pf_rules_lock.

Fortunately the define is not yet used anywhere so this was harmless.
Fix it anyway, in case it does get used.

Sponsored by:	Rubicon Communications, LLC ("Netgate")
2023-04-18 15:51:05 +02:00
Hans Petter Selasky
1943c40cd6 mlx5en(4): Don't wait for receive queue to fill up with mbufs during open channels.
Failure to get mbufs may be transient.
Don't permanently fail to open the channels due to lack of mbufs.
This also makes modifying channel parameters faster.

MFC after:	1 week
Sponsored by:	NVIDIA Networking
2023-04-18 15:01:07 +02:00
Hans Petter Selasky
6bd4bb9bdb mlx5en(4): Explain why CQE zipping is off.
MFC after:	1 week
Sponsored by:	NVIDIA Networking
2023-04-18 15:01:07 +02:00
Hans Petter Selasky
80b4ef6d10 mlx5: Remove unused debugfs node pointers.
No functional change intended.

MFC after:	1 week
Sponsored by:	NVIDIA Networking
2023-04-18 15:01:07 +02:00
Hans Petter Selasky
aa7bbdabde mlx5: Implement diagostic counters as sysctl(8) nodes.
MFC after:	1 week
Sponsored by:	NVIDIA Networking
2023-04-18 15:01:07 +02:00
Hans Petter Selasky
95bf70a4bf mlx5: Don't give zero number of pages to the firmware.
Can happen when using virtual mlx5_core<N> functions, VFs.

MFC after:	1 week
Sponsored by:	NVIDIA Networking
2023-04-18 15:01:06 +02:00
Hans Petter Selasky
273bfac08f mlx5: Implement mlx5_core_modify_cq_by_mask().
Implement one CQ modify function supporting all firmware versions,
instead of having more variants of CQ modify.

MFC after:	1 week
Sponsored by:	NVIDIA Networking
2023-04-18 15:01:06 +02:00
Hans Petter Selasky
2f7e9a8a21 mlx5: Fix duplicate free of default flow rule in error case.
MFC after:	1 week
Sponsored by:	NVIDIA Networking
2023-04-18 15:01:06 +02:00
Hans Petter Selasky
b0b87d9151 mlx5: Make mlx5_del_flow_rule() NULL safe.
This change factors out repeated NULL checks.

No functional change intended.

MFC after:	1 week
Sponsored by:	NVIDIA Networking
2023-04-18 15:01:06 +02:00
Hans Petter Selasky
3bb3e4768f mlx5: Make MLX5_COMP_EQ_SIZE tunable.
When using hardware pacing, this value can be increased, because more SQ's
means more EQ events aswell. Make it tunable, hw.mlx5.comp_eq_size .

MFC after:	1 week
Sponsored by:	NVIDIA Networking
2023-04-18 15:01:06 +02:00
Randall Stewart
37229fed38 tcp: Blackbox logging and tcp accounting together can cause a crash.
If you currently turn BB logging on and in combination have TCP Accounting on we can get a
crash where we have no NULL check and we run out of memory. Also lets make sure we
don't do a divide by 0 in calculating any BB ratios.

Reviewed by: tuexen
Sponsored by: Netflix Inc
Differential Revision:https://reviews.freebsd.org/D39622
2023-04-17 13:52:00 -04:00
Alexander V. Chernikov
28abf63277 netlink: sync interface IFLA attributes
MFC after:	2 weeks
2023-04-18 12:34:05 +00:00
Gordon Bergling
105e397eb6 kern_sysctl: Remove double words in source code comments
- s/on on/on/

MFC after:	5 days
2023-04-18 07:14:57 +02:00
Gordon Bergling
93e4914816 net80211: Remove double words in source code comments
- s/we we/we/

MFC after:	5 days
2023-04-18 07:14:50 +02:00
Stephen J. Kiernan
76735c7439 flash: Add "n25q64" to mx25l driver
This is for 64Mb Micron N25Q serial NOR flash memory

Obtained from:	Juniper Networks, Inc.
2023-04-18 00:21:17 -04:00
Jason A. Harmening
0c01203e47 vfs_lookup(): re-check v_mountedhere on lock upgrade
The VV_CROSSLOCK handling logic may need to upgrade the covered
vnode lock depending upon the requirements of the filesystem into
which vfs_lookup() is walking.  This may involve transiently
dropping the lock, which can allow the target mount to be unmounted.

Tested by:	pho
Reviewed by:	markj
Differential Revision:	https://reviews.freebsd.org/D39272
2023-04-17 20:31:40 -05:00
Jason A. Harmening
93fe61afde unionfs_mkdir(): handle dvp reclamation
The underlying VOP_MKDIR() implementation may temporarily drop the
parent directory vnode's lock.  If the vnode is reclaimed during that
window, the unionfs vnode will effectively become unlocked because
the its v_vnlock field will be reset.  To uphold the locking
requirements of VOP_MKDIR() and to avoid triggering various VFS
assertions, explicitly re-lock the unionfs vnode before returning
in this case.

Note that there are almost certainly other cases in which we'll
similarly need to handle vnode relocking by the underlying FS; this
is the only one that's caused problems in stress testing so far.
A more general solution, such as that employed for nullfs in
null_bypass(), will likely need to be implemented.

Tested by:	pho
Reviewed by:	kib, markj
Differential Revision: https://reviews.freebsd.org/D39272
2023-04-17 20:31:40 -05:00
Jason A. Harmening
d711884e60 Remove unionfs_islocked()
The implementation is racy; if the unionfs vnode is not in fact
locked, vnode private data may be concurrently altered or freed.
Instead, simply rely upon the standard implementation to query the
v_vnlock field, which is type-stable and will reflect the correct
lower/upper vnode configuration for the unionfs node.

Tested by:	pho
Reviewed by:	kib, markj
Differential Revision:	https://reviews.freebsd.org/D39272
2023-04-17 20:31:40 -05:00
Jason A. Harmening
a5d82b55fe Remove an impossible condition from unionfs_lock()
We hold the vnode interlock, so vnode private data cannot suddenly
become NULL.

Tested by:	pho
Reviewed by:	kib, markj
Differential Revision:	https://reviews.freebsd.org/D39272
2023-04-17 20:31:40 -05:00
Jason A. Harmening
a18c403fbd unionfs: remove LK_UPGRADE if falling back to the standard lock
The LK_UPGRADE operation may have temporarily dropped the upper or
lower vnode's lock.  If the unionfs vnode was reclaimed during that
window, its lock field will be reset to no longer point at the
upper/lower vnode lock, so the lock operation will use the standard
lock stored in v_lock.  Remove LK_UPGRADE from the flags in this case
to avoid a lockmgr assertion, as this lock has not been previously
owned by the calling thread.

Reported by:	pho
Tested by:	pho
Reviewed by:	kib, markj
Differential Revision:	https://reviews.freebsd.org/D39272
2023-04-17 20:31:40 -05:00
Ed Maste
00172f3416 geom: use bool for one-bit wide bit-field
A one-bit wide bit-field can take only the values 0 and -1.  Clang 16
introduced a warning that "implicit truncation from 'int' to a one-bit
wide bit-field changes value from 1 to -1".  Fix by using c99 bool.

Reported by:	Clang, via dim
Reviewed by:	dim
Sponsored by:	The FreeBSD Foundation
2023-04-17 15:43:00 -04:00
Gleb Smirnoff
3232b1f4a9 tcp: fix build
The recent 25685b7537 came in conflict with a540cdca31.  Remove the
code that cleans up the old style input queue.  Note that two lines
below we assert that the new style input queue is empty.  The TCP
stacks that use the queue are supposed to flush it in their
tfb_tcp_fb_fini method.
2023-04-17 10:24:20 -07:00
Gleb Smirnoff
a6b55ee6be net: replace IFF_KNOWSEPOCH with IFF_NEEDSEPOCH
Expect that drivers call into the network stack with the net epoch
entered. This has already been the fact since early 2020. The net
interrupts, that are marked with INTR_TYPE_NET, were entering epoch
since 511d1afb6b. For the taskqueues there is NET_TASK_INIT() and
all drivers that were known back in 2020 we marked with it in
6c3e93cb5a. However in e87c494015 we took conservative approach
and preferred to opt-in rather than opt-out for the epoch.

This change not only reverts e87c494015 but adds a safety belt to
avoid panicing with INVARIANTS if there is a missed driver. With
INVARIANTS we will run in_epoch() check, print a warning and enter
the net epoch.  A driver that prints can be quickly fixed with the
IFF_NEEDSEPOCH flag, but better be augmented to properly enter the
epoch itself.

Note on TCP LRO: it is a backdoor to enter the TCP stack bypassing
some layers of net stack, ignoring either old IFF_KNOWSEPOCH or the
new IFF_NEEDSEPOCH.  But the tcp_lro_flush_all() asserts the presence
of network epoch.  Indeed, all NIC drivers that support LRO already
provide the epoch, either with help of INTR_TYPE_NET or just running
NET_EPOCH_ENTER() in their code.

Reviewed by:		zlei, gallatin, erj
Differential Revision:	https://reviews.freebsd.org/D39510
2023-04-17 09:08:35 -07:00
Gleb Smirnoff
a540cdca31 tcp_hpts: use queue(9) STAILQ for the input queue
Reviewed by:		rrs
Differential Revision:	https://reviews.freebsd.org/D39574
2023-04-17 09:07:23 -07:00
Steve Kiernan
48ffacbc84 veriexec: Add function to get label associated with a file
Add mac_veriexec_metadata_get_file_label to avoid the need to
expose internals to other MAC modules.

Obtained from:	Juniper Networks, Inc.
2023-04-17 11:47:33 -04:00
Steve Kiernan
bd4742c970 veriexec: Rename old VERIEXEC_SIGNED_LOAD as VERIEXEC_SIGNED_LOAD32
We need to handle old ioctl from old binary.

Add some missing ioctls.

Obtained from:	Juniper Networks, Inc.
2023-04-17 11:47:32 -04:00
Steve Kiernan
d195f39d1d veriexec: Add option MAC_VERIEXEC_DEBUG
Obtained from:	Juniper Networks, Inc.
2023-04-17 11:47:32 -04:00
Simon J. Gerraty
8c3e263dc1 veriexec: mac_veriexec_syscall compat32 support
Some 32bit apps may need to be able to use
MAC_VERIEXEC_GET_PARAMS_PID_SYSCALL
MAC_VERIEXEC_GET_PARAMS_PATH_SYSCALL

Therefore compat32 support is required.

Obtained from:	Juniper Networks, Inc.
2023-04-17 11:47:32 -04:00
Steve Kiernan
8512d82ea0 veriexec: Additional functionality for MAC/veriexec
Ensure veriexec opens the file before doing any read operations.

When the MAC_VERIEXEC_CHECK_PATH_SYSCALL syscall is requested, veriexec
needs to open the file before calling mac_veriexec_check_vp. This is to
ensure any set up is done by the file system. Most file systems do not
explicitly need an open, but some (e.g. virtfs) require initialization
of access tokens (file identifiers, etc.) before doing any read or write
operations.

The evaluate_fingerprint() function needs to ensure it has an open file
for reading in order to evaluate the fingerprint. The ideal solution is
to have a hook after the VOP_OPEN call in vn_open. For now, we open the
file for reading, envaluate the fingerprint, and close the file. While
this leaves a potential hole that could possibly be taken advantage of
by a dedicated aversary, this code path is not typically visited often
in our use cases, as we primarily encounter verified mounts and not
individual files. This should be considered a temporary workaround until
discussions about the post-open hook have concluded and the hook becomes
available.

Add MAC_VERIEXEC_GET_PARAMS_PATH_SYSCALL and
MAC_VERIEXEC_GET_PARAMS_PID_SYSCALL to mac_veriexec_syscall so we can
fetch and check label contents in an unconstrained manner.

Add a check for PRIV_VERIEXEC_CONTROL to do ioctl on /dev/veriexec

Make it clear that trusted process cannot be debugged. Attempts to debug
a trusted process already fail, but the failure path is very obscure.
Add an explicit check for VERIEXEC_TRUSTED in
mac_veriexec_proc_check_debug.

We need mac_veriexec_priv_check to not block PRIV_KMEM_WRITE if
mac_priv_gant() says it is ok.

Reviewed by:	sjg
Obtained from:	Juniper Networks, Inc.
2023-04-17 11:47:32 -04:00
Mark Johnston
d95fbf4e1a riscv: save the thread pointer in both modes
The contents of frame->tf_tp are uninitialized if accessed by DTrace (in
probe context), resulting in a panic when trying to access the memory
pointed to by tp. This saves the thread pointer to the trap frame when
handling both userland and kernel exceptions.

Reviewed by:	markj, mhorne
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D39582
2023-04-17 09:49:52 -04:00
Alexander V. Chernikov
f656a96020 tests: make ktest build on ppc.
MFC after:	2 weeks
2023-04-17 13:47:07 +00:00
Alexander V. Chernikov
9742519b22 netlink: fix operations with link-local routes/gateways.
MFC after:	3 days
2023-04-17 12:04:43 +00:00
Alexander V. Chernikov
b8da3b62a5 tests: add ktest modules to build
MFC after:	2 weeks
2023-04-17 10:46:05 +00:00
Pawel Jakub Dawidek
068913e4ba zfs: Add vfs.zfs.bclone_enabled sysctl.
Keep block cloning disabled by default for now, but allow to enable and
use it after setting vfs.zfs.bclone_enabled to 1, so people can easily
try it.

Approved by:	oshogbo
Reviewed by:	mm, oshogbo
Differential Revision:	https://reviews.freebsd.org/D39613
2023-04-17 03:38:30 -07:00
Zhenlei Huang
401f03445e lagg(4): Correctly define some sysctl variables
939a050ad9 virtualized lagg(4), but the corresponding sysctl of some
virtualized global variables are not marked with CTLFLAG_VNET. A try to
operate on those variables via sysctl will effectively go to the 'master'
copies and the virtualized ones are not read or set accordingly. As a
side effect, on updating the 'master' copy, the virtualized global
variables of newly created vnets will have correct values.

PR:		270705
Reviewed by:	kp
Fixes:		939a050ad9 Virtualize lagg(4) cloner
MFC after:	3 days
Differential Revision:	https://reviews.freebsd.org/D39467
2023-04-17 18:24:35 +08:00
Zhenlei Huang
a7acce3491 vnet: Fix a typo in a source code comment
- s/form/from/

MFC after:	3 days
2023-04-17 18:24:35 +08:00
Pawel Jakub Dawidek
1959e122d9 zfs: Merge https://github.com/openzfs/zfs/pull/14739
The zfs_log_clone_range() function is never called from the
zfs_clone_range_replay() function, so I assumed it is safe to assert
that zil_replaying() is never TRUE here. It turns out zil_replaying()
also returns TRUE when the sync property is set to disabled.

Fix the problem by just returning if zil_replaying() returns TRUE.

Reported by: Florian Smeets
Signed-off-by: Pawel Jakub Dawidek pawel@dawidek.net

Approved by: oshogbo, mm
2023-04-17 02:22:56 -07:00
Pawel Jakub Dawidek
e0bb199925 zfs: cherry-pick openzfs/zfs@c71fe7164
Fix data corruption when cloning embedded blocks

Don't overwrite blk_phys_birth, as for embedded blocks it is part of
the payload.

Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Pawel Jakub Dawidek <pawel@dawidek.net>
Issue #13392
Closes #14739

Approved by: oshogbo, mm
2023-04-17 02:19:49 -07:00
Stephen J. Kiernan
88a3358ea4 veriexec: Add SPDX-License-Identifier 2023-04-16 21:23:00 -04:00
Stephen J. Kiernan
894bcc876d sys/modules/Makefile: conditionally add MAC/veriexec modules
Only build MAC/veriexec modules when MK_VERIEXEC is yes or we
are building all modules.

Add VERIEXEC knob to kernel __DEFAULT_NO_OPTIONS

Reviewed by:	sjg
Obtained from:	Juniper Networks, Inc.
2023-04-16 20:24:54 -04:00
Stephen J. Kiernan
8050e0a429 sys/modules/Makefile: add MAC/veriexec modules into the build
Build the MAC/veriexec module and the SHA2, SHA256, SHA384, and
SHA512 fingerprint modules.

Obtained from:	Juniper Networks, Inc.
2023-04-16 19:18:55 -04:00
Simon J. Gerraty
6ae8d57652 mac_veriexec: add mac_priv_grant check for NODEV
Allow other MAC modules to override some veriexec checks.

We need two new privileges:
PRIV_VERIEXEC_DIRECT	process wants to override 'indirect' flag
			on interpreter
PRIV_VERIEXEC_NOVERIFY	typically associated with PRIV_VERIEXEC_DIRECT
			allow override of O_VERIFY

We also need to check for PRIV_VERIEXEC_NOVERIFY override
for FINGERPRINT_NODEV and FINGERPRINT_NOENTRY.
This will only happen if parent had PRIV_VERIEXEC_DIRECT override.

This allows for MAC modules to selectively allow some applications to
run without verification.

Needless to say, this is extremely dangerous and should only be used
sparingly and carefully.

Obtained from:	Juniper Networks, Inc.

Reviewers: sjg
Subscribers: imp, dab

Differential Revision: https://reviews.freebsd.org/D39537
2023-04-16 19:14:40 -04:00
Stephen J. Kiernan
4819e5aeda Add new privilege PRIV_KDB_SET_BACKEND
Summary:
Check for PRIV_KDB_SET_BACKEND before allowing a thread to change
the KDB backend.

Obtained from:	Juniper Networks, Inc.
Reviewers: sjg, emaste
Subscribers: imp

Differential Revision: https://reviews.freebsd.org/D39538
2023-04-16 14:37:58 -04:00
Val Packett
77f0e198d9 procctl: add state flags to PROC_REAP_GETPIDS reports
For a process supervisor using the reaper API to track process subtrees,
it is very useful to know the state of the processes on the list.

Sponsored by:   https://www.patreon.com/valpackett
Reviewed by:    kib
MFC after:	1 week
Differential Revision: https://reviews.freebsd.org/D39585
2023-04-16 13:48:20 +03:00
Stephen J. Kiernan
b1a00c2b13 Quiet compiler warnings for fget_noref and fdget_noref
Summary:
Typecasting both parts of the comparison to u_int quiets compiler
warnings about signed/unsigned comparison and takes care of positive
and negative numbers for the file descriptor in a single comparison.

Obtained from:	Juniper Netwowrks, Inc.

Reviewers: mjg

Subscribers: imp

Differential Revision: https://reviews.freebsd.org/D39593
2023-04-15 23:50:54 -04:00
Warner Losh
214909d669 Revert "cam: fix up world compilation after previous"
This reverts commit 1d35493e46. It was the wrong fix. 757fc6666b has
the proper fix to include stdbool for userland.

Sponsored by:		Netflix
2023-04-15 18:25:55 -06:00