Commit Graph

147132 Commits

Author SHA1 Message Date
John Baldwin
0b672df914 iwlwifi: Silence unused but set warnings from GCC for iwl-debug.c.
Reviewed by:	bz
Differential Revision:	https://reviews.freebsd.org/D39352
2023-04-10 10:31:07 -07:00
John Baldwin
677e70e0c4 ipmi: Remove some dead code for unsupported BMCs.
Reviewed by:	emaste
Reported by:	GCC
Differential Revision:	https://reviews.freebsd.org/D39351
2023-04-10 10:30:54 -07:00
Alan Somers
9309a460b2 Implement GEOM::rotation_rate for gmirror
If all of the mirror's children have the same rotation rate, report
that.  But if they have mixed rotation rates, or if any child has an
unknown rotation rate, report "Unknown".

MFC after:	2 weeks
Sponsored by:	Axcient
Reviewed by:	imp
Differential Revision: https://reviews.freebsd.org/D39458
2023-04-10 10:27:10 -06:00
Christos Margiolis
38594ff9c0 ofw: fix memory leak in ofwbus_attach()
PR:		269509
Reported by:	Jaroslaw Pelczar <jarek@jpelczar.com>
Reviewed by:	markj
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D38903
2023-04-10 12:14:12 -04:00
Christos Margiolis
0388a0887a dtrace: handle NOP instructions in the riscv invop handler
This will be used by a forthcoming port of the kinst provider.

Reviewed by:	markj
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D39481
2023-04-10 12:14:11 -04:00
Mark Johnston
d862b165a6 bridge: Add support for emulated netmap mode
if_bridge receives packets via a special interface, if_bridge_input,
rather than by if_input.  Thus, netmap's usual hooking of ifnet routines
does not work as expected.  Instead, modify bridge_input() to pass
packets directly to netmap when it is enabled.  This applies to both
locally delivered packets and forwarded packets.

When a netmap application transmits a packet by writing it to the host
TX ring, the mbuf chain is passed to if_input, which ordinarily points
to ether_input().  However, when transmitting via if_bridge,
bridge_input() needs to see the packet again in order to decide whether
to deliver or forward.  Thus, introduce a new protocol flag,
M_BRIDGE_INJECT, which 1) causes the packet to be passed to
bridge_input() again after Ethernet processing, and 2) avoids passing
the packet back to netmap.  The source MAC address of the packet is used
to determine the original "receiving" interface.

Reviewed by:	vmaffione
MFC after:	2 months
Sponsored by:	Zenarmor
Sponsored by:	OPNsense
Sponsored by:	Klara, Inc.
Differential Revision:	https://reviews.freebsd.org/D38066
2023-04-10 12:14:11 -04:00
Kristof Provost
c69ae84197 if_epair: also remove vlan metadata from mbufs
We already remove mbuf tags from packets transitting an if_epair, but we
didn't remove vlan metadata.
In certain configurations this could lead to unexpected vlan tags
turning up on the rx side.

PR:		270736
Reviewed by:	markj
MFC after:	1 week
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D39482
2023-04-10 15:55:35 +02:00
Konstantin Belousov
7b6fe2428a DEBUG_VFS_LOCKS: use witness if available
The assert_vop_locked messages are ignored, and file/line information
is not too useful. Fixing this without changing both witness and VFS
asserts KPIs is not possible.

Reviewed by:	markj (previous version)
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D39464
2023-04-10 00:34:12 +03:00
Alexander V. Chernikov
cc3793b1c5 netlink: improve source ifa selection algorithm when adding routes.
Use route destination sockaddr when the gateway is eiter AF_LINK or
 has the different family (IPv4 over IPv6). This change ensures
 the nexthop IFA has the same family as the destination.

Reported by:	Dmitriy Smirnov <fox@sage.su>
Tested by:	Dmitriy Smirnov <fox@sage.su>
MFC after:	3 days
2023-04-09 13:33:22 +00:00
Alexander V. Chernikov
0d4038e301 netlink: set prefix-related flags to the created nexthop.
This fixes incorrect flag combinations when adding IPv4/IPv6 host
routes.

MFC after:	3 days
2023-04-09 09:26:12 +00:00
Alexander V. Chernikov
75379ea2e4 netlink: do not print "unknown sa family" warnings at the default debug
level.

MFC after:	2 weeks
2023-04-08 19:40:32 +00:00
Alexander V. Chernikov
39c0036d88 netlink: fix !INET6 warning
Reported by:	Gary Jennejohn <garyj@gmx.de>
MFC after:	2 weeks
2023-04-08 19:39:37 +00:00
Mateusz Guzik
c17eb99a66 Bump __FreeBSD_version to 1400086 for vn_lock_pair arg change
Sponsored by:	Rubicon Communications, LLC ("Netgate")
2023-04-08 17:31:01 +00:00
Hans Petter Selasky
9b077d72bc usb(4): Separate the fast path and the slow path to avoid races and use-after-free for the USB FS interface.
Bad behaving user-space USB applicatoins may crash the kernel by issuing
USB FS related ioctl(2)'s out of their expected order. By default
the USB FS ioctl(2) interface is only available to the
administrator, root, and driver applications like webcamd(8) needs
to be hijacked in order for this to happen.

The issue is the fast-path code does not always see updates made
by the slow-path code, and may then work on freed memory.

This is easily fixed by using an EPOCH(9) type of synchronization
mechanism. A SX(9) lock will be used as a substitute for EPOCH(9),
due to the need for sleepability. In addition most calls going into
the fast-path originate from a single user-space process and the
need for multi-thread performance is not present.

Differential Revision:	https://reviews.freebsd.org/D39373
Reviewed by:	markj@
Reported by:	C Turt <ecturt@gmail.com>
admbugs:	994
MFC after:	1 week
Sponsored by:	NVIDIA Networking
2023-04-08 17:11:31 +02:00
Hans Petter Selasky
03a2e432d5 usb(4): Code refactoring as a pre-step for adding missing synchronization mechanism.
Move code in switch cases into own functions to make later changes easier to track.

No functional change, except for removing a superfluous break statement when
range checking USB_FS_MAX_FRAMES, in the USB_FS_OPEN case.
It should not have been there at all.

Suggested by:	emaste@
MFC after:	1 week
Sponsored by:	NVIDIA Networking
2023-04-08 16:52:20 +02:00
Konstantin Belousov
182b21d462 openzfs: adopt to the new vn_lock_pair() interface
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2023-04-08 02:37:56 +03:00
Konstantin Belousov
bb24eaea49 vn_lock_pair(): allow to request shared locking
If either of vnodes is shared locked, lock must not be recursed.

Requested by:	rmacklem
Reviewed by:	markj, rmacklem
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D39444
2023-04-08 01:58:26 +03:00
Mateusz Guzik
d6e2490134 zfs: disable kernel fpu usage on arm and aarc64
It is not implemented and causes panics on boot.

This is a temporary measure until someone(tm) sorts it out.

Reported by:	many
Sponsored by:	Rubicon Communications, LLC ("Netgate")
2023-04-07 21:44:49 +00:00
Mateusz Guzik
02e6e8d218 vfs: extend vn_printf with vop vector 2023-04-07 20:39:06 +00:00
Mateusz Guzik
26b9648750 vfs: more informative panic for missing fplookup ops 2023-04-07 20:39:06 +00:00
Mateusz Guzik
4032c38814 ufs: add missing vop_fplookup ops to fifo vectors
Reported-by: syzbot+a324b64ef9a933659c1c@syzkaller.appspotmail.com
2023-04-07 20:39:05 +00:00
Rick Macklem
4adb28c0ab nfscl: Fix support for doing Null RPCs
Although the NFS client does not currently perform Null RPCs,
this fix is needed if/when it might do so.
Found during testing of experimental code that uses Null RPCs
to maintain/monitor TCP connections for "nconnect" mounts.

MFC after:	3 months
2023-04-07 12:57:26 -07:00
Rick Macklem
ff2f1f691c nfsd: Add support for the SP4_MACH_CRED case in ExchangeID
Commit f4179ad46f added support for operation bitmaps for
NFSv4.1/4.2.  This commit uses those to implement the SP4_MACH_CRED
case for the NFSv4.1/4.2 ExchangeID operation since the Linux
NFSv4.1/4.2 client is now using this for Kerberized mounts.
The Linux Kerberized NFSv4.1/4.2 mounts currently work without
support for this because Linux will fall back to SP4_NONE,
but there is no guarantee this fallback will work forever.

This commit only affects Kerberized NFSv4.1/4.2 mounts from
Linux at this time.

MFC after:	3 months
2023-04-07 12:49:23 -07:00
Gleb Smirnoff
66fbc19fbd tcp: pass tcpcb in the tfb_tcp_ctloutput() method instead of inpcb
Just matches rest of the KPI.

Reviewed by:		rrs
Differential Revision:	https://reviews.freebsd.org/D39435
2023-04-07 12:18:10 -07:00
Gleb Smirnoff
35bc0bcc51 tcp: reduce argument list to functions that pass a segment
The socket argument is superfluous, as a tcpcb always has one and
only one socket.

Reviewed by:		rrs
Differential Revision:	https://reviews.freebsd.org/D39434
2023-04-07 12:18:06 -07:00
Gleb Smirnoff
de4368dd84 tcp: retire tfb_tcp_hpts_do_segment()
Isn't in use anymore.  Correct comments that mention it.

Reviewed by:		rrs
Differential Revision:	https://reviews.freebsd.org/D39433
2023-04-07 12:18:02 -07:00
Mateusz Guzik
20be1b4fc4 zfs: try to fallback early if can't do optimized copy
Not complete, but already shaves on some locking.

Sponsored by:	Rubicon Communications, LLC ("Netgate")
2023-04-07 15:47:45 +00:00
Mateusz Guzik
d012836fb6 zfs: fix up EXDEV handling for clone_range
API contract requires VOPs to handle EXDEV internally, worst case by
falling back to the generic copy routine. This broke with the recent
changes.

While here whack custom loop to lock 2 vnodes with vn_lock_pair, which
provides the same functionality internally. write start/finish around
it plays no role so got eliminated.

One difference is that vn_lock_pair always takes an exclusive lock on
both vnodes. I did not patch around it because current code takes an
exclusive lock on the target vnode. zfs supports shared-locking for
writes, so this serializes different calls to the routine as is, despite
range locking inside. At the same time you may notice the source vnode
can get some traffic if only shared-locked, thus once more this goes
the safer route of exclusive-locking. Note this should be patched to
use shared-locking for both once the feature is considered stable.

Technically the switch to vn_lock_pair should be a separate change, but
it would only introduce churn immediately whacked by the rest of the
patch.

[note: technically the review is still in progress, but so is the
active breakage]

Sponsored by:	Rubicon Communications, LLC ("Netgate")
2023-04-07 15:46:20 +00:00
Zhenlei Huang
2d3614fb13 bridge: Log MAC address port flapping
MAC flapping occurs when a bridge receives packets with the same source MAC
address on different member interfaces. The common reasons are:
 - user roams from one bridge port to another
 - user has wrong network setup, bridge loops e.g.
 - someone set duplicated ethernet address on his/her nic
 - some bad guy / virus / trojan send spoofed packets

if_bridge currently updates the bridge routing entry silently hence it is hard
to diagnose.

Emit logs when MAC address port flapping occurs to make it easier to diagnose.

Reviewed by:	kp
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D39375
2023-04-07 22:25:41 +08:00
Randall Stewart
945f9a7cc9 tcp: misc cleanup of options for rack as well as socket option logging.
Both BBR and Rack have the ability to log socket options, which is currently disabled. Rack
has an experimental SaD (Sack Attack Detection) algorithm that should be made available. Also
there is a t_maxpeak_rate that needs to be removed (its un-used).

Reviewed by: tuexen, cc
Sponsored by: Netflix Inc
Differential Revision: https://reviews.freebsd.org/D39427
2023-04-07 10:15:29 -04:00
Ganbold Tsagaankhuu
37d97b10ff Fix style. 2023-04-07 03:18:42 +00:00
Ganbold Tsagaankhuu
4720afaffe Improve RK3568 pcie phy handling codes a bit.
Move phy bifurcation code to a separate function
that can be called during the attach phase.
Also initialize both pcie lanes accordingly.
2023-04-07 02:54:13 +00:00
Andrew Turner
04b4655997 Mark EENTRY as .text
To allow it to be used before ENTRY we need to ensure the symbol is
in the .text section. It also needs to be aligned correctly.

While here mark the symbol type as a function as in the ENTRY macro.

Reported by:	jrtc27
Sponsored by:	Arm Ltd
2023-04-06 16:50:54 +01:00
Mateusz Guzik
f87a9f51ef vfs: validate that a mount point with FPLOOKUP has vop_fplookup ops 2023-04-06 15:20:41 +00:00
Mateusz Guzik
e237e2ba5f vfs: only allow doomed vnodes to return EOPNOTSUPP for fplookup vops
This helps asserting that they are provided by filesystems indicating
they do it.
2023-04-06 15:20:41 +00:00
Mateusz Guzik
5f6df17775 vfs: validate that vop vectors provide all or none fplookup vops
In order to prevent later susprises.
2023-04-06 15:20:41 +00:00
Mateusz Guzik
c7f6c2a50b deadfs: consistently return EOPNOTSUPP for fplookup vops 2023-04-06 15:20:41 +00:00
Mateusz Guzik
0baef43ed0 vfs: add missing vop_fplookup ops to syncer 2023-04-06 15:20:41 +00:00
Mateusz Guzik
8495fa49ea vfs: whack spurious comments from syncer's vop_vector 2023-04-06 15:20:40 +00:00
Mateusz Guzik
f3c81b9738 ufs: add missing vop_fplookup ops 2023-04-06 15:20:40 +00:00
Mateusz Guzik
e2d997d1cb zfs: add missing vop_fplookup_vexec assignments
This happens to be a nop right now.
2023-04-06 15:20:40 +00:00
Mark Johnston
5f6d37787f netmap: Handle packet batches in generic mode
ifnets are allowed to pass batches of multiple packets to if_input,
linked by the m_nextpkt pointer.  iflib_rxeof() sometimes does this, for
example.  Netmap's generic mode did not handle this and would only
deliver the first packet in the batch, leaking the rest.

PR:		270636
Reviewed by:	vmaffione
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D39426
2023-04-05 17:07:48 -04:00
Mark Johnston
ce12afaa6f netmap: Fix queue stalls with generic interfaces
In emulated mode, the FreeBSD netmap port attempts to perform zero-copy
transmission.  This works as follows: the kernel ring is populated with
mbuf headers to which netmap buffers are attached.  When transmitting,
the mbuf refcount is initialized to 2, and when the counter value has
been decremented to 1 netmap infers that the driver has freed the mbuf
and thus transmission is complete.

This scheme does not generalize to the situation where netmap is
attaching to a software interface which may transmit packets among
multiple "queues", as is the case with bridge or lagg interfaces.  In
that case, we would be relying on backing hardware drivers to free
transmitted mbufs promptly, but this isn't guaranteed; a driver may
reasonably defer freeing a small number of transmitted buffers
indefinitely.  If such a buffer ends up at the tail of a netmap transmit
ring, further transmits can end up blocked indefinitely.

Fix the problem by removing the zero-copy scheme (which is also not
implemented in the Linux port of netmap).  Instead, the kernel ring is
populated with regular mbuf clusters into which netmap buffers are
copied by nm_os_generic_xmit_frame().  The refcounting scheme is
preserved, and this lets us avoid allocating a fresh cluster per
transmitted packet in the common case.  If the transmit ring is full, a
callout is used to free the "stuck" mbuf, avoiding the queue deadlock
described above.

Furthermore, when recycling mbuf clusters, be sure to fully reinitialize
the mbuf header instead of simply re-setting M_PKTHDR.  Some software
interfaces, like if_vlan, may set fields in the header which should be
reset before the mbuf is reused.

Reviewed by:	vmaffione
MFC after:	1 month
Sponsored by:	Zenarmor
Sponsored by:	OPNsense
Sponsored by:	Klara, Inc.
Differential Revision:	https://reviews.freebsd.org/D38065
2023-04-05 12:12:30 -04:00
Zhenlei Huang
da4068c4e1 mlx5ib(4): Mark driver knows net epoch
This driver has already been EPOCH(9) aware since e48813009c.

Reviewed by:	hselasky
Tested by:	hselasky
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D39406
2023-04-06 00:08:23 +08:00
Zhenlei Huang
fc6c93b6a5 infiniband: Opt-in for net epoch
This is counterpart to e87c494015, which did the same for ethernet.

Suggested by:	hselasky
Reviewed by:	hselasky, kib
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D39405
2023-04-06 00:08:23 +08:00
Mark Johnston
03276e338a netisr: Remove the now-unused NETISR_EPAIR queue index
No functional change intended.

Fixes:		3dd5760aa5 ("if_epair: rework")
MFC after:	1 week
Sponsored by:	Klara, Inc.
2023-04-05 11:46:42 -04:00
Mark Johnston
82bbdde4eb bridge: Try to make the GRAB_OUR_PACKETS macro a bit more readable
- Let the compiler use constant folding to eliminate conditionals.
- Fix some inconsistent whitespace.

No functional change intended.

Reviewed by:	zlei
MFC after:	2 weeks
Sponsored by:	Klara, Inc.
Differential Revision:	https://reviews.freebsd.org/D38410
2023-04-05 10:37:00 -04:00
Mateusz Guzik
c67eb393fa tcp_hpts: plug a compiler warn
Sponsored by:	Rubicon Communications, LLC ("Netgate")
2023-04-05 14:32:13 +00:00
Martin Matuska
eb1feadc20 zfs: fix null ap->a_fsizetd NULL pointer derefernce
Submitted by:		rmacklem
Reported by:		cy
Tested by:		cy, mm
Reviewed by:		pjd, mm
Differential revision:	https://reviews.freebsd.org/D39418
2023-04-05 09:33:32 +02:00
Konstantin Belousov
54579376c0 Change kqueue1() to be compatible with NetBSD
by making it accept some open(2) flags.  More precisely, only
O_CLOEXEC is supported, the flag is translated into the KQUEUE_CLOEXEC flag
for kqueuex(2), and O_NONBLOCK is silently ignored.

Reported and tested by:	vishwin
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D39377
2023-04-05 06:29:49 +03:00
Gleb Smirnoff
84b42df834 rack: fix build on powerpc 2023-04-04 16:35:36 -07:00
Dmitry Chagin
71bc17803e linux(4): Implement close_range over native
Handling of the CLOSE_RANGE_UNSHARE flag is not implemented due to
difference in fd unsharing mechanism in the Linux and FreeBSD.

Reviewed by:		mjg
Differential revision:	https://reviews.freebsd.org/D39398
MFC after:		2 weeks
2023-04-04 23:24:04 +03:00
Dmitry Chagin
50111714f5 linux(4): Regen for close_range syscall
MFC after:		2 weeks
2023-04-04 23:23:37 +03:00
Dmitry Chagin
1c27dce1f8 linux(4): Modify close_range syscall to match Linux
MFC after:		2 weeks
2023-04-04 23:23:24 +03:00
Randall Stewart
030434acaf Update rack to the latest code used at NF.
There have been many changes to rack over the last couple of years, including:
     a) Ability when switching stacks to have one stack query another.
     b) Internal use of micro-second timers instead of ticks.
     c) Many changes to pacing in forms of
        1) Improvements to Dynamic Goodput Pacing (DGP)
        2) Improvements to fixed rate paciing
        3) A new feature called hybrid pacing where the requestor can
           get a combination of DGP and fixed rate pacing with deadlines
           for delivery that can dynamically speed things up.
     d) All kinds of bugs found during extensive testing and use of the
        rack stack for streaming video and in fact all data transferred
        by NF

Reviewed by: glebius, gallatin, tuexen
Sponsored By: Netflix Inc.
Differential Revision: https://reviews.freebsd.org/D39402
2023-04-04 16:05:46 -04:00
Gleb Smirnoff
2ff8187efd tcp_hpts: remove dead code tcp_drop_in_pkts()
Should have gone in f971e79139.
2023-04-04 12:55:27 -07:00
Eric van Gyzen
ecaeac805b mlxfw: fix potential NULL pointer dereference
Reported by:	Coverity (an internal run at Dell)
Reviewed by:	kib
MFC after:	1 week
Sponsored by:	Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D39348
2023-04-04 15:27:46 -05:00
Martin Matuska
91ef6f14f2 Revert "zfs: fall back if block_cloning feature is disabled"
This reverts commit 8ee579abe0.
2023-04-04 16:34:34 +02:00
Konstantin Belousov
11cdffc603 Regen 2023-04-04 16:19:08 +03:00
Konstantin Belousov
dac3102488 Rename kqueue1(2) to kqueuex(2) to avoid compat issues with NetBSD
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D39377
2023-04-04 16:19:08 +03:00
Randall Stewart
73ee5756de Fixes in the tcp infrastructure with respect to stack changes as well as other infrastructure updates for incoming rack features.
So stack switching as always been a bit of a issue. We currently use a break before make setup which means that
if something goes wrong you have to try to get back to a stack. This patch among a lot of other things changes that so
that it is a make before break. We also expand some of the function blocks in prep for new features in rack that will allow
more controlled pacing. We also add other abilities such as the pathway for a stack to query a previous stack to acquire from
it critical state information so things in flight don't get dropped or mis-handled when switching stacks. We also add the
concept of a timer granularity. This allows an alternate stack to change from the old ticks granularity to microseconds and
of course this even gives us a pathway to go to nanosecond timekeeping if we need to (something for the data center to consider
for sure).

Once all this lands I will then update rack to begin using all these new features.

Reviewed by: tuexen
Sponsored by: Netflix Inc
Differential Revision: https://reviews.freebsd.org/D39210
2023-04-01 01:46:38 -04:00
Emmanuel Vadot
63b113af57 linuxkpi: Add a few more dummy includes
Needed by drm-kmod.

Sponsored by:	Beckhoff Automation GmbH & Co. KG
2023-04-04 14:11:32 +02:00
Martin Matuska
8ee579abe0 zfs: fall back if block_cloning feature is disabled
If block_cloning is disabled, or other errors from zfs_clone_range()
return an EXDEV we should fall back to vn_generic_copy_file_range().

This fixes issues when copying files on the same dataset with
block_cloning disabled.

Upstreamed as pull request to OpenZFS.

Reviewed by:	Mateusz Guzik <mjguzik@gmail.com>
OpenZFS pull request:	14713
2023-04-04 13:43:34 +02:00
Emmanuel Vadot
44312c28fe linuxkpi: Add linux/agp_backend.h
It declares the structs needed by drm code for AGP.

Obtained from:	drm-kmod
Sponsored by:	Beckhoff Automation GmbH & Co. KG
2023-04-04 11:49:48 +02:00
Emmanuel Vadot
7c7419f60c linuxkpi: Add linux/stddef.h
It simply include sys/stdef.h

Obtained from:	drm-kmod
Sponsored by:	Beckhoff Automation GmbH & Co. KG
2023-04-04 11:39:27 +02:00
Emmanuel Vadot
0bf561351b linuxkpi: Include linux/types.h in linux/mod_devicetable.h
It's done like this in linux too.

Sponsored by:   Beckhoff Automation GmbH & Co. KG
2023-04-04 10:48:28 +02:00
Emmanuel Vadot
3f39ff2420 linuxkpi: Include linux/math64.h in linux/time.h
It's done like this in linux too.

Sponsored by:	Beckhoff Automation GmbH & Co. KG
2023-04-04 10:48:27 +02:00
Ganbold Tsagaankhuu
396b6914c9 Fix pcie phy enabling codes for RK3568 SoC.
Handle data-lanes property for pcie phy and set it accordingly.
This makes devices attached to pcie3 work properly.
For some RK3568 based boards, RTL8125B based device is
connected it. So with this, realtek-re-kmod driver attaches
and works.

Partially obtained from OpenBSD.
Tested on NanoPI-R5S, FireFly Station P2 boards.
2023-04-04 02:50:29 +00:00
Martin Matuska
2a58b312b6 zfs: merge openzfs/zfs@431083f75
Notable upstream pull request merges:
  #12194 Fix short-lived txg caused by autotrim
  #13368 ZFS_IOC_COUNT_FILLED does unnecessary txg_wait_synced()
  #13392 Implementation of block cloning for ZFS
  #13741 SHA2 reworking and API for iterating over multiple implementations
  #14282 Sync thread should avoid holding the spa config write lock
         when possible
  #14283 txg_sync should handle write errors in ZIL
  #14359 More adaptive ARC eviction
  #14469 Fix NULL pointer dereference in zio_ready()
  #14479 zfs redact fails when dnodesize=auto
  #14496 improve error message of zfs redact
  #14500 Skip memory allocation when compressing holes
  #14501 FreeBSD: don't verify recycled vnode for zfs control directory
  #14502 partially revert PR 14304 (eee9362a7)
  #14509 Fix per-jail zfs.mount_snapshot setting
  #14514 Fix data race between zil_commit() and zil_suspend()
  #14516 System-wide speculative prefetch limit
  #14517 Use rw_tryupgrade() in dmu_bonus_hold_by_dnode()
  #14519 Do not hold spa_config in ZIL while blocked on IO
  #14523 Move dmu_buf_rele() after dsl_dataset_sync_done()
  #14524 Ignore too large stack in case of dsl_deadlist_merge
  #14526 Use .section .rodata instead of .rodata on FreeBSD
  #14528 ICP: AES-GCM: Refactor gcm_clear_ctx()
  #14529 ICP: AES-GCM: Unify gcm_init_ctx() and gmac_init_ctx()
  #14532 Handle unexpected errors in zil_lwb_commit() without ASSERT()
  #14544 icp: Prevent compilers from optimizing away memset()
         in gcm_clear_ctx()
  #14546 Revert zfeature_active() to static
  #14556 Remove bad kmem_free() oversight from previous zfsdev_state_list
         patch
  #14563 Optimize the is_l2cacheable functions
  #14565 FreeBSD: zfs_znode_alloc: lock the vnode earlier
  #14566 FreeBSD: fix false assert in cache_vop_rmdir when replaying ZIL
  #14567 spl: Add cmn_err_once() to log a message only on the first call
  #14568 Fix incremental receive silently failing for recursive sends
  #14569 Restore ASMABI and other Unify work
  #14576 Fix detection of IBM Power8 machines (ISA 2.07)
  #14577 Better handling for future crypto parameters
  #14600 zcommon: Refactor FPU state handling in fletcher4
  #14603 Fix prefetching of indirect blocks while destroying
  #14633 Fixes in persistent error log
  #14639 FreeBSD: Remove extra arc_reduce_target_size() call
  #14641 Additional limits on hole reporting
  #14649 Drop lying to the compiler in the fletcher4 code
  #14652 panic loop when removing slog device
  #14653 Update vdev state for spare vdev
  #14655 Fix cloning into already dirty dbufs
  #14678 Revert "Do not hold spa_config in ZIL while blocked on IO"

Obtained from:	OpenZFS
OpenZFS commit:	431083f75b
2023-04-03 16:49:30 +02:00
Ganbold Tsagaankhuu
b98fbf3781 Fix driver name.
Submitted by:	Tyuryukanov S.Y.
2023-04-03 14:20:28 +00:00
Andrew Turner
41236539d8 Add non-posted device memory to the arm64 mem map
Add VM_MEMATTR_DEVICE_NP to the arm64 vm.pmap.kernel_maps sysctl.

Reviewed by:	markj
Sponsored by:	Arm Ltd
 Differential Revision:	https://reviews.freebsd.org/D39371
2023-04-03 12:59:11 +01:00
Dmitry Chagin
7ae0972c7b linsysfs(4): Reimplement listnics() using ifAPI
Handle if arrival/departure events and VNETs.

Differential Revision:	https://reviews.freebsd.org/D38901
MFC after:		1 month
XMFC with:		ifAPI, pseudofs
2023-04-03 11:22:16 +03:00
Bjoern A. Zeeb
cfccc7f30a LinuxKPI: 802.11: remove extra spaces
Remove two extra spaces.  No functional change.

Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2023-04-02 22:25:28 +00:00
Zhenlei Huang
5f3d0399e9 lagg(4): Tap traffic after protocol processing
Different lagg protocols have different means and policies to process incoming
traffic. For example, for failover protocol, by default received traffic is only
accepted when they are received through the active port. For lacp protocol, LACP
control messages are tapped off, also traffic will be dropped if they are
received through the port which is not in collecting state or is not joined to
the active aggregator. It confuses if user dump and see inbound traffic on
lagg(4) interfaces but they are actually silently dropped and not passed into
the net stack.

Tap traffic after protocol processing so that user will have consistent view of
the inbound traffic, meanwhile mbuf is set with correct receiving interface and
bpf(4) will diagnose the right direction of inbound packets.

PR:		270417
Reviewed by:	melifaro (previous version)
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D39225
2023-04-03 01:01:51 +08:00
Zhenlei Huang
90820ef121 infiniband: Widen NET_EPOCH coverage
From static code analysis, some device drivers (cxgbe, mlx4, mthca, and qlnx)
do not enter net epoch before lagg_input_infiniband(). If IPoIB interface is a
member of lagg(4) interface, and after returning from lagg_input_infiniband()
the receiving interface of mbuf is set to lagg(4) interface, then when
concurrently destroying the lagg(4) interface, there is a small window that the
interface gets destroyed and becomes invalid before infiniband_input() re-enter
net epoch, thus leading use-after-free.

Widen NET_EPOCH coverage to prevent use-after-free.

Thanks hselasky@ for testing with mlx5 devices.

Reviewed by:	hselasky
Tested by:	hselasky
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D39275
2023-04-03 00:51:49 +08:00
Alexander V. Chernikov
3091d980f5 netlink: add NETLINK to the DEFAULTS for each architecture
NETLINK is going to replace rtsock and a number of other ioctl/sysctl interfaces.
In-base utilies such as route(8), netstat(8) and soon ifconfig(8)
 are being converted to use netlink sockets as a transport between
 kernel and userland.
In the current configuration, it still possible have the kernel
 without NETLINK (`nooptions NETLINK`) and use the aforementioned
 utilies by buidling the world with `WITHOUT_NETLINK` src.conf knob.
However, this approach does not cover the cases when person unintentionally
 builds a custom kernel without netlink and tries to use the standard userland.

This change adds `option NETLINK` to the default options for each
 architecture, fixing the custom kernel issue.
For arm, this change uses `std.armv6` and `std.armv7` (netlink already in)
 instead of DEFAULTS.

Reviewed By: imp
Differential Revision: https://reviews.freebsd.org/D39339
2023-04-02 15:27:21 +00:00
Emmanuel Vadot
4fd9e20671 linuxkpi: hdmi: Remove wrong dependency on wlan
Copy-paste mistake.

Reported by:	Alastair Hogge <agh@riseup.net>
Fixes:	f1d7ae31d4 ("linuxkpi: Add hdmi helpers")
2023-04-02 16:56:23 +02:00
Alexander V. Chernikov
c35a43b261 netlink: allow exact-match route lookups via RTM_GETROUTE.
Use already-existing RTM_F_PREFIX rtm_flag to indicate that the
 request assumes exact-prefix lookup instead of the
 longest-prefix-match.

MFC after:	2 weeks
2023-04-02 13:47:10 +00:00
Alexander V. Chernikov
4aeb939ecf netlink: fix NULL check in the default route snl(3) parser.
CID:		1506959
MFC after:	2 weeks
2023-04-02 12:44:20 +00:00
Alexander V. Chernikov
27cbc1a7fe netlink: fix snl_read_reply_multi().
CID:		1506956
MFC after:	2 weeks
2023-04-02 12:41:53 +00:00
Dmitry Chagin
a32ed5ec05 pseudofs: Simplify pfs_visible_proc
Reviewed by:		des
Differential revision:	https://reviews.freebsd.org/D39383
MFC after:		1 month
2023-04-02 11:24:10 +03:00
Dmitry Chagin
405c0c04ed pseudofs: Allow vis callback to be called for a named node
This will be used later in the linsysfs module to filter out VNETs.

Reviewed by:		des
Differential revision:	https://reviews.freebsd.org/D39382
MFC after:		1 month
2023-04-02 11:21:15 +03:00
Dmitry Chagin
7f72324346 pseudofs: Microoptimize struct pfs_node
Since 81167243b the size of struct pfs_node is 280 bytes, so the kernel
memory allocator takes memory from 384 bytes sized bucket. However, the
length of the node name is mostly short, e.g., for Linux emulation layer
it is up to 16 bytes. The size of struct pfs_node w/o pfs_name is 152
bytes, i.e., we have 104 bytes left to fit the node name into the 256
bytes-sized bucket.

Reviewed by:		des
Differential revision:	https://reviews.freebsd.org/D39381
MFC after:		1 month
2023-04-02 11:20:07 +03:00
Navdeep Parhar
9f354cd3d0 cxgbe(4): Allow tracing filters on loopback ports.
Each physical port has an associated loopback tx channel and anything
transmitted over that channel by the driver is looped back internally by
the hardware as if received on that physical port.  This change allows
tracing filters to be installed in this loopback path.

MFC after:	1 week
Sponsored by:	Chelsio Communications
2023-04-01 17:50:46 -07:00
Navdeep Parhar
531ef35241 cxgbe/iw_cxgbe: Always set a vnet around calls to IN_LOOPBACK.
This is catch up with efe58855f3.

MFC after:	1 week
Sponsored by:	Chelsio Communications
2023-04-01 16:19:10 -07:00
Rick Macklem
f4179ad46f nfscommon: Add support for an NFSv4 operation bitmap
NFSv4.1/4.2 uses operation bitmaps for various operations,
such as the SP4_MACH_CRED case for ExchangeID.
This patch adds support for operation bitmaps so that
support for SP4_MACH_CRED can be added to the NFSv4.1/4.2
server in a future commit.

This commit should not change any NFSv4.1/4.2 semantics.

MFC after:	3 months
2023-04-01 14:22:26 -07:00
黃清隆
285d85f4f9 arcmsr(4): Fix reading buffer empty length error.
MFC after:	2 weeks
2023-03-31 22:43:43 -07:00
Bjoern A. Zeeb
8ac540d3b8 LinuxKPI: 802.11: adjust locking
Split up the lhw lock and the scan lock.  The latter is a mtx
while the former changes from mtx to sx as mac80211 downcalls may
sleep (and the ic lock is not usable in that case either and a larger
project to fix).
This will also enforce some lookups under lock (mostly scan) as well
as general protection for more compat code and avoid a possible
deadlock with one of the upcoming callbacks from driver into the
compat code.

Sponsored by:	The FreeBSD Foundation
MFC after:	7 days
2023-03-31 19:59:50 +00:00
Joseph Mingrone
6f9cba8f8b
libpcap: Update to 1.10.3
Local changes:

- In contrib/libpcap/pcap/bpf.h, do not include pcap/dlt.h.  Our system
  net/dlt.h is pulled in from net/bpf.h.
- sys/net/dlt.h: Incorporate changes from libpcap 1.10.3.
- lib/libpcap/Makefile: Update for libpcap 1.10.3.

Changelog:	https://git.tcpdump.org/libpcap/blob/95691ebe7564afa3faa5c6ba0dbd17e351be455a:/CHANGES
Reviewed by:	emaste
Obtained from:	https://www.tcpdump.org/release/libpcap-1.10.3.tar.gz
Sponsored by:	The FreeBSD Foundation
2023-03-31 16:02:22 -03:00
John Baldwin
cb750f7f5a fuse: Remove set but unused cr_gid variable.
Reviewed by:	asomers
Reported by:	GCC
Differential Revision:	https://reviews.freebsd.org/D39350
2023-03-31 10:57:13 -07:00
John Baldwin
ad83dd2b2b LinuxKPI: Appease -Wunused-but-set-variable warnings from GCC.
- Mark assert dummy variables as __unused.

- Use a dummy (void) cast of the flags argument passed to
  spin_unlock_irqrestore so it gets treated as used.

Reviewed by:	manu, hselasky
Differential Revision:	https://reviews.freebsd.org/D39349
2023-03-31 10:56:33 -07:00
Zhenlei Huang
5a8abd0a29 lacp: Use C99 bool for boolean return value
This improves readability.

No functional change intended.

MFC after:	1 week
2023-04-01 01:48:36 +08:00
Mitchell Horne
3462c371c2 arm64/gicv3: correct the size of the distributor resource
Use the GICD_SIZE macro (0x10000), which is half the size of the current
fixed-sized mapping (128 * 1024 == 0x20000).

In ARM64 Hyper-V instances, it seems the Distributor's registers are
located immediately preceding a range of physical memory in the bus
address space. Thus, when ram0 is attaching and attempts to reserve
SYS_RES_MEMORY resources corresponding to its physmem ranges, it fails,
because the first 0x10000 bytes of this range are already owned by gic0.

PR:		270415
Reported by:	whu
Tested by:	whu
Differential Revision:	https://reviews.freebsd.org/D39260
2023-03-31 13:26:22 -03:00
Mark Johnston
1a3cb489e5 arm64: Move the initial kernel stack out of the init_pagetables section
init_pagetables is mapped into the segment containing the BSS, but does
not get zeroed by locore.  It is used for bootstrap page table pages.

It happens that the bootstrap kernel stack is also placed in that
section, but there's no reason it shouldn't live in the BSS, so move it
there.  No functional change intended.

Reviewed by:	andrew
MFC after:	1 week
Sponsored by:	Klara, Inc.
Sponsored by:	Juniper Networks
Differential Revision:	https://reviews.freebsd.org/D39367
2023-03-31 11:57:26 -04:00
Andrew Turner
47ff149afa Move arm64 EENTRY uses before ENTRY
The ENTRY macro adds instructions to the start of a function but not
EENTRY. To use these instructions in both functions move the EENTRY
use before the ENTRY use.

Sponsored by:	Arm Ltd
2023-03-31 16:45:31 +01:00
Mark Johnston
a54370f4ab arm64: Ensure that thread0's PCB flags are initialized
On arm64, the PCB is stored at the top of the thread stack.  For thread0
this comes from the static "initstack" region, which is placed in the
.init_pagetable section, which is not part of the BSS and thus doesn't
get zeroed by locore.  (See the comment in ldscript.arm64.)  It is thus
possible for the pcb_flags field to be uninitialized, which can result
in PCB_SINGLE_STEP being set.

Fix this by simply initializing the field.  A separate commit will move
initstack out of the .init_pagetable section, since it has no reason to
be there, but it is preferable to explicitly initialize PCB fields
anyway.  In particular, regular kernel stacks are not zeroed upon
allocation, so we should be consistent here.

Reviewed by:	andrew
MFC after:	1 week
Sponsored by:	Klara, Inc.
Sponsored by:	Juniper Networks
Differential Revision:	https://reviews.freebsd.org/D39343
2023-03-31 09:50:34 -04:00
Dmitry Chagin
960562652c linux(4): Fix opt_netlink.h inclusion
Add opt_netlink.h to the linux_common module, on i386, where we don't
uses linux_common module, move opt_netlink.h inclusion under
i386 condition.

MFC after:		2 weeks
2023-03-31 14:56:59 +03:00
Dmitry Chagin
126df352f5 linux(4): Move inclusion of i386-specific files under common condition 2023-03-31 14:56:29 +03:00
Dmitry Chagin
f94b5734bc Revert "linsysfs(4): Reimplement listnics() using ifAPI"
This reverts commit 0b56641cfc.

As it poorly interacts with vnet subsystem
2023-03-31 14:54:33 +03:00
Kristof Provost
28921c4f7d carp: allow commands to use interface name rather than index
Get/set commands can now choose to provide the interface name rather
than the interface index. This allows userspace to avoid a call to
if_nametoindex().

Suggested by:	melifaro
Reviewed by:	melifaro
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D39359
2023-03-31 11:29:58 +02:00
Andrew Turner
3a0cc6fe61 Handle the arm64 unknown exception separately
Rather than falling through to the default case handle the unknown
exception with its own panic message. As ESR_EL1 is zero for this
exception stop printing it.

Sponsored by:	Arm Ltd
2023-03-31 10:15:45 +01:00
Bjoern A. Zeeb
3f0083c4e3 LinuxKPI: 802.11: use ic_printf more consistently
Rather than printing ic_name ourselves (or not at all) use ic_printf()
as a common function from net80211 where possible.

Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2023-03-31 07:14:32 +00:00
Ed Maste
d33cdf16df fs/cd9660: add header include guards
Diff reduction against NetBSD files in sys/fs/cd9660/ and OpenBSD files
in usr.sbin/makefs/cd9660/.

Sponsored by:	The FreeBSD Foundation
2023-03-30 19:20:54 -04:00
Konstantin Belousov
7170774e2a ifcapnv: cap_bit in ifcap2_nv_bit_names[] is bit, not index
Sponsored by:	Nvidia networking
2023-03-31 02:08:15 +03:00
John Baldwin
47d1e67874 sys: Disable errors for -Wunused-function on GCC.
This matches the handling of this warning on clang.
2023-03-30 15:43:48 -07:00
Navdeep Parhar
21b778fbeb cxgbe(4): Remove dead code.
Fixes:	e7e0844422 cxgbe(4): Replace T4_PKT_TIMESTAMP with something slightly less hackish.
MFC after:	1 week
Sponsored by:	Chelsio Communications
2023-03-30 14:13:07 -07:00
Mark Johnston
fd02d0bc14 graid3: Pre-allocate the timeout event structure
As in commit 2f1cfb7f63 ("gmirror: Pre-allocate the timeout event
structure"), graid3 must avoid M_WAITOK allocations in callout handlers.

Reported by:	graid3 regression tests
MFC after	2 weeks
2023-03-30 13:38:15 -04:00
Mark Johnston
cab1056105 kdb: Modify securelevel policy
Currently, sysctls which enable KDB in some way are flagged with
CTLFLAG_SECURE, meaning that you can't modify them if securelevel > 0.
This is so that KDB cannot be used to lower a running system's
securelevel, see commit 3d7618d8bf.  However, the newer mac_ddb(4)
restricts DDB operations which could be abused to lower securelevel
while retaining some ability to gather useful debugging information.

To enable the use of KDB (specifically, DDB) on systems with a raised
securelevel, change the KDB sysctl policy: rather than relying on
CTLFLAG_SECURE, add a check of the current securelevel to kdb_trap().
If the securelevel is raised, only pass control to the backend if MAC
specifically grants access; otherwise simply check to see if mac_ddb
vetoes the request, as before.

Add a new secure sysctl, debug.kdb.enter_securelevel, to override this
behaviour.  That is, the sysctl lets one enter a KDB backend even with a
raised securelevel, so long as it is set before the securelevel is
raised.

Reviewed by:	mhorne, stevek
MFC after:	1 month
Sponsored by:	Juniper Networks
Sponsored by:	Klara, Inc.
Differential Revision:	https://reviews.freebsd.org/D37122
2023-03-30 10:45:00 -04:00
Alexander V. Chernikov
b755f1a009 netlink: Fix adding routes with nexthops on p2p interfaces.
Use full-featured ifa_ifwithroute() to guess route ifa/ifp
 instead of ifa_ifwithnet(). This change makes the route addition
 logic closer to the rt_getifa_fib() used by rtsock.

Reported by:	glebius
Tested by:	glebius
Differential Revision: https://reviews.freebsd.org/D39335
MFC after:	2 weeks
2023-03-30 09:53:50 +00:00
Mateusz Guzik
f5a365e51f inet6: protect address manipulation with a lock
This is a total hack/bare minimum which follows inet4.

Otherwise 2 threads removing the same address can easily crash.

Reviewed by:	kp
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D39317
2023-03-30 08:46:38 +00:00
Kirk McKusick
fe5e6e2cc5 Improvement in UFS/FFS directory placement when doing mkdir(2).
The algorithm for laying out new directories was devised in the 1980s
and markedly improved the performance of the filesystem. In those days
large disks had at most 100 cylinder groups and often as few as 10-20.
Modern multi-terrabyte disks have thousands of cylinder groups. The
original algorithm does not handle these large sizes well. This change
attempts to expand the scope of the original algorithm to work well
with these much larger disks while still retaining the properties
of the original algorithm for small disks.

The filesystem implementation is divided into policy routines and
implementation routines. The policy routines can be changed in any
way desired without risk of corrupting the filesystem. The policy
requests are handled by the implementation layer. If the policy
asks for an available resource, it is granted. But if it asks for
an already in-use resource, then the implementation will provide
an available one nearby the request. Thus it is impossible for a
policy to double allocate. This change is limited to the policy
implementation.

This change updates the ffs_dirpref() routine which is responsible
for selecting the cylinder group into which a new directory should
be placed. If we are near the root of the filesystem we aim to
spread them out as much as possible. As we descend deeper from the
root we cluster them closer together around their parent as we
expect them to be more closely interactive. Higher-level directories
like usr/src/sys and usr/src/bin should be separated while the
directories in these areas are more likely to be accessed together
so should be closer. And directories within commands or kernel
subsystems should be closer still.

We pick a range of cylinder groups around the cylinder group of the
directory in which we are being created. The size of the range for
our search is based on our depth from the root of our filesystem.
We then probe that range based on how many directories are already
present. The first new directory is at 1/2 (middle) of the range;
the second is in the first 1/4 of the range, then at 3/4, 1/8, 3/8,
5/8, 7/8, 1/16, 3/16, 5/16, etc.

It is desirable to store the depth of a directory in its on-disk
inode so that it is available when we need it. We add a new field
di_dirdepth to track the depth of each directory. Because there are
few spare fields left in the inode, we choose to share an existing
field in the inode rather than having one of our own. Specifically
we create a union with the di_freelink field. The di_freelink field
is used to track inodes that have been unlinked but remain referenced.
It is not needed until a rmdir(2) operation has been done on a
directory. At that point, the directory has no contents and even
if it is kept active as a current directory is no longer able to
have any new directories or files created in it. Thus the use of
di_dirdepth and di_freelink will never coincide.

Reported by:  Timo Voelker
Reviewed by:  kib
Tested by:    Peter Holm
MFC after:    2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D39246
2023-03-29 21:13:27 -07:00
Alexander V. Chernikov
badcb3fd57 routing: fix panic when adding an interface route to the p2p interface
without and inet/inet6 addresses attached.

MFC after:      3 days
2023-03-29 20:28:24 +00:00
Konstantin Belousov
cd137909c3 amd64 wakeup: recalculate mitigations after APICs are woken
APICs are needed to broadcast IPIs for MSR writes.

PR:	270489
Reviewed by:	dchagin, emaste, jhb
Tested by:	dchagin, manu
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D39302
2023-03-29 21:45:20 +03:00
Zhenlei Huang
d4a80d21b3 lagg(4): Do not enter net epoch recursively
This saves a little resources.

No functional change intended.

Reviewed by:	kp
Fixes:		b8a6e03fac Widen NET_EPOCH coverage
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D39267
2023-03-30 00:29:51 +08:00
Zhenlei Huang
dbe86dd5de lagg(4): Refactor out some lagg protocol input routines into a default one
Those input routines are identical.

Also inline two fast paths.

No functional change intended.

MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D39251
2023-03-30 00:22:13 +08:00
Zhenlei Huang
fcac5719a1 lagg(4): Make lagg_list and lagg_detach_cookie static
They are used internally only.

No functional change intended.

MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D39250
2023-03-30 00:14:44 +08:00
Mateusz Guzik
80cf427b8d proc: shave a lock trip on exit if possible
... which happens to be vast majority of the time
2023-03-29 09:19:03 +00:00
Mateusz Guzik
7c31de1a3c ufs: stop doing refcount_init on made up creds
creds are not using the refcount API for a long time now, but this
previously failed to fail to compile because the type remained int.

Now it broke due to conversion to long.
2023-03-29 09:19:03 +00:00
Joseph Koshy
57014ab776
pmc: Add a reminder to maintain documentation.
Approved by:	gnn (mentor)
Differential Revision: https://reviews.freebsd.org/D39298
2023-03-29 10:12:08 +01:00
Elliott Mitchell
b94341afcb xen/intr: rework xen_intr_resume() for in-place remapping
The prior implementation of xen_intr_resume() was wiping
xen_intr_port_to_isrc[] and then rebuilding from the x86 interrupt
table.  Rework to instead wipe the channel numbers (->xi_port) and then
scan the table for sources with invalid channels.

This will be slower due to scanning the whole table, but this removes
the dependency on the x86 interrupt code.

Reviewed by: royger
Differential Revision: https://reviews.freebsd.org/D30599
[royger]
Split line over 80 characters.
2023-03-29 09:51:45 +02:00
Elliott Mitchell
e45c8ea31c xen/intr: merge parts of resume functionality into new function
The portions of xen_rebind_ipi() and xen_rebind_virq() were already
near-identical.  While xen_rebind_ipi() should panic() on
single-processor, still having the functionality to invoke seems
harmless.

Meanwhile much of the loop from xen_intr_resume() seemed to want to be
closer to this same code.  This pushes related bits closer together.

Reviewed by: royger
Differential Revision: https://reviews.freebsd.org/D30598
2023-03-29 09:51:44 +02:00
Julien Grall
910bd069f8 xen/intr: remove x86 APIC headers from xen_intr.c
Remove these no longer needed headers.  Key for making xen_intr.c
machine-independent as they don't exist on other architectures.

Originally this was part of a much larger commit, but was broken off
for submission to the FreeBSD project.

Reviewed by: royger
Submitted by: Elliott Mitchell <ehem+freebsd@m5p.com>
Original implementation: Julien Grall <julien@xen.org>, 2015-10-20 09:14:56
MFC after: 1 week
2023-03-29 09:51:43 +02:00
Elliott Mitchell
40ad9aaa88 xen/intr: stop passing shared_info_t to xen_intr_active_ports()
There is only a single global HYPERVISOR_shared_info pointer, so
directly use the global pointer.

Reviewed by: royger
MFC after: 1 week
2023-03-29 09:51:43 +02:00
Elliott Mitchell
9f3be3a6ec xen: switch to using core atomics for synchronization
Now that the atomic macros are always genuinely atomic on x86, they can
be used for synchronization with Xen.  A single core VM isn't too
unusual, but actual single core hardware is uncommon.

Replace an open-coding of evtchn_clear_port() with the inline.

Substantially inspired by work done by Julien Grall <julien@xen.org>,
2014-01-13 17:40:58.

Reviewed by: royger
MFC after: 1 week
2023-03-29 09:51:42 +02:00
Elliott Mitchell
49ca3167b7 xen/intr: add check for intr_register_source() errors
While unusual, intr_register_source() can return failure.  A likely
cause might be another device grabbing from Xen's interrupt range.
This should NOT happen, but could happen due to a bug.  As such check
for this and fail if it occurs.

This theoretical situation also effects xen_intr_find_unused_isrc().
There, .is_pic must be tested to ensure such an intrusion doesn't cause
misbehavior.

Reviewed by: royger
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D31995
2023-03-29 09:51:41 +02:00
Elliott Mitchell
1797ff9627 xen/intr: cleanup event channel number use
Consistently use ~0 instead of 0 when clearing xenisrc structures.
0 is a valid event channel number, even though it is reserved by Xen.
Whereas ~0 is guaranteed invalid.

Reviewed by: royger
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D30743
2023-03-29 09:51:41 +02:00
Elliott Mitchell
2b2415bafa xen/intr: fix corruption of event channel table
In xen_intr_release_isrc(), the isrc should only be removed if it is
assigned to a valid port.  This had been mitigated by using 0 for not
having a port, but this is actually corrupting the table.  Fix this bug
as modifying the code would cause this bug to manifest as kernel memory
corruption.  Similar issue for the vCPU bitmap masks.

The KASSERT() doesn't need lock protection.

Reviewed by: royger
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D30743
2023-03-29 09:51:40 +02:00
Elliott Mitchell
0ebf9bb42d xen/intr: fix overflow of Xen interrupt range
The comparison was wrong.  Hopefully this never occurred in the wild,
but now ensure the error message will occur before damage is caused.
This appears non-exploitable as exploitation would require a guest to
force Domain 0 to allocate all event channels, which a guest shouldn't
be able to do.

Adjust the error message to better describe what has occurred.

Reviewed by: royger
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D30743
2023-03-29 09:51:39 +02:00
Elliott Mitchell
2d5e325303 xen/intr: always set xi_close in xen_intr_bind_isrc()
Appears errors are uncommon since calling xen_intr_release_isrc() on a
xenisrc with xi_close in an undefined state could be bad.  Fix this
problematic lurking nasty.

Reviewed by: royger
MFC after: 1 week
2023-03-29 09:51:39 +02:00
Stefan Eßer
c33db74b53 fs/msdosfs: add tracking of free root directory entries
This update implements tallying of free directory entries during
create, delete,	or rename operations on FAT12 and FAT16 file systems.

Prior to this change, the total number of root directory entries
was reported as number of inodes, but 0 as the number of free
inodes, causing system health monitoring software to warn about
a suspected disk full issue.

The FAT12 and FAT16 file systems provide a limited number of
root directory entries, e.g. 512 on typical hard disk formats.
The valid range of values is 1 to 65535, but the msdosfs code
will effectively round up "odd" values to the next multiple of 16
(e.g. 513 would allow for 528 root directory entries).

This update implements tracking of directory entries during create,
delete, or rename operations, with initial values determined by
scanning the directory when the file system is mounted.

Total and free directory entries are reported in the f_files and
f_ffree elements of struct statfs, despite differences in semantics
of these values:

- There is no limit on the number of files and directories that can
  be created on a FAT file system. Only the root directory of FAT12
  and FAT16 file systems is limited, any number of files can still be
  created in sub-directories, even when 0 free "inodes" are reported.

- A single file can require 1 to 21 directory entries, depending on
  the character set, structure, and length of the name. The DOS 8.3
  style file name takes up 1 entry, and if the name does not comply
  with the syntax of a DOS 8.3 file name, 1 additional entry is used
  for each 13 characters of the file name. Since all these entries
  have to be contiguous, it is possible that a file or directory with
  a long name can not be created, despite a sufficient total number of
  free directory entries.

- Renaming a file can require more directory entries than currently
  allocated to store its long name, which may prevent an in-place
  update of the name if more entries are needed. This may cause a
  rename operation to fail if no contiguous range of free entries for
  the new name can be found.

- The volume label is stored in a directory entry. An empty FAT file
  system with a volume label will therefore show 1 used "inode" in
  df.

- The perceentage of free inodes shown in df or monitoring tools does
  only represent the state of the root directory of a FAT12 or FAT16
  file system. Neither does a reported value of 0% free inodes does
  prevent files from being created in sub-directories, nor does a
  value of 50% free inodes guarantee that even a single file with
  a "long" name can be created in the root directory (if every other
  directory entry is occupied and there are no 2 contiguous entries).

The statfs(2) and df(1) man pages have been updated with a notice
regarding the possibly different semantics of values reported as
total and free inodes for non-Unix file systems.

PR:		270053
Reported by:	Ben Woods <woodsb02@freebsd.org>
Approved by:	mckusick
MFC after:	1 month
Differential Revision:	https://reviews.freebsd.org/D38987
2023-03-29 08:46:01 +02:00
Mateusz Guzik
37337709d3 cred: convert the refcount from int to long
On 64-bit platforms this sorts out worries about mitigating bugs which
overflow the counter, all while not pessimizng anything -- most notably
it avoids whacking per-thread operation in favor of refcount(9) API.

The struct already had two instances of 4 byte padding with 256 bytes in
size, cr_flags gets moved around to avoid growing it.

32-bit platforms could also get the extended counter, but I did not do
it as one day(tm) the mutex protecting centralized operation should be
replaced with atomics and 64-bit ops on 32-bit platforms remain quite
penalizing.

While worries of counter overflow are addressed, the following is not
(just like it would not be with conversion to refcount(9)):
- counter *underflows*
- buffer overruns from adjacent allocations
- UAF due to stale cred pointer
- .. and other goodies

As such, while lipstick was placed, the pig should not be participating
in any beauty pageants.

Prodded by:	emaste
Differential Revision:	https://reviews.freebsd.org/D39220
2023-03-29 05:02:32 +00:00
Mateusz Guzik
21d29c5192 cred: make the refcount signed
There are asserts on the count being > 0, but they are less useful than
they can be because the type itself is unsigned.

The kernel is compiled with -frapv, making wraparound perfectly defined.

Differential Revision:	https://reviews.freebsd.org/D39220
2023-03-29 05:02:04 +00:00
Rick Macklem
695d87bae1 nfscl: Make coverity happy
Coverity does not like code that checks a function's
return value sometimes.  Add "(void)" in front of the
function when the return value does not matter to try
and make it happy.

A recent commit deleted "(void)"s in front of nfsm_fhtom().
This commit puts them back in.

Reported by:	emaste
MFC after:	3 months
2023-03-28 17:08:45 -07:00
Bartosz Sobczak
35105900c6
irdma(4): Upgrade the driver to 1.1.11-k
Summary of changes:
- postpone mtu size assignment during load to avoid race condition
- refactor some of the debug prints
- add request reset handler
- refactor flush scheduler to increase efficiency and avoid racing
- put correct vlan_tag for UD traffic with PFC
- suspend QP before going to ERROR state to avoid CQP timout
- fix arithmetic error on irdma_debug_bugf
- allow debug flag to be settable during driver load
- introduce meaningful default values for DCQCN algorithm
- interrupt naming convention improvements
- skip unsignaled completions in poll_cmpl

Signed-off-by: Bartosz Sobczak bartosz.sobczak@intel.com
Signed-off-by: Eric Joyner <erj@FreeBSD.org>

Reviewed by:	hselasky@
MFC after:	1 week
Sponsored by:	Intel Corporation
Differential Revision:	https://reviews.freebsd.org/D39173
2023-03-28 14:29:07 -07:00
Simon J. Gerraty
a02d9cad77 tarfs_mount allow control of vfs_mountedfrom
We default to passing the path of the tar file to vfs_mountedfrom
so we can tell where a filesystem was mounted from.
However this can make the output of mount(8) hard to read.

Allow things like:

	mount -t tarfs -o as=`basename $tar` $tar /mnt

so "as" is recorded instead of $tar

Reviewed by:	des
Sponsored by:	Juniper Networks
Differential Revision:	https://reviews.freebsd.org/D39273
2023-03-28 10:57:26 -07:00
Emmanuel Vadot
51d07956cf linuxkpi: Add alderlake defines in intel-family
Needed by drm-kmod.

Sponsored by:	Beckhoff Automation GmbH & Co. KG
2023-03-28 10:57:13 +02:00
Emmanuel Vadot
1cebc9298c Bump __FreeBSD_version after linuxkpi updates.
Sponsored by:	Beckhoff Automation GmbH & Co. KG
2023-03-28 09:19:53 +02:00
Emmanuel Vadot
aaf6129c9d linuxkpi: Add devm_add_action and devm_add_action_or_reset
Those adds a new devres and will exectute some function on release.

Reviewed by:	bz
Differential Revision:	https://reviews.freebsd.org/D39142
Sponsored by:	Beckhoff Automation GmbH & Co. KG
2023-03-28 09:19:05 +02:00
Emmanuel Vadot
f1d7ae31d4 linuxkpi: Add hdmi helpers
This is a direct port of the Linux code as the licence allows it, so
style(9) isn't respected to allow applying directly the upstream commits.
Do not add it to linuxkpi directly but add a new linuxkpi_hdmi module
that drm modules will require later, no need to bloat linuxkpi more.

Sponsored by:	Beckhoff Automation GmbH & Co. KG
Differential Revision:	https://reviews.freebsd.org/D39122
2023-03-28 09:11:06 +02:00
Richard Scheffenegger
f858eb916f tcp: send SACK rescue retransmission also mid-stream
Previously, SACK rescue retransmissions would only happen
on a loss recovery at the tail end of the send buffer.

This extends the mechanism such that partial ACKs without SACK
mid-stream also trigger a rescue retransmission to try avoid
an otherwise unavoidable retransmission timeout.

Reviewed By:		tuexen, #transport
Sponsored by:		NetApp, Inc.
Differential Revision:	https://reviews.freebsd.org/D39274
2023-03-28 04:47:01 +02:00
Gleb Smirnoff
78e6c3aacc tcp: update error counter when dropping a packet due to bad source
Use the same counter that ip_input()/ip6_input() use for bad destination
address.  For IPv6 this is already heavily abused ip6s_badscope, which
needs to be split into several separate error counters.

Reviewed by:		markj
Differential Revision:	https://reviews.freebsd.org/D39234
2023-03-27 18:37:15 -07:00
Kristof Provost
ccff2078af carp: fix source MAC
When we're not in unicast mode we need to change the source MAC address.
The check for this was wrong, because IN_MULTICAST() assumes host
endianness and the address in sc_carpaddr is in network endianness.

Sponsored by:	Rubicon Communications, LLC ("Netgate")
2023-03-28 01:18:18 +02:00
Kristof Provost
27b23cdec9 pf: remove pd_refs from pfsync
It only served to complicate cleanup, and added no value.

While here drop packets in pfsync_defer_tmo() if we don't have a syncif,
rather than just leaving them on the queue.

Reviewed by:	markj
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D39248
2023-03-28 01:18:07 +02:00
Kristof Provost
01194da28a pfsync: hold b_mtx for callout_stop(pd_tmo)
The pd_tmo callout has an associated mutex, which we must hold while
calling callout_stop().

Reported by:	markj
Reviewed by:	markj
MFC after:	3 days
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D39223
2023-03-28 01:17:55 +02:00
Rick Macklem
1512579adc nfscl: Make coverity happy
Coverity does not like code that checks a function's
return value sometimes.  Add "(void)" in front of the
function when the return value does not matter to try
and make it happy.

Reported by:	emaste
MFC after:	3 months
2023-03-27 16:53:30 -07:00
Konstantin Belousov
6a0a634590 Regen 2023-03-28 02:39:26 +03:00
Konstantin Belousov
375732cc6e kqueue1(2): export the symbol from libc
Reviewed by:	emaste, jhb
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D39271
2023-03-28 02:39:26 +03:00
Konstantin Belousov
61194e9852 Add kqueue1() syscall
It takes the flags argument.  Immediate use is to provide the KQUEUE_CLOEXEC
flag for kqueue(2).

Reviewed by:	emaste, jhb
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D39271
2023-03-28 02:39:26 +03:00
Alexander V. Chernikov
b894193501 netlink: fix linux module build w/ netlink.
Reported by:	Marek Zarychta <zarychtam@plan-b.pwste.edu.pl>
MFC after:	2 weeks
2023-03-27 18:21:26 +00:00
Alexander V. Chernikov
d3a49f62a2 netlink: fix 19e43c163c by adding miseed netlinkg_glue.c 2023-03-27 16:09:02 +00:00
Alexander V. Chernikov
19e43c163c netlink: add netlink KPI to the kernel by default
This change does the following:

Base Netlink KPIs (ability to register the family, parse and/or
 write a Netlink message) are always present in the kernel. Specifically,
* Implementation of genetlink family/group registration/removal,
  some base accessors (netlink_generic_kpi.c, 260 LoC) are compiled in
  unconditionally.
* Basic TLV parser functions (netlink_message_parser.c, 507 LoC) are
  compiled in unconditionally.
* Glue functions (netlink<>rtsock), malloc/core sysctl definitions
 (netlink_glue.c, 259 LoC) are compiled in unconditionally.
* The rest of the KPI _functions_ are defined in the netlink_glue.c,
 but their implementation calls a pointer to either the stub function
 or the actual function, depending on whether the module is loaded or not.

This approach allows to have only 1k LoC out of ~3.7k LoC (current
 sys/netlink implementation) in the kernel, which will not grow further.
It also allows for the generic netlink kernel customers to load
 successfully without requiring Netlink module and operate correctly
 once Netlink module is loaded.

Reviewed by:	imp
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D39269
2023-03-27 13:55:44 +00:00
Mark Johnston
ad2f2ee015 arm64: Remove duplicated function prototypes for PAC
No functional change intended.

Sponsored by:	The FreeBSD Foundation
2023-03-27 08:56:22 -04:00
Yuri Pankov
6aa5b10d0c nvme: fix resv commands with nda device
- passing I/O commands through nda requires nsid field to be set (it was
  unused when going through nvme_ns_ioctl())
- ccb's status can be OR'ed with the flags, use CAM_STATUS_MASK

Reviewed by:	imp (cam)
Differential Revision:	https://reviews.freebsd.org/D37696
2023-03-27 14:53:24 +02:00
Yuri Pankov
1249680609 kern.post.mk: fix PORTSDIR handling
Using subshell's PORTSDIR variable (via $${PORTSDIR}}) seems to be
only working if PORTSDIR is specified directly on the make command
line.

Use ${PORTDIR} here instead so that setting the variable in
/etc/{make,src,src-env}.conf would work (also works for variable
being set on command line or in the environment).

PR:		268299
Reviewed by:	imp
Differential Revision:	https://reviews.freebsd.org/D37868
2023-03-27 13:57:57 +02:00
Alexander V. Chernikov
eccccd657f netlink: make nlattr_add_in[6]_addr inline
MFC after:	2 weeks
2023-03-27 11:53:34 +00:00
Alexander V. Chernikov
6dc858d84c netlink: remove forgotten debug message in handle_rtm_getroute().
MFC after:	2 weeks
2023-03-27 10:49:40 +00:00
Alexander V. Chernikov
544f1492c0 netlink: ensure genetlink control family always registers under the same ID.
MFC after:	2 weeks
2023-03-27 10:48:24 +00:00
Corvin Köhne
e8988d60d2
pci: expose intel_graphics_stolen as sysctl
The Intel graphics stolen memory is used by the Intel GOP driver on
boot. When using bhyve with GPU passthrough, it's also used by the guest
GOP driver at guest boot. For that reason, bhyve needs to know the
address and size of this region to inform the guest about this region.
Exposing the variables as sysctl allows bhyve to easily read them.
2023-03-27 11:40:49 +02:00
Corvin Köhne
48d70503bc
pci: add tunable hw.pci.enable_mps_tune
If the tunable is set to 0, the tuning of the MPS (maximum payload size)
is disabled and the default MPS values set by the BIOS are used. In this
case the system may use a lower speed or operate in a less optimized
state, but it can resolve issues with stability and compatibility. With
specific devices the tuning of the mps, can lead to a complete freeze of
the system.

Reviewed by:		manu
MFC after:		1 week
Sponsored by:		Beckhoff Automation GmbH & Co. KG
Differential Revision:	https://reviews.freebsd.org/D38397
2023-03-27 11:28:27 +02:00
Kyle Evans
ec1bc53002 arm64: cpu_switch: don't zero out pcb_onfault
Previously this would zero out x18 in the pcb, now it's attacking the
innocent pcb_onfault -- drop it entirely.

This technically fixes
e605b87a9e ("Save only callee-saved registers in pcb"), but it's
harmless until the below commit trims down pcb_x.

Reported by:	mmel
Reviewed by:	andrew, mmel
Fixes:	1c1f31a5e5 ("Remove unused registes from the arm pcb")
Differential Revision:	https://reviews.freebsd.org/D39277
2023-03-26 13:48:22 -05:00
Alexander V. Chernikov
9a11f3dff9 netlink: add standrard ifaddr/neigh parsers to snl(3).
MFC after:	2 weeks
2023-03-26 09:04:41 +00:00
Alexander V. Chernikov
04f75b9802 netlink: allow netlink sockets in non-vnet jails.
This change allow to open Netlink sockets in the non-vnet jails, even for
 unpriviledged processes.
The security model largely follows the existing one. To be more specific:
* by default, every `NETLINK_ROUTE` command is **NOT** allowed in non-VNET
 jail UNLESS `RTNL_F_ALLOW_NONVNET_JAIL` flag is specified in the command
 handler.
* All notifications are **disabled** for non-vnet jails (requests to
 subscribe for the notifications are ignored). This will change to be more
 fine-grained model once the first netlink provider requiring this gets
 committed.
* Listing interfaces (RTM_GETLINK) is **allowed** w/o limits (**including**
 interfaces w/o any addresses attached to the jail). The value of this is
 questionable, but it follows the existing approach.
* Listing ARP/NDP neighbours is **forbidden**. This is a **change** from the
 current approach - currently we list static ARP/ND entries belonging to the
 addresses attached to the jail.
* Listing interface addresses is **allowed**, but the addresses are filtered
 to match only ones attached to the jail.
* Listing routes is **allowed**, but the routes are filtered to provide only
 host routes matching the addresses attached to the jail.
* By default, every `NETLINK_GENERIC` command is **allowed** in non-VNET jail
 (as sub-families may be unrelated to network at all).
 It is the goal of the family author to implement the restriction if
 necessary.

Differential Revision: https://reviews.freebsd.org/D39206
MFC after:	1 month
2023-03-26 08:44:09 +00:00
Alexander V. Chernikov
2cda6a2fb0 routing: add public rt_is_exportable() version to check if
the route can be exported to userland when jailed.

Differential Revision: https://reviews.freebsd.org/D39204
MFC after:	2 weeks
2023-03-26 08:24:27 +00:00
Konstantin Belousov
28f957b8b3 vnode_pager_input: return runningbufspace back
Both vnode_pager_input_smlfs() and vnode_pager_generic_getpages()
increment runningbufspace, but also both delegate io completion handling
on the pbuf to either plain bdone() or filesystem-specific strategy
routine. Accidentally, for e.g. UFS it is g_vfs_strategy()/g_vfs_done().
The later calls bufdone() which handles runningbufspace reclamation.

For plain bdone() io done handler, nothing would return
accounted b_runningbufspace back. Do it in the new
helper vnode_pager_input_bdone(), as well as in
vnode_pager_generic_getpages_done() explicitly.

Note that potential multiple calls to runningbufwakeup() for the same
pbuf or buf completion are safe. runningbufwakeup() clears accounting
for the buffer, so second and later calls are nop.

The problem was found due to tarfs using small vnode pager input but not
g_vfs_strategy().

Reported by:	des
Reviewed by:	markj, sjg
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D39263
2023-03-26 00:55:29 +02:00
Mateusz Guzik
0e71f4f77c vm: add unlocked page lookup before trying vm_fault_soft_fast
Shaves a read lock + tryupgrade trip most of the time.

Stats from doing a kernel build (counters not present in the tree):
vm.fault_soft_fast_ok: 262653
vm.fault_soft_fast_failed_other: 41
vm.fault_soft_fast_failed_no_page: 39595772
vm.fault_soft_fast_failed_page_busy: 1929
vm.fault_soft_fast_failed_page_invalid: 22183

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D39268
2023-03-25 22:14:59 +00:00
Mateusz Guzik
22eb66d961 vfs cache: always assert on ndp->ni_resflags 2023-03-25 21:57:55 +00:00
Dmitry Chagin
f0fe68a965 i386: ansify
Reported by:	clang 15
2023-03-25 23:21:03 +03:00
Andrew Gallatin
abba58766f LRO: Add missing checks for invalid IP addresses
LRO bypasses normal ip_input()/tcp_input() and lacks several checks
that are present in the normal path.  Without these checks, it
is possible to trigger assertions added in b0ccf53f24

Reviewed by: glebius, rrs
Sponsored by: Netflix
2023-03-25 11:56:02 -04:00
Mateusz Guzik
138a5dafba vfs: trylock vnode requeue
The quasi-LRU still gets in the way for example when doing an
incremental bzImage build, with vnode_list lock being at the
top of the profile. Further damage control the problem by trylocking.

Note the entire mechanism desperately wants to be reaped out in favor
of something(tm) which both scales in a multicore setting and provides
sensible replacement policy.

With this change everything vfs almost disappears from the on CPU
flamegraph, what is left is tons of contention in the VM.
2023-03-25 13:42:27 +00:00
Mateusz Guzik
245767c278 vfs: flip deferred_inact to atomic
Turns out it is very rarely triggered, making a per-cpu
counter a waste.

Examples from real life boxes:
uptime		counter
135 days	847
138 days	2190
141 days	1
2023-03-25 13:42:27 +00:00
Mateusz Guzik
e5eb1d298f vfs: replace some spelled out VNASSERTs with VNPASS
nfc
2023-03-25 13:42:27 +00:00
Vico Chen
1f0c8bfd65 linsysfs(4): Keep Linux compatible sysfs the same as Ubuntu
By checking Ubuntu, there is no `/sys/subsystem' in sysfs. To compatible
with Ubuntu, delete the 'subsystem' creation in Linux compatible module.

On the other hand, the sysfs `/sys/subsystem' cause failure for some
Linux udev cases. In Linux udev source code, there is a function named
`scan_devices_all', and it will scan `/sys/subsystem' if it is existed,
but now there are nothing in /sys/subsystem `, and it returns empty
to cause some use cases failed.

Reviewed by:		dchagin
Differential Revision:	https://reviews.freebsd.org/D38885
MFC after:		1 month
XMFC with:		ifAPI
2023-03-25 13:41:04 +03:00
Dmitry Chagin
0b56641cfc linsysfs(4): Reimplement listnics() using ifAPI
Handle if arrival/departure events.

Reviewed by:		melifaro (early version)
Differential Revision:	https://reviews.freebsd.org/D38901
MFC after:		1 month
XMFC with:		ifAPI
2023-03-25 13:40:41 +03:00
John Baldwin
0f735657aa bhyve: Remove vmctx member from struct vm_snapshot_meta.
This is a userland-only pointer that isn't relevant to the kernel and
doesn't belong in the ioctl structure shared between userland and the
kernel.  For the kernel, the old structure for the ioctl is still
supported under COMPAT_FREEBSD13.

This changes vm_snapshot_req() in libvmmapi to accept an explicit
vmctx argument.

It also changes vm_snapshot_guest2host_addr to take an explicit vmctx
argument.  As part of this change, move the declaration for this
function and its wrapper macro from vmm_snapshot.h to snapshot.h as it
is a userland-only API.

Reviewed by:	corvink, markj
Differential Revision:	https://reviews.freebsd.org/D38125
2023-03-24 11:49:06 -07:00
John Baldwin
7d9ef309bd libvmmapi: Add a struct vcpu and use it in most APIs.
This replaces the 'struct vm, int vcpuid' tuple passed to most API
calls and is similar to the changes recently made in vmm(4) in the
kernel.

struct vcpu is an opaque type managed by libvmmapi.  For now it stores
a pointer to the VM context and an integer id.

As an immediate effect this removes the divergence between the kernel
and userland for the instruction emulation code introduced by the
recent vmm(4) changes.

Since this is a major change to the vmmapi API, bump VMMAPI_VERSION to
0x200 (2.0) and the shared library major version.

While here (and since the major version is bumped), remove unused
vcpu argument from vm_setup_pptdev_msi*().

Add new functions vm_suspend_all_cpus() and vm_resume_all_cpus() for
use by the debug server.  The underyling ioctl (which uses a vcpuid of
-1) remains unchanged, but the userlevel API now uses separate
functions for global CPU suspend/resume.

Reviewed by:	corvink, markj
Differential Revision:	https://reviews.freebsd.org/D38124
2023-03-24 11:49:06 -07:00
Konstantin Belousov
13262b07a0 fdesc_lookup(): drop fdropped
Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D39207
2023-03-24 19:47:22 +02:00
Konstantin Belousov
7dca8fd1cb fdesc_lookup(): the condition to use vn_vget_ino() is always true
The ix number for the fdescfs root is 1, while any fd vnode has the ix
value at least 3.

Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D39207
2023-03-24 19:47:16 +02:00
Konstantin Belousov
8d97282a39 fdesc_lookup(): no need to vhold the dvp vnode
It is already referenced by the VOP_LOOKUP() caller, otherwise vdrop()
after vn_lock() is invalid anyway.

Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D39207
2023-03-24 19:47:07 +02:00
Konstantin Belousov
51b8ffb95c fdesc_allocvp(): fix potential use after free
Just owning the interlock is not enough for vget() to operate on the
vnode race-free with vgone(), the vnode should be held.  Use
vget_prep()/vget_finish() to avoid vholding the vnode explicitly, and
drop LK_INTERLOCK.

Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D39207
2023-03-24 19:46:53 +02:00
Konstantin Belousov
fa3ea81b77 fdescfs: remove useless XXX comment, unwrap line
Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D39207
2023-03-24 19:46:47 +02:00
Justin Hibbits
79aa96f9ca infiniband: Bring back M_ASSERTVALID() check in infiband_bpf_mtap()
Reported by:	rpokala
Fixes:		adf62e8363
2023-03-24 11:04:42 -04:00
Justin Hibbits
bb55bb1740 inet6: Include if_private.h in one more netstack file
ip6_input() and ip6_destroy() both directly reference ifnet members.
This file was missed in 3d0d5b21

Fixes:		3d0d5b21 ("IfAPI: Explicitly include <net/if_private.h>...")
Sponsored by:	Juniper Networks, Inc.
2023-03-24 10:25:35 -04:00
Justin Hibbits
727bfe3894 Mechanically convert qlnx(4) to IfAPI
Reviewed By:	zlei
Sponsored by:	Juniper Networks, Inc.
Differential Revision: https://reviews.freebsd.org/D37856
2023-03-24 10:09:53 -04:00
Justin Hibbits
3e142e0767 ofed: Mechanically convert to IfAPI
Summary:
Because of the intricacies of this code it wasn't purely scripted, but
instead hand-mechanical.

Reviewed by:	hselasky
Sponsored by:	Juniper Networks, Inc.
Differential Revision: https://reviews.freebsd.org/D38560
2023-03-24 10:04:33 -04:00
Zhenlei Huang
dcd7f0bd02 lagg: Various style fixes
MFC after:	1 week
2023-03-24 17:55:15 +08:00
Kristof Provost
ad729f8d50 pf: ignore ip6_output() return value in pf_refragment6()
We can't do anything if ip6_output() fails, other than discard the
packet which ip6_output() already does for us.
Mark the return value as ignored.

Reported by:	emaste, Coverity
Sponsored by:	Rubicon Communications, LLC (Netgate)
2023-03-24 08:08:19 +01:00
Eric Joyner
949d971f0b
ice(4): Restore old conditional overwritten by last update
Commit 8923de5905 ("ice(4): Update to 1.37.7-k", 2023-02-13)
unintentionally overwrote the change made in commit 52f45d8ace ("net:
iflib: let the drivers use isc_capenable", 2021-12-28).

Signed-off-by: Eric Joyner <erj@FreeBSD.org>

Reported by:	jhibbits@
MFC after:	3 days
Sponsored by:	Intel Corporation
2023-03-24 00:05:26 -07:00
Kyle Evans
89c52f9d59 arm64: add KASAN support
This entails:
- Marking some obvious candidates for __nosanitizeaddress
- Similar trap frame markings as amd64, for similar reasons
- Shadow map implementation

The shadow map implementation is roughly similar to what was done on
amd64, with some exceptions.  Attempting to use available space at
preinit_map_va + PMAP_PREINIT_MAPPING_SIZE (up to the end of that range,
as depicted in the physmap) results in odd failures, so we instead
search the physmap for free regions that we can carve out, fragmenting
the shadow map as necessary to try and fit as much as we need for the
initial kernel map.  pmap_bootstrap_san() is thus after
pmap_bootstrap(), which still included some technically reserved areas
of the memory map that needed to be included in the DMAP.

The odd failure noted above may be a bug, but I haven't investigated it
all that much.

Initial work by mhorne with additional fixes from kevans and markj.

Reviewed by:	andrew, markj
Sponsored by:	Juniper Networks, Inc.
Sponsored by:	Klara, Inc.
Differential Revision:	https://reviews.freebsd.org/D36701
2023-03-23 16:34:33 -05:00
Mitchell Horne
698dbd66fe arm64: add a GENERIC-KASAN config
Reviewed by:	andrew, markj
Sponsored by:	Juniper Networks, Inc.
Sponsored by:	Klara, Inc.
Differential Revision:	https://reviews.freebsd.org/D36702
2023-03-23 16:34:33 -05:00
John Baldwin
d2dab20c2a ktls: Drop all the INET and INET6 compile-time guards.
Consistent with 9fd0d9b16e, KERN_TLS is
not supported on kernels without any INET support.

Reviewed by:	gallatin, hselasky
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D39232
2023-03-23 14:29:07 -07:00
Gordon Bergling
f63aaffebc acpi(4): Fix a typo in a kernel message
- s/enitialization/initialization/

MFC afer:	5 days
2023-03-23 22:03:31 +01:00
Kirk McKusick
42c82aad32 Improve chance of finding an alternate superblock in sbsearch(3).
When requesting a superblock read for the sole purpose of getting
the parameters needed to find if backup parameters have been stored,
specify UFS_NOCSUM as only the base superblock is needed. This
change reduces the number of checks that the superblock must pass.

MFC after:    1 week
2023-03-23 13:04:52 -07:00
Mateusz Guzik
c16c4ea6d3 vfs cache: return ENOTDIR for not_a_dir/{.,..} lookups
Reported by:	Oliver Kiddle
PR:	270419
MFC:	3 days
2023-03-23 19:31:18 +00:00
Andrew Turner
6a4f5fdd19 Mark the arm64 PSR register fields with UL
These are for a 64 bit register. Make them 64 bit values on arm64.

Sponsored by:	Arm Ltd
2023-03-23 18:56:26 +00:00
Andrew Turner
ea3061526e Bump __FreeBSD_version for changing spsr on arm64
This changed a few structures, bump __FreeBSD_version for kgdb and
userspace consumers.

Sponsored by:	Arm Ltd
2023-03-23 18:56:26 +00:00
Andrew Turner
1c1f31a5e5 Remove unused registes from the arm pcb
These were kept for ABI reasons. Remove them and bump __FreeBSD_version
so debuggers can be updated to use the new layout.

Reviewed by:	jhb
Sponsored by:	Arm Ltd
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D35378
2023-03-23 18:56:26 +00:00
Andrew Turner
4a06b28a15 Add compat support for struct reg on arm64
The size of the spsr field in struct reg has changed. Mask the bits
that userspace doesn't know about out as they may be invalid.

While here add a comment why we don't need compat support in set_regs.

Sponsored by:	Arm Ltd
2023-03-23 18:56:26 +00:00
Zachary Leaf
f4036a9234 arm64: add fault address to trapframe
It was previously possible for the fault address register to get
clobbered before it was saved. This small window occurred when an
additional exception was encountered inside the exception handler,
overwriting the previous value.

Commit f29942229d ("Read the arm64 far early in el0 exceptions")
patched this issue, but avoided changing the trapframe since this could
be considered a KBI change in FreeBSD 13.

Revert the above fix and save the fault address in the trapframe
instead. This saves the fault address even earlier in the exception
handling process, and is a more robust and simple fix.

Reviewed by:	andrew, jhb, jrtc27
Sponsored by: Arm Ltd
Differential Revision:	https://reviews.freebsd.org/D38984
2023-03-23 18:56:26 +00:00
Zachary Leaf
2ecbbcc7ca arm64: extend ESR/SPSR registers to 64b
For the Exception Syndrome Register, ESR_ELx, the upper 32b were
previously unused, but now may contain additional exception info as of
Armv8.7 (FEAT_LS64).

Extend ESR from u32->u64 in exception handling code to support this. In
addition, also extend Saved Program Status Register SPSR_ELx in the same
way to allow for future extensions.

Reviewed by:	andrew
Sponsored by: Arm Ltd
Differential Revision:	https://reviews.freebsd.org/D38983
2023-03-23 18:56:26 +00:00
Justin Hibbits
e2427c6917 IfAPI: Add iterator to complement if_foreach()
Summary:
Sometimes an if_foreach() callback can be trivial, or need a lot of
outer context.  In this case a regular `for` loop makes more sense.  To
keep things hidden in the new API, use an opaque `if_iter` structure
that can still be instantiated on the stack.  The current implementation
uses just a single pointer out of the 4 alotted to the opaque context,
and the cleanup does nothing, but may be used in the future.

Reviewed by:	melifaro
Sponsored by:	Juniper Networks, Inc.
Differential Revision: https://reviews.freebsd.org/D39138
2023-03-23 09:39:26 -04:00