Commit Graph

147132 Commits

Author SHA1 Message Date
Gleb Smirnoff
84b42df834 rack: fix build on powerpc 2023-04-04 16:35:36 -07:00
Dmitry Chagin
71bc17803e linux(4): Implement close_range over native
Handling of the CLOSE_RANGE_UNSHARE flag is not implemented due to
difference in fd unsharing mechanism in the Linux and FreeBSD.

Reviewed by:		mjg
Differential revision:	https://reviews.freebsd.org/D39398
MFC after:		2 weeks
2023-04-04 23:24:04 +03:00
Dmitry Chagin
50111714f5 linux(4): Regen for close_range syscall
MFC after:		2 weeks
2023-04-04 23:23:37 +03:00
Dmitry Chagin
1c27dce1f8 linux(4): Modify close_range syscall to match Linux
MFC after:		2 weeks
2023-04-04 23:23:24 +03:00
Randall Stewart
030434acaf Update rack to the latest code used at NF.
There have been many changes to rack over the last couple of years, including:
     a) Ability when switching stacks to have one stack query another.
     b) Internal use of micro-second timers instead of ticks.
     c) Many changes to pacing in forms of
        1) Improvements to Dynamic Goodput Pacing (DGP)
        2) Improvements to fixed rate paciing
        3) A new feature called hybrid pacing where the requestor can
           get a combination of DGP and fixed rate pacing with deadlines
           for delivery that can dynamically speed things up.
     d) All kinds of bugs found during extensive testing and use of the
        rack stack for streaming video and in fact all data transferred
        by NF

Reviewed by: glebius, gallatin, tuexen
Sponsored By: Netflix Inc.
Differential Revision: https://reviews.freebsd.org/D39402
2023-04-04 16:05:46 -04:00
Gleb Smirnoff
2ff8187efd tcp_hpts: remove dead code tcp_drop_in_pkts()
Should have gone in f971e79139.
2023-04-04 12:55:27 -07:00
Eric van Gyzen
ecaeac805b mlxfw: fix potential NULL pointer dereference
Reported by:	Coverity (an internal run at Dell)
Reviewed by:	kib
MFC after:	1 week
Sponsored by:	Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D39348
2023-04-04 15:27:46 -05:00
Martin Matuska
91ef6f14f2 Revert "zfs: fall back if block_cloning feature is disabled"
This reverts commit 8ee579abe0.
2023-04-04 16:34:34 +02:00
Konstantin Belousov
11cdffc603 Regen 2023-04-04 16:19:08 +03:00
Konstantin Belousov
dac3102488 Rename kqueue1(2) to kqueuex(2) to avoid compat issues with NetBSD
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D39377
2023-04-04 16:19:08 +03:00
Randall Stewart
73ee5756de Fixes in the tcp infrastructure with respect to stack changes as well as other infrastructure updates for incoming rack features.
So stack switching as always been a bit of a issue. We currently use a break before make setup which means that
if something goes wrong you have to try to get back to a stack. This patch among a lot of other things changes that so
that it is a make before break. We also expand some of the function blocks in prep for new features in rack that will allow
more controlled pacing. We also add other abilities such as the pathway for a stack to query a previous stack to acquire from
it critical state information so things in flight don't get dropped or mis-handled when switching stacks. We also add the
concept of a timer granularity. This allows an alternate stack to change from the old ticks granularity to microseconds and
of course this even gives us a pathway to go to nanosecond timekeeping if we need to (something for the data center to consider
for sure).

Once all this lands I will then update rack to begin using all these new features.

Reviewed by: tuexen
Sponsored by: Netflix Inc
Differential Revision: https://reviews.freebsd.org/D39210
2023-04-01 01:46:38 -04:00
Emmanuel Vadot
63b113af57 linuxkpi: Add a few more dummy includes
Needed by drm-kmod.

Sponsored by:	Beckhoff Automation GmbH & Co. KG
2023-04-04 14:11:32 +02:00
Martin Matuska
8ee579abe0 zfs: fall back if block_cloning feature is disabled
If block_cloning is disabled, or other errors from zfs_clone_range()
return an EXDEV we should fall back to vn_generic_copy_file_range().

This fixes issues when copying files on the same dataset with
block_cloning disabled.

Upstreamed as pull request to OpenZFS.

Reviewed by:	Mateusz Guzik <mjguzik@gmail.com>
OpenZFS pull request:	14713
2023-04-04 13:43:34 +02:00
Emmanuel Vadot
44312c28fe linuxkpi: Add linux/agp_backend.h
It declares the structs needed by drm code for AGP.

Obtained from:	drm-kmod
Sponsored by:	Beckhoff Automation GmbH & Co. KG
2023-04-04 11:49:48 +02:00
Emmanuel Vadot
7c7419f60c linuxkpi: Add linux/stddef.h
It simply include sys/stdef.h

Obtained from:	drm-kmod
Sponsored by:	Beckhoff Automation GmbH & Co. KG
2023-04-04 11:39:27 +02:00
Emmanuel Vadot
0bf561351b linuxkpi: Include linux/types.h in linux/mod_devicetable.h
It's done like this in linux too.

Sponsored by:   Beckhoff Automation GmbH & Co. KG
2023-04-04 10:48:28 +02:00
Emmanuel Vadot
3f39ff2420 linuxkpi: Include linux/math64.h in linux/time.h
It's done like this in linux too.

Sponsored by:	Beckhoff Automation GmbH & Co. KG
2023-04-04 10:48:27 +02:00
Ganbold Tsagaankhuu
396b6914c9 Fix pcie phy enabling codes for RK3568 SoC.
Handle data-lanes property for pcie phy and set it accordingly.
This makes devices attached to pcie3 work properly.
For some RK3568 based boards, RTL8125B based device is
connected it. So with this, realtek-re-kmod driver attaches
and works.

Partially obtained from OpenBSD.
Tested on NanoPI-R5S, FireFly Station P2 boards.
2023-04-04 02:50:29 +00:00
Martin Matuska
2a58b312b6 zfs: merge openzfs/zfs@431083f75
Notable upstream pull request merges:
  #12194 Fix short-lived txg caused by autotrim
  #13368 ZFS_IOC_COUNT_FILLED does unnecessary txg_wait_synced()
  #13392 Implementation of block cloning for ZFS
  #13741 SHA2 reworking and API for iterating over multiple implementations
  #14282 Sync thread should avoid holding the spa config write lock
         when possible
  #14283 txg_sync should handle write errors in ZIL
  #14359 More adaptive ARC eviction
  #14469 Fix NULL pointer dereference in zio_ready()
  #14479 zfs redact fails when dnodesize=auto
  #14496 improve error message of zfs redact
  #14500 Skip memory allocation when compressing holes
  #14501 FreeBSD: don't verify recycled vnode for zfs control directory
  #14502 partially revert PR 14304 (eee9362a7)
  #14509 Fix per-jail zfs.mount_snapshot setting
  #14514 Fix data race between zil_commit() and zil_suspend()
  #14516 System-wide speculative prefetch limit
  #14517 Use rw_tryupgrade() in dmu_bonus_hold_by_dnode()
  #14519 Do not hold spa_config in ZIL while blocked on IO
  #14523 Move dmu_buf_rele() after dsl_dataset_sync_done()
  #14524 Ignore too large stack in case of dsl_deadlist_merge
  #14526 Use .section .rodata instead of .rodata on FreeBSD
  #14528 ICP: AES-GCM: Refactor gcm_clear_ctx()
  #14529 ICP: AES-GCM: Unify gcm_init_ctx() and gmac_init_ctx()
  #14532 Handle unexpected errors in zil_lwb_commit() without ASSERT()
  #14544 icp: Prevent compilers from optimizing away memset()
         in gcm_clear_ctx()
  #14546 Revert zfeature_active() to static
  #14556 Remove bad kmem_free() oversight from previous zfsdev_state_list
         patch
  #14563 Optimize the is_l2cacheable functions
  #14565 FreeBSD: zfs_znode_alloc: lock the vnode earlier
  #14566 FreeBSD: fix false assert in cache_vop_rmdir when replaying ZIL
  #14567 spl: Add cmn_err_once() to log a message only on the first call
  #14568 Fix incremental receive silently failing for recursive sends
  #14569 Restore ASMABI and other Unify work
  #14576 Fix detection of IBM Power8 machines (ISA 2.07)
  #14577 Better handling for future crypto parameters
  #14600 zcommon: Refactor FPU state handling in fletcher4
  #14603 Fix prefetching of indirect blocks while destroying
  #14633 Fixes in persistent error log
  #14639 FreeBSD: Remove extra arc_reduce_target_size() call
  #14641 Additional limits on hole reporting
  #14649 Drop lying to the compiler in the fletcher4 code
  #14652 panic loop when removing slog device
  #14653 Update vdev state for spare vdev
  #14655 Fix cloning into already dirty dbufs
  #14678 Revert "Do not hold spa_config in ZIL while blocked on IO"

Obtained from:	OpenZFS
OpenZFS commit:	431083f75b
2023-04-03 16:49:30 +02:00
Ganbold Tsagaankhuu
b98fbf3781 Fix driver name.
Submitted by:	Tyuryukanov S.Y.
2023-04-03 14:20:28 +00:00
Andrew Turner
41236539d8 Add non-posted device memory to the arm64 mem map
Add VM_MEMATTR_DEVICE_NP to the arm64 vm.pmap.kernel_maps sysctl.

Reviewed by:	markj
Sponsored by:	Arm Ltd
 Differential Revision:	https://reviews.freebsd.org/D39371
2023-04-03 12:59:11 +01:00
Dmitry Chagin
7ae0972c7b linsysfs(4): Reimplement listnics() using ifAPI
Handle if arrival/departure events and VNETs.

Differential Revision:	https://reviews.freebsd.org/D38901
MFC after:		1 month
XMFC with:		ifAPI, pseudofs
2023-04-03 11:22:16 +03:00
Bjoern A. Zeeb
cfccc7f30a LinuxKPI: 802.11: remove extra spaces
Remove two extra spaces.  No functional change.

Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2023-04-02 22:25:28 +00:00
Zhenlei Huang
5f3d0399e9 lagg(4): Tap traffic after protocol processing
Different lagg protocols have different means and policies to process incoming
traffic. For example, for failover protocol, by default received traffic is only
accepted when they are received through the active port. For lacp protocol, LACP
control messages are tapped off, also traffic will be dropped if they are
received through the port which is not in collecting state or is not joined to
the active aggregator. It confuses if user dump and see inbound traffic on
lagg(4) interfaces but they are actually silently dropped and not passed into
the net stack.

Tap traffic after protocol processing so that user will have consistent view of
the inbound traffic, meanwhile mbuf is set with correct receiving interface and
bpf(4) will diagnose the right direction of inbound packets.

PR:		270417
Reviewed by:	melifaro (previous version)
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D39225
2023-04-03 01:01:51 +08:00
Zhenlei Huang
90820ef121 infiniband: Widen NET_EPOCH coverage
From static code analysis, some device drivers (cxgbe, mlx4, mthca, and qlnx)
do not enter net epoch before lagg_input_infiniband(). If IPoIB interface is a
member of lagg(4) interface, and after returning from lagg_input_infiniband()
the receiving interface of mbuf is set to lagg(4) interface, then when
concurrently destroying the lagg(4) interface, there is a small window that the
interface gets destroyed and becomes invalid before infiniband_input() re-enter
net epoch, thus leading use-after-free.

Widen NET_EPOCH coverage to prevent use-after-free.

Thanks hselasky@ for testing with mlx5 devices.

Reviewed by:	hselasky
Tested by:	hselasky
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D39275
2023-04-03 00:51:49 +08:00
Alexander V. Chernikov
3091d980f5 netlink: add NETLINK to the DEFAULTS for each architecture
NETLINK is going to replace rtsock and a number of other ioctl/sysctl interfaces.
In-base utilies such as route(8), netstat(8) and soon ifconfig(8)
 are being converted to use netlink sockets as a transport between
 kernel and userland.
In the current configuration, it still possible have the kernel
 without NETLINK (`nooptions NETLINK`) and use the aforementioned
 utilies by buidling the world with `WITHOUT_NETLINK` src.conf knob.
However, this approach does not cover the cases when person unintentionally
 builds a custom kernel without netlink and tries to use the standard userland.

This change adds `option NETLINK` to the default options for each
 architecture, fixing the custom kernel issue.
For arm, this change uses `std.armv6` and `std.armv7` (netlink already in)
 instead of DEFAULTS.

Reviewed By: imp
Differential Revision: https://reviews.freebsd.org/D39339
2023-04-02 15:27:21 +00:00
Emmanuel Vadot
4fd9e20671 linuxkpi: hdmi: Remove wrong dependency on wlan
Copy-paste mistake.

Reported by:	Alastair Hogge <agh@riseup.net>
Fixes:	f1d7ae31d4 ("linuxkpi: Add hdmi helpers")
2023-04-02 16:56:23 +02:00
Alexander V. Chernikov
c35a43b261 netlink: allow exact-match route lookups via RTM_GETROUTE.
Use already-existing RTM_F_PREFIX rtm_flag to indicate that the
 request assumes exact-prefix lookup instead of the
 longest-prefix-match.

MFC after:	2 weeks
2023-04-02 13:47:10 +00:00
Alexander V. Chernikov
4aeb939ecf netlink: fix NULL check in the default route snl(3) parser.
CID:		1506959
MFC after:	2 weeks
2023-04-02 12:44:20 +00:00
Alexander V. Chernikov
27cbc1a7fe netlink: fix snl_read_reply_multi().
CID:		1506956
MFC after:	2 weeks
2023-04-02 12:41:53 +00:00
Dmitry Chagin
a32ed5ec05 pseudofs: Simplify pfs_visible_proc
Reviewed by:		des
Differential revision:	https://reviews.freebsd.org/D39383
MFC after:		1 month
2023-04-02 11:24:10 +03:00
Dmitry Chagin
405c0c04ed pseudofs: Allow vis callback to be called for a named node
This will be used later in the linsysfs module to filter out VNETs.

Reviewed by:		des
Differential revision:	https://reviews.freebsd.org/D39382
MFC after:		1 month
2023-04-02 11:21:15 +03:00
Dmitry Chagin
7f72324346 pseudofs: Microoptimize struct pfs_node
Since 81167243b the size of struct pfs_node is 280 bytes, so the kernel
memory allocator takes memory from 384 bytes sized bucket. However, the
length of the node name is mostly short, e.g., for Linux emulation layer
it is up to 16 bytes. The size of struct pfs_node w/o pfs_name is 152
bytes, i.e., we have 104 bytes left to fit the node name into the 256
bytes-sized bucket.

Reviewed by:		des
Differential revision:	https://reviews.freebsd.org/D39381
MFC after:		1 month
2023-04-02 11:20:07 +03:00
Navdeep Parhar
9f354cd3d0 cxgbe(4): Allow tracing filters on loopback ports.
Each physical port has an associated loopback tx channel and anything
transmitted over that channel by the driver is looped back internally by
the hardware as if received on that physical port.  This change allows
tracing filters to be installed in this loopback path.

MFC after:	1 week
Sponsored by:	Chelsio Communications
2023-04-01 17:50:46 -07:00
Navdeep Parhar
531ef35241 cxgbe/iw_cxgbe: Always set a vnet around calls to IN_LOOPBACK.
This is catch up with efe58855f3.

MFC after:	1 week
Sponsored by:	Chelsio Communications
2023-04-01 16:19:10 -07:00
Rick Macklem
f4179ad46f nfscommon: Add support for an NFSv4 operation bitmap
NFSv4.1/4.2 uses operation bitmaps for various operations,
such as the SP4_MACH_CRED case for ExchangeID.
This patch adds support for operation bitmaps so that
support for SP4_MACH_CRED can be added to the NFSv4.1/4.2
server in a future commit.

This commit should not change any NFSv4.1/4.2 semantics.

MFC after:	3 months
2023-04-01 14:22:26 -07:00
黃清隆
285d85f4f9 arcmsr(4): Fix reading buffer empty length error.
MFC after:	2 weeks
2023-03-31 22:43:43 -07:00
Bjoern A. Zeeb
8ac540d3b8 LinuxKPI: 802.11: adjust locking
Split up the lhw lock and the scan lock.  The latter is a mtx
while the former changes from mtx to sx as mac80211 downcalls may
sleep (and the ic lock is not usable in that case either and a larger
project to fix).
This will also enforce some lookups under lock (mostly scan) as well
as general protection for more compat code and avoid a possible
deadlock with one of the upcoming callbacks from driver into the
compat code.

Sponsored by:	The FreeBSD Foundation
MFC after:	7 days
2023-03-31 19:59:50 +00:00
Joseph Mingrone
6f9cba8f8b
libpcap: Update to 1.10.3
Local changes:

- In contrib/libpcap/pcap/bpf.h, do not include pcap/dlt.h.  Our system
  net/dlt.h is pulled in from net/bpf.h.
- sys/net/dlt.h: Incorporate changes from libpcap 1.10.3.
- lib/libpcap/Makefile: Update for libpcap 1.10.3.

Changelog:	https://git.tcpdump.org/libpcap/blob/95691ebe7564afa3faa5c6ba0dbd17e351be455a:/CHANGES
Reviewed by:	emaste
Obtained from:	https://www.tcpdump.org/release/libpcap-1.10.3.tar.gz
Sponsored by:	The FreeBSD Foundation
2023-03-31 16:02:22 -03:00
John Baldwin
cb750f7f5a fuse: Remove set but unused cr_gid variable.
Reviewed by:	asomers
Reported by:	GCC
Differential Revision:	https://reviews.freebsd.org/D39350
2023-03-31 10:57:13 -07:00
John Baldwin
ad83dd2b2b LinuxKPI: Appease -Wunused-but-set-variable warnings from GCC.
- Mark assert dummy variables as __unused.

- Use a dummy (void) cast of the flags argument passed to
  spin_unlock_irqrestore so it gets treated as used.

Reviewed by:	manu, hselasky
Differential Revision:	https://reviews.freebsd.org/D39349
2023-03-31 10:56:33 -07:00
Zhenlei Huang
5a8abd0a29 lacp: Use C99 bool for boolean return value
This improves readability.

No functional change intended.

MFC after:	1 week
2023-04-01 01:48:36 +08:00
Mitchell Horne
3462c371c2 arm64/gicv3: correct the size of the distributor resource
Use the GICD_SIZE macro (0x10000), which is half the size of the current
fixed-sized mapping (128 * 1024 == 0x20000).

In ARM64 Hyper-V instances, it seems the Distributor's registers are
located immediately preceding a range of physical memory in the bus
address space. Thus, when ram0 is attaching and attempts to reserve
SYS_RES_MEMORY resources corresponding to its physmem ranges, it fails,
because the first 0x10000 bytes of this range are already owned by gic0.

PR:		270415
Reported by:	whu
Tested by:	whu
Differential Revision:	https://reviews.freebsd.org/D39260
2023-03-31 13:26:22 -03:00
Mark Johnston
1a3cb489e5 arm64: Move the initial kernel stack out of the init_pagetables section
init_pagetables is mapped into the segment containing the BSS, but does
not get zeroed by locore.  It is used for bootstrap page table pages.

It happens that the bootstrap kernel stack is also placed in that
section, but there's no reason it shouldn't live in the BSS, so move it
there.  No functional change intended.

Reviewed by:	andrew
MFC after:	1 week
Sponsored by:	Klara, Inc.
Sponsored by:	Juniper Networks
Differential Revision:	https://reviews.freebsd.org/D39367
2023-03-31 11:57:26 -04:00
Andrew Turner
47ff149afa Move arm64 EENTRY uses before ENTRY
The ENTRY macro adds instructions to the start of a function but not
EENTRY. To use these instructions in both functions move the EENTRY
use before the ENTRY use.

Sponsored by:	Arm Ltd
2023-03-31 16:45:31 +01:00
Mark Johnston
a54370f4ab arm64: Ensure that thread0's PCB flags are initialized
On arm64, the PCB is stored at the top of the thread stack.  For thread0
this comes from the static "initstack" region, which is placed in the
.init_pagetable section, which is not part of the BSS and thus doesn't
get zeroed by locore.  (See the comment in ldscript.arm64.)  It is thus
possible for the pcb_flags field to be uninitialized, which can result
in PCB_SINGLE_STEP being set.

Fix this by simply initializing the field.  A separate commit will move
initstack out of the .init_pagetable section, since it has no reason to
be there, but it is preferable to explicitly initialize PCB fields
anyway.  In particular, regular kernel stacks are not zeroed upon
allocation, so we should be consistent here.

Reviewed by:	andrew
MFC after:	1 week
Sponsored by:	Klara, Inc.
Sponsored by:	Juniper Networks
Differential Revision:	https://reviews.freebsd.org/D39343
2023-03-31 09:50:34 -04:00
Dmitry Chagin
960562652c linux(4): Fix opt_netlink.h inclusion
Add opt_netlink.h to the linux_common module, on i386, where we don't
uses linux_common module, move opt_netlink.h inclusion under
i386 condition.

MFC after:		2 weeks
2023-03-31 14:56:59 +03:00
Dmitry Chagin
126df352f5 linux(4): Move inclusion of i386-specific files under common condition 2023-03-31 14:56:29 +03:00
Dmitry Chagin
f94b5734bc Revert "linsysfs(4): Reimplement listnics() using ifAPI"
This reverts commit 0b56641cfc.

As it poorly interacts with vnet subsystem
2023-03-31 14:54:33 +03:00
Kristof Provost
28921c4f7d carp: allow commands to use interface name rather than index
Get/set commands can now choose to provide the interface name rather
than the interface index. This allows userspace to avoid a call to
if_nametoindex().

Suggested by:	melifaro
Reviewed by:	melifaro
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D39359
2023-03-31 11:29:58 +02:00
Andrew Turner
3a0cc6fe61 Handle the arm64 unknown exception separately
Rather than falling through to the default case handle the unknown
exception with its own panic message. As ESR_EL1 is zero for this
exception stop printing it.

Sponsored by:	Arm Ltd
2023-03-31 10:15:45 +01:00
Bjoern A. Zeeb
3f0083c4e3 LinuxKPI: 802.11: use ic_printf more consistently
Rather than printing ic_name ourselves (or not at all) use ic_printf()
as a common function from net80211 where possible.

Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2023-03-31 07:14:32 +00:00
Ed Maste
d33cdf16df fs/cd9660: add header include guards
Diff reduction against NetBSD files in sys/fs/cd9660/ and OpenBSD files
in usr.sbin/makefs/cd9660/.

Sponsored by:	The FreeBSD Foundation
2023-03-30 19:20:54 -04:00
Konstantin Belousov
7170774e2a ifcapnv: cap_bit in ifcap2_nv_bit_names[] is bit, not index
Sponsored by:	Nvidia networking
2023-03-31 02:08:15 +03:00
John Baldwin
47d1e67874 sys: Disable errors for -Wunused-function on GCC.
This matches the handling of this warning on clang.
2023-03-30 15:43:48 -07:00
Navdeep Parhar
21b778fbeb cxgbe(4): Remove dead code.
Fixes:	e7e0844422 cxgbe(4): Replace T4_PKT_TIMESTAMP with something slightly less hackish.
MFC after:	1 week
Sponsored by:	Chelsio Communications
2023-03-30 14:13:07 -07:00
Mark Johnston
fd02d0bc14 graid3: Pre-allocate the timeout event structure
As in commit 2f1cfb7f63 ("gmirror: Pre-allocate the timeout event
structure"), graid3 must avoid M_WAITOK allocations in callout handlers.

Reported by:	graid3 regression tests
MFC after	2 weeks
2023-03-30 13:38:15 -04:00
Mark Johnston
cab1056105 kdb: Modify securelevel policy
Currently, sysctls which enable KDB in some way are flagged with
CTLFLAG_SECURE, meaning that you can't modify them if securelevel > 0.
This is so that KDB cannot be used to lower a running system's
securelevel, see commit 3d7618d8bf.  However, the newer mac_ddb(4)
restricts DDB operations which could be abused to lower securelevel
while retaining some ability to gather useful debugging information.

To enable the use of KDB (specifically, DDB) on systems with a raised
securelevel, change the KDB sysctl policy: rather than relying on
CTLFLAG_SECURE, add a check of the current securelevel to kdb_trap().
If the securelevel is raised, only pass control to the backend if MAC
specifically grants access; otherwise simply check to see if mac_ddb
vetoes the request, as before.

Add a new secure sysctl, debug.kdb.enter_securelevel, to override this
behaviour.  That is, the sysctl lets one enter a KDB backend even with a
raised securelevel, so long as it is set before the securelevel is
raised.

Reviewed by:	mhorne, stevek
MFC after:	1 month
Sponsored by:	Juniper Networks
Sponsored by:	Klara, Inc.
Differential Revision:	https://reviews.freebsd.org/D37122
2023-03-30 10:45:00 -04:00
Alexander V. Chernikov
b755f1a009 netlink: Fix adding routes with nexthops on p2p interfaces.
Use full-featured ifa_ifwithroute() to guess route ifa/ifp
 instead of ifa_ifwithnet(). This change makes the route addition
 logic closer to the rt_getifa_fib() used by rtsock.

Reported by:	glebius
Tested by:	glebius
Differential Revision: https://reviews.freebsd.org/D39335
MFC after:	2 weeks
2023-03-30 09:53:50 +00:00
Mateusz Guzik
f5a365e51f inet6: protect address manipulation with a lock
This is a total hack/bare minimum which follows inet4.

Otherwise 2 threads removing the same address can easily crash.

Reviewed by:	kp
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D39317
2023-03-30 08:46:38 +00:00
Kirk McKusick
fe5e6e2cc5 Improvement in UFS/FFS directory placement when doing mkdir(2).
The algorithm for laying out new directories was devised in the 1980s
and markedly improved the performance of the filesystem. In those days
large disks had at most 100 cylinder groups and often as few as 10-20.
Modern multi-terrabyte disks have thousands of cylinder groups. The
original algorithm does not handle these large sizes well. This change
attempts to expand the scope of the original algorithm to work well
with these much larger disks while still retaining the properties
of the original algorithm for small disks.

The filesystem implementation is divided into policy routines and
implementation routines. The policy routines can be changed in any
way desired without risk of corrupting the filesystem. The policy
requests are handled by the implementation layer. If the policy
asks for an available resource, it is granted. But if it asks for
an already in-use resource, then the implementation will provide
an available one nearby the request. Thus it is impossible for a
policy to double allocate. This change is limited to the policy
implementation.

This change updates the ffs_dirpref() routine which is responsible
for selecting the cylinder group into which a new directory should
be placed. If we are near the root of the filesystem we aim to
spread them out as much as possible. As we descend deeper from the
root we cluster them closer together around their parent as we
expect them to be more closely interactive. Higher-level directories
like usr/src/sys and usr/src/bin should be separated while the
directories in these areas are more likely to be accessed together
so should be closer. And directories within commands or kernel
subsystems should be closer still.

We pick a range of cylinder groups around the cylinder group of the
directory in which we are being created. The size of the range for
our search is based on our depth from the root of our filesystem.
We then probe that range based on how many directories are already
present. The first new directory is at 1/2 (middle) of the range;
the second is in the first 1/4 of the range, then at 3/4, 1/8, 3/8,
5/8, 7/8, 1/16, 3/16, 5/16, etc.

It is desirable to store the depth of a directory in its on-disk
inode so that it is available when we need it. We add a new field
di_dirdepth to track the depth of each directory. Because there are
few spare fields left in the inode, we choose to share an existing
field in the inode rather than having one of our own. Specifically
we create a union with the di_freelink field. The di_freelink field
is used to track inodes that have been unlinked but remain referenced.
It is not needed until a rmdir(2) operation has been done on a
directory. At that point, the directory has no contents and even
if it is kept active as a current directory is no longer able to
have any new directories or files created in it. Thus the use of
di_dirdepth and di_freelink will never coincide.

Reported by:  Timo Voelker
Reviewed by:  kib
Tested by:    Peter Holm
MFC after:    2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D39246
2023-03-29 21:13:27 -07:00
Alexander V. Chernikov
badcb3fd57 routing: fix panic when adding an interface route to the p2p interface
without and inet/inet6 addresses attached.

MFC after:      3 days
2023-03-29 20:28:24 +00:00
Konstantin Belousov
cd137909c3 amd64 wakeup: recalculate mitigations after APICs are woken
APICs are needed to broadcast IPIs for MSR writes.

PR:	270489
Reviewed by:	dchagin, emaste, jhb
Tested by:	dchagin, manu
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D39302
2023-03-29 21:45:20 +03:00
Zhenlei Huang
d4a80d21b3 lagg(4): Do not enter net epoch recursively
This saves a little resources.

No functional change intended.

Reviewed by:	kp
Fixes:		b8a6e03fac Widen NET_EPOCH coverage
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D39267
2023-03-30 00:29:51 +08:00
Zhenlei Huang
dbe86dd5de lagg(4): Refactor out some lagg protocol input routines into a default one
Those input routines are identical.

Also inline two fast paths.

No functional change intended.

MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D39251
2023-03-30 00:22:13 +08:00
Zhenlei Huang
fcac5719a1 lagg(4): Make lagg_list and lagg_detach_cookie static
They are used internally only.

No functional change intended.

MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D39250
2023-03-30 00:14:44 +08:00
Mateusz Guzik
80cf427b8d proc: shave a lock trip on exit if possible
... which happens to be vast majority of the time
2023-03-29 09:19:03 +00:00
Mateusz Guzik
7c31de1a3c ufs: stop doing refcount_init on made up creds
creds are not using the refcount API for a long time now, but this
previously failed to fail to compile because the type remained int.

Now it broke due to conversion to long.
2023-03-29 09:19:03 +00:00
Joseph Koshy
57014ab776
pmc: Add a reminder to maintain documentation.
Approved by:	gnn (mentor)
Differential Revision: https://reviews.freebsd.org/D39298
2023-03-29 10:12:08 +01:00
Elliott Mitchell
b94341afcb xen/intr: rework xen_intr_resume() for in-place remapping
The prior implementation of xen_intr_resume() was wiping
xen_intr_port_to_isrc[] and then rebuilding from the x86 interrupt
table.  Rework to instead wipe the channel numbers (->xi_port) and then
scan the table for sources with invalid channels.

This will be slower due to scanning the whole table, but this removes
the dependency on the x86 interrupt code.

Reviewed by: royger
Differential Revision: https://reviews.freebsd.org/D30599
[royger]
Split line over 80 characters.
2023-03-29 09:51:45 +02:00
Elliott Mitchell
e45c8ea31c xen/intr: merge parts of resume functionality into new function
The portions of xen_rebind_ipi() and xen_rebind_virq() were already
near-identical.  While xen_rebind_ipi() should panic() on
single-processor, still having the functionality to invoke seems
harmless.

Meanwhile much of the loop from xen_intr_resume() seemed to want to be
closer to this same code.  This pushes related bits closer together.

Reviewed by: royger
Differential Revision: https://reviews.freebsd.org/D30598
2023-03-29 09:51:44 +02:00
Julien Grall
910bd069f8 xen/intr: remove x86 APIC headers from xen_intr.c
Remove these no longer needed headers.  Key for making xen_intr.c
machine-independent as they don't exist on other architectures.

Originally this was part of a much larger commit, but was broken off
for submission to the FreeBSD project.

Reviewed by: royger
Submitted by: Elliott Mitchell <ehem+freebsd@m5p.com>
Original implementation: Julien Grall <julien@xen.org>, 2015-10-20 09:14:56
MFC after: 1 week
2023-03-29 09:51:43 +02:00
Elliott Mitchell
40ad9aaa88 xen/intr: stop passing shared_info_t to xen_intr_active_ports()
There is only a single global HYPERVISOR_shared_info pointer, so
directly use the global pointer.

Reviewed by: royger
MFC after: 1 week
2023-03-29 09:51:43 +02:00
Elliott Mitchell
9f3be3a6ec xen: switch to using core atomics for synchronization
Now that the atomic macros are always genuinely atomic on x86, they can
be used for synchronization with Xen.  A single core VM isn't too
unusual, but actual single core hardware is uncommon.

Replace an open-coding of evtchn_clear_port() with the inline.

Substantially inspired by work done by Julien Grall <julien@xen.org>,
2014-01-13 17:40:58.

Reviewed by: royger
MFC after: 1 week
2023-03-29 09:51:42 +02:00
Elliott Mitchell
49ca3167b7 xen/intr: add check for intr_register_source() errors
While unusual, intr_register_source() can return failure.  A likely
cause might be another device grabbing from Xen's interrupt range.
This should NOT happen, but could happen due to a bug.  As such check
for this and fail if it occurs.

This theoretical situation also effects xen_intr_find_unused_isrc().
There, .is_pic must be tested to ensure such an intrusion doesn't cause
misbehavior.

Reviewed by: royger
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D31995
2023-03-29 09:51:41 +02:00
Elliott Mitchell
1797ff9627 xen/intr: cleanup event channel number use
Consistently use ~0 instead of 0 when clearing xenisrc structures.
0 is a valid event channel number, even though it is reserved by Xen.
Whereas ~0 is guaranteed invalid.

Reviewed by: royger
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D30743
2023-03-29 09:51:41 +02:00
Elliott Mitchell
2b2415bafa xen/intr: fix corruption of event channel table
In xen_intr_release_isrc(), the isrc should only be removed if it is
assigned to a valid port.  This had been mitigated by using 0 for not
having a port, but this is actually corrupting the table.  Fix this bug
as modifying the code would cause this bug to manifest as kernel memory
corruption.  Similar issue for the vCPU bitmap masks.

The KASSERT() doesn't need lock protection.

Reviewed by: royger
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D30743
2023-03-29 09:51:40 +02:00
Elliott Mitchell
0ebf9bb42d xen/intr: fix overflow of Xen interrupt range
The comparison was wrong.  Hopefully this never occurred in the wild,
but now ensure the error message will occur before damage is caused.
This appears non-exploitable as exploitation would require a guest to
force Domain 0 to allocate all event channels, which a guest shouldn't
be able to do.

Adjust the error message to better describe what has occurred.

Reviewed by: royger
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D30743
2023-03-29 09:51:39 +02:00
Elliott Mitchell
2d5e325303 xen/intr: always set xi_close in xen_intr_bind_isrc()
Appears errors are uncommon since calling xen_intr_release_isrc() on a
xenisrc with xi_close in an undefined state could be bad.  Fix this
problematic lurking nasty.

Reviewed by: royger
MFC after: 1 week
2023-03-29 09:51:39 +02:00
Stefan Eßer
c33db74b53 fs/msdosfs: add tracking of free root directory entries
This update implements tallying of free directory entries during
create, delete,	or rename operations on FAT12 and FAT16 file systems.

Prior to this change, the total number of root directory entries
was reported as number of inodes, but 0 as the number of free
inodes, causing system health monitoring software to warn about
a suspected disk full issue.

The FAT12 and FAT16 file systems provide a limited number of
root directory entries, e.g. 512 on typical hard disk formats.
The valid range of values is 1 to 65535, but the msdosfs code
will effectively round up "odd" values to the next multiple of 16
(e.g. 513 would allow for 528 root directory entries).

This update implements tracking of directory entries during create,
delete, or rename operations, with initial values determined by
scanning the directory when the file system is mounted.

Total and free directory entries are reported in the f_files and
f_ffree elements of struct statfs, despite differences in semantics
of these values:

- There is no limit on the number of files and directories that can
  be created on a FAT file system. Only the root directory of FAT12
  and FAT16 file systems is limited, any number of files can still be
  created in sub-directories, even when 0 free "inodes" are reported.

- A single file can require 1 to 21 directory entries, depending on
  the character set, structure, and length of the name. The DOS 8.3
  style file name takes up 1 entry, and if the name does not comply
  with the syntax of a DOS 8.3 file name, 1 additional entry is used
  for each 13 characters of the file name. Since all these entries
  have to be contiguous, it is possible that a file or directory with
  a long name can not be created, despite a sufficient total number of
  free directory entries.

- Renaming a file can require more directory entries than currently
  allocated to store its long name, which may prevent an in-place
  update of the name if more entries are needed. This may cause a
  rename operation to fail if no contiguous range of free entries for
  the new name can be found.

- The volume label is stored in a directory entry. An empty FAT file
  system with a volume label will therefore show 1 used "inode" in
  df.

- The perceentage of free inodes shown in df or monitoring tools does
  only represent the state of the root directory of a FAT12 or FAT16
  file system. Neither does a reported value of 0% free inodes does
  prevent files from being created in sub-directories, nor does a
  value of 50% free inodes guarantee that even a single file with
  a "long" name can be created in the root directory (if every other
  directory entry is occupied and there are no 2 contiguous entries).

The statfs(2) and df(1) man pages have been updated with a notice
regarding the possibly different semantics of values reported as
total and free inodes for non-Unix file systems.

PR:		270053
Reported by:	Ben Woods <woodsb02@freebsd.org>
Approved by:	mckusick
MFC after:	1 month
Differential Revision:	https://reviews.freebsd.org/D38987
2023-03-29 08:46:01 +02:00
Mateusz Guzik
37337709d3 cred: convert the refcount from int to long
On 64-bit platforms this sorts out worries about mitigating bugs which
overflow the counter, all while not pessimizng anything -- most notably
it avoids whacking per-thread operation in favor of refcount(9) API.

The struct already had two instances of 4 byte padding with 256 bytes in
size, cr_flags gets moved around to avoid growing it.

32-bit platforms could also get the extended counter, but I did not do
it as one day(tm) the mutex protecting centralized operation should be
replaced with atomics and 64-bit ops on 32-bit platforms remain quite
penalizing.

While worries of counter overflow are addressed, the following is not
(just like it would not be with conversion to refcount(9)):
- counter *underflows*
- buffer overruns from adjacent allocations
- UAF due to stale cred pointer
- .. and other goodies

As such, while lipstick was placed, the pig should not be participating
in any beauty pageants.

Prodded by:	emaste
Differential Revision:	https://reviews.freebsd.org/D39220
2023-03-29 05:02:32 +00:00
Mateusz Guzik
21d29c5192 cred: make the refcount signed
There are asserts on the count being > 0, but they are less useful than
they can be because the type itself is unsigned.

The kernel is compiled with -frapv, making wraparound perfectly defined.

Differential Revision:	https://reviews.freebsd.org/D39220
2023-03-29 05:02:04 +00:00
Rick Macklem
695d87bae1 nfscl: Make coverity happy
Coverity does not like code that checks a function's
return value sometimes.  Add "(void)" in front of the
function when the return value does not matter to try
and make it happy.

A recent commit deleted "(void)"s in front of nfsm_fhtom().
This commit puts them back in.

Reported by:	emaste
MFC after:	3 months
2023-03-28 17:08:45 -07:00
Bartosz Sobczak
35105900c6
irdma(4): Upgrade the driver to 1.1.11-k
Summary of changes:
- postpone mtu size assignment during load to avoid race condition
- refactor some of the debug prints
- add request reset handler
- refactor flush scheduler to increase efficiency and avoid racing
- put correct vlan_tag for UD traffic with PFC
- suspend QP before going to ERROR state to avoid CQP timout
- fix arithmetic error on irdma_debug_bugf
- allow debug flag to be settable during driver load
- introduce meaningful default values for DCQCN algorithm
- interrupt naming convention improvements
- skip unsignaled completions in poll_cmpl

Signed-off-by: Bartosz Sobczak bartosz.sobczak@intel.com
Signed-off-by: Eric Joyner <erj@FreeBSD.org>

Reviewed by:	hselasky@
MFC after:	1 week
Sponsored by:	Intel Corporation
Differential Revision:	https://reviews.freebsd.org/D39173
2023-03-28 14:29:07 -07:00
Simon J. Gerraty
a02d9cad77 tarfs_mount allow control of vfs_mountedfrom
We default to passing the path of the tar file to vfs_mountedfrom
so we can tell where a filesystem was mounted from.
However this can make the output of mount(8) hard to read.

Allow things like:

	mount -t tarfs -o as=`basename $tar` $tar /mnt

so "as" is recorded instead of $tar

Reviewed by:	des
Sponsored by:	Juniper Networks
Differential Revision:	https://reviews.freebsd.org/D39273
2023-03-28 10:57:26 -07:00
Emmanuel Vadot
51d07956cf linuxkpi: Add alderlake defines in intel-family
Needed by drm-kmod.

Sponsored by:	Beckhoff Automation GmbH & Co. KG
2023-03-28 10:57:13 +02:00
Emmanuel Vadot
1cebc9298c Bump __FreeBSD_version after linuxkpi updates.
Sponsored by:	Beckhoff Automation GmbH & Co. KG
2023-03-28 09:19:53 +02:00
Emmanuel Vadot
aaf6129c9d linuxkpi: Add devm_add_action and devm_add_action_or_reset
Those adds a new devres and will exectute some function on release.

Reviewed by:	bz
Differential Revision:	https://reviews.freebsd.org/D39142
Sponsored by:	Beckhoff Automation GmbH & Co. KG
2023-03-28 09:19:05 +02:00
Emmanuel Vadot
f1d7ae31d4 linuxkpi: Add hdmi helpers
This is a direct port of the Linux code as the licence allows it, so
style(9) isn't respected to allow applying directly the upstream commits.
Do not add it to linuxkpi directly but add a new linuxkpi_hdmi module
that drm modules will require later, no need to bloat linuxkpi more.

Sponsored by:	Beckhoff Automation GmbH & Co. KG
Differential Revision:	https://reviews.freebsd.org/D39122
2023-03-28 09:11:06 +02:00
Richard Scheffenegger
f858eb916f tcp: send SACK rescue retransmission also mid-stream
Previously, SACK rescue retransmissions would only happen
on a loss recovery at the tail end of the send buffer.

This extends the mechanism such that partial ACKs without SACK
mid-stream also trigger a rescue retransmission to try avoid
an otherwise unavoidable retransmission timeout.

Reviewed By:		tuexen, #transport
Sponsored by:		NetApp, Inc.
Differential Revision:	https://reviews.freebsd.org/D39274
2023-03-28 04:47:01 +02:00
Gleb Smirnoff
78e6c3aacc tcp: update error counter when dropping a packet due to bad source
Use the same counter that ip_input()/ip6_input() use for bad destination
address.  For IPv6 this is already heavily abused ip6s_badscope, which
needs to be split into several separate error counters.

Reviewed by:		markj
Differential Revision:	https://reviews.freebsd.org/D39234
2023-03-27 18:37:15 -07:00
Kristof Provost
ccff2078af carp: fix source MAC
When we're not in unicast mode we need to change the source MAC address.
The check for this was wrong, because IN_MULTICAST() assumes host
endianness and the address in sc_carpaddr is in network endianness.

Sponsored by:	Rubicon Communications, LLC ("Netgate")
2023-03-28 01:18:18 +02:00
Kristof Provost
27b23cdec9 pf: remove pd_refs from pfsync
It only served to complicate cleanup, and added no value.

While here drop packets in pfsync_defer_tmo() if we don't have a syncif,
rather than just leaving them on the queue.

Reviewed by:	markj
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D39248
2023-03-28 01:18:07 +02:00
Kristof Provost
01194da28a pfsync: hold b_mtx for callout_stop(pd_tmo)
The pd_tmo callout has an associated mutex, which we must hold while
calling callout_stop().

Reported by:	markj
Reviewed by:	markj
MFC after:	3 days
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D39223
2023-03-28 01:17:55 +02:00
Rick Macklem
1512579adc nfscl: Make coverity happy
Coverity does not like code that checks a function's
return value sometimes.  Add "(void)" in front of the
function when the return value does not matter to try
and make it happy.

Reported by:	emaste
MFC after:	3 months
2023-03-27 16:53:30 -07:00
Konstantin Belousov
6a0a634590 Regen 2023-03-28 02:39:26 +03:00
Konstantin Belousov
375732cc6e kqueue1(2): export the symbol from libc
Reviewed by:	emaste, jhb
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D39271
2023-03-28 02:39:26 +03:00
Konstantin Belousov
61194e9852 Add kqueue1() syscall
It takes the flags argument.  Immediate use is to provide the KQUEUE_CLOEXEC
flag for kqueue(2).

Reviewed by:	emaste, jhb
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D39271
2023-03-28 02:39:26 +03:00
Alexander V. Chernikov
b894193501 netlink: fix linux module build w/ netlink.
Reported by:	Marek Zarychta <zarychtam@plan-b.pwste.edu.pl>
MFC after:	2 weeks
2023-03-27 18:21:26 +00:00
Alexander V. Chernikov
d3a49f62a2 netlink: fix 19e43c163c by adding miseed netlinkg_glue.c 2023-03-27 16:09:02 +00:00
Alexander V. Chernikov
19e43c163c netlink: add netlink KPI to the kernel by default
This change does the following:

Base Netlink KPIs (ability to register the family, parse and/or
 write a Netlink message) are always present in the kernel. Specifically,
* Implementation of genetlink family/group registration/removal,
  some base accessors (netlink_generic_kpi.c, 260 LoC) are compiled in
  unconditionally.
* Basic TLV parser functions (netlink_message_parser.c, 507 LoC) are
  compiled in unconditionally.
* Glue functions (netlink<>rtsock), malloc/core sysctl definitions
 (netlink_glue.c, 259 LoC) are compiled in unconditionally.
* The rest of the KPI _functions_ are defined in the netlink_glue.c,
 but their implementation calls a pointer to either the stub function
 or the actual function, depending on whether the module is loaded or not.

This approach allows to have only 1k LoC out of ~3.7k LoC (current
 sys/netlink implementation) in the kernel, which will not grow further.
It also allows for the generic netlink kernel customers to load
 successfully without requiring Netlink module and operate correctly
 once Netlink module is loaded.

Reviewed by:	imp
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D39269
2023-03-27 13:55:44 +00:00
Mark Johnston
ad2f2ee015 arm64: Remove duplicated function prototypes for PAC
No functional change intended.

Sponsored by:	The FreeBSD Foundation
2023-03-27 08:56:22 -04:00
Yuri Pankov
6aa5b10d0c nvme: fix resv commands with nda device
- passing I/O commands through nda requires nsid field to be set (it was
  unused when going through nvme_ns_ioctl())
- ccb's status can be OR'ed with the flags, use CAM_STATUS_MASK

Reviewed by:	imp (cam)
Differential Revision:	https://reviews.freebsd.org/D37696
2023-03-27 14:53:24 +02:00
Yuri Pankov
1249680609 kern.post.mk: fix PORTSDIR handling
Using subshell's PORTSDIR variable (via $${PORTSDIR}}) seems to be
only working if PORTSDIR is specified directly on the make command
line.

Use ${PORTDIR} here instead so that setting the variable in
/etc/{make,src,src-env}.conf would work (also works for variable
being set on command line or in the environment).

PR:		268299
Reviewed by:	imp
Differential Revision:	https://reviews.freebsd.org/D37868
2023-03-27 13:57:57 +02:00
Alexander V. Chernikov
eccccd657f netlink: make nlattr_add_in[6]_addr inline
MFC after:	2 weeks
2023-03-27 11:53:34 +00:00
Alexander V. Chernikov
6dc858d84c netlink: remove forgotten debug message in handle_rtm_getroute().
MFC after:	2 weeks
2023-03-27 10:49:40 +00:00
Alexander V. Chernikov
544f1492c0 netlink: ensure genetlink control family always registers under the same ID.
MFC after:	2 weeks
2023-03-27 10:48:24 +00:00
Corvin Köhne
e8988d60d2
pci: expose intel_graphics_stolen as sysctl
The Intel graphics stolen memory is used by the Intel GOP driver on
boot. When using bhyve with GPU passthrough, it's also used by the guest
GOP driver at guest boot. For that reason, bhyve needs to know the
address and size of this region to inform the guest about this region.
Exposing the variables as sysctl allows bhyve to easily read them.
2023-03-27 11:40:49 +02:00
Corvin Köhne
48d70503bc
pci: add tunable hw.pci.enable_mps_tune
If the tunable is set to 0, the tuning of the MPS (maximum payload size)
is disabled and the default MPS values set by the BIOS are used. In this
case the system may use a lower speed or operate in a less optimized
state, but it can resolve issues with stability and compatibility. With
specific devices the tuning of the mps, can lead to a complete freeze of
the system.

Reviewed by:		manu
MFC after:		1 week
Sponsored by:		Beckhoff Automation GmbH & Co. KG
Differential Revision:	https://reviews.freebsd.org/D38397
2023-03-27 11:28:27 +02:00
Kyle Evans
ec1bc53002 arm64: cpu_switch: don't zero out pcb_onfault
Previously this would zero out x18 in the pcb, now it's attacking the
innocent pcb_onfault -- drop it entirely.

This technically fixes
e605b87a9e ("Save only callee-saved registers in pcb"), but it's
harmless until the below commit trims down pcb_x.

Reported by:	mmel
Reviewed by:	andrew, mmel
Fixes:	1c1f31a5e5 ("Remove unused registes from the arm pcb")
Differential Revision:	https://reviews.freebsd.org/D39277
2023-03-26 13:48:22 -05:00
Alexander V. Chernikov
9a11f3dff9 netlink: add standrard ifaddr/neigh parsers to snl(3).
MFC after:	2 weeks
2023-03-26 09:04:41 +00:00
Alexander V. Chernikov
04f75b9802 netlink: allow netlink sockets in non-vnet jails.
This change allow to open Netlink sockets in the non-vnet jails, even for
 unpriviledged processes.
The security model largely follows the existing one. To be more specific:
* by default, every `NETLINK_ROUTE` command is **NOT** allowed in non-VNET
 jail UNLESS `RTNL_F_ALLOW_NONVNET_JAIL` flag is specified in the command
 handler.
* All notifications are **disabled** for non-vnet jails (requests to
 subscribe for the notifications are ignored). This will change to be more
 fine-grained model once the first netlink provider requiring this gets
 committed.
* Listing interfaces (RTM_GETLINK) is **allowed** w/o limits (**including**
 interfaces w/o any addresses attached to the jail). The value of this is
 questionable, but it follows the existing approach.
* Listing ARP/NDP neighbours is **forbidden**. This is a **change** from the
 current approach - currently we list static ARP/ND entries belonging to the
 addresses attached to the jail.
* Listing interface addresses is **allowed**, but the addresses are filtered
 to match only ones attached to the jail.
* Listing routes is **allowed**, but the routes are filtered to provide only
 host routes matching the addresses attached to the jail.
* By default, every `NETLINK_GENERIC` command is **allowed** in non-VNET jail
 (as sub-families may be unrelated to network at all).
 It is the goal of the family author to implement the restriction if
 necessary.

Differential Revision: https://reviews.freebsd.org/D39206
MFC after:	1 month
2023-03-26 08:44:09 +00:00
Alexander V. Chernikov
2cda6a2fb0 routing: add public rt_is_exportable() version to check if
the route can be exported to userland when jailed.

Differential Revision: https://reviews.freebsd.org/D39204
MFC after:	2 weeks
2023-03-26 08:24:27 +00:00
Konstantin Belousov
28f957b8b3 vnode_pager_input: return runningbufspace back
Both vnode_pager_input_smlfs() and vnode_pager_generic_getpages()
increment runningbufspace, but also both delegate io completion handling
on the pbuf to either plain bdone() or filesystem-specific strategy
routine. Accidentally, for e.g. UFS it is g_vfs_strategy()/g_vfs_done().
The later calls bufdone() which handles runningbufspace reclamation.

For plain bdone() io done handler, nothing would return
accounted b_runningbufspace back. Do it in the new
helper vnode_pager_input_bdone(), as well as in
vnode_pager_generic_getpages_done() explicitly.

Note that potential multiple calls to runningbufwakeup() for the same
pbuf or buf completion are safe. runningbufwakeup() clears accounting
for the buffer, so second and later calls are nop.

The problem was found due to tarfs using small vnode pager input but not
g_vfs_strategy().

Reported by:	des
Reviewed by:	markj, sjg
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D39263
2023-03-26 00:55:29 +02:00
Mateusz Guzik
0e71f4f77c vm: add unlocked page lookup before trying vm_fault_soft_fast
Shaves a read lock + tryupgrade trip most of the time.

Stats from doing a kernel build (counters not present in the tree):
vm.fault_soft_fast_ok: 262653
vm.fault_soft_fast_failed_other: 41
vm.fault_soft_fast_failed_no_page: 39595772
vm.fault_soft_fast_failed_page_busy: 1929
vm.fault_soft_fast_failed_page_invalid: 22183

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D39268
2023-03-25 22:14:59 +00:00
Mateusz Guzik
22eb66d961 vfs cache: always assert on ndp->ni_resflags 2023-03-25 21:57:55 +00:00
Dmitry Chagin
f0fe68a965 i386: ansify
Reported by:	clang 15
2023-03-25 23:21:03 +03:00
Andrew Gallatin
abba58766f LRO: Add missing checks for invalid IP addresses
LRO bypasses normal ip_input()/tcp_input() and lacks several checks
that are present in the normal path.  Without these checks, it
is possible to trigger assertions added in b0ccf53f24

Reviewed by: glebius, rrs
Sponsored by: Netflix
2023-03-25 11:56:02 -04:00
Mateusz Guzik
138a5dafba vfs: trylock vnode requeue
The quasi-LRU still gets in the way for example when doing an
incremental bzImage build, with vnode_list lock being at the
top of the profile. Further damage control the problem by trylocking.

Note the entire mechanism desperately wants to be reaped out in favor
of something(tm) which both scales in a multicore setting and provides
sensible replacement policy.

With this change everything vfs almost disappears from the on CPU
flamegraph, what is left is tons of contention in the VM.
2023-03-25 13:42:27 +00:00
Mateusz Guzik
245767c278 vfs: flip deferred_inact to atomic
Turns out it is very rarely triggered, making a per-cpu
counter a waste.

Examples from real life boxes:
uptime		counter
135 days	847
138 days	2190
141 days	1
2023-03-25 13:42:27 +00:00
Mateusz Guzik
e5eb1d298f vfs: replace some spelled out VNASSERTs with VNPASS
nfc
2023-03-25 13:42:27 +00:00
Vico Chen
1f0c8bfd65 linsysfs(4): Keep Linux compatible sysfs the same as Ubuntu
By checking Ubuntu, there is no `/sys/subsystem' in sysfs. To compatible
with Ubuntu, delete the 'subsystem' creation in Linux compatible module.

On the other hand, the sysfs `/sys/subsystem' cause failure for some
Linux udev cases. In Linux udev source code, there is a function named
`scan_devices_all', and it will scan `/sys/subsystem' if it is existed,
but now there are nothing in /sys/subsystem `, and it returns empty
to cause some use cases failed.

Reviewed by:		dchagin
Differential Revision:	https://reviews.freebsd.org/D38885
MFC after:		1 month
XMFC with:		ifAPI
2023-03-25 13:41:04 +03:00
Dmitry Chagin
0b56641cfc linsysfs(4): Reimplement listnics() using ifAPI
Handle if arrival/departure events.

Reviewed by:		melifaro (early version)
Differential Revision:	https://reviews.freebsd.org/D38901
MFC after:		1 month
XMFC with:		ifAPI
2023-03-25 13:40:41 +03:00
John Baldwin
0f735657aa bhyve: Remove vmctx member from struct vm_snapshot_meta.
This is a userland-only pointer that isn't relevant to the kernel and
doesn't belong in the ioctl structure shared between userland and the
kernel.  For the kernel, the old structure for the ioctl is still
supported under COMPAT_FREEBSD13.

This changes vm_snapshot_req() in libvmmapi to accept an explicit
vmctx argument.

It also changes vm_snapshot_guest2host_addr to take an explicit vmctx
argument.  As part of this change, move the declaration for this
function and its wrapper macro from vmm_snapshot.h to snapshot.h as it
is a userland-only API.

Reviewed by:	corvink, markj
Differential Revision:	https://reviews.freebsd.org/D38125
2023-03-24 11:49:06 -07:00
John Baldwin
7d9ef309bd libvmmapi: Add a struct vcpu and use it in most APIs.
This replaces the 'struct vm, int vcpuid' tuple passed to most API
calls and is similar to the changes recently made in vmm(4) in the
kernel.

struct vcpu is an opaque type managed by libvmmapi.  For now it stores
a pointer to the VM context and an integer id.

As an immediate effect this removes the divergence between the kernel
and userland for the instruction emulation code introduced by the
recent vmm(4) changes.

Since this is a major change to the vmmapi API, bump VMMAPI_VERSION to
0x200 (2.0) and the shared library major version.

While here (and since the major version is bumped), remove unused
vcpu argument from vm_setup_pptdev_msi*().

Add new functions vm_suspend_all_cpus() and vm_resume_all_cpus() for
use by the debug server.  The underyling ioctl (which uses a vcpuid of
-1) remains unchanged, but the userlevel API now uses separate
functions for global CPU suspend/resume.

Reviewed by:	corvink, markj
Differential Revision:	https://reviews.freebsd.org/D38124
2023-03-24 11:49:06 -07:00
Konstantin Belousov
13262b07a0 fdesc_lookup(): drop fdropped
Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D39207
2023-03-24 19:47:22 +02:00
Konstantin Belousov
7dca8fd1cb fdesc_lookup(): the condition to use vn_vget_ino() is always true
The ix number for the fdescfs root is 1, while any fd vnode has the ix
value at least 3.

Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D39207
2023-03-24 19:47:16 +02:00
Konstantin Belousov
8d97282a39 fdesc_lookup(): no need to vhold the dvp vnode
It is already referenced by the VOP_LOOKUP() caller, otherwise vdrop()
after vn_lock() is invalid anyway.

Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D39207
2023-03-24 19:47:07 +02:00
Konstantin Belousov
51b8ffb95c fdesc_allocvp(): fix potential use after free
Just owning the interlock is not enough for vget() to operate on the
vnode race-free with vgone(), the vnode should be held.  Use
vget_prep()/vget_finish() to avoid vholding the vnode explicitly, and
drop LK_INTERLOCK.

Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D39207
2023-03-24 19:46:53 +02:00
Konstantin Belousov
fa3ea81b77 fdescfs: remove useless XXX comment, unwrap line
Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D39207
2023-03-24 19:46:47 +02:00
Justin Hibbits
79aa96f9ca infiniband: Bring back M_ASSERTVALID() check in infiband_bpf_mtap()
Reported by:	rpokala
Fixes:		adf62e8363
2023-03-24 11:04:42 -04:00
Justin Hibbits
bb55bb1740 inet6: Include if_private.h in one more netstack file
ip6_input() and ip6_destroy() both directly reference ifnet members.
This file was missed in 3d0d5b21

Fixes:		3d0d5b21 ("IfAPI: Explicitly include <net/if_private.h>...")
Sponsored by:	Juniper Networks, Inc.
2023-03-24 10:25:35 -04:00
Justin Hibbits
727bfe3894 Mechanically convert qlnx(4) to IfAPI
Reviewed By:	zlei
Sponsored by:	Juniper Networks, Inc.
Differential Revision: https://reviews.freebsd.org/D37856
2023-03-24 10:09:53 -04:00
Justin Hibbits
3e142e0767 ofed: Mechanically convert to IfAPI
Summary:
Because of the intricacies of this code it wasn't purely scripted, but
instead hand-mechanical.

Reviewed by:	hselasky
Sponsored by:	Juniper Networks, Inc.
Differential Revision: https://reviews.freebsd.org/D38560
2023-03-24 10:04:33 -04:00
Zhenlei Huang
dcd7f0bd02 lagg: Various style fixes
MFC after:	1 week
2023-03-24 17:55:15 +08:00
Kristof Provost
ad729f8d50 pf: ignore ip6_output() return value in pf_refragment6()
We can't do anything if ip6_output() fails, other than discard the
packet which ip6_output() already does for us.
Mark the return value as ignored.

Reported by:	emaste, Coverity
Sponsored by:	Rubicon Communications, LLC (Netgate)
2023-03-24 08:08:19 +01:00
Eric Joyner
949d971f0b
ice(4): Restore old conditional overwritten by last update
Commit 8923de5905 ("ice(4): Update to 1.37.7-k", 2023-02-13)
unintentionally overwrote the change made in commit 52f45d8ace ("net:
iflib: let the drivers use isc_capenable", 2021-12-28).

Signed-off-by: Eric Joyner <erj@FreeBSD.org>

Reported by:	jhibbits@
MFC after:	3 days
Sponsored by:	Intel Corporation
2023-03-24 00:05:26 -07:00
Kyle Evans
89c52f9d59 arm64: add KASAN support
This entails:
- Marking some obvious candidates for __nosanitizeaddress
- Similar trap frame markings as amd64, for similar reasons
- Shadow map implementation

The shadow map implementation is roughly similar to what was done on
amd64, with some exceptions.  Attempting to use available space at
preinit_map_va + PMAP_PREINIT_MAPPING_SIZE (up to the end of that range,
as depicted in the physmap) results in odd failures, so we instead
search the physmap for free regions that we can carve out, fragmenting
the shadow map as necessary to try and fit as much as we need for the
initial kernel map.  pmap_bootstrap_san() is thus after
pmap_bootstrap(), which still included some technically reserved areas
of the memory map that needed to be included in the DMAP.

The odd failure noted above may be a bug, but I haven't investigated it
all that much.

Initial work by mhorne with additional fixes from kevans and markj.

Reviewed by:	andrew, markj
Sponsored by:	Juniper Networks, Inc.
Sponsored by:	Klara, Inc.
Differential Revision:	https://reviews.freebsd.org/D36701
2023-03-23 16:34:33 -05:00
Mitchell Horne
698dbd66fe arm64: add a GENERIC-KASAN config
Reviewed by:	andrew, markj
Sponsored by:	Juniper Networks, Inc.
Sponsored by:	Klara, Inc.
Differential Revision:	https://reviews.freebsd.org/D36702
2023-03-23 16:34:33 -05:00
John Baldwin
d2dab20c2a ktls: Drop all the INET and INET6 compile-time guards.
Consistent with 9fd0d9b16e, KERN_TLS is
not supported on kernels without any INET support.

Reviewed by:	gallatin, hselasky
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D39232
2023-03-23 14:29:07 -07:00
Gordon Bergling
f63aaffebc acpi(4): Fix a typo in a kernel message
- s/enitialization/initialization/

MFC afer:	5 days
2023-03-23 22:03:31 +01:00
Kirk McKusick
42c82aad32 Improve chance of finding an alternate superblock in sbsearch(3).
When requesting a superblock read for the sole purpose of getting
the parameters needed to find if backup parameters have been stored,
specify UFS_NOCSUM as only the base superblock is needed. This
change reduces the number of checks that the superblock must pass.

MFC after:    1 week
2023-03-23 13:04:52 -07:00
Mateusz Guzik
c16c4ea6d3 vfs cache: return ENOTDIR for not_a_dir/{.,..} lookups
Reported by:	Oliver Kiddle
PR:	270419
MFC:	3 days
2023-03-23 19:31:18 +00:00
Andrew Turner
6a4f5fdd19 Mark the arm64 PSR register fields with UL
These are for a 64 bit register. Make them 64 bit values on arm64.

Sponsored by:	Arm Ltd
2023-03-23 18:56:26 +00:00
Andrew Turner
ea3061526e Bump __FreeBSD_version for changing spsr on arm64
This changed a few structures, bump __FreeBSD_version for kgdb and
userspace consumers.

Sponsored by:	Arm Ltd
2023-03-23 18:56:26 +00:00
Andrew Turner
1c1f31a5e5 Remove unused registes from the arm pcb
These were kept for ABI reasons. Remove them and bump __FreeBSD_version
so debuggers can be updated to use the new layout.

Reviewed by:	jhb
Sponsored by:	Arm Ltd
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D35378
2023-03-23 18:56:26 +00:00
Andrew Turner
4a06b28a15 Add compat support for struct reg on arm64
The size of the spsr field in struct reg has changed. Mask the bits
that userspace doesn't know about out as they may be invalid.

While here add a comment why we don't need compat support in set_regs.

Sponsored by:	Arm Ltd
2023-03-23 18:56:26 +00:00
Zachary Leaf
f4036a9234 arm64: add fault address to trapframe
It was previously possible for the fault address register to get
clobbered before it was saved. This small window occurred when an
additional exception was encountered inside the exception handler,
overwriting the previous value.

Commit f29942229d ("Read the arm64 far early in el0 exceptions")
patched this issue, but avoided changing the trapframe since this could
be considered a KBI change in FreeBSD 13.

Revert the above fix and save the fault address in the trapframe
instead. This saves the fault address even earlier in the exception
handling process, and is a more robust and simple fix.

Reviewed by:	andrew, jhb, jrtc27
Sponsored by: Arm Ltd
Differential Revision:	https://reviews.freebsd.org/D38984
2023-03-23 18:56:26 +00:00
Zachary Leaf
2ecbbcc7ca arm64: extend ESR/SPSR registers to 64b
For the Exception Syndrome Register, ESR_ELx, the upper 32b were
previously unused, but now may contain additional exception info as of
Armv8.7 (FEAT_LS64).

Extend ESR from u32->u64 in exception handling code to support this. In
addition, also extend Saved Program Status Register SPSR_ELx in the same
way to allow for future extensions.

Reviewed by:	andrew
Sponsored by: Arm Ltd
Differential Revision:	https://reviews.freebsd.org/D38983
2023-03-23 18:56:26 +00:00
Justin Hibbits
e2427c6917 IfAPI: Add iterator to complement if_foreach()
Summary:
Sometimes an if_foreach() callback can be trivial, or need a lot of
outer context.  In this case a regular `for` loop makes more sense.  To
keep things hidden in the new API, use an opaque `if_iter` structure
that can still be instantiated on the stack.  The current implementation
uses just a single pointer out of the 4 alotted to the opaque context,
and the cleanup does nothing, but may be used in the future.

Reviewed by:	melifaro
Sponsored by:	Juniper Networks, Inc.
Differential Revision: https://reviews.freebsd.org/D39138
2023-03-23 09:39:26 -04:00
Bjoern A. Zeeb
59559d1354 LinuxKPI: remove now implemented dummy headers.
Both iommu.h and kconfig.h now exist in the common code.
There is no need for the duplicate (empty) headers anymore.
2023-03-23 01:15:53 +00:00
Bjoern A. Zeeb
700acdc7b5 rtw89: fix -Wunused-but-set-variable
Fix a -Wunused-but-set-variable warning by adding the field to the
debug logging as is done for other versions handler functions.

MFC after:	3 days
2023-03-23 00:29:38 +00:00
Mateusz Guzik
b5d43972e3 vfs: decouple freevnodes from vnode batching
In principle one cpu can keep vholding vnodes, while another vdrops
them. In this case it may be the local count will keep growing in an
unbounded manner. Roll it up after a threshold instead.

While here move it out of dpcpu into struct pcpu.

Reviewed by:	kib (previous version)
Differential Revision:	https://reviews.freebsd.org/D39195
2023-03-22 23:57:25 +00:00
John Baldwin
dddb1aec4d sys: Retire OPENZFS_CWARNFLAGS now that it is empty.
Reviewed by:	markj, emaste
Differential Revision:	https://reviews.freebsd.org/D39217
2023-03-22 12:35:30 -07:00
John Baldwin
4ffeb3b88e sys: Stop enabling -Wnested-externs.
clang doesn't implement this warning, so violations are only caught by
GCC.  It is also no longer a common practice to use this as it was in
the original BSD code, so the need for the warning is not as important
as when it was used to do cleanups 20 years ago.  A recent commit
(c3179891f8) triggers this warning on
GCC, but that commit uses nested externs purposefully.

Reviewed by:	markj, emaste
Differential Revision:	https://reviews.freebsd.org/D39214
2023-03-22 12:35:09 -07:00
Brooks Davis
eb232cffc9 amd64: reduce header pollution in _stdint.h
In 38d1ac34ff SIGATOMIC_{MIN,MAX} were
defined in terms of LONG_{MIN,MAX}.  Later, they were switched to
__LONG_{MIN,MAX} in 78fe75bc28 where an
include of machine/_limits.h was added.  Switch to using fixed width
INT64_{MIN,MAX} and remove the header pollution.

No functional change.

Reviewed by:	theraven, emaste
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D39196
2023-03-22 16:23:57 +00:00
Brooks Davis
de761318a4 riscv: Fix sig_atomic_t limit definitions
sig_atomic_t is defined as a long and thus is 64-bit on arm64.  For some
reason its limit was incorrectly specified as a 32-bit number.  This had
the unfortunate side effect of causing gnulib to override most of the
definitions in stdint.h.  On CheriBSD this breaks all software that uses
gnulib in annoying and hard to debug ways.

Technically updating the limits might be an ABI change, but these
defines are largely unused (the only use in tree is in the libc++ test
suite where it's use an assertion that will fail due to this bug).
Further, since the underlying type remains the same, we're just
increasing the range of values a paranoid program might use.

Reviewed by:	emaste
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D39194
2023-03-22 16:23:22 +00:00
Brooks Davis
3d2837f3bd arm64: Fix sig_atomic_t limit definitions
sig_atomic_t is defined as a long and thus is 64-bit on arm64.  For some
reason its limit was incorrectly specified as a 32-bit number.  This had
the unfortunate side effect of causing gnulib to override most of the
definitions in stdint.h.  On CheriBSD this breaks all software that uses
gnulib in annoying and hard to debug ways.

Technically updating the limits might be an ABI change, but these
defines are largely unused (the only use in tree is in the libc++ test
suite where it's use an assertion that will fail due to this bug).
Further, since the underlying type remains the same, we're just
increasing the range of values a paranoid program might use.

Reviewed by:	andrew, emaste
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D39193
2023-03-22 16:22:21 +00:00
Andrew Turner
787bf3bcd6 arm64: Use the new PCB macros in swtch.S
Rather than hard coding the location of these registers in the array
use the new macros to find the correct offset.

Sponsored by:	Arm Ltd
2023-03-22 15:08:03 +00:00
Andrew Turner
1c33a94ab0 Add macros for arm64 pcb register offsets
Add macros for offsets of macros we set in the arm64 pcb pcb_x array.
This will simplift reducing the size of this array in a later change.

Sponsored by:	Arm Ltd
2023-03-22 15:08:03 +00:00
Mark Johnston
0f5b6f9a04 fdescfs: Fix a file ref leak
In fdesc_lookup(), vn_vget_ino_gen() may fail without invoking the
callback, in which case the ref on fp is leaked.  This happens if the
fdescfs mount is being concurrently unmounted.  Moreover, we cannot
safely drop the ref while the dvp is locked.

So:
- Use a flag variable to indicate whether the ref is dropped.
- Reorganize things to handle the leak.

Reported by:	C Turt <ecturt@gmail.com>
Reviewed by:	mjg, kib
Tested by:	pho
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D39189
2023-03-22 09:19:27 -04:00
Warner Losh
ed52baf51b _endian.h: Include sys/ctypes.h for visibility macros
BYTE_ORDER, LITTLE_ENDIAN and BIG_ENDIAN will be required by the
forthcoming POSIX Issue 8. In addition, they are provided in the BSD
compilation environments. However, depending on the order includes
happend, sys/cdefs.h may or may not be included when endian.h is
included. Include it here so we can safely test __BSD_VISIBLE.  Add
visibility when we're compiling in the future for issue 8, but since the
date number for issue 8 hasn't been fixed, use strictly greater than the
issue 7 date.of 200809.

This had the side effect of sometimes (in the traditional BSD
compliation environment)
 #if BYTE_ORDER == LITTLE_ENDIAN
and
 #if BYTE_ORDER == BIG_ENDIAN
both being true because none of these were defined. This fixes
that. It also fixes including it after <stdio.h> but not before.

PR:			269249
MFC After:		1d (build related)
Reviewed by:		kib, emaste
Differential Revision:	https://reviews.freebsd.org/D39176

Sponsored by:		Netflix
2023-03-21 20:25:58 -06:00
Vincenzo Maffione
e2a431a0ff netmap: fix copyin/copyout of nmreq options list
The previous code unsuccesfully attempted to report a precise error for
each option in the user list. Moreover, commit 253b2ec199 broke some
ctrl-api-test (see https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=260547).

With this patch we bail out as soon as an unrecoverable error is detected and
we properly check for copy boundaries. EOPNOTSUPP no longer immediately
returns an error, so that any other option in the list may be examined
by the caller code and a precise report of the (un)supported options can
be returned to the user.

With this patch, all ctrl-api-test unit tests pass again.

PR:			260547
Submitted by:		giuseppe.lettieri@unipi.it
Reviewed by:		vmaffione
MFC after:		14 days
2023-03-21 23:23:18 +00:00
Jean-Sébastien Pédron
83df72e5fb
linuxkpi: Define pat_enabled()
This new <asm/memtype.h> header is included from <linux/pci.h> because
that's how it is included in Linux too. DRM drivers depend on this.

Reviewed by:	manu
Approved by:	manu
Differential Revision:	https://reviews.freebsd.org/D39052
2023-03-21 23:36:40 +01:00
Mark Johnston
b4b33821fa ktls: Fix interlocking between ktls_enable_rx() and listen(2)
The TCP_TXTLS_ENABLE and TCP_RXTLS_ENABLE socket option handlers check
whether the socket is listening socket and fail if so, but this check is
racy.  Since we have to lock the socket buffer later anyway, defer the
check to that point.

ktls_enable_tx() locks the send buffer's I/O lock, which will fail if
the socket is a listening socket, so no explicit checks are needed.  In
ktls_enable_rx(), which does not acquire the I/O lock (see the review
for some discussion on this), use an explicit SOLISTENING() check after
locking the recv socket buffer.

Otherwise, a concurrent solisten_proto() call can trigger crashes and
memory leaks by wiping out socket buffers as ktls_enable_*() is
modifying them.

Also make sure that a KTLS-enabled socket can't be converted to a
listening socket, and use SOCK_(SEND|RECV)BUF_LOCK macros instead of the
old ones while here.

Add some simple regression tests involving listen(2).

Reported by:	syzkaller
MFC after:	2 weeks
Reviewed by:	gallatin, glebius, jhb
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D38504
2023-03-21 16:04:00 -04:00
Jung-uk Kim
e415d255a6 OpenSSL: Regen an assembly file for arm
X-MFC with:	af19988f6c
2023-03-21 15:13:51 -04:00
Alexander V. Chernikov
a74998f38a netlink: reduce the default debugging levels
Reported by:	kp
MFC after:	2 weeks
2023-03-21 18:55:00 +00:00
Ed Maste
87bb53cb53 gvinum: correct assertions
Pointer addresses are always >= 0.  Assert that the value is >= 0
instead.

PR:		207855, 207856
Reviewed by:	imp
Reported by:	David Binderman
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D37677
2023-03-21 10:03:12 -04:00
Justin Hibbits
379e14ba6c powerpc/pmap: Account for a potential NULL pmap in pmap_sync_icache
It's apparently possible for pcpu->pc_curpmap to be NULL at some point,
leading to a panic.  Account for this as is done with the other 64-bit
AIM pmap.

Reported by:	pkubaj
Tested by:	pkubaj
Fixes:		6f0b2a235a ("Add pmap_sync_icache() for radix pmap")
MFC after:	3 days
2023-03-21 09:56:26 -04:00
David E. O'Brien
daa0b64a22 pmap_mapdev_attr() doesn't use any of its arguments. 2023-03-20 23:09:34 -07:00
Zhenlei Huang
535946ce11 if_re: Drop redundant assignments for ifq_maxlen and ifq_drv_maxlen
Fixes:	4519a073c3 Mechanically convert if_re(4) to DrvAPI
2023-03-21 12:29:24 +08:00
Jean-Sébastien Pédron
d780c6a6ab
x86/pci_early_quirks: Support Intel 11th+ gen
Newer Intel CPUs/iGPUs use a new method to determine the base address of
the stolen memory. This code was ported from Linux.

Reviewed by:	manu
Approved by:	manu
Differential Revision:	https://reviews.freebsd.org/D39057
2023-03-20 21:47:36 +01:00
Jean-Sébastien Pédron
5fbfe9517b
linuxkpi: Define seq_has_overflowed() and single_open_size()
This required non-trivial changes to `linux_seq_file.c` to manage a new
`(struct seq_file)->size` field. This field is read directly by DRM
drivers, so we can't alias it to a call to sbuf_len(9).

`single_open_size()` also depended on the ability to allocate the sbuf
with a specified size instead of relying on `sbuf_new_auto()`.

Reviewed by:	manu
Approved by:	manu
Differential Revision:	https://reviews.freebsd.org/D39056
2023-03-20 21:47:36 +01:00
Jean-Sébastien Pédron
1b4e08b483
linuxkpi: Support non-NULL zero-size pointers
DRM drivers set some pointers to `ZERO_SIZE_PTR` directly (without
allocating anything), to treat pointers which were "initialized" (set to
`ZERO_SIZE_PTR`) with no memory allocation like really allocated
pointers. NULL isn't used because it represents a third state.

Reviewed by:	emaste, manu
Approved by:	emaste, manu
Differential Revision:	https://reviews.freebsd.org/D39055
2023-03-20 21:47:36 +01:00
Jean-Sébastien Pédron
eef905a859
linuxkpi: Add <linux/iommu.h>
It defines a small part of the IOMMU API of Linux. We don't implement
that yet.

Reviewed by:	manu
Approved by:	manu
Differential Revision:	https://reviews.freebsd.org/D39054
2023-03-20 21:47:36 +01:00
Jean-Sébastien Pédron
af19988f6c
linuxkpi: Define pcie_aspm_enabled()
This is not the same as querying the PCIE ASPM capability. The function
should return if the feature is actually enabled or not. It always
return false on FreeBSD.

Reviewed by:	manu
Approved by:	manu
Differential Revision:	https://reviews.freebsd.org/D39053
2023-03-20 21:47:36 +01:00
Jean-Sébastien Pédron
19a355436e
linuxkpi: Add default_groups field to struct kobj_type
We don't use it, but it is set by the DRM drivers.

Reviewed by:	manu
Approved by:	manu
Differential Revision:	https://reviews.freebsd.org/D39051
2023-03-20 21:47:35 +01:00
Jean-Sébastien Pédron
e91f5814b8
linuxkpi: Define device_iommu_mapped()
For now, it always return false.

Reviewed by:	manu
Approved by:	manu
Differential Revision:	https://reviews.freebsd.org/D39050
2023-03-20 21:47:35 +01:00
Jean-Sébastien Pédron
0777b000f1
linuxkpi: Define dev_WARN() and dev_WARN_ONCE()
Reviewed by:	manu
Approved by:	manu
Differential Revision:	https://reviews.freebsd.org/D39049
2023-03-20 21:47:28 +01:00
Mitchell Horne
99bd5c1fe3 x86: nexus code tidy-up
Make a pass at the various nexus implementations, fixing some very minor
style issues, obsolete comments, etc.

The method declaration section has become unwieldy in many respects.
Attempt to tame it by:
 - Using generated method typedefs
 - Grouping methods roughly by category, and then alphabetically.

Reviewed by:	jhb
Differential Revision:	https://reviews.freebsd.org/D38495
2023-03-20 17:35:47 -03:00
Mitchell Horne
c514686aa0 powerpc: nexus code tidy-up
Make a pass at the various nexus implementations, fixing some very minor
style issues, obsolete comments, etc.

Update the top-level comment to be closer to other nexus
implementations.

The method declaration section has become unwieldy in many respects.
Attempt to tame it by:
 - Using generated method typedefs
 - Grouping methods roughly by category, and then alphabetically.

Reviewed by:	jhb
Differential Revision:	https://reviews.freebsd.org/D38495
2023-03-20 17:35:47 -03:00
Mitchell Horne
abe3309e71 riscv: nexus code tidy-up
Make a pass at the various nexus implementations, fixing some very minor
style issues, obsolete comments, etc.

The method declaration section has become unwieldy in many respects.
Attempt to tame it by:
 - Using generated method typedefs
 - Grouping methods roughly by category, and then alphabetically.

Reviewed by:	jhb
Differential Revision:	https://reviews.freebsd.org/D38495
2023-03-20 17:35:47 -03:00
Mitchell Horne
e582d4a2b0 arm64: nexus code tidy-up
Make a pass at the various nexus implementations, fixing some very minor
style issues, obsolete comments, etc.

The method declaration section has become unwieldy in many respects.
Attempt to tame it by:
 - Using generated method typedefs
 - Grouping methods roughly by category, and then alphabetically.

Reviewed by:	jhb
Differential Revision:	https://reviews.freebsd.org/D38495
2023-03-20 17:35:47 -03:00
Mitchell Horne
c650e19495 arm: nexus code tidy-up
Make a pass at the various nexus implementations, fixing some very minor
style issues, obsolete comments, etc.

The method declaration section has become unwieldy in many respects.
Attempt to tame it by:
 - Using generated method typedefs
 - Grouping methods roughly by category, and then alphabetically.

Reviewed by:	jhb
Differential Revision:	https://reviews.freebsd.org/D38495
2023-03-20 17:35:47 -03:00
Mitchell Horne
8965b3033e callout(9): adopt old references to timeout(9)
timeout(9) was removed a couple of years ago; all consumers now use the
callout(9) interface.

Explicitly do not bump .Dd anywhere, as this is not a content or
semantic change.

Reviewed by:	markj, jhb, Pau Amma <pauamma@gundo.com>
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D39136
2023-03-20 17:12:12 -03:00
Mitchell Horne
f6f8cbda8e kern_reboot(9): describe event handlers
Add more details about the execution and purpose of these shutdown
handlers. Make a point to mention the requirement that they can be run
in a normal or panic context. Add some simple examples.

Add a brief comment to the declaration in sys/eventhandler.h.

Reviewed by:	markj
Discussed with:	rpokala, Pau Amma <pauamma@gundo.com>
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D39135
2023-03-20 17:12:12 -03:00
Fedor Uporov
bb95e6fa98 Fix compilation issue, when DTRACE is not defined
PR:             270346
Reported by:    Michael Paepcke
MFC after:      2 week
2023-03-20 23:06:57 +03:00
Mark Johnston
c3179891f8 kerneldump: Inline dump_savectx() into its callers
The callers of dump_savectx() (i.e., doadump() and livedump_start())
subsequently call dumpsys()/minidumpsys(), which dump the calling
thread's stack when writing the dump.  If dump_savectx() gets its own
stack frame, that frame might be clobbered when its caller later calls
dumpsys()/minidumpsys(), making it difficult for debuggers to unwind the
stack.

Fix this by making dump_savectx() a macro, so that savectx() is always
called directly by the function which subsequently calls
dumpsys()/minidumpsys().

This fixes stack unwinding for the panicking thread from arm64
minidumps.  The same happened to work on amd64, but kgdb reports the
dump_savectx() calls as coming from dumpsys(), so in that case it
appears to work by accident.

Fixes:	c9114f9f86 ("Add new vnode dumper to support live minidumps")
Reviewed by:	mhorne, jhb
MFC after:	3 days
Differential Revision:	https://reviews.freebsd.org/D39151
2023-03-20 14:16:28 -04:00
Kristof Provost
53247cdf12 pfsync: fix pfsync_undefer_state() locking
pfsync_undefer_state() takes the bucket lock, but could get called from
places (e.g. from pfsync_update_state() or pfsync_delete_state()) where
we already held the lock.

As it can also be called from places where we don't yet hold the lock
create new locked variant for use when the lock is already held. Keep
using pfsync_undefer_state() where the lock must still be taken.

PR:		268246
MFC after:	1 week
Sponsored by:	Rubicon Communications, LLC (Netgate)
2023-03-20 16:39:14 +01:00
Kristof Provost
844ad2828a pfsync: add missing unlock in pfsync_defer_tmo()
The callout for pfsync_defer_tmo() is created with
CALLOUT_RETURNUNLOCKED, because while the callout framework takes care
of taking the lock we want to run a few operations outside of the lock,
so we unlock ourselves.

However, if `sc->sc_sync_if == NULL` we return without releasing the
lock, and leak the lock, causing later deadlocks.
Ensure we always release the bucket lock when we exit pfsync_defer_tmo()

PR:		268246
MFC after:	1 week
Sponsored by:	Rubicon Communications, LLC (Netgate)
2023-03-20 16:39:14 +01:00
Kristof Provost
511a6d5ed3 carp: use if_name()
Reported by:	melifaro
Sponsored by:	Rubicon Communications, LLC ("Netgate")
2023-03-20 14:37:10 +01:00
Kristof Provost
137818006d carp: support unicast
Allow users to configure the address to send carp messages to. This
allows carp to be used in unicast mode, which is useful in certain
virtual configurations (e.g. AWS, VMWare ESXi, ...)

Reviewed by:	melifaro
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D38940
2023-03-20 14:37:09 +01:00
Dmitry Mikushin
ef9f49a21a arm64: Adding a missing include file
Adding a missing include file, which provides the definition of
SYSCTL_INT.

Reviewed by:	andrew
Differential Revision:	https://reviews.freebsd.org/D39149
2023-03-20 11:23:01 +00:00
Kristof Provost
40e0435964 carp: add netlink interface
Allow carp configuration information to be supplied and retrieved via
netlink.

Reviewed by:	melifaro
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D39048
2023-03-20 10:52:27 +01:00
Konstantin Belousov
62b572694b ext2_dirbad(): fix !DTRACE build
Fixes:	3c2dc524c3
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2023-03-20 10:44:22 +02:00
Zhenlei Huang
082895ebec xhci(4): Describe Fresco Logic FL1009 USB 3.0 controller
Reviewed by:	hselasky
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D38922
2023-03-20 12:04:14 +08:00
Kirk McKusick
191115cfb6 Fix syntax error in 0697670.
Reported by: Michael Tuexen
2023-03-18 17:03:32 -07:00
Michael Tuexen
48345048cd sctp: fix typo in assignment 2023-03-18 23:58:50 +01:00
Kirk McKusick
069767091e Do not panic in case of corrupted UFS/FFS directory.
Historically the system panic'ed when it encountered a corrupt
directory. This change recovers well enough to continue operations.
This change is made in response to a similar change made in the ext2
filesystem as described in the cited Differential Revision.

MFC after:    2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D38503
2023-03-18 15:37:58 -07:00
Konstantin Belousov
2b4b3789f8 acpi_wakeup.c: apply the reviewer' editorial corrections to the comment text.
Fixes:	02904a06c7
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D39146
2023-03-18 17:47:19 +02:00
Konstantin Belousov
02904a06c7 amd64: properly recalculate mitigations knobs after resume
Revision r333125 AKA 986c4ca387 forced clear cpu_stdext_feature3
on suspend, since at that time microcode update was not reloaded
early on resume. Then, revision 050f5a8405 started re-reading
cpu_stdext_feature3 again. Since modern CPUs do not require mitigations
from the Skylake era, this went unnoticed for some time.

Keep zeroing cpu_stdext_feature3 on suspend, but re-read it in more
controlled way on resume after microcode is reloaded, and recalculate
active workarounds based on actual microcode capabilities.

Reported and tested by:	romain
Reviewed by:	emaste, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D39146
2023-03-18 17:40:05 +02:00
Alexander V. Chernikov
046acc2bfd netlink: add public ucred accessor for nlp.
MFC after:	2 weeks
2023-03-18 11:44:29 +00:00
Alexander V. Chernikov
568a645ba5 netlink: fix capped uncapped ack handling in snl(3).
Reviewed by:	kp
Differential Revision: https://reviews.freebsd.org/D39144
MFC after:	2 weeks
2023-03-18 11:35:56 +00:00
Wei Hu
8ea7fa16d9 uart: Don't change settings or throttle putc for Hyper-V
Azure setup does not like it when FreeBSD overrides the settings of the
UART device. When Hyper-V is detected, don't do this and also don't
throttle putc() output. This is a workaround for the early boot hang
of FreeBSD on Azure.

Tested on Azure, ESXi (VM with serial port), and SG-8200

PR:		264267
Reviewed by:	kevans, whu
Tested by:	whu
Obtained from:	Rubicon Communications, LLC (Netgate)
MFC after:	2 weeks
Sponsored by:	Rubicon Communications, LLC (Netgate)
2023-03-18 07:07:54 +00:00
Konstantin Belousov
ab3ff87a33 Belately bump __FreeBSD_version for introduction of __libc_start1()
and move of most of the initialization code from csu to libc.

Requested by:	jrtc27
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2023-03-18 08:31:41 +02:00
Fedor Uporov
3c2dc524c3 Do not panic in case of corrupted directory
The panic() will be called under ext2_dirbad()
function in case of rw mount. It cause user confusion,
like in BZ 265951.

PR:			265951
Reviewed by:		pfg, mckusick
MFC after:		2 week
Differential revision:  https://reviews.freebsd.org/D38503
2023-03-18 09:16:24 +03:00
Fedor Uporov
366da717de Add root directory entry check.
Add check that directory entry with ino=EXT2_ROOTINO
have correct namelength and name. It is possible to
create malicious image which will cause panic if root
directory entry have incorrect name.

PR:			259068
Reported by:		Robert Morris
Reviewed by:		pfg
MFC after:		2 weeks
Differential Revision:  https://reviews.freebsd.org/D38502
2023-03-18 09:16:22 +03:00
Zhenlei Huang
b754d7faaf uhci(4): Correct PCI device ID for Zhaoxin USB controller
And minor style fixes.

Tested by:	Weitao Wang <WeitaoWang-oc@zhaoxin.com>
Fixes:		986c7be472 uhci(4): Add new USB IDs
Differential Revision:	https://reviews.freebsd.org/D38924
2023-03-18 01:30:19 +08:00
Zhenlei Huang
95b2d16b38 ehci(4): Correct PCI device ID for Zhaoxin USB 2.0 controller
And minor style fixes.

Tested by:	Weitao Wang <WeitaoWang-oc@zhaoxin.com>
Fixes:		f9237e1937 ehci(4): Add new USB IDs
Differential Revision:	https://reviews.freebsd.org/D38923
2023-03-18 01:30:18 +08:00
Zhenlei Huang
f50f53931e xhci(4): Correct PCI device IDs for Zhaoxin USB 3.0 controllers
And minor style fixes.

Reviewed by:	hselasky
Tested by:	Weitao Wang <WeitaoWang-oc@zhaoxin.com>
Fixes:		0d7064d58f xhci(4): Add new USB IDs
Differential Revision:	https://reviews.freebsd.org/D38921
2023-03-18 01:30:18 +08:00
Mateusz Guzik
62a573d953 vfs: retire KERN_VNODE
It got disabled in 2003:

commit acb18acfec
Author: Poul-Henning Kamp <phk@FreeBSD.org>
Date:   Sun Feb 23 18:09:05 2003 +0000

    Bracket the kern.vnode sysctl in #ifdef notyet because it results
    in massive locking issues on diskless systems.

    It is also not clear that this sysctl is non-dangerous in its
    requirements for locked down memory on large RAM systems.

There does not seem to be practical use for it and the disabled routine
does not work anyway.

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D39127
2023-03-17 16:21:45 +00:00
Konstantin Belousov
ff6d60946a amd64 acpi_wakeup.c: fix typo
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2023-03-17 15:10:34 +02:00
Vitaliy Gusev
94a3876d7e
vmm: fix missing ipi statistic
ipi counters are missing in bhyvectl's output because vm_maxcpu is 0
when initializing them. That's because vmm_stat_register is executed
before vmm_init.

Instead of directly fixing it, there's a better solution in illumos
which is cherry picked:
65a3bc8373

It replaces the matrix statistic by two counters per vcpu. One for
counting the ipis to the vcpu and one counting the ipis received by the
vcpu. This has several advantages:

- A matrix statistic becomes huge when using many vcpus.
- A matrix statistic easily reaches the MAX_VMM_STAT_ELEMS limit.
- Two counters are enough in most cases. DTrace can be used for more
  advanced debugging purposes.
- A matrix statistic wastes memory. The matrix size is determined by
  vm_maxcpu regardless of the number of vcpus assigned to the vm.

Reviewed by:		corvink, markj
Fixes:			ee98f99d7a ("vmm: Convert VM_MAXCPU into a loader tunable hw.vmm.maxcpu.")
MFC after:		1 week
Sponsored by:		vStack
Differential Revision:	https://reviews.freebsd.org/D39038
2023-03-17 13:50:08 +01:00
Emmanuel Vadot
949efdaa1d arm: Remove SOCFPGA specific kernel configs
We had GENERIC for a while now so anyone still interested in those boards
should make sure that we can boot on them with it and with upstream DTS files.

Sponsored by:   Beckhoff Automation GmbH & Co. KG
Reviewed by:	br
Differential Revision:	https://reviews.freebsd.org/D39088
2023-03-17 14:49:01 +01:00
Emmanuel Vadot
00e84f52f0 arm: Rename hdmi_if.m to crtc_if.m
There is nothing hdmi related in this interface, it's just a generic interface
for crt controller so rename it.
This also remove the 'hdmi' device used in arm kernel config. 'vt' now controls
if we build this interface (sc(4) isn't supported on arm).

Sponsored by:	Beckhoff Automation GmbH & Co. KG
Reviewed by:	andrew
Differential Revision:	https://reviews.freebsd.org/D39120
2023-03-17 13:34:57 +01:00
Emmanuel Vadot
3bcb469c61 arm: ti: Rename video related devices
device 'hdmi' is too generic (and will be used later in a new device) so rename
the arm TI devices to some proper name.

Sponsored by:   Beckhoff Automation GmbH & Co. KG
Reviewed by:	andrew
Differential Revision:	https://reviews.freebsd.org/D39119
2023-03-17 13:34:52 +01:00
Emmanuel Vadot
8574d32f22 arm: imx: Rename video related devices
device 'hdmi' is too generic (and will be used later in a new device) so rename
the arm IMX devices to some proper name.

Sponsored by:	Beckhoff Automation GmbH & Co. KG
Reviewed by:	andrew
Differential Revision:	https://reviews.freebsd.org/D39118
2023-03-17 13:34:46 +01:00
Emmanuel Vadot
37531e78df arm: allwinner: Garbage collect a10_hdmi driver
It was disconnected 5 years ago in 4573cd3914
("arm: allwinner: Disconnect A10/A20 HDMI driver") as it wasn't working.

Sponsored by:	Beckhoff Automation GmbH & Co. KG
Reviewed by:	andrew
Differential Revision:	https://reviews.freebsd.org/D39117
2023-03-17 13:34:40 +01:00
Emmanuel Vadot
1c4ff02a74 arm: Remove IMX6 kernel config
All devices are in GENERIC and GENERIC is known to boot on those SoCs.

Sponsored by:	Beckhoff Automation GmbH & Co. KG
Reviewed by:	andrew
Differential Revision:	https://reviews.freebsd.org/D39090
2023-03-17 13:34:03 +01:00
Emmanuel Vadot
cdb0c2a73d arm: Remove IMX5 specific kernel configs
We had GENERIC for a while now so anyone still interested in those boards
should make sure that we can boot on them with it and with upstream DTS files.

Sponsored by:   Beckhoff Automation GmbH & Co. KG
Reviewed by:	andrew
Differential Revision:	https://reviews.freebsd.org/D39089
2023-03-17 13:33:57 +01:00
Emmanuel Vadot
ba9f8eeb47 arm: Remove VYBRID specific kernel config
We had GENERIC for a while now so anyone still interested in those boards
should make sure that we can boot on them with it and with upstream DTS files.

Sponsored by:	Beckhoff Automation GmbH & Co. KG
Reviewed by:	andrew
Differential Revision:	https://reviews.freebsd.org/D39087
2023-03-17 13:33:51 +01:00
Emmanuel Vadot
031461b166 arm: Remove kernel config APALIS-IMX6
It reference to a non-existant dts file apalis-imx6.dts so unlikekly to compile.
Aldo IMX6 support is in GENERIC so anyone interested in this board should
make it work with GENERIC kernel (if that's not already the case).

Sponsored by:	Beckhoff Automation GmbH & Co. KG
Reviewed by:	andrew
Differential Revision:	https://reviews.freebsd.org/D39086
2023-03-17 13:33:46 +01:00
Kyle Evans
f7a884cb01 x86: gate smbios hypervisor identification behind vm_guest
cpuid detection may have picked up a more specific guest type already,
and a follow-up check of smbios vendor/product may erroneously blow
away the previously detected type.

This reportedly fixes the boot under Hyper-V, which advertises an
smbios.system.product of "Virtual Machine."

PR:		270239
Reviewed by:	imp, kib (both earlier version, same concept)
Fixes:	2fee875629 ("abstract out the vm detection via smbios..")
Differential Revision:	https://reviews.freebsd.org/D39140
2023-03-17 00:54:32 -05:00
Rick Macklem
896516e54a nfscl: Add a new NFSv4.1/4.2 mount option for Kerberized mounts
Without this patch, a Kerberized NFSv4.1/4.2 mount must provide
a Kerberos credential for the client at mount time.  This credential
is typically referred to as a "machine credential".  It can be
created one of two ways:
- The user (usually root) has a valid TGT at the time the mount
  is done and this becomes the machine credential.
  There are two problems with this.
  1 - The user doing the mount must have a valid TGT for a user
      principal at mount time.  As such, the mount cannot be put
      in fstab(5) or similar.
  2 - When the TGT expires, the mount breaks.
- The client machine has a service principal in its default keytab
  file and this service principal (typically called a host-based
  initiator credential) is used as the machine credential.
  There are problems with this approach as well:
  1 - There is a certain amount of administrative overhead creating
      the service principal for the NFS client, creating a keytab
      entry for this principal and then copying the keytab entry
      into the client's default keytab file via some secure means.
  2 - The NFS client must have a fixed, well known, DNS name, since
      that FQDN is in the service principal name as the instance.

This patch uses a feature of NFSv4.1/4.2 called SP4_NONE, which
allows the state maintenance operations to be performed by any
authentication mechanism, to do these operations via AUTH_SYS
instead of RPCSEC_GSS (Kerberos).  As such, neither of the above
mechanisms is needed.

It is hoped that this option will encourage adoption of Kerberized
NFS mounts using TLS, to provide a more secure NFS mount.

This new NFSv4.1/4.2 mount option, called "syskrb5" must be used
with "sec=krb5[ip]" to avoid the need for either of the above
Kerberos setups to be done by the client.

Note that all file access/modification operations still require
users on the NFS client to have a valid TGT recognized by the
NFSv4.1/4.2 server.  As such, this option allows, at most, a
malicious client to do some sort of DOS attack.

Although not required, use of "tls" with this new option is
encouraged, since it provides on-the-wire encryption plus,
optionally, client identity verification via a X.509
certificate provided to the server during TLS handshake.
Alternately, "sec=krb5p" does provide on-the-wire
encryption of file data.

A mount_nfs(8) man page update will be done in a separate commit.

Discussed on:	freebsd-current@
MFC after:	3 months
2023-03-16 15:55:36 -07:00
Andrew Turner
a671f96d93 Mark arm64 mair_el1 fields as unsigned long
The register is 64-bit so the upper bits could be shifted past the
signed 32-bit size of an int the values were before.

Sponsored by:	Arm Ltd
2023-03-16 16:45:42 +00:00
Andrew Turner
3473f28322 Switch the arm64 VM_MEMATTR_DEVICE to nGnRE
Move device memory to a weaker type. The new device memory type allows
the system to acknowledge a write to a device before the write has
completed. This is inline with VM_MEMATTR_DEVICE on armv6/armv7.

Sponsored by:	Arm Ltd
Differential Revision:	https://reviews.freebsd.org/D38945
2023-03-16 16:45:42 +00:00
Andrew Turner
f7acb7ed41 Allow forcing non-posted memory on arm64
To allow for debugging after changing the arm64 VM_MEMATTR_DEVICE
memory type add a new set of tunables to tell the kernel to use
non-posted memory.

This adds the following tunables:
 - kern.force_nonposted: When set to non-zero the kernel will use
   non-posted memory for all device allocations.
 - hint.<dev>.<unit>.force_nonposted: As above, however only forces
   non-posted memory on the named device.

Sponsored by:	Arm Ltd
Differential Revision:	https://reviews.freebsd.org/D38944
2023-03-16 16:45:42 +00:00
Andrew Turner
bc10894757 Remove an unneeded CTASSERT in the smmu driver
We don't map the DMAP here

Sponsored by:	Arm Ltd
Differential Revision:	https://reviews.freebsd.org/D38951
2023-03-16 16:45:42 +00:00
Andrew Turner
9a5dddc94f Remove unneeded arm64 smmu macros
These aren't used by the driver so can be removed.

Sponsored by:	Arm Ltd
Differential Revision:	https://reviews.freebsd.org/D38950
2023-03-16 16:45:42 +00:00
Andrew Turner
5f2070adb9 Only support a 4 level smmu page table
We only ever build a 4 level page table for the Arm SMMU. Remove the
support for a 3 level table.

Sponsored by:	Arm Ltd
Differential Revision:	https://reviews.freebsd.org/D38949
2023-03-16 16:45:42 +00:00
Andrew Turner
83fb1bdbfe Rename smmu pmap functions
These are SMMU (and MALI GPU) specific. Give them a SMMU specific name.

Sponsored by:	Arm Ltd
Differential Revision:	https://reviews.freebsd.org/D38948
2023-03-16 16:45:42 +00:00
Michael Tuexen
8ed1e2c880 sctp: enforce Kahn's rule during the handshake
Don't take RTT measurements on packets containing INIT or COOKIE-ECHO
chunks, when they were retransmitted.

MFC after:	1 week
2023-03-16 17:40:40 +01:00
Randall Stewart
69c7c81190 Move access to tcp's t_logstate into inline functions and provide new tracepoint and bbpoint capabilities.
The TCP stacks have long accessed t_logstate directly, but in order to do tracepoints and the new bbpoints
we need to move to using the new inline functions. This adds them and moves rack to now use
the tcp_tracepoints.

Reviewed by: tuexen, gallatin
Sponsored by: Netflix Inc
Differential Revision: https://reviews.freebsd.org/D38831
2023-03-16 11:43:16 -04:00
Andrew Turner
7d0b915919 Add PSCI affinity info return values
These can be returned from the PSCI AFFINITY_INFO call. This is not
marked as optional so bhyve will need to implement it & can use these
macros.

Sponsored by:	Arm Ltd
2023-03-16 13:08:00 +00:00
Andrew Turner
e89be21854 Add a psci macro to build a version value
Add PSCI_VER that takes a major and minor version and builds the value
returned by the firmware. This will be used by bhyve.

Sponsored by:	Arm Ltd
2023-03-16 13:08:00 +00:00
Andrew Turner
473ab212dc Allow psci.h to be used by userspace
Wrap parts of psci.h that aren't usable by userspace in _KERNEL checks.
This allows it to be used to implement PSCI and SMCCC by bhyve in
userspace.

Sponsored by:	Arm Ltd
Sponsored by:	Innovate UK
Sponsored by:	The FreeBSD Foundation
2023-03-16 13:08:00 +00:00
Dag-Erling Smørgrav
ef184e989b tarfs: Fix backtracking during node creation.
Sponsored by:	Juniper Networks, Inc.
Sponsored by:	Klara, Inc.
Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D39082
2023-03-16 11:31:22 +00:00
Dag-Erling Smørgrav
e81d55b439 tarfs: Support tar files which include file modes with permissions.
Sponsored by:	Juniper Networks, Inc.
Sponsored by:	Klara, Inc.
Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D39020
2023-03-16 11:31:22 +00:00
Dag-Erling Smørgrav
fd8c98a52f tarfs: Correctly track link count.
Sponsored by:	Juniper Networks, Inc.
Sponsored by:	Klara, Inc.
Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D39019
2023-03-16 11:31:22 +00:00
Zhenlei Huang
49cad3daf2 carp: carp_master_down_locked() requires net epoch
Reviewed by:	kp
Fixes:		1d126e9b94 carp: Widen epoch coverage
MFC after:	1 day
Differential Revision:	https://reviews.freebsd.org/D39113
2023-03-16 18:07:03 +08:00
Kristof Provost
80e76c61cc pf: set scope in pf_refragment6()
Link-local traffic needs to have a scope embedded before it's passed on
to ip6_output(). Do so in pf_refragment6(), because when we end up here
in the output path we may have passed through ip6_output() already
(before being reassembled), where the scope would have been removed.

Re-embed the scope so that link-local traffic is sent correctly.

Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D39062
2023-03-16 10:59:04 +01:00
Kristof Provost
b52b61c0b6 pf: distinguish forwarding and output cases for pf_refragment6()
Re-introduce PFIL_FWD, because pf's pf_refragment6() needs to know if
we're ip6_forward()-ing or ip6_output()-ing.

ip6_forward() relies on m->m_pkthdr.rcvif, at least for link-local
traffic (for in6_get_unicast_scopeid()). rcvif is not set for locally
generated traffic (e.g. from icmp6_reflect()), so we need to call the
correct output function.

Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revisi:	https://reviews.freebsd.org/D39061
2023-03-16 10:59:04 +01:00
Michael Tuexen
c91ae48a25 sctp: don't do RTT measurements with cookies
When receiving a cookie, the receiver does not know whether the
peer retransmitted the COOKIE-ECHO chunk or not. Therefore, don't
do an RTT measurement. It might be much too long.
To overcome this limitation, one could do at least two things:
1. Bundle the INIT-ACK chunk with a HEARTBEAT chunk for doing the
   RTT measurement. But this is not allowed.
2. Add a flag to the COOKIE-ECHO chunk, which indicates that it
   is the initial transmission, and not a retransmission. But
   this requires an RFC.

MFC after:	1 week
2023-03-16 10:45:13 +01:00
Michael Tuexen
cee09bda03 sctp: allow disabling of SCTP_ACCEPT_ZERO_CHECKSUM socket option 2023-03-15 22:55:23 +01:00
Michael Tuexen
6026b45aab sctp: improve negotiation of zero checksum feature
Enforce consistency between announcing 0-cksum support and actually
using it in the association. The value from the inp when the
INIT ACK is sent must be used, not the one from the inp when the
cookie is received.
2023-03-15 22:29:52 +01:00
Alexander V. Chernikov
73ae25c174 netlink: improve snl(3)
Summary:
* add snl_send_message() as a convenient send wrapper
* add signed integer parsers
* add snl_read_reply_code() to simplify operation result checks
* add snl_read_reply_multi() to simplify reading multipart messages
* add snl_create_genl_msg_request()
* add snl_get_genl_family() to simplify family name->id resolution
* add tests for some of the functionality

Reviewed by:	kp
Differential Revision: https://reviews.freebsd.org/D39092
MFC after:	2 weeks
2023-03-15 20:53:20 +00:00
Andrew Turner
cc67cd58fc arm64: Support stage 2 mappings in pmap_remove_all
This has been hit when testing bhyve.

Sponsored by:	Arm Ltd
2023-03-15 18:24:50 +00:00
Andrew Turner
5c4bd8756f Stop using the rid as an index in the arm timer
The order of the interrupt array doesn't matter. Store the described
interrupts at the start of the array to simplify iterating over them.

Reviewed by:	imp, kevans
Sponsored by:	Arm Ltd
Differential Revision:	https://reviews.freebsd.org/D39094
2023-03-15 15:49:00 +00:00
Andrew Turner
02f3d094ae Use the arm physical timer when able
To allow bhyve manage the virtual timer while in a guest have FreeBSD
use the virtual timer only when bhyve will be unavailable due to not
starting at EL2 where the hypervisor switcher will run.

Reviewed by:	imp, kevans
Sponsored by:	Arm Ltd
Differential Revision:	https://reviews.freebsd.org/D39093
2023-03-15 15:49:00 +00:00
Andrew Turner
51308cd29b Support the arm64 pmap_remove_write for stage 2
The fields we need to adjust are different in stage 1 and stage 2
tables. Handle this by adding variables to hold the bits to check,
set, and clear.

Reviewed by:	alc
Sponsored by:	The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D37399
2023-03-15 15:49:00 +00:00