Commit Graph

146832 Commits

Author SHA1 Message Date
Alexander V. Chernikov
19e43c163c netlink: add netlink KPI to the kernel by default
This change does the following:

Base Netlink KPIs (ability to register the family, parse and/or
 write a Netlink message) are always present in the kernel. Specifically,
* Implementation of genetlink family/group registration/removal,
  some base accessors (netlink_generic_kpi.c, 260 LoC) are compiled in
  unconditionally.
* Basic TLV parser functions (netlink_message_parser.c, 507 LoC) are
  compiled in unconditionally.
* Glue functions (netlink<>rtsock), malloc/core sysctl definitions
 (netlink_glue.c, 259 LoC) are compiled in unconditionally.
* The rest of the KPI _functions_ are defined in the netlink_glue.c,
 but their implementation calls a pointer to either the stub function
 or the actual function, depending on whether the module is loaded or not.

This approach allows to have only 1k LoC out of ~3.7k LoC (current
 sys/netlink implementation) in the kernel, which will not grow further.
It also allows for the generic netlink kernel customers to load
 successfully without requiring Netlink module and operate correctly
 once Netlink module is loaded.

Reviewed by:	imp
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D39269
2023-03-27 13:55:44 +00:00
Mark Johnston
ad2f2ee015 arm64: Remove duplicated function prototypes for PAC
No functional change intended.

Sponsored by:	The FreeBSD Foundation
2023-03-27 08:56:22 -04:00
Yuri Pankov
6aa5b10d0c nvme: fix resv commands with nda device
- passing I/O commands through nda requires nsid field to be set (it was
  unused when going through nvme_ns_ioctl())
- ccb's status can be OR'ed with the flags, use CAM_STATUS_MASK

Reviewed by:	imp (cam)
Differential Revision:	https://reviews.freebsd.org/D37696
2023-03-27 14:53:24 +02:00
Yuri Pankov
1249680609 kern.post.mk: fix PORTSDIR handling
Using subshell's PORTSDIR variable (via $${PORTSDIR}}) seems to be
only working if PORTSDIR is specified directly on the make command
line.

Use ${PORTDIR} here instead so that setting the variable in
/etc/{make,src,src-env}.conf would work (also works for variable
being set on command line or in the environment).

PR:		268299
Reviewed by:	imp
Differential Revision:	https://reviews.freebsd.org/D37868
2023-03-27 13:57:57 +02:00
Alexander V. Chernikov
eccccd657f netlink: make nlattr_add_in[6]_addr inline
MFC after:	2 weeks
2023-03-27 11:53:34 +00:00
Alexander V. Chernikov
6dc858d84c netlink: remove forgotten debug message in handle_rtm_getroute().
MFC after:	2 weeks
2023-03-27 10:49:40 +00:00
Alexander V. Chernikov
544f1492c0 netlink: ensure genetlink control family always registers under the same ID.
MFC after:	2 weeks
2023-03-27 10:48:24 +00:00
Corvin Köhne
e8988d60d2
pci: expose intel_graphics_stolen as sysctl
The Intel graphics stolen memory is used by the Intel GOP driver on
boot. When using bhyve with GPU passthrough, it's also used by the guest
GOP driver at guest boot. For that reason, bhyve needs to know the
address and size of this region to inform the guest about this region.
Exposing the variables as sysctl allows bhyve to easily read them.
2023-03-27 11:40:49 +02:00
Corvin Köhne
48d70503bc
pci: add tunable hw.pci.enable_mps_tune
If the tunable is set to 0, the tuning of the MPS (maximum payload size)
is disabled and the default MPS values set by the BIOS are used. In this
case the system may use a lower speed or operate in a less optimized
state, but it can resolve issues with stability and compatibility. With
specific devices the tuning of the mps, can lead to a complete freeze of
the system.

Reviewed by:		manu
MFC after:		1 week
Sponsored by:		Beckhoff Automation GmbH & Co. KG
Differential Revision:	https://reviews.freebsd.org/D38397
2023-03-27 11:28:27 +02:00
Kyle Evans
ec1bc53002 arm64: cpu_switch: don't zero out pcb_onfault
Previously this would zero out x18 in the pcb, now it's attacking the
innocent pcb_onfault -- drop it entirely.

This technically fixes
e605b87a9e ("Save only callee-saved registers in pcb"), but it's
harmless until the below commit trims down pcb_x.

Reported by:	mmel
Reviewed by:	andrew, mmel
Fixes:	1c1f31a5e5 ("Remove unused registes from the arm pcb")
Differential Revision:	https://reviews.freebsd.org/D39277
2023-03-26 13:48:22 -05:00
Alexander V. Chernikov
9a11f3dff9 netlink: add standrard ifaddr/neigh parsers to snl(3).
MFC after:	2 weeks
2023-03-26 09:04:41 +00:00
Alexander V. Chernikov
04f75b9802 netlink: allow netlink sockets in non-vnet jails.
This change allow to open Netlink sockets in the non-vnet jails, even for
 unpriviledged processes.
The security model largely follows the existing one. To be more specific:
* by default, every `NETLINK_ROUTE` command is **NOT** allowed in non-VNET
 jail UNLESS `RTNL_F_ALLOW_NONVNET_JAIL` flag is specified in the command
 handler.
* All notifications are **disabled** for non-vnet jails (requests to
 subscribe for the notifications are ignored). This will change to be more
 fine-grained model once the first netlink provider requiring this gets
 committed.
* Listing interfaces (RTM_GETLINK) is **allowed** w/o limits (**including**
 interfaces w/o any addresses attached to the jail). The value of this is
 questionable, but it follows the existing approach.
* Listing ARP/NDP neighbours is **forbidden**. This is a **change** from the
 current approach - currently we list static ARP/ND entries belonging to the
 addresses attached to the jail.
* Listing interface addresses is **allowed**, but the addresses are filtered
 to match only ones attached to the jail.
* Listing routes is **allowed**, but the routes are filtered to provide only
 host routes matching the addresses attached to the jail.
* By default, every `NETLINK_GENERIC` command is **allowed** in non-VNET jail
 (as sub-families may be unrelated to network at all).
 It is the goal of the family author to implement the restriction if
 necessary.

Differential Revision: https://reviews.freebsd.org/D39206
MFC after:	1 month
2023-03-26 08:44:09 +00:00
Alexander V. Chernikov
2cda6a2fb0 routing: add public rt_is_exportable() version to check if
the route can be exported to userland when jailed.

Differential Revision: https://reviews.freebsd.org/D39204
MFC after:	2 weeks
2023-03-26 08:24:27 +00:00
Konstantin Belousov
28f957b8b3 vnode_pager_input: return runningbufspace back
Both vnode_pager_input_smlfs() and vnode_pager_generic_getpages()
increment runningbufspace, but also both delegate io completion handling
on the pbuf to either plain bdone() or filesystem-specific strategy
routine. Accidentally, for e.g. UFS it is g_vfs_strategy()/g_vfs_done().
The later calls bufdone() which handles runningbufspace reclamation.

For plain bdone() io done handler, nothing would return
accounted b_runningbufspace back. Do it in the new
helper vnode_pager_input_bdone(), as well as in
vnode_pager_generic_getpages_done() explicitly.

Note that potential multiple calls to runningbufwakeup() for the same
pbuf or buf completion are safe. runningbufwakeup() clears accounting
for the buffer, so second and later calls are nop.

The problem was found due to tarfs using small vnode pager input but not
g_vfs_strategy().

Reported by:	des
Reviewed by:	markj, sjg
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D39263
2023-03-26 00:55:29 +02:00
Mateusz Guzik
0e71f4f77c vm: add unlocked page lookup before trying vm_fault_soft_fast
Shaves a read lock + tryupgrade trip most of the time.

Stats from doing a kernel build (counters not present in the tree):
vm.fault_soft_fast_ok: 262653
vm.fault_soft_fast_failed_other: 41
vm.fault_soft_fast_failed_no_page: 39595772
vm.fault_soft_fast_failed_page_busy: 1929
vm.fault_soft_fast_failed_page_invalid: 22183

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D39268
2023-03-25 22:14:59 +00:00
Mateusz Guzik
22eb66d961 vfs cache: always assert on ndp->ni_resflags 2023-03-25 21:57:55 +00:00
Dmitry Chagin
f0fe68a965 i386: ansify
Reported by:	clang 15
2023-03-25 23:21:03 +03:00
Andrew Gallatin
abba58766f LRO: Add missing checks for invalid IP addresses
LRO bypasses normal ip_input()/tcp_input() and lacks several checks
that are present in the normal path.  Without these checks, it
is possible to trigger assertions added in b0ccf53f24

Reviewed by: glebius, rrs
Sponsored by: Netflix
2023-03-25 11:56:02 -04:00
Mateusz Guzik
138a5dafba vfs: trylock vnode requeue
The quasi-LRU still gets in the way for example when doing an
incremental bzImage build, with vnode_list lock being at the
top of the profile. Further damage control the problem by trylocking.

Note the entire mechanism desperately wants to be reaped out in favor
of something(tm) which both scales in a multicore setting and provides
sensible replacement policy.

With this change everything vfs almost disappears from the on CPU
flamegraph, what is left is tons of contention in the VM.
2023-03-25 13:42:27 +00:00
Mateusz Guzik
245767c278 vfs: flip deferred_inact to atomic
Turns out it is very rarely triggered, making a per-cpu
counter a waste.

Examples from real life boxes:
uptime		counter
135 days	847
138 days	2190
141 days	1
2023-03-25 13:42:27 +00:00
Mateusz Guzik
e5eb1d298f vfs: replace some spelled out VNASSERTs with VNPASS
nfc
2023-03-25 13:42:27 +00:00
Vico Chen
1f0c8bfd65 linsysfs(4): Keep Linux compatible sysfs the same as Ubuntu
By checking Ubuntu, there is no `/sys/subsystem' in sysfs. To compatible
with Ubuntu, delete the 'subsystem' creation in Linux compatible module.

On the other hand, the sysfs `/sys/subsystem' cause failure for some
Linux udev cases. In Linux udev source code, there is a function named
`scan_devices_all', and it will scan `/sys/subsystem' if it is existed,
but now there are nothing in /sys/subsystem `, and it returns empty
to cause some use cases failed.

Reviewed by:		dchagin
Differential Revision:	https://reviews.freebsd.org/D38885
MFC after:		1 month
XMFC with:		ifAPI
2023-03-25 13:41:04 +03:00
Dmitry Chagin
0b56641cfc linsysfs(4): Reimplement listnics() using ifAPI
Handle if arrival/departure events.

Reviewed by:		melifaro (early version)
Differential Revision:	https://reviews.freebsd.org/D38901
MFC after:		1 month
XMFC with:		ifAPI
2023-03-25 13:40:41 +03:00
John Baldwin
0f735657aa bhyve: Remove vmctx member from struct vm_snapshot_meta.
This is a userland-only pointer that isn't relevant to the kernel and
doesn't belong in the ioctl structure shared between userland and the
kernel.  For the kernel, the old structure for the ioctl is still
supported under COMPAT_FREEBSD13.

This changes vm_snapshot_req() in libvmmapi to accept an explicit
vmctx argument.

It also changes vm_snapshot_guest2host_addr to take an explicit vmctx
argument.  As part of this change, move the declaration for this
function and its wrapper macro from vmm_snapshot.h to snapshot.h as it
is a userland-only API.

Reviewed by:	corvink, markj
Differential Revision:	https://reviews.freebsd.org/D38125
2023-03-24 11:49:06 -07:00
John Baldwin
7d9ef309bd libvmmapi: Add a struct vcpu and use it in most APIs.
This replaces the 'struct vm, int vcpuid' tuple passed to most API
calls and is similar to the changes recently made in vmm(4) in the
kernel.

struct vcpu is an opaque type managed by libvmmapi.  For now it stores
a pointer to the VM context and an integer id.

As an immediate effect this removes the divergence between the kernel
and userland for the instruction emulation code introduced by the
recent vmm(4) changes.

Since this is a major change to the vmmapi API, bump VMMAPI_VERSION to
0x200 (2.0) and the shared library major version.

While here (and since the major version is bumped), remove unused
vcpu argument from vm_setup_pptdev_msi*().

Add new functions vm_suspend_all_cpus() and vm_resume_all_cpus() for
use by the debug server.  The underyling ioctl (which uses a vcpuid of
-1) remains unchanged, but the userlevel API now uses separate
functions for global CPU suspend/resume.

Reviewed by:	corvink, markj
Differential Revision:	https://reviews.freebsd.org/D38124
2023-03-24 11:49:06 -07:00
Konstantin Belousov
13262b07a0 fdesc_lookup(): drop fdropped
Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D39207
2023-03-24 19:47:22 +02:00
Konstantin Belousov
7dca8fd1cb fdesc_lookup(): the condition to use vn_vget_ino() is always true
The ix number for the fdescfs root is 1, while any fd vnode has the ix
value at least 3.

Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D39207
2023-03-24 19:47:16 +02:00
Konstantin Belousov
8d97282a39 fdesc_lookup(): no need to vhold the dvp vnode
It is already referenced by the VOP_LOOKUP() caller, otherwise vdrop()
after vn_lock() is invalid anyway.

Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D39207
2023-03-24 19:47:07 +02:00
Konstantin Belousov
51b8ffb95c fdesc_allocvp(): fix potential use after free
Just owning the interlock is not enough for vget() to operate on the
vnode race-free with vgone(), the vnode should be held.  Use
vget_prep()/vget_finish() to avoid vholding the vnode explicitly, and
drop LK_INTERLOCK.

Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D39207
2023-03-24 19:46:53 +02:00
Konstantin Belousov
fa3ea81b77 fdescfs: remove useless XXX comment, unwrap line
Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D39207
2023-03-24 19:46:47 +02:00
Justin Hibbits
79aa96f9ca infiniband: Bring back M_ASSERTVALID() check in infiband_bpf_mtap()
Reported by:	rpokala
Fixes:		adf62e8363
2023-03-24 11:04:42 -04:00
Justin Hibbits
bb55bb1740 inet6: Include if_private.h in one more netstack file
ip6_input() and ip6_destroy() both directly reference ifnet members.
This file was missed in 3d0d5b21

Fixes:		3d0d5b21 ("IfAPI: Explicitly include <net/if_private.h>...")
Sponsored by:	Juniper Networks, Inc.
2023-03-24 10:25:35 -04:00
Justin Hibbits
727bfe3894 Mechanically convert qlnx(4) to IfAPI
Reviewed By:	zlei
Sponsored by:	Juniper Networks, Inc.
Differential Revision: https://reviews.freebsd.org/D37856
2023-03-24 10:09:53 -04:00
Justin Hibbits
3e142e0767 ofed: Mechanically convert to IfAPI
Summary:
Because of the intricacies of this code it wasn't purely scripted, but
instead hand-mechanical.

Reviewed by:	hselasky
Sponsored by:	Juniper Networks, Inc.
Differential Revision: https://reviews.freebsd.org/D38560
2023-03-24 10:04:33 -04:00
Zhenlei Huang
dcd7f0bd02 lagg: Various style fixes
MFC after:	1 week
2023-03-24 17:55:15 +08:00
Kristof Provost
ad729f8d50 pf: ignore ip6_output() return value in pf_refragment6()
We can't do anything if ip6_output() fails, other than discard the
packet which ip6_output() already does for us.
Mark the return value as ignored.

Reported by:	emaste, Coverity
Sponsored by:	Rubicon Communications, LLC (Netgate)
2023-03-24 08:08:19 +01:00
Eric Joyner
949d971f0b
ice(4): Restore old conditional overwritten by last update
Commit 8923de5905 ("ice(4): Update to 1.37.7-k", 2023-02-13)
unintentionally overwrote the change made in commit 52f45d8ace ("net:
iflib: let the drivers use isc_capenable", 2021-12-28).

Signed-off-by: Eric Joyner <erj@FreeBSD.org>

Reported by:	jhibbits@
MFC after:	3 days
Sponsored by:	Intel Corporation
2023-03-24 00:05:26 -07:00
Kyle Evans
89c52f9d59 arm64: add KASAN support
This entails:
- Marking some obvious candidates for __nosanitizeaddress
- Similar trap frame markings as amd64, for similar reasons
- Shadow map implementation

The shadow map implementation is roughly similar to what was done on
amd64, with some exceptions.  Attempting to use available space at
preinit_map_va + PMAP_PREINIT_MAPPING_SIZE (up to the end of that range,
as depicted in the physmap) results in odd failures, so we instead
search the physmap for free regions that we can carve out, fragmenting
the shadow map as necessary to try and fit as much as we need for the
initial kernel map.  pmap_bootstrap_san() is thus after
pmap_bootstrap(), which still included some technically reserved areas
of the memory map that needed to be included in the DMAP.

The odd failure noted above may be a bug, but I haven't investigated it
all that much.

Initial work by mhorne with additional fixes from kevans and markj.

Reviewed by:	andrew, markj
Sponsored by:	Juniper Networks, Inc.
Sponsored by:	Klara, Inc.
Differential Revision:	https://reviews.freebsd.org/D36701
2023-03-23 16:34:33 -05:00
Mitchell Horne
698dbd66fe arm64: add a GENERIC-KASAN config
Reviewed by:	andrew, markj
Sponsored by:	Juniper Networks, Inc.
Sponsored by:	Klara, Inc.
Differential Revision:	https://reviews.freebsd.org/D36702
2023-03-23 16:34:33 -05:00
John Baldwin
d2dab20c2a ktls: Drop all the INET and INET6 compile-time guards.
Consistent with 9fd0d9b16e, KERN_TLS is
not supported on kernels without any INET support.

Reviewed by:	gallatin, hselasky
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D39232
2023-03-23 14:29:07 -07:00
Gordon Bergling
f63aaffebc acpi(4): Fix a typo in a kernel message
- s/enitialization/initialization/

MFC afer:	5 days
2023-03-23 22:03:31 +01:00
Kirk McKusick
42c82aad32 Improve chance of finding an alternate superblock in sbsearch(3).
When requesting a superblock read for the sole purpose of getting
the parameters needed to find if backup parameters have been stored,
specify UFS_NOCSUM as only the base superblock is needed. This
change reduces the number of checks that the superblock must pass.

MFC after:    1 week
2023-03-23 13:04:52 -07:00
Mateusz Guzik
c16c4ea6d3 vfs cache: return ENOTDIR for not_a_dir/{.,..} lookups
Reported by:	Oliver Kiddle
PR:	270419
MFC:	3 days
2023-03-23 19:31:18 +00:00
Andrew Turner
6a4f5fdd19 Mark the arm64 PSR register fields with UL
These are for a 64 bit register. Make them 64 bit values on arm64.

Sponsored by:	Arm Ltd
2023-03-23 18:56:26 +00:00
Andrew Turner
ea3061526e Bump __FreeBSD_version for changing spsr on arm64
This changed a few structures, bump __FreeBSD_version for kgdb and
userspace consumers.

Sponsored by:	Arm Ltd
2023-03-23 18:56:26 +00:00
Andrew Turner
1c1f31a5e5 Remove unused registes from the arm pcb
These were kept for ABI reasons. Remove them and bump __FreeBSD_version
so debuggers can be updated to use the new layout.

Reviewed by:	jhb
Sponsored by:	Arm Ltd
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D35378
2023-03-23 18:56:26 +00:00
Andrew Turner
4a06b28a15 Add compat support for struct reg on arm64
The size of the spsr field in struct reg has changed. Mask the bits
that userspace doesn't know about out as they may be invalid.

While here add a comment why we don't need compat support in set_regs.

Sponsored by:	Arm Ltd
2023-03-23 18:56:26 +00:00
Zachary Leaf
f4036a9234 arm64: add fault address to trapframe
It was previously possible for the fault address register to get
clobbered before it was saved. This small window occurred when an
additional exception was encountered inside the exception handler,
overwriting the previous value.

Commit f29942229d ("Read the arm64 far early in el0 exceptions")
patched this issue, but avoided changing the trapframe since this could
be considered a KBI change in FreeBSD 13.

Revert the above fix and save the fault address in the trapframe
instead. This saves the fault address even earlier in the exception
handling process, and is a more robust and simple fix.

Reviewed by:	andrew, jhb, jrtc27
Sponsored by: Arm Ltd
Differential Revision:	https://reviews.freebsd.org/D38984
2023-03-23 18:56:26 +00:00
Zachary Leaf
2ecbbcc7ca arm64: extend ESR/SPSR registers to 64b
For the Exception Syndrome Register, ESR_ELx, the upper 32b were
previously unused, but now may contain additional exception info as of
Armv8.7 (FEAT_LS64).

Extend ESR from u32->u64 in exception handling code to support this. In
addition, also extend Saved Program Status Register SPSR_ELx in the same
way to allow for future extensions.

Reviewed by:	andrew
Sponsored by: Arm Ltd
Differential Revision:	https://reviews.freebsd.org/D38983
2023-03-23 18:56:26 +00:00
Justin Hibbits
e2427c6917 IfAPI: Add iterator to complement if_foreach()
Summary:
Sometimes an if_foreach() callback can be trivial, or need a lot of
outer context.  In this case a regular `for` loop makes more sense.  To
keep things hidden in the new API, use an opaque `if_iter` structure
that can still be instantiated on the stack.  The current implementation
uses just a single pointer out of the 4 alotted to the opaque context,
and the cleanup does nothing, but may be used in the future.

Reviewed by:	melifaro
Sponsored by:	Juniper Networks, Inc.
Differential Revision: https://reviews.freebsd.org/D39138
2023-03-23 09:39:26 -04:00