Commit Graph

282342 Commits

Author SHA1 Message Date
Ruslan Bukin
46e6e29097 Import OpenCSD v.1.4.0.
Sponsored by:	UKRI
2023-03-27 17:03:16 +01:00
Ruslan Bukin
974000f192 Update OpenCSD to v1.4.0.
Sponsored by:	UKRI
2023-03-27 16:23:28 +01:00
Alexander V. Chernikov
19e43c163c netlink: add netlink KPI to the kernel by default
This change does the following:

Base Netlink KPIs (ability to register the family, parse and/or
 write a Netlink message) are always present in the kernel. Specifically,
* Implementation of genetlink family/group registration/removal,
  some base accessors (netlink_generic_kpi.c, 260 LoC) are compiled in
  unconditionally.
* Basic TLV parser functions (netlink_message_parser.c, 507 LoC) are
  compiled in unconditionally.
* Glue functions (netlink<>rtsock), malloc/core sysctl definitions
 (netlink_glue.c, 259 LoC) are compiled in unconditionally.
* The rest of the KPI _functions_ are defined in the netlink_glue.c,
 but their implementation calls a pointer to either the stub function
 or the actual function, depending on whether the module is loaded or not.

This approach allows to have only 1k LoC out of ~3.7k LoC (current
 sys/netlink implementation) in the kernel, which will not grow further.
It also allows for the generic netlink kernel customers to load
 successfully without requiring Netlink module and operate correctly
 once Netlink module is loaded.

Reviewed by:	imp
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D39269
2023-03-27 13:55:44 +00:00
Yuri Pankov
21af4e09f4 nvmecontrol(8): fix resv register -i synopsis
-i is "ignore existing key" and does not take argument

Reviewed by:	pauamma (manpages)
Differential Revision:	https://reviews.freebsd.org/D37709
2023-03-27 15:00:33 +02:00
Mark Johnston
68ca8363c7 libc: Use secure_getenv(3) where appropriate
No functional change intended.

Reviewed by:	mjg, imp, kib
Differential Revision:	https://reviews.freebsd.org/D39278
2023-03-27 08:56:22 -04:00
Mark Johnston
ad2f2ee015 arm64: Remove duplicated function prototypes for PAC
No functional change intended.

Sponsored by:	The FreeBSD Foundation
2023-03-27 08:56:22 -04:00
Yuri Pankov
6aa5b10d0c nvme: fix resv commands with nda device
- passing I/O commands through nda requires nsid field to be set (it was
  unused when going through nvme_ns_ioctl())
- ccb's status can be OR'ed with the flags, use CAM_STATUS_MASK

Reviewed by:	imp (cam)
Differential Revision:	https://reviews.freebsd.org/D37696
2023-03-27 14:53:24 +02:00
Yuri Pankov
1249680609 kern.post.mk: fix PORTSDIR handling
Using subshell's PORTSDIR variable (via $${PORTSDIR}}) seems to be
only working if PORTSDIR is specified directly on the make command
line.

Use ${PORTDIR} here instead so that setting the variable in
/etc/{make,src,src-env}.conf would work (also works for variable
being set on command line or in the environment).

PR:		268299
Reviewed by:	imp
Differential Revision:	https://reviews.freebsd.org/D37868
2023-03-27 13:57:57 +02:00
Alexander V. Chernikov
eccccd657f netlink: make nlattr_add_in[6]_addr inline
MFC after:	2 weeks
2023-03-27 11:53:34 +00:00
Alexander V. Chernikov
6dc858d84c netlink: remove forgotten debug message in handle_rtm_getroute().
MFC after:	2 weeks
2023-03-27 10:49:40 +00:00
Alexander V. Chernikov
544f1492c0 netlink: ensure genetlink control family always registers under the same ID.
MFC after:	2 weeks
2023-03-27 10:48:24 +00:00
Corvin Köhne
e8988d60d2
pci: expose intel_graphics_stolen as sysctl
The Intel graphics stolen memory is used by the Intel GOP driver on
boot. When using bhyve with GPU passthrough, it's also used by the guest
GOP driver at guest boot. For that reason, bhyve needs to know the
address and size of this region to inform the guest about this region.
Exposing the variables as sysctl allows bhyve to easily read them.
2023-03-27 11:40:49 +02:00
Corvin Köhne
48d70503bc
pci: add tunable hw.pci.enable_mps_tune
If the tunable is set to 0, the tuning of the MPS (maximum payload size)
is disabled and the default MPS values set by the BIOS are used. In this
case the system may use a lower speed or operate in a less optimized
state, but it can resolve issues with stability and compatibility. With
specific devices the tuning of the mps, can lead to a complete freeze of
the system.

Reviewed by:		manu
MFC after:		1 week
Sponsored by:		Beckhoff Automation GmbH & Co. KG
Differential Revision:	https://reviews.freebsd.org/D38397
2023-03-27 11:28:27 +02:00
Corvin Köhne
f4ceaff56d
bhyve: add config option to modify LPC IDs
The Intel GOP driver checks the LPC IDs to detect the platform it's
running on. The GOP driver only works on the platforms it's written for.
Maybe other Intel driver have the same behaviour. For that reason, we
should use the LPC IDs of the FreeBSD host for GPU passthrough to work
properly.

We don't know if setting different LPC IDs have any side effect.
Therefore, don't use the host LPC IDs by default on Intel system. Give
the user the opportunity to modify the LPC IDs.

Reviewed by:		jhb
MFC after:		1 week
Sponsored by:		Beckhoff Automation GmbH & Co. KG
Differential Revision:	https://reviews.freebsd.org/D28280
2023-03-27 10:10:24 +02:00
Corvin Köhne
b72e06b13e
bhyve: make use of helper to read PCI IDs from bhyve config
For compatibilty reasons, the old config values are still supported.

Reviewed by:		jhb
MFC after:		1 week
Sponsored by:		Beckhoff Automation GmbH & Co. KG
Differential Revision:	https://reviews.freebsd.org/D38403
2023-03-27 10:10:24 +02:00
Corvin Köhne
ffaed739a8
bhyve: add helper to read PCI IDs from bhyve config
Changing the PCI IDs is valuable in some situations. The Intel GOP
driver requires that some PCI IDs of the LPC bridge are aligned with the
physical values of the host LPC bridge. Another use case are oracles
virtio driver. They require different subvendor ID than the default one.
For that reason, create a helper which makes it easy to read PCI IDs
from bhyve config. Additionally, this helper ensures that all emulation
devices are using the same config keys.

Reviewed by:		jhb
MFC after:		1 week
Sponsored by:		Beckhoff Automation GmbH & Co. KG
Differential Revision:	https://reviews.freebsd.org/D38402
2023-03-27 10:10:24 +02:00
Gordon Bergling
107597fbd4 rarpd(8): Fix a typo in a source code comment
- s/combinataion/combination/

Obtained from:	NetBSD
MFC after:	3 days
2023-03-27 08:45:30 +02:00
Jose Luis Duran
9fc2d858b4 ping tests: Add a regression test
Test regression fixed in 4630a3252a. Add two tests that do not
use the verbose flag, so the code path in question can be reached:

1. Respond with a proper ICMP destination host unreachable packet.
2. Respond with a doctored ICMP destination host unreachable packet,
   that has the ICMP Identifier field modified (+1 bit).

Reviewed by:	cy
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D39244
2023-03-26 19:54:29 -07:00
Kyle Evans
ec1bc53002 arm64: cpu_switch: don't zero out pcb_onfault
Previously this would zero out x18 in the pcb, now it's attacking the
innocent pcb_onfault -- drop it entirely.

This technically fixes
e605b87a9e ("Save only callee-saved registers in pcb"), but it's
harmless until the below commit trims down pcb_x.

Reported by:	mmel
Reviewed by:	andrew, mmel
Fixes:	1c1f31a5e5 ("Remove unused registes from the arm pcb")
Differential Revision:	https://reviews.freebsd.org/D39277
2023-03-26 13:48:22 -05:00
Alexander V. Chernikov
3a151e31ac route: fix RTF_HOST & non-empty mask handling in netlink translation. 2023-03-26 18:07:23 +00:00
Mark Johnston
ae2f0b2611 tftp tests: Fix a typo in the makefile
Fixes:	64c2a712d6 ("tftp: Add tests.")
2023-03-26 12:54:59 -04:00
Alexander V. Chernikov
c597432e22 route(8): convert to netlink
This change converts all kernel rtsock interactions in route(8)
 to Netlink.

Based on the WITHOUT_NETLINK_SUPPORT src.conf(5) variable, route(8)
 now fully operates either via Netlink or via rtsock/sysctl.
The default (compile-time) is Netlink.

The output for route delete/add/get/flush is targeted to be exactly
 the same (apart from some error handling cases).
The output for the route monitor has been changed to improve
 readability and support netlink models.

Other behaviour changes:
* exact prefix lookup (route -n get a.b.c.d/e) is not yet supported.
* route monitor does not show the change originator yet.

Differential Revision:	https://reviews.freebsd.org/D39007
2023-03-26 11:06:56 +00:00
Alexander V. Chernikov
a85dcd4ac4 netlink: restrict default userland switch to netlink to i386/amd64. 2023-03-26 11:06:53 +00:00
Alexander V. Chernikov
9a11f3dff9 netlink: add standrard ifaddr/neigh parsers to snl(3).
MFC after:	2 weeks
2023-03-26 09:04:41 +00:00
Alexander V. Chernikov
64dfea8651 netlink: add NETLINK/NETLINK_SUPPORT userland options.
Make userland tools such as netstat, route, arp and ndp use
 either netlink or rtsock interfaces based on the NETLINK_SUPPORT
 options.
Both NETLINK and NETLINK_SUPPORT options are turned on by default.

Reviewed By: eugen
Differential Revision: https://reviews.freebsd.org/D39148
2023-03-26 08:59:58 +00:00
Alexander V. Chernikov
04f75b9802 netlink: allow netlink sockets in non-vnet jails.
This change allow to open Netlink sockets in the non-vnet jails, even for
 unpriviledged processes.
The security model largely follows the existing one. To be more specific:
* by default, every `NETLINK_ROUTE` command is **NOT** allowed in non-VNET
 jail UNLESS `RTNL_F_ALLOW_NONVNET_JAIL` flag is specified in the command
 handler.
* All notifications are **disabled** for non-vnet jails (requests to
 subscribe for the notifications are ignored). This will change to be more
 fine-grained model once the first netlink provider requiring this gets
 committed.
* Listing interfaces (RTM_GETLINK) is **allowed** w/o limits (**including**
 interfaces w/o any addresses attached to the jail). The value of this is
 questionable, but it follows the existing approach.
* Listing ARP/NDP neighbours is **forbidden**. This is a **change** from the
 current approach - currently we list static ARP/ND entries belonging to the
 addresses attached to the jail.
* Listing interface addresses is **allowed**, but the addresses are filtered
 to match only ones attached to the jail.
* Listing routes is **allowed**, but the routes are filtered to provide only
 host routes matching the addresses attached to the jail.
* By default, every `NETLINK_GENERIC` command is **allowed** in non-VNET jail
 (as sub-families may be unrelated to network at all).
 It is the goal of the family author to implement the restriction if
 necessary.

Differential Revision: https://reviews.freebsd.org/D39206
MFC after:	1 month
2023-03-26 08:44:09 +00:00
Alexander V. Chernikov
2cda6a2fb0 routing: add public rt_is_exportable() version to check if
the route can be exported to userland when jailed.

Differential Revision: https://reviews.freebsd.org/D39204
MFC after:	2 weeks
2023-03-26 08:24:27 +00:00
Gordon Bergling
328ebd4680 devd.conf.5: Fix a typo in the manual page
- s/deteted/detected/

MFC after:	5 days
2023-03-26 09:43:58 +02:00
Cy Schubert
1f5c16c4e1 share/misc/committers-*: Document my mentors
Document my ports and src mentors.
2023-03-25 20:29:37 -07:00
Konstantin Belousov
28f957b8b3 vnode_pager_input: return runningbufspace back
Both vnode_pager_input_smlfs() and vnode_pager_generic_getpages()
increment runningbufspace, but also both delegate io completion handling
on the pbuf to either plain bdone() or filesystem-specific strategy
routine. Accidentally, for e.g. UFS it is g_vfs_strategy()/g_vfs_done().
The later calls bufdone() which handles runningbufspace reclamation.

For plain bdone() io done handler, nothing would return
accounted b_runningbufspace back. Do it in the new
helper vnode_pager_input_bdone(), as well as in
vnode_pager_generic_getpages_done() explicitly.

Note that potential multiple calls to runningbufwakeup() for the same
pbuf or buf completion are safe. runningbufwakeup() clears accounting
for the buffer, so second and later calls are nop.

The problem was found due to tarfs using small vnode pager input but not
g_vfs_strategy().

Reported by:	des
Reviewed by:	markj, sjg
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D39263
2023-03-26 00:55:29 +02:00
Mateusz Guzik
0e71f4f77c vm: add unlocked page lookup before trying vm_fault_soft_fast
Shaves a read lock + tryupgrade trip most of the time.

Stats from doing a kernel build (counters not present in the tree):
vm.fault_soft_fast_ok: 262653
vm.fault_soft_fast_failed_other: 41
vm.fault_soft_fast_failed_no_page: 39595772
vm.fault_soft_fast_failed_page_busy: 1929
vm.fault_soft_fast_failed_page_invalid: 22183

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D39268
2023-03-25 22:14:59 +00:00
Mateusz Guzik
22eb66d961 vfs cache: always assert on ndp->ni_resflags 2023-03-25 21:57:55 +00:00
Dmitry Chagin
f0fe68a965 i386: ansify
Reported by:	clang 15
2023-03-25 23:21:03 +03:00
Warner Losh
9c43d5ca20 CONTRIBUTING.md: Suggest using checkstyle9.pl 2023-03-21 02:08:12 -06:00
Warner Losh
3a3c924273 checkstyle9.pl: Perl script to check if a change is approximately style(9)
This code is adapted from the QEMU checkpatch.pl script. It can check
either a patch, a file or a git branch. It tries to warn about things
that I believe might be style(9) violations. It's experimental, since I
heavily hacked on the qemu version to get it to not complain (much)
about iconic code in the tree. At the moment, it's use should be
considered expermental. It will likely miss violations, and complain
about code that's perfectly fine.  It's offered as an experiment
and to make it easier for contributors to submit patches.
2023-03-25 11:06:13 -06:00
Warner Losh
d5df268584 secure_getenv: Improve documentation wording
Improve the documentation wording to be more consistent with FreeBSD
manual pages.

Suggested by:		mjg (though reworded)
Sponsored by:		Netflix
2023-03-25 11:06:13 -06:00
Warner Losh
72f501d07a secure_getenv: Add () around return values
Style only change, no functional change intended.

Sponsored by:		Netflix
2023-03-25 11:06:13 -06:00
Joseph Koshy
6093120776
procfs: Add manual page cross references.
Approved by:	gnn (mentor)
Differential Revision: https://reviews.freebsd.org/D39264
2023-03-25 16:21:34 +00:00
Andrew Gallatin
abba58766f LRO: Add missing checks for invalid IP addresses
LRO bypasses normal ip_input()/tcp_input() and lacks several checks
that are present in the normal path.  Without these checks, it
is possible to trigger assertions added in b0ccf53f24

Reviewed by: glebius, rrs
Sponsored by: Netflix
2023-03-25 11:56:02 -04:00
Dag-Erling Smørgrav
508aee9681 pwd_mkdb: Sort options and update usage message.
Sponsored by:	Klara, Inc.
Reviewed by:	allanjude
Differential Revision:	https://reviews.freebsd.org/D39266
2023-03-25 14:15:40 +00:00
Mateusz Guzik
138a5dafba vfs: trylock vnode requeue
The quasi-LRU still gets in the way for example when doing an
incremental bzImage build, with vnode_list lock being at the
top of the profile. Further damage control the problem by trylocking.

Note the entire mechanism desperately wants to be reaped out in favor
of something(tm) which both scales in a multicore setting and provides
sensible replacement policy.

With this change everything vfs almost disappears from the on CPU
flamegraph, what is left is tons of contention in the VM.
2023-03-25 13:42:27 +00:00
Mateusz Guzik
245767c278 vfs: flip deferred_inact to atomic
Turns out it is very rarely triggered, making a per-cpu
counter a waste.

Examples from real life boxes:
uptime		counter
135 days	847
138 days	2190
141 days	1
2023-03-25 13:42:27 +00:00
Mateusz Guzik
e5eb1d298f vfs: replace some spelled out VNASSERTs with VNPASS
nfc
2023-03-25 13:42:27 +00:00
Ed Maste
978013a094 makefs: emit NM records for all directory entries
We previously attempted to emit Rock Ridge NM records only when the name
represented by the Rock Ridge extensions would actually differ. We would
omit the record for an all-upper-case directory name, however Linux (and
perhaps other operating systems) map names with no NM record to
lowercase.

This affected only directories, as file names have an implicit ";1"
version number appended and thus always differ.  To solve, just emit NM
records for all entries other than DOT and DOTDOT .

We could continue to omit the NM record for directories that would avoid
mapping (for example, one named 1234.567) but this does not seem worth
the complexity.

PR:		203531
Reported by:	Thomas Schmitt <scdbackup@gmx.net
Reviewed by:	kevans
Sponsored by:	The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D39258
2023-03-25 09:25:18 -04:00
Vico Chen
1f0c8bfd65 linsysfs(4): Keep Linux compatible sysfs the same as Ubuntu
By checking Ubuntu, there is no `/sys/subsystem' in sysfs. To compatible
with Ubuntu, delete the 'subsystem' creation in Linux compatible module.

On the other hand, the sysfs `/sys/subsystem' cause failure for some
Linux udev cases. In Linux udev source code, there is a function named
`scan_devices_all', and it will scan `/sys/subsystem' if it is existed,
but now there are nothing in /sys/subsystem `, and it returns empty
to cause some use cases failed.

Reviewed by:		dchagin
Differential Revision:	https://reviews.freebsd.org/D38885
MFC after:		1 month
XMFC with:		ifAPI
2023-03-25 13:41:04 +03:00
Dmitry Chagin
0b56641cfc linsysfs(4): Reimplement listnics() using ifAPI
Handle if arrival/departure events.

Reviewed by:		melifaro (early version)
Differential Revision:	https://reviews.freebsd.org/D38901
MFC after:		1 month
XMFC with:		ifAPI
2023-03-25 13:40:41 +03:00
Lorenzo Salvadore
a756181cea
share/misc/committers-doc.dot: Add myself
Add myself as a doc committer, mentored by carlavilla and dbaio.

Approved by:	carlavilla (mentor)
Differential Revision: https://reviews.freebsd.org/D39254
2023-03-25 10:47:40 +01:00
Dag-Erling Smørgrav
9b20ab1e1e local-unbound-setup: Disable the libc subscriber.
This is the correct way to prevent resolvconf from updating resolv.conf.

Reviewed by:	cy
Differential Revision:	https://reviews.freebsd.org/D39262
2023-03-24 21:46:09 +01:00
John Baldwin
b22fccad8c Remove libvmmapi.so.5 after the shlib version was bumped to 6. 2023-03-24 13:42:18 -07:00
John Baldwin
0f735657aa bhyve: Remove vmctx member from struct vm_snapshot_meta.
This is a userland-only pointer that isn't relevant to the kernel and
doesn't belong in the ioctl structure shared between userland and the
kernel.  For the kernel, the old structure for the ioctl is still
supported under COMPAT_FREEBSD13.

This changes vm_snapshot_req() in libvmmapi to accept an explicit
vmctx argument.

It also changes vm_snapshot_guest2host_addr to take an explicit vmctx
argument.  As part of this change, move the declaration for this
function and its wrapper macro from vmm_snapshot.h to snapshot.h as it
is a userland-only API.

Reviewed by:	corvink, markj
Differential Revision:	https://reviews.freebsd.org/D38125
2023-03-24 11:49:06 -07:00