This change does the following:
Base Netlink KPIs (ability to register the family, parse and/or
write a Netlink message) are always present in the kernel. Specifically,
* Implementation of genetlink family/group registration/removal,
some base accessors (netlink_generic_kpi.c, 260 LoC) are compiled in
unconditionally.
* Basic TLV parser functions (netlink_message_parser.c, 507 LoC) are
compiled in unconditionally.
* Glue functions (netlink<>rtsock), malloc/core sysctl definitions
(netlink_glue.c, 259 LoC) are compiled in unconditionally.
* The rest of the KPI _functions_ are defined in the netlink_glue.c,
but their implementation calls a pointer to either the stub function
or the actual function, depending on whether the module is loaded or not.
This approach allows to have only 1k LoC out of ~3.7k LoC (current
sys/netlink implementation) in the kernel, which will not grow further.
It also allows for the generic netlink kernel customers to load
successfully without requiring Netlink module and operate correctly
once Netlink module is loaded.
Reviewed by: imp
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D39269
- passing I/O commands through nda requires nsid field to be set (it was
unused when going through nvme_ns_ioctl())
- ccb's status can be OR'ed with the flags, use CAM_STATUS_MASK
Reviewed by: imp (cam)
Differential Revision: https://reviews.freebsd.org/D37696
Using subshell's PORTSDIR variable (via $${PORTSDIR}}) seems to be
only working if PORTSDIR is specified directly on the make command
line.
Use ${PORTDIR} here instead so that setting the variable in
/etc/{make,src,src-env}.conf would work (also works for variable
being set on command line or in the environment).
PR: 268299
Reviewed by: imp
Differential Revision: https://reviews.freebsd.org/D37868
The Intel graphics stolen memory is used by the Intel GOP driver on
boot. When using bhyve with GPU passthrough, it's also used by the guest
GOP driver at guest boot. For that reason, bhyve needs to know the
address and size of this region to inform the guest about this region.
Exposing the variables as sysctl allows bhyve to easily read them.
If the tunable is set to 0, the tuning of the MPS (maximum payload size)
is disabled and the default MPS values set by the BIOS are used. In this
case the system may use a lower speed or operate in a less optimized
state, but it can resolve issues with stability and compatibility. With
specific devices the tuning of the mps, can lead to a complete freeze of
the system.
Reviewed by: manu
MFC after: 1 week
Sponsored by: Beckhoff Automation GmbH & Co. KG
Differential Revision: https://reviews.freebsd.org/D38397
The Intel GOP driver checks the LPC IDs to detect the platform it's
running on. The GOP driver only works on the platforms it's written for.
Maybe other Intel driver have the same behaviour. For that reason, we
should use the LPC IDs of the FreeBSD host for GPU passthrough to work
properly.
We don't know if setting different LPC IDs have any side effect.
Therefore, don't use the host LPC IDs by default on Intel system. Give
the user the opportunity to modify the LPC IDs.
Reviewed by: jhb
MFC after: 1 week
Sponsored by: Beckhoff Automation GmbH & Co. KG
Differential Revision: https://reviews.freebsd.org/D28280
For compatibilty reasons, the old config values are still supported.
Reviewed by: jhb
MFC after: 1 week
Sponsored by: Beckhoff Automation GmbH & Co. KG
Differential Revision: https://reviews.freebsd.org/D38403
Changing the PCI IDs is valuable in some situations. The Intel GOP
driver requires that some PCI IDs of the LPC bridge are aligned with the
physical values of the host LPC bridge. Another use case are oracles
virtio driver. They require different subvendor ID than the default one.
For that reason, create a helper which makes it easy to read PCI IDs
from bhyve config. Additionally, this helper ensures that all emulation
devices are using the same config keys.
Reviewed by: jhb
MFC after: 1 week
Sponsored by: Beckhoff Automation GmbH & Co. KG
Differential Revision: https://reviews.freebsd.org/D38402
Test regression fixed in 4630a3252a. Add two tests that do not
use the verbose flag, so the code path in question can be reached:
1. Respond with a proper ICMP destination host unreachable packet.
2. Respond with a doctored ICMP destination host unreachable packet,
that has the ICMP Identifier field modified (+1 bit).
Reviewed by: cy
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D39244
Previously this would zero out x18 in the pcb, now it's attacking the
innocent pcb_onfault -- drop it entirely.
This technically fixes
e605b87a9e ("Save only callee-saved registers in pcb"), but it's
harmless until the below commit trims down pcb_x.
Reported by: mmel
Reviewed by: andrew, mmel
Fixes: 1c1f31a5e5 ("Remove unused registes from the arm pcb")
Differential Revision: https://reviews.freebsd.org/D39277
This change converts all kernel rtsock interactions in route(8)
to Netlink.
Based on the WITHOUT_NETLINK_SUPPORT src.conf(5) variable, route(8)
now fully operates either via Netlink or via rtsock/sysctl.
The default (compile-time) is Netlink.
The output for route delete/add/get/flush is targeted to be exactly
the same (apart from some error handling cases).
The output for the route monitor has been changed to improve
readability and support netlink models.
Other behaviour changes:
* exact prefix lookup (route -n get a.b.c.d/e) is not yet supported.
* route monitor does not show the change originator yet.
Differential Revision: https://reviews.freebsd.org/D39007
Make userland tools such as netstat, route, arp and ndp use
either netlink or rtsock interfaces based on the NETLINK_SUPPORT
options.
Both NETLINK and NETLINK_SUPPORT options are turned on by default.
Reviewed By: eugen
Differential Revision: https://reviews.freebsd.org/D39148
This change allow to open Netlink sockets in the non-vnet jails, even for
unpriviledged processes.
The security model largely follows the existing one. To be more specific:
* by default, every `NETLINK_ROUTE` command is **NOT** allowed in non-VNET
jail UNLESS `RTNL_F_ALLOW_NONVNET_JAIL` flag is specified in the command
handler.
* All notifications are **disabled** for non-vnet jails (requests to
subscribe for the notifications are ignored). This will change to be more
fine-grained model once the first netlink provider requiring this gets
committed.
* Listing interfaces (RTM_GETLINK) is **allowed** w/o limits (**including**
interfaces w/o any addresses attached to the jail). The value of this is
questionable, but it follows the existing approach.
* Listing ARP/NDP neighbours is **forbidden**. This is a **change** from the
current approach - currently we list static ARP/ND entries belonging to the
addresses attached to the jail.
* Listing interface addresses is **allowed**, but the addresses are filtered
to match only ones attached to the jail.
* Listing routes is **allowed**, but the routes are filtered to provide only
host routes matching the addresses attached to the jail.
* By default, every `NETLINK_GENERIC` command is **allowed** in non-VNET jail
(as sub-families may be unrelated to network at all).
It is the goal of the family author to implement the restriction if
necessary.
Differential Revision: https://reviews.freebsd.org/D39206
MFC after: 1 month
Both vnode_pager_input_smlfs() and vnode_pager_generic_getpages()
increment runningbufspace, but also both delegate io completion handling
on the pbuf to either plain bdone() or filesystem-specific strategy
routine. Accidentally, for e.g. UFS it is g_vfs_strategy()/g_vfs_done().
The later calls bufdone() which handles runningbufspace reclamation.
For plain bdone() io done handler, nothing would return
accounted b_runningbufspace back. Do it in the new
helper vnode_pager_input_bdone(), as well as in
vnode_pager_generic_getpages_done() explicitly.
Note that potential multiple calls to runningbufwakeup() for the same
pbuf or buf completion are safe. runningbufwakeup() clears accounting
for the buffer, so second and later calls are nop.
The problem was found due to tarfs using small vnode pager input but not
g_vfs_strategy().
Reported by: des
Reviewed by: markj, sjg
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D39263
Shaves a read lock + tryupgrade trip most of the time.
Stats from doing a kernel build (counters not present in the tree):
vm.fault_soft_fast_ok: 262653
vm.fault_soft_fast_failed_other: 41
vm.fault_soft_fast_failed_no_page: 39595772
vm.fault_soft_fast_failed_page_busy: 1929
vm.fault_soft_fast_failed_page_invalid: 22183
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D39268
This code is adapted from the QEMU checkpatch.pl script. It can check
either a patch, a file or a git branch. It tries to warn about things
that I believe might be style(9) violations. It's experimental, since I
heavily hacked on the qemu version to get it to not complain (much)
about iconic code in the tree. At the moment, it's use should be
considered expermental. It will likely miss violations, and complain
about code that's perfectly fine. It's offered as an experiment
and to make it easier for contributors to submit patches.
LRO bypasses normal ip_input()/tcp_input() and lacks several checks
that are present in the normal path. Without these checks, it
is possible to trigger assertions added in b0ccf53f24
Reviewed by: glebius, rrs
Sponsored by: Netflix
The quasi-LRU still gets in the way for example when doing an
incremental bzImage build, with vnode_list lock being at the
top of the profile. Further damage control the problem by trylocking.
Note the entire mechanism desperately wants to be reaped out in favor
of something(tm) which both scales in a multicore setting and provides
sensible replacement policy.
With this change everything vfs almost disappears from the on CPU
flamegraph, what is left is tons of contention in the VM.
Turns out it is very rarely triggered, making a per-cpu
counter a waste.
Examples from real life boxes:
uptime counter
135 days 847
138 days 2190
141 days 1
We previously attempted to emit Rock Ridge NM records only when the name
represented by the Rock Ridge extensions would actually differ. We would
omit the record for an all-upper-case directory name, however Linux (and
perhaps other operating systems) map names with no NM record to
lowercase.
This affected only directories, as file names have an implicit ";1"
version number appended and thus always differ. To solve, just emit NM
records for all entries other than DOT and DOTDOT .
We could continue to omit the NM record for directories that would avoid
mapping (for example, one named 1234.567) but this does not seem worth
the complexity.
PR: 203531
Reported by: Thomas Schmitt <scdbackup@gmx.net
Reviewed by: kevans
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D39258
By checking Ubuntu, there is no `/sys/subsystem' in sysfs. To compatible
with Ubuntu, delete the 'subsystem' creation in Linux compatible module.
On the other hand, the sysfs `/sys/subsystem' cause failure for some
Linux udev cases. In Linux udev source code, there is a function named
`scan_devices_all', and it will scan `/sys/subsystem' if it is existed,
but now there are nothing in /sys/subsystem `, and it returns empty
to cause some use cases failed.
Reviewed by: dchagin
Differential Revision: https://reviews.freebsd.org/D38885
MFC after: 1 month
XMFC with: ifAPI
Add myself as a doc committer, mentored by carlavilla and dbaio.
Approved by: carlavilla (mentor)
Differential Revision: https://reviews.freebsd.org/D39254
This is a userland-only pointer that isn't relevant to the kernel and
doesn't belong in the ioctl structure shared between userland and the
kernel. For the kernel, the old structure for the ioctl is still
supported under COMPAT_FREEBSD13.
This changes vm_snapshot_req() in libvmmapi to accept an explicit
vmctx argument.
It also changes vm_snapshot_guest2host_addr to take an explicit vmctx
argument. As part of this change, move the declaration for this
function and its wrapper macro from vmm_snapshot.h to snapshot.h as it
is a userland-only API.
Reviewed by: corvink, markj
Differential Revision: https://reviews.freebsd.org/D38125