eventfd is a Linux system call that produces special file descriptors
for event notification. When porting Linux software, it is currently
usually emulated by epoll-shim on top of kqueues. Unfortunately, kqueues
are not passable between processes. And, as noted by the author of
epoll-shim, even if they were, the library state would also have to be
passed somehow. This came up when debugging strange HW video decode
failures in Firefox. A native implementation would avoid these problems
and help with porting Linux software.
Since we now already have an eventfd implementation in the kernel (for
the Linuxulator), it's pretty easy to expose it natively, which is what
this patch does.
Submitted by: greg@unrelenting.technology
Reviewed by: markj (previous version)
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D26668
allprison_lock should be at least held shared when jail OSD methods
are called. Add a shared lock around one such call where that wasn't
the case.
In another such call, change an exclusive lock grab to be shared in
what is likely the more common case.
Return a boolean (i.e. 0 or 1) from prison_allow, instead of the flag
value itself, which is what sysctl expects.
Add prison_set_allow(), which can set or clear a permission bit, and
propagates cleared bits down to child jails.
Use prison_allow() and prison_set_allow() in the various jail.allow.*
sysctls, and others that depend on thoe permissions.
Add locking around checking both pr_allow and pr_enforce_statfs in
prison_priv_check().
We initialize sfio->npages only when some I/O is required to satisfy the
request. However, sendfile_iodone() contains an INVARIANTS-only check
that references sfio->npages, and this check is executed even if no I/O
is performed, so the check may use an uninitialized value.
Fix the problem by initializing sfio->npages earlier. Note that
sendfile_swapin() always initializes the page array. In some rare cases
we need to trim the page array so ensure that sfio->npages gets updated
accordingly.
Reported by: syzkaller (with KASAN)
Reviewed by: kib
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D27726
Use atomic access and a memory barrier to ensure that the flag parameter
in pr_flag_allow is indeed set after the rest of the structure is valid.
Simplify adding flag bits with pr_allow_all, a dynamic version of
PR_ALLOW_ALL_STATIC.
Use the kernel physical base rather than the ttbr0 base when building
the kernel identity map. The latter is correct with current assumptions
but may not always be the case.
Sponsored by: Innovate UK
These drivers should have been removed along with tl(4) as part of
7c897ca91fe1cdb785531d2f5aa0d441c1d73142 and r347918 respectively
as these fromer made sure to only ever attach to the latter, e. g.:
<...>
static int
tlphy_probe(device_t dev)
{
if (!mii_dev_mac_match(dev, "tl"))
return (ENXIO);
<...>
When a jail is added using the default (system-chosen) JID, and
non-default-JID jails already exist, a loop through the allprison
list could restart and result in unnecessary O(n^2) behaviour.
There should never be more than two list passes required.
Also clean up inefficient (though still O(n)) allprison list traversal
when finding jails by ID, or when adding jails in the common case of
all default JIDs.
We have stopped using SVN, so the notes containing the old SVN revisions
are no longer populated, so fall back to purely counting the number of
commits (currently at about 255337).
Also turn the format more into what git-describe produces, with a name
first, then the number of commits and the hash last. Note that as we
don't tag anything on `main`, git describe will never produce something
useful there and finds the newest vendor tag that was merged in instead.
Sample output:
FreeBSD 13.0-CURRENT #6 main-c255126-gb81783dc98e6-dirty
FreeBSD 12.2-STABLE #0 stable/12-c243035-gd16dac42b641-dirty
MFC after: 3 weeks
Reviewed by: imp, glebius
Differential Revision: https://reviews.freebsd.org/D27751
Use recently-added combination of `fib[46]_lookup_rt()` which
returns rtentry & raw nexthop with `rt_get_inet[6]_plen()` which
returns address/prefix length of prefix inside `rt`.
Add `nhop_select_func()` wrapper around inlined `nhop_select()` to
allow callers external to the routing subsystem select the proper
nexthop from the multipath group without including internal headers.
New calls does not require reference counting objects and reduce
the amount of copied/processed rtentry data.
Differential Revision: https://reviews.freebsd.org/D27675
These functions get/set tty winsize respectively, and are trivial wrappers
around corresponding termio ioctls.
The functions are expected to be a part of POSIX.1 issue 8:
https://www.austingroupbugs.net/view.php?id=1151#c3856.
They are currently available in NetBSD and in musl libc.
PR: 251868
Submitted by: Soumendra Ganguly <soumendraganguly@gmail.com>
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D27650
This change introduces framework that allows to dynamically
attach or detach longest prefix match (lpm) lookup algorithms
to speed up datapath route tables lookups.
Framework takes care of handling initial synchronisation,
route subscription, nhop/nhop groups reference and indexing,
dataplane attachments and fib instance algorithm setup/teardown.
Framework features automatic algorithm selection, allowing for
picking the best matching algorithm on-the-fly based on the
amount of routes in the routing table.
Currently framework code is guarded under FIB_ALGO config option.
An idea is to enable it by default in the next couple of weeks.
The following algorithms are provided by default:
IPv4:
* bsearch4 (lockless binary search in a special IP array), tailored for
small-fib (<16 routes)
* radix4_lockless (lockless immutable radix, re-created on every rtable change),
tailored for small-fib (<1000 routes)
* radix4 (base system radix backend)
* dpdk_lpm4 (DPDK DIR24-8-based lookups), lockless datastrucure, optimized
for large-fib (D27412)
IPv6:
* radix6_lockless (lockless immutable radix, re-created on every rtable change),
tailed for small-fib (<1000 routes)
* radix6 (base system radix backend)
* dpdk_lpm6 (DPDK DIR24-8-based lookups), lockless datastrucure, optimized
for large-fib (D27412)
Performance changes:
Micro benchmarks (I7-7660U, single-core lookups, 2048k dst, code in D27604):
IPv4:
8 routes:
radix4: ~20mpps
radix4_lockless: ~24.8mpps
bsearch4: ~69mpps
dpdk_lpm4: ~67 mpps
700k routes:
radix4_lockless: 3.3mpps
dpdk_lpm4: 46mpps
IPv6:
8 routes:
radix6_lockless: ~20mpps
dpdk_lpm6: ~70mpps
100k routes:
radix6_lockless: 13.9mpps
dpdk_lpm6: 57mpps
Forwarding benchmarks:
+ 10-15% IPv4 forwarding performance (small-fib, bsearch4)
+ 25% IPv4 forwarding performance (full-view, dpdk_lpm4)
+ 20% IPv6 forwarding performance (full-view, dpdk_lpm6)
Control:
Framwork adds the following runtime sysctls:
List algos
* net.route.algo.inet.algo_list: bsearch4, radix4_lockless, radix4
* net.route.algo.inet6.algo_list: radix6_lockless, radix6, dpdk_lpm6
Debug level (7=LOG_DEBUG, per-route)
net.route.algo.debug_level: 5
Algo selection (currently only for fib 0):
net.route.algo.inet.algo: bsearch4
net.route.algo.inet6.algo: radix6_lockless
Support for manually changing algos in non-default fib will be added
soon. Some sysctl names will be changed in the near future.
Differential Revision: https://reviews.freebsd.org/D27401
If LED state is set through evdev interface, than asynchronous nature
of USB transfer callback can lead to change of order of events echoed
back to userland as it causes LED events to be echoed with some lag.
Fix that with echoing of LED events synchronously in ioctl handler.
Reviewed by: hselasky
Obtained from: sysutils/iichid
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D27750
Unlike AT keyboards, HID devices are able to send all pc105 key
states within a single report. Let evdev to transmit all key state
changes within a single report too.
Reviewed by: hselasky
Obtained from: sysutils/iichid
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D27749
Add support for LAN found on Thinkpad USB-C and Thunderbolt Gen 2
docking stations.
Submitted by: ali.abdallah@suse.com
MFC after: 1 week
Sponsored by: Mellanox Technologies // NVIDIA Networking
Hardware timestamp reporting is disabled by default as it produces many
extra events which are not handled by consumers like libinput.
Add hw.usb.wmt.timestamps=1 tunable to loader.conf to enable it.
Obtained from: sysutils/iichid
In Hybrid mode, the number of contacts that can be reported in one
report is less than the maximum number of contacts that the device
supports. For example, a device that supports a maximum of 4
concurrent physical contacts, can set up its top-level collection to
deliver a maximum of two contacts in one report. If four contact
points are present, the device can break these up into two serial
reports that deliver two contacts each.
Obtained from: sysutils/iichid
When using NFS-over-TLS, an NFS client can optionally provide an X.509
certificate to the server during the TLS handshake. For some situations,
such as different NFS servers or different certificates being mapped
to different user credentials on the NFS server, there may be a need
for different mounts to provide different certificates.
This new mount option called "tlscertname" may be used to specify a
non-default certificate be provided. This alernate certificate will
be stored in /etc/rpc.tlsclntd in a file with a name based on what is
provided by this mount option.
Original if_dwc driver used m_defrag as an implementation shortcut but on
1000Mb networks it affects performance. Implement multi-descriptor support for
TX path.
Tested on RK3399-Firefly, patch adds ~15% of network throughput.
Reviewed By: manu
Differential Revision: https://reviews.freebsd.org/D27520
The existing values correspond to x86 exception vector numbers, but the
trap numbers used in the kernel do not match these 1-to-1. Prefer the
definitions from x86/trap.h, as they are what actually get passed to
kdb_trap(). This is of little consequence, as gdb_cpu_signal() only
reports the trap reason (signal number) to the gdb client.
This is limited to the subset of trap values for which kdb_trap() is
reachable.
Reviewed by: kib
Discussed with: jhb
MFC after: 1 week
Sponsored by: NetApp, Inc.
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D27645
Add support for the remote 'G' packet. This is not widely used by gdb
when 'P' is supported, but is technically required by any remote gdb
stub implementation [1].
[1] https://sourceware.org/gdb/current/onlinedocs/gdb/Overview.html
Reviewed by: cem
MFC after: 1 week
Sponsored by: NetApp, Inc.
Sponsored by: Klara, Inc.
NetApp PR: 44
Differential Revision: https://reviews.freebsd.org/D27644
We support bulk reads of the register set, but not reading specific
registers via the 'p' packet. This is useful at least for the 'call'
command in gdb.
Reviewed by: cem
MFC after: 1 week
Sponsored by: NetApp, Inc.
Sponsored by: Klara, Inc.
NetApp PR: 44
Differential Revision: https://reviews.freebsd.org/D27644