A security feature from c06f087ccb appeared to be a huge bottleneck
under SYN flood. To mitigate that add a sysctl that would make
syncache(4) globally visible, ignoring UID/GID, jail(2) and mac(4)
checks. When turned on, we won't need to call crhold() on the listening
socket credential for every incoming SYN packet.
Reviewed by: bz
Make it possible to reclaim items from a specific NUMA domain.
- Add uma_zone_reclaim_domain() and uma_reclaim_domain().
- Permit parallel reclamations. Use a counter instead of a flag to
synchronize with zone_dtor().
- Use the zone lock to protect cache_shrink() now that parallel reclaims
can happen.
- Add a sysctl that can be used to trigger reclamation from a specific
domain.
Currently the new KPIs are unused, so there should be no functional
change.
Reviewed by: mav
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D29685
Add global definitions for first-touch and interleave policies. The
former may be useful for UMA, which implements a similar policy without
using domainset iterators.
No functional change intended.
Reviewed by: mav
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D29104
In 22bd0c9731 ossl(4) was ported to arm64. The manual page was
adapted, but never installed since the ossl(4) manual page was
i386 / amd64 only.
Reviewed by: mhorne
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D29762
KASAN enables the use of LLVM's AddressSanitizer in the kernel. This
feature makes use of compiler instrumentation to validate memory
accesses in the kernel and detect several types of bugs, including
use-after-frees and out-of-bounds accesses. It is particularly
effective when combined with test suites or syzkaller. KASAN has high
CPU and memory usage overhead and so is not suited for production
environments.
The runtime and pmap maintain a shadow of the kernel map to store
information about the validity of memory mapped at a given kernel
address.
The runtime implements a number of functions defined by the compiler
ABI. These are prefixed by __asan. The compiler emits calls to
__asan_load*() and __asan_store*() around memory accesses, and the
runtime consults the shadow map to determine whether a given access is
valid.
kasan_mark() is called by various kernel allocators to update state in
the shadow map. Updates to those allocators will come in subsequent
commits.
The runtime also defines various interceptors. Some low-level routines
are implemented in assembly and are thus not amenable to compiler
instrumentation. To handle this, the runtime implements these routines
on behalf of the rest of the kernel. The sanitizer implementation
validates memory accesses manually before handing off to the real
implementation.
The sanitizer in a KASAN-configured kernel can be disabled by setting
the loader tunable debug.kasan.disable=1.
Obtained from: NetBSD
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D29416
MAP-E (RFC 7597) requires special care for selecting source ports
in NAT operation on the Customer Edge because a part of bits of the port
numbers are used by the Border Relay to distinguish another side of the
IPv4-over-IPv6 tunnel.
PR: 254577
Reviewed by: kp
Differential Revision: https://reviews.freebsd.org/D29468
There haven't been any non-obscure drivers that supported this
functionality and it has been impossible to test to ensure that it
still works. The only known consumer of this interface was the engine
in OpenSSL < 1.1. Modern OpenSSL versions do not include support for
this interface as it was not well-documented.
Reviewed by: cem
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D29736
Allows for duplicate locks to be acquired without witness complaining.
Similar flags exists already for rwlock(9) and sx(9).
Reviewed by: markj
MFC after: 3 days
Sponsored by: NetApp, Inc.
Sponsored by: Klara, Inc.
NetApp PR: 52
Differential Revision: https://reviews.freebsd.org/D29683n
types.h defines device_t as a typedef of struct device *. struct device
is defined in subr_bus.c and almost all of the kernel uses device_t.
The LinuxKPI also defines a struct device, so type confusion can occur.
This causes bugs and ambiguity for debugging tools. Rename the FreeBSD
struct device to struct _device.
Reviewed by: gbe (man pages)
Reviewed by: rpokala, imp, jhb
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D29676
A number of changes:
- Clarifies the locking rules when calling the routine.
- Correct the description regarding the content range to be purged.
- Document the effects on page fault handler.
MFC after: 3 days
MFC with: 86a52e262a
Sponsored by: The FreeBSD Foundation
Reviewed by: bcr, kib
Approved by: philip (mentor)
Differential Revision: https://reviews.freebsd.org/D29637
The possibility of using a sysctl.conf.local on a machine that has a shared
sysctl.conf(5) isn't documented. So mention the sysctl.conf.local in the
manual page.
PR: 254901
Submitted by: Jose Luis Duran <jlduran at gmail dot com>
Reported by: Jose Luis Duran <jlduran at gmail dot com>
Reviewed by: markj
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D29673
As other parts of the base tcp stack (eg.
tcp fastopen) already use jenkins_hash32,
and the properties appear reasonably good,
switching to use that.
Reviewed By: tuexen, #transport, ae
MFC after: 2 weeks
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D29515
MFC after: 3 days
Sponsored by: The FreeBSD Foundation
Reviewed by: bcr
Approved by: philip (mentor)
Differential Revision: https://reviews.freebsd.org/D29408
Follow-up to the removal of the mcov from kernel.
Noted by: mckusick
Reviewed by: mckusick
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D29563
Otherwise it breaks when offloading like checksum or TSO are used,
because second (encapsulated) ip_output() processing passes fragments of
the encapsulated packet down to the hardware interface.
Diagnosed by: hselasky
Reviewed by: np
Sponsored by: Nvidia Networking / Mellanox Technologies
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D29501
documention.
Commit SVN r364219 / Git 8a0edc914f changed random(9) to be a shim around
prng32(9) and inadvertently caused random(9) to begin returning numbers in the
range [0,2^32-1] instead of [0,2^31-1], where the latter has been the documented
range for decades.
The increased output range has been identified as the source of numerous bugs in
code written against the historical output range e.g. ipfw "prob" rules and
stats(3) are known to be affected, and a non-exhaustive audit of the tree
identified other random(9) consumers which are also likely affected.
As random(9) is deprecated and slated for eventual removal in 14.0, consumers
should gradually be audited and migrated to prng(9).
Submitted by: Loic Prylli <lprylli@netflix.com>
Obtained from: Netflix
Reviewed by: cem, delphij, imp
MFC after: 1 day
MFC to: stable/13, releng/13.0
Differential Revision: https://reviews.freebsd.org/D29385
ipv6_ipfilter_rules was obsoleted because of ipfilter was updated, and
rc_parallel_start was reverted to undergo further refinement.
PR: 254398
Fixes: e2ad10e847, f61831d2e8
Document the workstation ACL ruleset, which uses stateful rules.
While here, add a note about where some of the undocumented variables
can be found. This is not a perfect solution for bug 127359, but it at
at least gives a place to go look, and can be used as a reference for
when bug 127359 gets fixed properly.
PR: 254358, 127359
After length decisions, we've decided that the if_wg(4) driver and
related work is not yet ready to live in the tree. This driver has
larger security implications than many, and thus will be held to
more scrutiny than other drivers.
Please also see the related message sent to the freebsd-hackers@
and freebsd-arch@ lists by Kyle Evans <kevans@FreeBSD.org> on
2021/03/16, with the subject line "Removing WireGuard Support From Base"
for additional context.
These ioctl commands aim to provide easier ways for user space
applications to enumerate existing audio devices and the node they can
potentially use.
The exchange of device lists between user space and kernel is done on
nv(9). Some ioctl commands are added to /dev/sndstat node:
- SNDSTAT_REFRESH_DEVS
- SNDSTAT_GET_DEVS
- SNDSTAT_ADD_USER_DEVS
- SNDSTAT_FLUSH_USER_DEVS
Bump __FreeBSD_version to reflect the addition of the ioctls.
Sponsored by: The FreeBSD Foundation
Reviewed by: hselasky
Approved by: philip (mentor)
Differential Revision: https://reviews.freebsd.org/D26884
Man pages can be big in total, add an options to split man pages
in -man packages so we produce smaller packages.
This is useful for small jails or mfsroot produced of pkgbase.
The option is off by default.
Reviewed by: bapt, Mina Galić <me@igalic.co>
Differential Revision: https://reviews.freebsd.org/D29169
MFC after: 2 weeks
This is the culmination of about a week of work from three developers to
fix a number of functional and security issues. This patch consists of
work done by the following folks:
- Jason A. Donenfeld <Jason@zx2c4.com>
- Matt Dunwoodie <ncon@noconroy.net>
- Kyle Evans <kevans@FreeBSD.org>
Notable changes include:
- Packets are now correctly staged for processing once the handshake has
completed, resulting in less packet loss in the interim.
- Various race conditions have been resolved, particularly w.r.t. socket
and packet lifetime (panics)
- Various tests have been added to assure correct functionality and
tooling conformance
- Many security issues have been addressed
- if_wg now maintains jail-friendly semantics: sockets are created in
the interface's home vnet so that it can act as the sole network
connection for a jail
- if_wg no longer fails to remove peer allowed-ips of 0.0.0.0/0
- if_wg now exports via ioctl a format that is future proof and
complete. It is additionally supported by the upstream
wireguard-tools (which we plan to merge in to base soon)
- if_wg now conforms to the WireGuard protocol and is more closely
aligned with security auditing guidelines
Note that the driver has been rebased away from using iflib. iflib
poses a number of challenges for a cloned device trying to operate in a
vnet that are non-trivial to solve and adds complexity to the
implementation for little gain.
The crypto implementation that was previously added to the tree was a
super complex integration of what previously appeared in an old out of
tree Linux module, which has been reduced to crypto.c containing simple
boring reference implementations. This is part of a near-to-mid term
goal to work with FreeBSD kernel crypto folks and take advantage of or
improve accelerated crypto already offered elsewhere.
There's additional test suite effort underway out-of-tree taking
advantage of the aforementioned jail-friendly semantics to test a number
of real-world topologies, based on netns.sh.
Also note that this is still a work in progress; work going further will
be much smaller in nature.
MFC after: 1 month (maybe)
This lets one interrupt DDB's output, which is useful if paging is
disabled and the output device is slow.
This follows a previous implementation in svn r311952 / git
5fddef7999 which was reverted because it
broke DDB type-ahead.
Now, try this again, but with a 512-byte type-ahead buffer. While there
is buffer space, control input is handled and non-control input is
buffered. When the buffer is exhausted, the default is to print a
warning and drop further non-control input in order to continue handling
control input. sysctl debug.ddb.prioritize_control_input can be set to
0 to instead preserve all input but lose immediate handling of control
input. This could for example effect pasting of a large script into the
ddb console.
Suggested by: Anton Rang <rang@acm.org>
Reviewed by: markj
Discussed with: imp
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D28676
While here also document that for counter_u64_free().
Reviewed by: rpokala@
MFC after: 1 week
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D29215
The NIC no longer provides a host database, and hasn't for quite some
time. Remove that paragraph, it's not been relevant for many years. Also, hosts
appeared in 4.1c, not 4.2, so correct that too.
Noticed by: Henry Bent
config_intrhook_drain will remove the hook from the list as
config_intrhook_disestablish does if the hook hasn't been called. If it has,
config_intrhook_drain will wait for the hook to be disestablished in the normal
course (or expedited, it's up to the driver to decide how and when
to call config_intrhook_disestablish).
This is intended for removable devices that use config_intrhook and might be
attached early in boot, but that may be removed before the kernel can call the
config_intrhook or before it ends. To prevent all races, the detach routine will
need to call config_intrhook_train.
Sponsored by: Netflix, Inc
Reviewed by: jhb, mav, gde (in D29006 for man page)
Differential Revision: https://reviews.freebsd.org/D29005
Fix the types of period and duty in share/man/man9/pwmbus.9 to match the one in sys/dev/pmw/pwmbus.c.
Reviewed By: rpokala
Differential Revision: https://reviews.freebsd.org/D29139
MFC after: 3 days
The structure was renamed while refactoring Netflix's KTLS changes for
upstreaming, but the original name remained in tcp.4 and was
subsequently copied to ktls.4.
PR: 254141
Reported by: asomers
MFC after: 3 days
The example in the manual page of wg(4) for connecting to a
peer was missing the 'public-key' ifconfig(8) keyword and for the
addressed peer the port must be specified.
PR: 253866
Reported by: Sergey Akhmatov <sergey at akhmatov dot ru>
Reviewed by: debdrup
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D29115