A security feature from c06f087ccb appeared to be a huge bottleneck
under SYN flood. To mitigate that add a sysctl that would make
syncache(4) globally visible, ignoring UID/GID, jail(2) and mac(4)
checks. When turned on, we won't need to call crhold() on the listening
socket credential for every incoming SYN packet.
Reviewed by: bz
Make it possible to reclaim items from a specific NUMA domain.
- Add uma_zone_reclaim_domain() and uma_reclaim_domain().
- Permit parallel reclamations. Use a counter instead of a flag to
synchronize with zone_dtor().
- Use the zone lock to protect cache_shrink() now that parallel reclaims
can happen.
- Add a sysctl that can be used to trigger reclamation from a specific
domain.
Currently the new KPIs are unused, so there should be no functional
change.
Reviewed by: mav
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D29685
Add global definitions for first-touch and interleave policies. The
former may be useful for UMA, which implements a similar policy without
using domainset iterators.
No functional change intended.
Reviewed by: mav
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D29104
In 22bd0c9731 ossl(4) was ported to arm64. The manual page was
adapted, but never installed since the ossl(4) manual page was
i386 / amd64 only.
Reviewed by: mhorne
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D29762
KASAN enables the use of LLVM's AddressSanitizer in the kernel. This
feature makes use of compiler instrumentation to validate memory
accesses in the kernel and detect several types of bugs, including
use-after-frees and out-of-bounds accesses. It is particularly
effective when combined with test suites or syzkaller. KASAN has high
CPU and memory usage overhead and so is not suited for production
environments.
The runtime and pmap maintain a shadow of the kernel map to store
information about the validity of memory mapped at a given kernel
address.
The runtime implements a number of functions defined by the compiler
ABI. These are prefixed by __asan. The compiler emits calls to
__asan_load*() and __asan_store*() around memory accesses, and the
runtime consults the shadow map to determine whether a given access is
valid.
kasan_mark() is called by various kernel allocators to update state in
the shadow map. Updates to those allocators will come in subsequent
commits.
The runtime also defines various interceptors. Some low-level routines
are implemented in assembly and are thus not amenable to compiler
instrumentation. To handle this, the runtime implements these routines
on behalf of the rest of the kernel. The sanitizer implementation
validates memory accesses manually before handing off to the real
implementation.
The sanitizer in a KASAN-configured kernel can be disabled by setting
the loader tunable debug.kasan.disable=1.
Obtained from: NetBSD
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D29416
MAP-E (RFC 7597) requires special care for selecting source ports
in NAT operation on the Customer Edge because a part of bits of the port
numbers are used by the Border Relay to distinguish another side of the
IPv4-over-IPv6 tunnel.
PR: 254577
Reviewed by: kp
Differential Revision: https://reviews.freebsd.org/D29468
There haven't been any non-obscure drivers that supported this
functionality and it has been impossible to test to ensure that it
still works. The only known consumer of this interface was the engine
in OpenSSL < 1.1. Modern OpenSSL versions do not include support for
this interface as it was not well-documented.
Reviewed by: cem
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D29736
Allows for duplicate locks to be acquired without witness complaining.
Similar flags exists already for rwlock(9) and sx(9).
Reviewed by: markj
MFC after: 3 days
Sponsored by: NetApp, Inc.
Sponsored by: Klara, Inc.
NetApp PR: 52
Differential Revision: https://reviews.freebsd.org/D29683n
types.h defines device_t as a typedef of struct device *. struct device
is defined in subr_bus.c and almost all of the kernel uses device_t.
The LinuxKPI also defines a struct device, so type confusion can occur.
This causes bugs and ambiguity for debugging tools. Rename the FreeBSD
struct device to struct _device.
Reviewed by: gbe (man pages)
Reviewed by: rpokala, imp, jhb
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D29676
A number of changes:
- Clarifies the locking rules when calling the routine.
- Correct the description regarding the content range to be purged.
- Document the effects on page fault handler.
MFC after: 3 days
MFC with: 86a52e262a
Sponsored by: The FreeBSD Foundation
Reviewed by: bcr, kib
Approved by: philip (mentor)
Differential Revision: https://reviews.freebsd.org/D29637
The possibility of using a sysctl.conf.local on a machine that has a shared
sysctl.conf(5) isn't documented. So mention the sysctl.conf.local in the
manual page.
PR: 254901
Submitted by: Jose Luis Duran <jlduran at gmail dot com>
Reported by: Jose Luis Duran <jlduran at gmail dot com>
Reviewed by: markj
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D29673
As other parts of the base tcp stack (eg.
tcp fastopen) already use jenkins_hash32,
and the properties appear reasonably good,
switching to use that.
Reviewed By: tuexen, #transport, ae
MFC after: 2 weeks
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D29515
MFC after: 3 days
Sponsored by: The FreeBSD Foundation
Reviewed by: bcr
Approved by: philip (mentor)
Differential Revision: https://reviews.freebsd.org/D29408
Follow-up to the removal of the mcov from kernel.
Noted by: mckusick
Reviewed by: mckusick
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D29563
Otherwise it breaks when offloading like checksum or TSO are used,
because second (encapsulated) ip_output() processing passes fragments of
the encapsulated packet down to the hardware interface.
Diagnosed by: hselasky
Reviewed by: np
Sponsored by: Nvidia Networking / Mellanox Technologies
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D29501
documention.
Commit SVN r364219 / Git 8a0edc914f changed random(9) to be a shim around
prng32(9) and inadvertently caused random(9) to begin returning numbers in the
range [0,2^32-1] instead of [0,2^31-1], where the latter has been the documented
range for decades.
The increased output range has been identified as the source of numerous bugs in
code written against the historical output range e.g. ipfw "prob" rules and
stats(3) are known to be affected, and a non-exhaustive audit of the tree
identified other random(9) consumers which are also likely affected.
As random(9) is deprecated and slated for eventual removal in 14.0, consumers
should gradually be audited and migrated to prng(9).
Submitted by: Loic Prylli <lprylli@netflix.com>
Obtained from: Netflix
Reviewed by: cem, delphij, imp
MFC after: 1 day
MFC to: stable/13, releng/13.0
Differential Revision: https://reviews.freebsd.org/D29385
ipv6_ipfilter_rules was obsoleted because of ipfilter was updated, and
rc_parallel_start was reverted to undergo further refinement.
PR: 254398
Fixes: e2ad10e847, f61831d2e8
Document the workstation ACL ruleset, which uses stateful rules.
While here, add a note about where some of the undocumented variables
can be found. This is not a perfect solution for bug 127359, but it at
at least gives a place to go look, and can be used as a reference for
when bug 127359 gets fixed properly.
PR: 254358, 127359
After length decisions, we've decided that the if_wg(4) driver and
related work is not yet ready to live in the tree. This driver has
larger security implications than many, and thus will be held to
more scrutiny than other drivers.
Please also see the related message sent to the freebsd-hackers@
and freebsd-arch@ lists by Kyle Evans <kevans@FreeBSD.org> on
2021/03/16, with the subject line "Removing WireGuard Support From Base"
for additional context.
These ioctl commands aim to provide easier ways for user space
applications to enumerate existing audio devices and the node they can
potentially use.
The exchange of device lists between user space and kernel is done on
nv(9). Some ioctl commands are added to /dev/sndstat node:
- SNDSTAT_REFRESH_DEVS
- SNDSTAT_GET_DEVS
- SNDSTAT_ADD_USER_DEVS
- SNDSTAT_FLUSH_USER_DEVS
Bump __FreeBSD_version to reflect the addition of the ioctls.
Sponsored by: The FreeBSD Foundation
Reviewed by: hselasky
Approved by: philip (mentor)
Differential Revision: https://reviews.freebsd.org/D26884
Man pages can be big in total, add an options to split man pages
in -man packages so we produce smaller packages.
This is useful for small jails or mfsroot produced of pkgbase.
The option is off by default.
Reviewed by: bapt, Mina Galić <me@igalic.co>
Differential Revision: https://reviews.freebsd.org/D29169
MFC after: 2 weeks
This is the culmination of about a week of work from three developers to
fix a number of functional and security issues. This patch consists of
work done by the following folks:
- Jason A. Donenfeld <Jason@zx2c4.com>
- Matt Dunwoodie <ncon@noconroy.net>
- Kyle Evans <kevans@FreeBSD.org>
Notable changes include:
- Packets are now correctly staged for processing once the handshake has
completed, resulting in less packet loss in the interim.
- Various race conditions have been resolved, particularly w.r.t. socket
and packet lifetime (panics)
- Various tests have been added to assure correct functionality and
tooling conformance
- Many security issues have been addressed
- if_wg now maintains jail-friendly semantics: sockets are created in
the interface's home vnet so that it can act as the sole network
connection for a jail
- if_wg no longer fails to remove peer allowed-ips of 0.0.0.0/0
- if_wg now exports via ioctl a format that is future proof and
complete. It is additionally supported by the upstream
wireguard-tools (which we plan to merge in to base soon)
- if_wg now conforms to the WireGuard protocol and is more closely
aligned with security auditing guidelines
Note that the driver has been rebased away from using iflib. iflib
poses a number of challenges for a cloned device trying to operate in a
vnet that are non-trivial to solve and adds complexity to the
implementation for little gain.
The crypto implementation that was previously added to the tree was a
super complex integration of what previously appeared in an old out of
tree Linux module, which has been reduced to crypto.c containing simple
boring reference implementations. This is part of a near-to-mid term
goal to work with FreeBSD kernel crypto folks and take advantage of or
improve accelerated crypto already offered elsewhere.
There's additional test suite effort underway out-of-tree taking
advantage of the aforementioned jail-friendly semantics to test a number
of real-world topologies, based on netns.sh.
Also note that this is still a work in progress; work going further will
be much smaller in nature.
MFC after: 1 month (maybe)
This lets one interrupt DDB's output, which is useful if paging is
disabled and the output device is slow.
This follows a previous implementation in svn r311952 / git
5fddef7999 which was reverted because it
broke DDB type-ahead.
Now, try this again, but with a 512-byte type-ahead buffer. While there
is buffer space, control input is handled and non-control input is
buffered. When the buffer is exhausted, the default is to print a
warning and drop further non-control input in order to continue handling
control input. sysctl debug.ddb.prioritize_control_input can be set to
0 to instead preserve all input but lose immediate handling of control
input. This could for example effect pasting of a large script into the
ddb console.
Suggested by: Anton Rang <rang@acm.org>
Reviewed by: markj
Discussed with: imp
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D28676
While here also document that for counter_u64_free().
Reviewed by: rpokala@
MFC after: 1 week
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D29215
The NIC no longer provides a host database, and hasn't for quite some
time. Remove that paragraph, it's not been relevant for many years. Also, hosts
appeared in 4.1c, not 4.2, so correct that too.
Noticed by: Henry Bent
config_intrhook_drain will remove the hook from the list as
config_intrhook_disestablish does if the hook hasn't been called. If it has,
config_intrhook_drain will wait for the hook to be disestablished in the normal
course (or expedited, it's up to the driver to decide how and when
to call config_intrhook_disestablish).
This is intended for removable devices that use config_intrhook and might be
attached early in boot, but that may be removed before the kernel can call the
config_intrhook or before it ends. To prevent all races, the detach routine will
need to call config_intrhook_train.
Sponsored by: Netflix, Inc
Reviewed by: jhb, mav, gde (in D29006 for man page)
Differential Revision: https://reviews.freebsd.org/D29005
Fix the types of period and duty in share/man/man9/pwmbus.9 to match the one in sys/dev/pmw/pwmbus.c.
Reviewed By: rpokala
Differential Revision: https://reviews.freebsd.org/D29139
MFC after: 3 days
The structure was renamed while refactoring Netflix's KTLS changes for
upstreaming, but the original name remained in tcp.4 and was
subsequently copied to ktls.4.
PR: 254141
Reported by: asomers
MFC after: 3 days
The example in the manual page of wg(4) for connecting to a
peer was missing the 'public-key' ifconfig(8) keyword and for the
addressed peer the port must be specified.
PR: 253866
Reported by: Sergey Akhmatov <sergey at akhmatov dot ru>
Reviewed by: debdrup
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D29115
truncate(1) is not case-sensitive with regard to setting the size
of a file. makefs(8), however, does not honor upper-case values.
Update release-specific files and the release(7) manual page to
reflect this.
MFC with: 1ca8842f3a
Submitted by: ehem_freebsd_m5p.com (original)
Differential Review: https://reviews.freebsd.org/D28979
Sponsored by: Rubicon Communications, LLC ("Netgate")
The zero_region() kernel interface was previously undocumented.
Add a new zero_region(9) manual page to document it.
Submitted by: Ka Ho Ng <khng@freebsdfoundation.org>
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D28914
with the semantic following C11 signal_fence, that is, it establishes
ordering between its place and any interrupt handler executing on the
same CPU.
Reviewed by: markj, mjg, rlibby
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D28909
The commit below added parallel service startup, and it needs to be
documented, so people know about it.
PR: 249192
MFC with: 77e1ccbee3
Reviewed by: yuripv
Differential Revision: https://reviews.freebsd.org/D28898
We now live in the world of git, and release(7) should reflect that.
As of the commit referenced below, release images also no longer
include (stale) documentation, as the documentation has moved to
AsciiDoctor. This means that a few environment variables no longer
make sense, so remove them from their sections and mention them in
the compatibility section instead.
While here, also pet mandoc.
PR: 253615
MFC after: 3 days
MFC with: f61e92ca5a release: permanently remove the 'reldoc'
target and associates
Reviewed by: gjb, lwhsu, yuripv
Differential Revision: https://reviews.freebsd.org/D28881
Add /var/run/bhyve/ to BSD.var.dist so we don't have to call mkdir when
creating the unix domain socket for a given bhyve vm.
The path to the unix domain socket for a bhyve vm will now be
/var/run/bhyve/vmname instead of /var/run/bhyve/checkpoint/vmname
Move BHYVE_RUN_DIR from snapshot.c to snapshot.h so it can be shared
to bhyvectl(8).
Reviewed by: jhb
Differential Revision: https://reviews.freebsd.org/D28783
Our uefi support has included environment variable support for several years
now. Remove the bogus blanket statement saying we don't support them.
MFC After: 3 days
Packages default to ending up in a different location compared to the
documentation, so catch up to the implementation by referring to the
location where packages can usually be found if no environment variables
have been set.
While here, also update the mention of the file extension to match the
txz format that packages use.
PR: 253179, 224370
Reported by: rwatson, jeromer at fastmail dotnet
Note that this algorithm implements the mode defined in RFC 8439.
Reviewed by: cem
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D27836
This is a prerequisite to allowing the use of hardware watchpoints for
userspace debuggers.
This is also a slight departure from the x86 behaviour, since `si_addr`
returns the data address that triggered the watchpoint, not the
address of the instruction that was executed. Otherwise, there is no
straightforward way for the application to determine which watchpoint
was triggered. Make a note of this in the siginfo(3) man page.
Reviewed by: jhb, markj (earlier version)
Tested by: Michał Górny (mgorny@gentoo.org)
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D28561
- improved pipe calculation which does not degrade under heavy loss
- engaging in Loss Recovery earlier under adverse conditions
- Rescue Retransmission in case some of the trailing packets of a request got lost
All above changes are toggled with the sysctl "rfc6675_pipe" (disabled by default).
Reviewers: #transport, tuexen, lstewart, slavash, jtl, hselasky, kib, rgrimes, chengc_netapp.com, thj, #manpages, kbowling, #netapp, rscheff
Reviewed By: #transport
Subscribers: imp, melifaro
MFC after: 2 weeks
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D18985
This adds a new sysctl to Wellspring Touchpad driver for controlling
Z-Axis (2-finger vertical scroll) direction "hw.usb.wsp.z_invert".
Submitted by: James Wright <james.wright_AT_digital-chaos_DOT_com>
Reviewed by: wulf
PR: 253321
Differential revision: https://reviews.freebsd.org/D28521
Since we ship a ktls(4) enabled OpenSSL version, mention
the src.conf(5) option WITH_OPENSSL_KTLS in the manual page.
Reviewed by: jhb
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D28435
Currently, OpenCrypto consumers can request asynchronous dispatch by
setting a flag in the cryptop. (Currently only IPSec may do this.) I
think this is a bit confusing: we (conditionally) set cryptop flags to
request async dispatch, and then crypto_dispatch() immediately examines
those flags to see if the consumer wants async dispatch. The flag names
are also confusing since they don't specify what "async" applies to:
dispatch or completion.
Add a new KPI, crypto_dispatch_async(), rather than encoding the
requested dispatch type in each cryptop. crypto_dispatch_async() falls
back to crypto_dispatch() if the session's driver provides asynchronous
dispatch. Get rid of CRYPTOP_ASYNC() and CRYPTOP_ASYNC_KEEPORDER().
Similarly, add crypto_dispatch_batch() to request processing of a tailq
of cryptops, rather than encoding the scheduling policy using cryptop
flags. Convert GELI, the only user of this interface (disabled by
default) to use the new interface.
Add CRYPTO_SESS_SYNC(), which can be used by consumers to determine
whether crypto requests will be dispatched synchronously. This is just
a helper macro. Use it instead of looking at cap flags directly.
Fix style in crypto_done(). Also get rid of CRYPTO_RETW_EMPTY() and
just check the relevant queues directly. This could result in some
unnecessary wakeups but I think it's very uncommon to be using more than
one queue per worker in a given workload, so checking all three queues
is a waste of cycles.
Reviewed by: jhb
Sponsored by: Ampere Computing
Submitted by: Klara, Inc.
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D28194
Handling of unknown MACs on an bridge with incomplete learning
capabilites (aka uplink ports) can be defined in different ways.
The classical approach is to broadcast unicast frames send to an
unknown MAC, because the unknown devices can be everywhere. This mode
is default for ng_bridge(4).
In the case of dedicated uplink ports, which prohibit learning of MAC
addresses in order to save memory and CPU cycles, the broadcast
approach is dangerous. All traffic to the uplink port is broadcasted
to every downlink port, too. In this case, it's better to restrict the
distribution of frames to unknown MAC to the uplink ports only.
In order to keep the chance small and the handling as natural as
possible, the first attached link is used to determine the behaviour
of the bridge: If it is an "uplink" port, then the bridge switch from
classical mode to restricted mode.
Reviewed By: kp
Approved by: kp (mentor)
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D28487
The ng_bridge(4) node is designed to work in moderately small
environments. Connecting such a node to a larger network rapidly fills
the MAC table for no reason. It even become complicated to obtain data
from the gettable message, because the result is too large to
transmit.
This patch introduces, two new functionality bits on the hooks:
- Allow or disallow MAC address learning for incoming patckets.
- Allow or disallow sending unknown MACs through this hook.
Uplinks are characterized by denied learing while sending out
unknowns. Normal links are charaterized by allowed learning and
sending out unknowns.
Reviewed by: kp
Approved by: kp (mentor)
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D23963
update mrsas(4) since big-endian is supported since
e34a057ca6
Reviewed by: bdragon, gbe
Sponsored by: Eldorado Research Institute (eldorado.org.br)
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D28475
Glen (@gjb) noticed that I am haven't mentioned the authors of the
WireGuard device driver in the manual page.
This is commit addressed this commit.
Reviewed by: gjb, brueffer
Differential Revision: https://reviews.freebsd.org/D28464
X-MFC-with: e59d9cb412
This option has been equivalent to any form of C++ support since libstdc++
was removed. Therefore, replace all MK_LIBCPLUSPLUS uses with MK_CXX.
Reviewed By: emaste
Differential Revision: https://reviews.freebsd.org/D27974
The argument passed to g_provider_by_name(9) can be a geom name or a
fullpath.
- g_provider_by_name() gained this functionality in
769afdc71e.
Reviewed by: imp, kevans
Approved by: kevans (mentor)
Differential Revision: https://reviews.freebsd.org/D27566
There's a third party dependency on this option; currently,
net/openldap24-{,sasl-}client. At least mention that an openldap from ports
is needed for this option.
PR: 252866
Reported-by: Build Option Survey via Michael Dexter
MFC-after: 3 days