Right now rack will fail to load due to its hack in accessing symbol names
in cc_newreno. This was fine when newreno was always compiled into the
kernel but now ... not so much. Instead lets fix up rack to use the socket
option queries to get the information it wants and set the parameters. We
also fix the CC parameter so they are always settable.
Reviewed by: tuexen
Sponsored by: Netflix Inc
Differential Revision: https://reviews.freebsd.org/D37622
* Separate interface creation from interface modification code
* Support setting some interface attributes (ifdescr, mtu, up/down, promisc)
* Improve interaction with the cloners requiring to parse/write custom
interface attributes
* Add bitmask-based way of checking if the attribute is present in the
message
* Don't use multipart RTM_GETLINK replies when searching for the
specific interface names
* Use ENODEV instead of ENOENT in case of failed RTM_GETLINK search
* Add python netlink test helpers
* Add some netlink interface tests
Differential Revision: https://reviews.freebsd.org/D37668
Rather than waiting until the batchqueue is full to acquire the lock &
process the queue, we now start trying to acquire the lock using trylocks
when the batchqueue is 1/2 full. This removes almost all contention on the
vm pagequeue mutex for for our busy sendfile() based web workload.
It also greadly reduces the amount of time a network driver ithread
remains blocked on a mutex, and eliminates some packet drops under
heavy load.
So that the system does not loose the benefit of processing large
batchqueues, I've doubled the size of the batchqueues. This way, when
there is no contention, we process the same batch size as before.
This has been run for several months on a busy Netflix server, as well
as on my personal desktop.
Reviewed by: markj
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D37305
The inpcb struct is one of the most heavily utilized in the kernel
on a busy network server. By aligning it to a cacheline
boundary, we can ensure that closely related fields in the inpcb
and tcbcb can be predictably located on the same cacheline. rrs
has already done a lot of this work to put related fields on the
same line for the tcbcb.
In combination with a forthcoming patch to align the start of the tcpcb,
we see a roughly 3% reduction in CPU use on a busy web server serving
traffic over roughly 50,000 TCP connections.
Reviewed by: glebius, markj, tuexen
Differential Revision: https://reviews.freebsd.org/D37687
Sponsored by: Netflix
Sockets have special handling for EPIPE on a write, that was spread out
into several places. Treating transient errors is also special - if
protocol is atomic, than we should ignore any changes to uio_resid, a
transient error means the write had completely failed (see d2b3a0ed31).
- Provide sousrsend() that expects a valid uio, and leave sosend() for
kernel consumers only. Do all special error handling right here.
- In dofilewrite() don't do special handling of error for DTYPE_SOCKET.
- For send(2), write(2) and aio_write(2) call into sousrsend() and remove
error handling for kern_sendit(), soo_write() and soaio_process_job().
PR: 265087
Reported by: rz-rpi03 at h-ka.de
Reviewed by: markj
Differential revision: https://reviews.freebsd.org/D35863
This subsystem is superseded by modern debugging facilities,
e.g. DTrace probes and TCP black box logging.
We intentionally leave SO_DEBUG in place, as many utilities may
set it on a socket. Also the tcp::debug DTrace probes look at
this flag on a socket.
Reviewed by: gnn, tuexen
Discussed with: rscheff, rrs, jtl
Differential revision: https://reviews.freebsd.org/D37694
After commit fac6dee9eb ("Remove tests for obsolete compilers in the
build system"), we always set -fdebug-prefix-map, so there's no point in
defining and testing _MAP_DEBUG_PREFIX. No functional change intended.
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
This use of "volatile" in the vnet definitions doesn't have any effect.
VNET_DEFINE_STATE(volatile int, ...) should work, but let's avoid using
"volatile" altogether and convert to atomic_load/atomic_store. Also
convert to bool while here.
Reviewed by: kp, mjg
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D37684
The previous commit fixed a memory leak, where we'd fail to clean up
removed groups (and interfaces).
Check that we now clean those up as expected.
MFC after: 2 weeks
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D37570
The detach of the interface and group were leaving pfi_ifnet memory
behind. Check if the kif still has references, and clean it up if it
doesn't
On interface detach, the group deletion was notified first and then a
change notification was sent. This would recreate the group in the kif
layer. Reorder the change to before the delete.
PR: 257218
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D37569
Move the use of the `offsetof(struct ovpn_counters, fieldname) /
sizeof(uint64_t)` construct into a macro.
This removes a fair bit of code duplication and should make things a
little easier to read.
Reviewed by: zlei
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D37607
When we remove a peer userspace can no longer retrieve its counters. To
ensure that userspace can get a full count of the entire session we now
include the counters in the deletion message.
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D37606
Introduce two more RB_TREEs so that we can look up peers by their peer
id (already present) or vpn4 or vpn6 address.
This removes the last linear scan of the peer list.
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D37605
OpenVPN will introduce a mechanism to retrieve per-peer statistics.
Start tracking those so we can return them to userspace when queried.
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D37603
OpenVPN userspace no longer uses the ioctl interface to send control
packets. It instead uses the socket directly.
The use of OVPN_SEND_PKT was never released, so we can remove this
without worrying about compatibility.
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D37602
The SYN_RCVD state count is tricky here due to default code path and TFO
being so different. In the default case the count is incremented when a
syncache entry is added to the the database in syncache_insert(). Later
when connection transitions from syncache entry to a socket in
syncache_expand(), this counter is inherited by the tcpcb. If socket or
tcpcb allocation failed in syncache_socket() failed the syncache_expand()
is responsible for decrement. In the TFO case the syncache entry is not
inserted into database and count of SYN_RCVD is first incremented in the
syncache_tfo_expand() after successful socket allocation. Thus, inside
syncache_socket() we can't tell whether we need to decrement in a case of
a failure or not. The caller is responsible for this book keeping.
Fixes: 07285bb4c2
Differential revision: https://reviews.freebsd.org/D37610
This is a bit more readable, and this loop is probably unlikely to gain
any `continue` or `break`s.
Suggested by: pstef
Differential Revision: https://reviews.freebsd.org/D37676
The previous logic conflated some things... in this block:
- j: input characters rendered so far
- nc: number of characters in the line
- col: columns rendered so far
- hw: column width ((h)ard (w)idth?)
Comparing j to hw or col to nc are naturally wrong, as col and hw are
limits on their respective counters and nc is already brought down to hw
if the input line should be truncated to start with.
Right now, we end up easily truncating lines with tabs in them as we
count each tab for $tabwidth lines in the input line, but we really
should only be accounting for them in the column count. The problem is
most easily demonstrated by the two input files added for the tests,
the two tabbed lines lose at least a word or two even though there's
plenty of space left in the row for each side.
Reviewed by: bapt, pstef
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D37676
This unbreak drm-kmod build.
the const is part of Linux API
Unfortunately drm-kmod uses hand-rolled refcount* calls on a kref
object. For now go the easy route of keeping it operational by casting
stuff internally.
The general goal here is to make FreeBSD refcount API use an opaque
type, hence the ongoing removal of hand-rolled accesses.
Reported by: emaste
Sync serial (e.g. T1/T1/G.703) interfaces are obsolete, this driver
includes obfuscated source, and has reported potential security issues.
Differential Revision: https://reviews.freebsd.org/D33468
Sync serial (e.g. T1/T1/G.703) interfaces are obsolete, this driver
includes obfuscated source, and has reported potential security issues.
Differential Revision: https://reviews.freebsd.org/D33467
And the related sconfig utility. Sync serial (e.g. E1/T1) interfaces
are obsolete, and nobody responded to several inquires on the mailing
lists about use of these drivers.
Relnotes: Yes
MFC after: 3 days
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D23928
Libarchive 3.6.2
Important bug fixes:
rar5 reader: fix possible garbled output with bsdtar -O (#1745)
mtree reader: support reading mtree files with tabs (#1783)
various small fixes for issues found by CodeQL
MFC after: 2 weeks
PR: 286306 (exp-run)
It typically had two args with an optional third from the userland
declaration in sys/ioccom.h. However, the funciton definition used a
non-optional char * argument. This mismatch is UB behavior (but worked
due to the calling convetions of all our machines).
Instead, add a declaration for ioctl to stand.h, make the third arg
'void *' which is a better match to the ... declaration before. This
prevents the convert int * -> char * errors as well. Make the ioctl
user-space declaration truly user-space specific (omit it in the
stand-alone build).
No functional change intended.
Sponsored by: Netflix
Reviewed by: emaste
Differential Revision: https://reviews.freebsd.org/D37680
We documented "[Use]PrivilegeSeparation defaults to sandbox" as one of
our modifications to ssh's server-side defaults, but this is not (any
longer) a difference from upstream.
Sponsored by: The FreeBSD Foundation
And note that it is deprecated.
PR: 236569
Reported by: bcran
Reviewed by: imp
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D37678
On amd64, don't abort promotion due to a missing accessed bit in a
mapping before possibly write protecting that mapping. Previously,
in some cases, we might not repromote after madvise(MADV_FREE) because
there was no write fault to trigger the repromotion. Conversely, on
arm64, don't pointlessly, yet harmlessly, write protect physical pages
that aren't part of the physical superpage.
Don't count aborted promotions due to explicit promotion prohibition
(arm64) or hardware errata (amd64) as ordinary promotion failures.
Reviewed by: kib, markj
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D36916