The method was called for two different conditions: 1) the VM layer is
low on pages or 2) one of UMA zones of mbuf allocator exhausted.
This change 2) into a new event handler, but all affected network
subsystems modified to subscribe to both, so this change shall not
bring functional changes under different low memory situations.
There were three subsystems still using pr_drain: TCP, SCTP and frag6.
The latter had its protosw entry for the only reason to register its
pr_drain method.
Reviewed by: tuexen, melifaro
Differential revision: https://reviews.freebsd.org/D36164
They were useful many years ago, when the callwheel was not efficient,
and the kernel tried to have as little callout entries scheduled as
possible.
Reviewed by: tuexen, melifaro
Differential revision: https://reviews.freebsd.org/D36163
While here remove recursive network epoch entry in mld_fasttimo_vnet(),
as this function is already in epoch.
Reviewed by: melifaro
Differential revision: https://reviews.freebsd.org/D36161
Modern TCP stacks uses multiple callouts per tcpcb, and a global
callout is ancient artifact. However it is still used to garbage
collect compressed timewait entries.
Reviewed by: melifaro, tuexen
Differential revision: https://reviews.freebsd.org/D36159
The protosw KPI historically has implemented two quite orthogonal
things: protocols that implement a certain kind of socket, and
protocols that are IPv4/IPv6 protocol. These two things do not
make one-to-one correspondence. The pr_input and pr_ctlinput methods
were utilized only in IP protocols. This strange duality required
IP protocols that doesn't have a socket to declare protosw, e.g.
carp(4). On the other hand developers of socket protocols thought
that they need to define pr_input/pr_ctlinput always, which lead to
strange dead code, e.g. div_input() or sdp_ctlinput().
With this change pr_input and pr_ctlinput as part of protosw disappear
and IPv4/IPv6 get their private single level protocol switch table
ip_protox[] and ip6_protox[] respectively, pointing at array of
ipproto_input_t functions. The pr_ctlinput that was used for
control input coming from the network (ICMP, ICMPv6) is now represented
by ip_ctlprotox[] and ip6_ctlprotox[].
ipproto_register() becomes the only official way to register in the
table. Those protocols that were always static and unlikely anybody
is interested in making them loadable, are now registered by ip_init(),
ip6_init(). An IP protocol that considers itself unloadable shall
register itself within its own private SYSINIT().
Reviewed by: tuexen, melifaro
Differential revision: https://reviews.freebsd.org/D36157
Move the mbr non-geli zfs cases to no-priv creation with makefs / mkimg.
Add comments about the weird thing we do for MBR + ZFS + Legacy. Add
comments about other architectures. Still need to think through how to
leverage a completed universe to do all the architectures...
Sponsored by: Netflix
Certain operations such as checksum insertion and VLAN insertion
require the device model to rewrite the packet header. The first step
in rewriting the packet header is to copy the existing packet header
from the source packet. This copy is done by copying data from an
iovec array that corresponds to the S/G entries described by transmit
descriptors. However, if the total packet length is smaller than the
headers that need to be copied as the initial template, this copy can
overflow the iovec array and use garbage values as the source pointer
to memcpy. The PR used a single descriptor with a length of 0 in its
PoC.
To fix, track the total packet length and drop requests to transmit
packets whose payload is smaller than the required header length.
While here, fix another issue where the final descriptor could have an
invalid length (too short) that could underflow 'len' when stripping
the checksum. Skip those requests instead, too.
PR: 264372
Reported by: Robert Morris <rtm@lcs.mit.edu>
Reviewed by: grehan, markj
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D36182
This avoids type confusion where a malicious guest could rewrite the
MaxPStreams field in an endpoint context after the endpoint was
initialized causing the device model to interpret a guest provided
address (stored in ep_ringaddr of the "software" endpoint state) as a
bhyve host process address (ep_sctx_trbs). It also prevents a malicious
guest from triggering overflows of ep_sctx_trbs[] by increasing the
number of streams after the endpoint has been initialized.
Rather than re-reading the MaxPStreams value out of the endpoint context
in guest memory on subsequent operations, cache the value in the software
endpoint state. Possibly the device model should raise errors if the
value of MaxPStreams changes while an endpoint is running. This approach
simply ignores any such changes by the guest.
PR: 264294, 264347
Reported by: Robert Morris <rtm@lcs.mit.edu>
Reviewed by: markj
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D36181
This reverts commit f6ffed44a8.
fetchadd will fail the waiters flag, which can cause other
code to wait when it should not with nothing clear it
Revert until I sort this out.
Reported by: markj
"Invalid TXQ id" and "Queue <n> is stuck <x> <y>" are two errors seen
more commonly by FreeBSD users. Try to gather some extra data the
"easy way" adding more error logging for these situations in the hope
to find a clue or at least do more targetd debugging in the future.
Note that for one of the errors the Linux Intel driver has a TODO to
print register data. If that will show up in future versions of the
driver this may also help.
Sponsored by: The FreeBSD Foundation
MFC after: 3 days
If hardware vlan tagging is disabled (after a vlan has been added) we
receive double-tagged packets, even if the packet on the wire only has a
single VLAN tag. That looks like this:
17:29:30.370787 00:51:82:11:22:02 > 90:ec:77:1f:8a:5f, ethertype 802.1Q (0x8100), length 64: vlan 0, p 0, ethertype 802.1Q, vlan 1001, p 0, ethertype ARP, Ethernet (len 6), IPv4 (len 4), Reply 10.101.0.12 is-at 00:51:82:11:22:02, length 42
This happens because the ixgbe driver does not clear the vlan flags in
the hardware (such as IXGBE_RXDCTL_VME) if IFCAP_VLAN_HWTAGGING is
cleared.
Add code to do so, which fixes this issue.
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D36139
This also fixes a bug where not-last unbusy failed to post a release
fence.
Reviewed by: markj (previous version), kib (previous version)
Differential Revision: https://reviews.freebsd.org/D36084
Start to use makefs for ZFS. This covers the gpt nogeli variants. ZFS
MBR booting is tricky and complicated, so will need some additional
tweaks that makefs/mkimg isn't able to do at the moment. This means that
all gpt nogeli amd64 combinations can be built w/o root.
In addition, tweak the generated qemu.sh files to use stdio for the
console. We grep the output for SUCCESS and report each of the booting
types. Create a all.sh that will run these automatically. These all can
also run w/o root.
In the future, I'll add support for a make univers followed by this
script to create other architectures' tests and/or generate stand tests
for /usr/tests...
Sponsored by: Netflix
Stack must be at least readable and writable.
PR: 242570
Reviewed by: kib, markj
MFC after: 1 month
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D35867
Both UDP and UDP Lite use same methods on sockets. Both UDP over IPv4
and over IPv6 use same methods. Don't pretend that methods can switch
and remove this unneeded complexity.
Reviewed by: melifaro
Differential revision: https://reviews.freebsd.org/D36154
For architectures with a small-data area, the __read_mostly section must
present at the object declaration.
(emaste note: This does not appear to have an affect within FreeBSD, but
may be needed by downstream projects that handle __read_mostly /
__section(".data.read_mostly") differently.)
Pull Request: https://github.com/freebsd/freebsd-src/pull/608
The pr_ctlinput method was a feature of IPv4/IPv6 with exception of
pfctlinput(), which broadcasted a call to pr_ctlinput on all protocols
ever registered statically or with pf_proto_register(). Now that
this broadcast call is gone, the only protocols that get their
pr_ctlinput ever called are those that have registered itselves with
ipproto_register() or ip6proto_register().
It is entirely possible that code deleted now was dead code from very
beginning. Just a copy-paste from TCP.
Reviewed by: rstone
Differential revision: https://reviews.freebsd.org/D36208
Obtained from https://android.googlesource.com/platform/bionic
libc/arch-x86_64/string/ at commit
919fb7f2e0e0c877dd5e9bbaa71d4c4a73e50ad3
Requested by: mjg
Sponsored by: The FreeBSD Foundation
Controllers must support the Identify Controller list if they support
Namespace Management. But the UNH NVMe tests use this command regardless
of whether the device under test supports Namespace Management.
This implementation returns an empty Controller list (i.e., Number of
Identifiers is zero).
Fixes UNH Test 1.1.2
Reviewed by: jhb
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D36193