Instead of incrementing pretty random counters in the IP statistics,
create divert socket statistics structure. Export via netstat(1).
Differential revision: https://reviews.freebsd.org/D36381
Since 4.4BSD the protosw was used to implement socket types created
by socket(2) syscall and at the same to demultiplex incoming IPv4
datagrams (later copied to IPv6). This story ended with 78b1fc05b20.
These entries (e.g. IPPROTO_ICMP) in inetsw that were added to catch
packets in ip_input(), they would also be returned by pffindproto()
if user says socket(AF_INET, SOCK_RAW, IPPROTO_ICMP). Thus, for raw
sockets to work correctly, all the entries were pointing at raw_usrreq
differentiating only in the value of pr_protocol.
With 78b1fc05b20 all these entries are no longer needed, as ip_protox
is independent of protosw. Any socket syscall requesting SOCK_RAW type
would end up with rip_protosw. And this protosw has its pr_protocol
set to 0, allowing to mark socket with any protocol.
For IPv6 raw socket the change required two small fixes:
o Validate user provided protocol value
o Always use protocol number stored in inp in rip6_attach, instead
of protosw value, which is now always 0.
Differential revision: https://reviews.freebsd.org/D36380
The divert(4) is not a protocol of IPv4. It is a socket to
intercept packets from ipfw(4) to userland and re-inject them
back. It can divert and re-inject IPv4 and IPv6 packets today,
but potentially it is not limited to these two protocols. The
IPPROTO_DIVERT does not belong to known IP protocols, it
doesn't even fit into u_char. I guess, the implementation of
divert(4) was done the way it is done basically because it was
easier to do it this way, back when protocols for sockets were
intertwined with IP protocols and domains were statically
compiled in.
Moving divert(4) out of inetsw accomplished two important things:
1) IPDIVERT is getting much closer to be not dependent on INET.
This will be finalized in following changes.
2) Now divert socket no longer aliases with raw IPv4 socket.
Domain/proto selection code won't need a hack for SOCK_RAW and
multiple entries in inetsw implementing different flavors of
raw socket can merge into one without requirement of raw IPv4
being the last member of dom_protosw.
Differential revision: https://reviews.freebsd.org/D36379
Recent problems related to NFSv4 mounts has been traced
to multiple NFSv4 clients using the same /etc/hostid
(or kern.hostuuid, if you prefer).
This patch adds a sentence to the man page noting that
clients must have unique /etc/hostid's.
This is a content change.
Reviewed by: gbe (manpages)
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D36392
To avoid issues starting any USB transfers before the open
function is complete.
Differential Revision: https://reviews.freebsd.org/D36391
MFC after: 1 week
Sponsored by: NVIDIA Networking
Some controllers like the XHCI(4) loose track of the data toggle value when
USB receive transfers are cancelled at close. This in turn can lead to to
data loss after the next open.
To avoid data loss, make sure both the receive and transmit data toggles
get reset, before trying to read or write any data.
Differential Revision: https://reviews.freebsd.org/D36391
Submitted by: Dave Baukus <daveb@spectralogic.com>
MFC after: 1 week
Sponsored by: NVIDIA Networking
domain_init() called at SI_SUB_PROTO_DOMAIN/SI_ORDER_SECOND is always
called right after domain_add(), that had been called at SI_ORDER_FIRST.
Note that protocols aren't initialized yet at this point, since they are
usually scheduled to initialize at SI_ORDER_THIRD.
After this merge it becomes clear that DOMF_SUPPORTED / DOMF_INITED
can be garbage collected as they are set & checked in the same function.
For initialization of the domain system itself it is now clear that
domaininit() can be garbage collected and static initializer is enough.
o Statically initialize max_linkhdr to default value without relying
on domain(9) code doing that.
o Statically initialize max_protohdr to a sane value, without relying
on TCP being always compiled in.
o Retire max_datalen. Set, but not used.
o Don't make the domain(9) system responsible in validating these
values and updating max_hdr. Instead provide KPI max_linkhdr_grow()
and max_protohdr_grow().
o Call max_linkhdr_grow() from IEEE802.11 and max_protohdr_grow() from
TCP. Those are the only protocols today that may want to grow.
Reviewed by: tuexen
Differential revision: https://reviews.freebsd.org/D36376
- Ignore I/O requests with insufficiently sized input or output
buffers (those not containing compete request headers).
- Ignore control requests with improperly sized buffers.
- While here, explicitly zero the output header of an I/O request to
avoid leaking malloc garbage from the host if the header is not
fully populated.
PR: 264521
Reported by: Robert Morris <rtm@lcs.mit.edu>
Reviewed by: mav, emaste
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D36271
Use a consistent prefix ("virtio-scsi: ") similar to the e1000 device
model.
Reviewed by: mav, emaste
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D36270
When preparing to transmit pending packets, ensure that the head (TDH)
and tail (TDT) indices are in bounds. Note that validating values
when they are written is not sufficient along as the transmit length
(TDLEN) could be changed turning a value that was valid when written
into an out of bounds value.
While here, add further restrictions to the head register (TDH). The
manual states that writing to this value while transmit is enabled can
cause unexpected behavior and that it should only be written after a
reset. As such, ignore attempts to write while transmit is active,
and also ignore writes of non-zero values. Later e1000 chipsets have
this register as read-only.
Also ignore any attempts to transmit packets if the transmit ring's
size is zero.
PR: 264567
Reported by: Robert Morris <rtm@lcs.mit.edu>
Reviewed by: emaste
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D36269
The io:::start and end probes trace individual I/O requests.
Also remove the unimplemented wait-start and wait-done probes.
PR: 266098
MFC after: 1 week
In RB_INSERT_COLOR and RB_REMOVE_COLOR, avoid reading a parent pointer
from memory, and then reading the left-color bit from memory, and then
reading the right-color bit from memory, since they're all in the same
field. The compiler can't infer that only the first read is really
necessary, so write the code in a way so that it doesn't have to.
Drop RB_RED_LEFT and RB_RED_RIGHT macros that reach into memory to get
those bits. Drop RB_COLOR, the only thing left using RB_RED_LEFT and
RB_RED_RIGHT after the other changes, and go straight to DIAGNOSTIC
code in subr_stats to implement RB_COLOR for its single, dubious use
there.
Reviewed by: alc
MFC after: 3 weeks
Differential Revision: https://reviews.freebsd.org/D36353
Add IF_DEBUG_LEVEL() macro to ensure all debug output preparation
is run only if the current debug level is sufficient. Consistently
use it within routing subsystem.
MFC after: 2 weeks
* add nhop_get_unlinked() used to prepare referenced but not
linked nexthop, that can later be used as a clone source.
* add nhop_check_gateway() to check for allowed address family
combinations between the rib family and neighbor family (useful
for 4o6 or direct routes)
* add nhop_set_upper_family() to allow copying IPv6 nexthops to
IPv4 rib.
* add rt_get_rnd() wrapper, returning both nexthop/group and its
weight attached to the rtentry.
* Add CHT_SLIST_FOREACH_SAFE(), allowing to delete items during
iteration.
MFC after: 2 weeks
Multiple consumers in the kernel space want to install IPv4 or IPv6
default route. Provide convenient wrapper to simplify the code
inside the customers.
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D36167
Construct the desired hexthops directly instead of using the
"translation" layer in form of filling rt_addrinfo data.
Simplify V_rt_add_addr_allfibs handling by using recently-added
rib_copy_route() to propagate the routes to the non-primary address
fibs.
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D36166
Further updates based on ways Peter Holm found to corrupt UFS
superblocks in ways that could cause kernel hangs or crashes.
No legitimate superblocks should fail as a result of these changes.
Reported by: Peter Holm
Tested by: Peter Holm
Sponsored by: The FreeBSD Foundation
- Record the physical address for doorbell page region
For supporting RDMA device with multiple user contexts with their
individual doorbell pages, record the start address of doorbell page
region for use by the RDMA driver to allocate user context doorbell IDs.
- Handle vport sharing between devices
For outgoing packets, the PF requires the VF to configure the vport with
corresponding protection domain and doorbell ID for the kernel or user
context. The vport can't be shared between different contexts.
Implement the logic to exclusively take over the vport by either the
Ethernet device or RDMA device.
- Add functions for allocating doorbell page from GDMA
The RDMA device needs to allocate doorbell pages for each user context.
Implement those functions and expose them for use by the RDMA driver.
- Export Work Queue functions for use by RDMA driver
RDMA device may need to create Ethernet device queues for use by Queue
Pair type RAW. This allows a user-mode context accesses Ethernet hardware
queues. Export the supporting functions for use by the RDMA driver.
- Define max values for SGL entries
The number of maximum SGl entries should be computed from the maximum
WQE size for the intended queue type and the corresponding OOB data
size. This guarantees the hardware queue can successfully queue requests
up to the queue depth exposed to the upper layer.
- Define and process GDMA response code GDMA_STATUS_MORE_ENTRIES
When doing memory registration, the PF may respond with
GDMA_STATUS_MORE_ENTRIES to indicate a follow request is needed. This is
not an error and should be processed as expected.
- Define data structures for protection domain and memory registration
The MANA hardware support protection domain and memory registration for use
in RDMA environment. Add those definitions and expose them for use by the
RDMA driver.
MFC after: 2 weeks
Sponsored by: Microsoft
Commit 40ada74ee1da modified the NFSv4.1/4.2 client so
that it would issue a DestroySession to the server when
all session slots are marked bad. Once this is done,
the Sequence operation should get a NFSERR_BADSESSION
reply from the server.
Without this patch, the code was setting ND_HASSLOTID
when, in fact, there was no slot marked in use by
nfsv4_sequencelookup(). This would result in the
code freeing a slot not in use. The effect of this
was minimal, since the session was already destroyed.
This patch fixes the code so that it does not set
ND_HASSLOTID for this case.
MFC after: 2 weeks
The NFSv4.1/4.2 client does recovery when it receives a
NFSERR_BADSESSION reply from the server. If the server has
not rebooted, this is often caused by multiple clients using
the same /etc/hostid and, as such, not being recognized as
different clients by the server.
This trivial patch adds a console message to suggest that
client's /etc/hostid's need to be checked for uniqueness.
MFC after: 2 weeks
None of tools working with login classes change umask(1)
and we had no ways to specify non-default umask for a service
not touching its startup script. This change makes in possible.
Some file-sharing services that create new files may benefit from it.
Differential: https://reviews.freebsd.org/D36309
MFC-after: 3 days
Improve RB_REMOVE by reading the fields of the removed node only once,
and by not writing to the removed node.
Reviewed by: kib
Discussed with: markj
MFC after: 3 weeks
Differential Revision: https://reviews.freebsd.org/D36288
The NFSv4.1/4.2 server generates a console message that indicates
that there is no session. I was until recently perplexed w.r.t. how
this could occur. It turns out that the common cause is multiple NFS
clients with the same /etc/hostid.
The host uuid is used by the FreeBSD NFSv4.1/4.2 client as a unique
identifier for the client. If multiple clients use the same host uuid,
this indicates to the NFSv4.1/4.2 server that they are the same client
and confusion occurs.
This trivial patch modifies the console message to suggest that the
client's /etc/hostid needs to be checked for uniqueness.
Reviewed by: asomers
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D36377
When the NFSv4.1/4.2 client is handling a server error
of NFSERR_BADSESSION, it retries RPCs with a new session.
Without this patch, the nd_slotid was not being updated
for the new session.
This would result in a bogus console message like
"Wrong session srvslot=X slot=Y" and then it would
free the incorrect slot, often generating a
"freeing free slot!!" console message as well.
This patch fixes the problem.
Note that FreeBSD NFSv4.1/4.2 servers only
generate a NFSERR_BADSESSION error after a reboot
or after a client does a DestroySession operation.
PR: 260011
MFC after: 1 week
Make Ethernet rule addition behave just like L3 rules, in that we now
allow ongoing transaction to be interrupted, rather than rejecting a new
one.
The result of that is that we can no longer end up in a state where a
transaction failed, but was not rolled back, blocking us from setting
new rules.
It's safe to assume there's no pending epoch callback for cleanup here,
because we've explicitly called it before hitting pf_begin_eth().
Sponsored by: Rubicon Communications, LLC ("Netgate")
Enabling other driver code found that the bssid in
struct ieee80211_bss_conf is not an array but expected to be
a const pointer (const, != NULL checks).
Adjust accordingly in the header and in the LinuxKPI compat code.
There initialization now needs to be a static array always present
as we need a value before we will have a BSS (node in scan_to_auth)
as the mac80211 driver (*handlers) are expecting the pointer to be
not NULL (copying without checks).
This is a pre-req to enable d3 (CONFIG_PM[_SLEEP]) in the future.
Tested by: Tomoaki AOKI (junchoon dec.sakura.ne.jp)
Tested by: Berislav Purgar (bpurgar gmail.com)
Sponsored by: The FreeBSD Foundation
MFC after: 3 days
Some non-compliant USB devices do not implement the
clear endpoint halt feature. Silently ignore such
failures, when they at least responded correctly
passing up a valid STALL PID packet.
Tested by: Doug Ambrisko <ambrisko@ambrisko.com>
MFC after: 1 week
Sponsored by: NVIDIA Networking