Commit Graph

277751 Commits

Author SHA1 Message Date
Richard Scheffenegger
4012ef7754 tcp: Functional implementation of Accurate ECN
The AccECN handshake and TCP header flags are supported,
no support yet for the AccECN option. This minimalistic
implementation is sufficient to support DCTCP while
dramatically cutting the number of ACKs, and provide ECN
response from the receiver to the CC modules.

Reviewed By:		#transport, #manpages, rrs, pauamma
Sponsored by:		NetApp, Inc.
Differential Revision:	https://reviews.freebsd.org/D21011
2022-08-31 15:05:53 +02:00
Richard Scheffenegger
c21b7b55be tcp: finish SACK loss recovery on sudden lack of SACK blocks
While a receiver should continue sending SACK blocks for the
duration of a SACK loss recovery, if for some reason the
TCP options no longer contain these SACK blocks, but we
already started maintaining the Scoreboard, keep on handling
incoming ACKs (without SACK) as belonging to the SACK recovery.

Reported by:		thj
Reviewed by:		tuexen, #transport
MFC after:		2 weeks
Sponsored by:		NetApp, Inc.
Differential Revision:	https://reviews.freebsd.org/D36046
2022-08-31 14:49:47 +02:00
Andrew Turner
544f047f89 Store mpidr as a 64-bit value on arm64
The mpidr register is 64 bit on arm64 and 32 bit on arm. Fix this by
extending the arm64 definition to include the top 32 bits.

To preserve KBI when MFCing split the value into two 32 bit values.
This will be cleaned up later only on main.

Reviewed by:	bz
Sponsored by:	The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D36346
2022-08-31 11:48:31 +01:00
Peter Wemm
998b0a4ad8
OptionalObsoleteFiles.inc: Add missing sendmail feature macro files.
MFC after:	3 days
2022-08-31 02:38:57 -07:00
Emmanuel Vadot
0a9a4d2cd6 arm64: Fix hwpmc module for OPT_ACPI isn't selected
Fixes: 59191f3573 ("Add support of ARM CMN-600 controller ...")
Sponsored by:	Beckhoff Automation GmbH & Co. KG
2022-08-31 09:25:39 +02:00
Gleb Smirnoff
24af7808fa protosw: repair protocol selection logic in socket(2)
Pointy hat to:	glebius
Fixes:		61f7427f02
2022-08-30 21:19:46 -07:00
Gleb Smirnoff
1f3d8c09be procstat: fix printing divert(4) sockets 2022-08-30 16:26:21 -07:00
Gleb Smirnoff
4627bc1e90 tests: use PF_DIVERT/SOCK_RAW instead of PF_INET/SOCK_RAW/IPPROTO_DIVERT 2022-08-30 16:24:37 -07:00
Gleb Smirnoff
f70a2e2948 ipfwpcap: use PF_DIVERT/SOCK_RAW instead of PF_INET/SOCK_RAW/IPPROTO_DIVERT 2022-08-30 16:24:37 -07:00
Gleb Smirnoff
1df08e905a natd: use PF_DIVERT/SOCK_RAW instead of PF_INET/SOCK_RAW/IPPROTO_DIVERT 2022-08-30 16:24:37 -07:00
Cy Schubert
2a63683b5d sqlite3: Vendor import of sqlite3 3.39.2
Changes at https://www.sqlite.org/releaselog/3_39_2.html.

Security:       CVE-2022-35737
Obtained from:  https://www.sqlite.org/2022/sqlite-autoconf-3390200.tar.gz
MFC after:      immediately

Merge commit '1545dd7d6cc54bdfca9bc9f74c42745b514b60c9' into sqlite3/main3
2022-08-30 15:54:32 -07:00
Cy Schubert
1545dd7d6c sqlite3: Vendor import of sqlite3 3.39.2
Changes at https://www.sqlite.org/releaselog/3_39_2.html.

Obtained from:	https://www.sqlite.org/2022/sqlite-autoconf-3390200.tar.gz
2022-08-30 15:29:34 -07:00
Gleb Smirnoff
e72c522858 divert(4): make it compilable and working without INET
Differential revision:	https://reviews.freebsd.org/D36383
2022-08-30 15:09:21 -07:00
Gleb Smirnoff
f1fb051716 divert(4): maintain own cb database and stop using inpcb KPI
Here go cons of using inpcb for divert:
- divert(4) uses only 16 bits (local port) out of struct inpcb,
  which is 424 bytes today.
- The inpcb KPI isn't able to provide hashing for divert(4),
  thus it uses global inpcb list for lookups.
- divert(4) uses INET-specific part of the KPI, making INET
  a requirement for IPDIVERT.

Maintain our own very simple hash lookup database instead.  It
has mutex protection for write and epoch protection for lookups.
Since now so->so_pcb no longer points to struct inpcb, don't
initialize protosw methods to methods that belong to PF_INET.
Also, drop support for setting options on a divert socket.  My
review of software in base and ports confirms that this has no
use and unlikely worked before.

Differential revision:	https://reviews.freebsd.org/D36382
2022-08-30 15:09:21 -07:00
Gleb Smirnoff
2b1c72171e divert(4): provide statistics
Instead of incrementing pretty random counters in the IP statistics,
create divert socket statistics structure.  Export via netstat(1).

Differential revision:	https://reviews.freebsd.org/D36381
2022-08-30 15:09:21 -07:00
Gleb Smirnoff
61f7427f02 protosw: cleanup protocols that existed merely to provide pr_input
Since 4.4BSD the protosw was used to implement socket types created
by socket(2) syscall and at the same to demultiplex incoming IPv4
datagrams (later copied to IPv6).  This story ended with 78b1fc05b2.

These entries (e.g. IPPROTO_ICMP) in inetsw that were added to catch
packets in ip_input(), they would also be returned by pffindproto()
if user says socket(AF_INET, SOCK_RAW, IPPROTO_ICMP).  Thus, for raw
sockets to work correctly, all the entries were pointing at raw_usrreq
differentiating only in the value of pr_protocol.

With 78b1fc05b2 all these entries are no longer needed, as ip_protox
is independent of protosw.  Any socket syscall requesting SOCK_RAW type
would end up with rip_protosw.  And this protosw has its pr_protocol
set to 0, allowing to mark socket with any protocol.

For IPv6 raw socket the change required two small fixes:
o Validate user provided protocol value
o Always use protocol number stored in inp in rip6_attach, instead
  of protosw value, which is now always 0.

Differential revision:	https://reviews.freebsd.org/D36380
2022-08-30 15:09:21 -07:00
Gleb Smirnoff
8624f4347e divert: declare PF_DIVERT domain and stop abusing PF_INET
The divert(4) is not a protocol of IPv4.  It is a socket to
intercept packets from ipfw(4) to userland and re-inject them
back.  It can divert and re-inject IPv4 and IPv6 packets today,
but potentially it is not limited to these two protocols.  The
IPPROTO_DIVERT does not belong to known IP protocols, it
doesn't even fit into u_char.  I guess, the implementation of
divert(4) was done the way it is done basically because it was
easier to do it this way, back when protocols for sockets were
intertwined with IP protocols and domains were statically
compiled in.

Moving divert(4) out of inetsw accomplished two important things:

1) IPDIVERT is getting much closer to be not dependent on INET.
   This will be finalized in following changes.
2) Now divert socket no longer aliases with raw IPv4 socket.
   Domain/proto selection code won't need a hack for SOCK_RAW and
   multiple entries in inetsw implementing different flavors of
   raw socket can merge into one without requirement of raw IPv4
   being the last member of dom_protosw.

Differential revision:	https://reviews.freebsd.org/D36379
2022-08-30 15:09:21 -07:00
Rick Macklem
603677334a mount_nfs.8: Note that NFSv4 requires unique /etc/hostid's
Recent problems related to NFSv4 mounts has been traced
to multiple NFSv4 clients using the same /etc/hostid
(or kern.hostuuid, if you prefer).

This patch adds a sentence to the man page noting that
clients must have unique /etc/hostid's.

This is a content change.

Reviewed by:	gbe (manpages)
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D36392
2022-08-30 07:57:27 -07:00
Alexander Motin
35b7759c05 cp: Fix build without VM_AND_BUFFER_CACHE_SYNCHRONIZED.
It allows to not use mmap() for small files, which is not helpful
in case of ZFS.  Should be no functional change.

MFC after:	1 week
2022-08-30 10:51:21 -04:00
Dave Baukus
cbc5350359 ucom(4): Make sure the open routine is executed synchronously.
To avoid issues starting any USB transfers before the open
function is complete.

Differential Revision:	https://reviews.freebsd.org/D36391
MFC after:	1 week
Sponsored by:	NVIDIA Networking
2022-08-30 16:19:58 +02:00
Hans Petter Selasky
40e43b056d umodem(4): Clear stall at every open.
Some controllers like the XHCI(4) loose track of the data toggle value when
USB receive transfers are cancelled at close. This in turn can lead to to
data loss after the next open.

To avoid data loss, make sure both the receive and transmit data toggles
get reset, before trying to read or write any data.

Differential Revision:	https://reviews.freebsd.org/D36391
Submitted by:		Dave Baukus <daveb@spectralogic.com>
MFC after:	1 week
Sponsored by:	NVIDIA Networking
2022-08-30 16:02:47 +02:00
Gleb Smirnoff
c00605751e tcp: remove a dead code leftover from T/TCP,
that doesn't have any value today.
2022-08-29 19:30:12 -07:00
Gleb Smirnoff
8fc8063849 divert: merge div_output() into div_send()
No functional change intended.
2022-08-29 19:15:01 -07:00
Gleb Smirnoff
244e1aeaec domains: merge domain_init() into domain_add()
domain_init() called at SI_SUB_PROTO_DOMAIN/SI_ORDER_SECOND is always
called right after domain_add(), that had been called at SI_ORDER_FIRST.
Note that protocols aren't initialized yet at this point, since they are
usually scheduled to initialize at SI_ORDER_THIRD.

After this merge it becomes clear that DOMF_SUPPORTED / DOMF_INITED
can be garbage collected as they are set & checked in the same function.

For initialization of the domain system itself it is now clear that
domaininit() can be garbage collected and static initializer is enough.
2022-08-29 19:15:01 -07:00
Gleb Smirnoff
e18c5816ea domains: use queue(9) SLIST for linked list of domains 2022-08-29 19:15:01 -07:00
Gleb Smirnoff
d7574c7432 domains: init pr_domain in pr_init() 2022-08-29 19:15:01 -07:00
Gleb Smirnoff
c414347bc5 mbufs: isolate max_linkhdr and max_protohdr handling in the mbuf code
o Statically initialize max_linkhdr to default value without relying
  on domain(9) code doing that.
o Statically initialize max_protohdr to a sane value, without relying
  on TCP being always compiled in.
o Retire max_datalen. Set, but not used.
o Don't make the domain(9) system responsible in validating these
  values and updating max_hdr.  Instead provide KPI max_linkhdr_grow()
  and max_protohdr_grow().
o Call max_linkhdr_grow() from IEEE802.11 and max_protohdr_grow() from
  TCP.  Those are the only protocols today that may want to grow.

Reviewed by:		tuexen
Differential revision:	https://reviews.freebsd.org/D36376
2022-08-29 19:14:25 -07:00
John Baldwin
bb31aee26b bhyve virtio-scsi: Avoid out of bounds accesses to guest requests.
- Ignore I/O requests with insufficiently sized input or output
  buffers (those not containing compete request headers).

- Ignore control requests with improperly sized buffers.

- While here, explicitly zero the output header of an I/O request to
  avoid leaking malloc garbage from the host if the header is not
  fully populated.

PR:		264521
Reported by:	Robert Morris <rtm@lcs.mit.edu>
Reviewed by:	mav, emaste
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D36271
2022-08-29 15:37:27 -07:00
John Baldwin
62806a7f31 bhyve virtio-scsi: Tidy warning and debug prints.
Use a consistent prefix ("virtio-scsi: ") similar to the e1000 device
model.

Reviewed by:	mav, emaste
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D36270
2022-08-29 15:37:15 -07:00
John Baldwin
7afe342dcb bhyve e1000: Sanitize transmit ring indices.
When preparing to transmit pending packets, ensure that the head (TDH)
and tail (TDT) indices are in bounds.  Note that validating values
when they are written is not sufficient along as the transmit length
(TDLEN) could be changed turning a value that was valid when written
into an out of bounds value.

While here, add further restrictions to the head register (TDH).  The
manual states that writing to this value while transmit is enabled can
cause unexpected behavior and that it should only be written after a
reset.  As such, ignore attempts to write while transmit is active,
and also ignore writes of non-zero values.  Later e1000 chipsets have
this register as read-only.

Also ignore any attempts to transmit packets if the transmit ring's
size is zero.

PR:		264567
Reported by:	Robert Morris <rtm@lcs.mit.edu>
Reviewed by:	emaste
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D36269
2022-08-29 15:36:57 -07:00
Alexander V. Chernikov
177f04d57f routing: constantify @rc in rib_decompose_notification().
Clarify the @rc immutability by explicitly marking @rc const.

MFC after:	2 weeks
2022-08-29 18:12:24 +00:00
Mark Johnston
32faf071bd devstat: Remove DTrace io probes lacking a BIO reference
The io:::start and end probes trace individual I/O requests.

Also remove the unimplemented wait-start and wait-done probes.

PR:		266098
MFC after:	1 week
2022-08-29 13:22:36 -04:00
Mark Johnston
09a2fce092 makefs tests: Do not run ZFS tests in parallel
makefs-created pools always have the same GUID and thus cannot be
imported simultaneously.

Reported by:	olivier
2022-08-29 12:54:25 -04:00
Mark Johnston
a3b6b3ac4d makefs tests: Do not install ZFS tests if WITHOUT_ZFS is defined 2022-08-29 12:50:51 -04:00
Mark Johnston
575ca2c265 makefs: Remove some redundant initializations
No functional change intended.
2022-08-29 12:50:51 -04:00
Alexander V. Chernikov
7b3440fc30 Revert "routing: install prefix and loopback routes using new nhop-based KPI."
Temporarily revert the commit to unblock testing.

This reverts commit a1b59379db.
2022-08-29 16:20:42 +00:00
Doug Moore
5d91386826 rb_tree: avoid extra reads in rebalancing
In RB_INSERT_COLOR and RB_REMOVE_COLOR, avoid reading a parent pointer
from memory, and then reading the left-color bit from memory, and then
reading the right-color bit from memory, since they're all in the same
field. The compiler can't infer that only the first read is really
necessary, so write the code in a way so that it doesn't have to.

Drop RB_RED_LEFT and RB_RED_RIGHT macros that reach into memory to get
those bits.  Drop RB_COLOR, the only thing left using RB_RED_LEFT and
RB_RED_RIGHT after the other changes, and go straight to DIAGNOSTIC
code in subr_stats to implement RB_COLOR for its single, dubious use
there.

Reviewed by:	alc
MFC after:	3 weeks
Differential Revision:	https://reviews.freebsd.org/D36353
2022-08-29 11:11:31 -05:00
Alexander V. Chernikov
578a99c939 routing: improve multiline debug
Add IF_DEBUG_LEVEL() macro to ensure all debug output preparation
 is run only if the current debug level is sufficient. Consistently
 use it within routing subsystem.

MFC after:	2 weeks
2022-08-29 15:14:49 +00:00
Alexander V. Chernikov
fe05d1dd0f routing: extend nhop(9) kpi
* add nhop_get_unlinked() used to prepare referenced but not
 linked nexthop, that can later be used as a clone source.
* add nhop_check_gateway() to check for allowed address family
  combinations between the rib family and neighbor family (useful
  for 4o6 or direct routes)
* add nhop_set_upper_family() to allow copying IPv6 nexthops to
 IPv4 rib.
* add rt_get_rnd() wrapper, returning both nexthop/group and its
 weight attached to the rtentry.
* Add CHT_SLIST_FOREACH_SAFE(), allowing to delete items during
  iteration.

MFC after:	2 weeks
2022-08-29 14:46:03 +00:00
Alexander V. Chernikov
c24a8f19c5 routing: fix rib_add_route_px()
Fix panic in newly-added rib_add_route_px() by removin unlocked
 prefix lookup.

MFC after:	2 weeks
2022-08-29 12:57:47 +00:00
Alexander V. Chernikov
db4ca19002 routing: add ability to store opaque indentifiers in nhops/nhgs
This is a pre-requisite for the direct nexthop/nexhop group operations
 via netlink.

MFC after:	2 weeks
2022-08-29 12:20:28 +00:00
Alexander V. Chernikov
6d4f6e4c70 routing: make rib_add_redirect() use new nhop-based KPI
MFC after:		1 month
Differential Revision:	https://reviews.freebsd.org/D36169
2022-08-29 10:23:26 +00:00
Alexander V. Chernikov
835a611e68 routing: make IPv6 defrouter code use new nhop-based KPI.
MFC after:		1 month
Differential Revision:	https://reviews.freebsd.org/D36168
2022-08-29 10:08:47 +00:00
Alexander V. Chernikov
d8b2693414 routing: add rib_add_default_route() wrapper
Multiple consumers in the kernel space want to install IPv4 or IPv6
 default route. Provide convenient wrapper to simplify the code
 inside the customers.

MFC after:		1 month
Differential Revision:	https://reviews.freebsd.org/D36167
2022-08-29 10:08:24 +00:00
Alexander V. Chernikov
a1b59379db routing: install prefix and loopback routes using new nhop-based KPI.
Construct the desired hexthops directly instead of using the
 "translation" layer in form of filling rt_addrinfo data.
Simplify V_rt_add_addr_allfibs handling by using recently-added
 rib_copy_route() to propagate the routes to the non-primary address
 fibs.

MFC after:	1 month
Differential Revision:	https://reviews.freebsd.org/D36166
2022-08-29 10:07:58 +00:00
Kirk McKusick
827622937e Correct calculation of inode location in getnextino cache.
Fix for 345bfec.

Reported by:  Peter Holm
Sponsored by: The FreeBSD Foundation
2022-08-28 23:47:17 -07:00
Wei Hu
9e772f203f mana: Fix a couple i386 build errors
Fix a couple i386 build errors

Fixes:	b685df314f
Sponsored by:	Microsoft
2022-08-29 06:35:02 +00:00
Kirk McKusick
9dee5da745 Updates to UFS/FFS superblock integrity checks when reading a superblock.
Further updates based on ways Peter Holm found to corrupt UFS
superblocks in ways that could cause kernel hangs or crashes.

No legitimate superblocks should fail as a result of these changes.

Reported by:  Peter Holm
Tested by:    Peter Holm
Sponsored by: The FreeBSD Foundation
2022-08-28 23:14:35 -07:00
Kirk McKusick
2e4da012d5 Correct calculation of inode location in getnextino cache.
Fix for 345bfec.

Reported by:  Peter Holm
Sponsored by: The FreeBSD Foundation
2022-08-28 23:09:29 -07:00
Wei Hu
b685df314f mana: some code refactoring and export apis for future RDMA driver
- Record the physical address for doorbell page region
  For supporting RDMA device with multiple user contexts with their
  individual doorbell pages, record the start address of doorbell page
  region for use by the RDMA driver to allocate user context doorbell IDs.

- Handle vport sharing between devices
  For outgoing packets, the PF requires the VF to configure the vport with
  corresponding protection domain and doorbell ID for the kernel or user
  context. The vport can't be shared between different contexts.

  Implement the logic to exclusively take over the vport by either the
  Ethernet device or RDMA device.

- Add functions for allocating doorbell page from GDMA
  The RDMA device needs to allocate doorbell pages for each user context.
  Implement those functions and expose them for use by the RDMA driver.

- Export Work Queue functions for use by RDMA driver
  RDMA device may need to create Ethernet device queues for use by Queue
  Pair type RAW. This allows a user-mode context accesses Ethernet hardware
  queues. Export the supporting functions for use by the RDMA driver.

- Define max values for SGL entries
  The number of maximum SGl entries should be computed from the maximum
  WQE size for the intended queue type and the corresponding OOB data
  size. This guarantees the hardware queue can successfully queue requests
  up to the queue depth exposed to the upper layer.

- Define and process GDMA response code GDMA_STATUS_MORE_ENTRIES
   When doing memory registration, the PF may respond with
   GDMA_STATUS_MORE_ENTRIES to indicate a follow request is needed. This is
   not an error and should be processed as expected.

- Define data structures for protection domain and memory registration
  The MANA hardware support protection domain and memory registration for use
  in RDMA environment. Add those definitions and expose them for use by the
  RDMA driver.

MFC after:	2 weeks
Sponsored by:	Microsoft
2022-08-29 05:24:21 +00:00