Commit Graph

2278 Commits

Author SHA1 Message Date
Mark Johnston
317fa5169d netinet: Remove the IP(V6)_RSS_LISTEN_BUCKET socket option
It has no effect, and an exp-run revealed that it is not in use.

PR:		261398 (exp-run)
Reviewed by:	mjg, glebius
Sponsored by:	Klara, Inc.
Differential Revision:	https://reviews.freebsd.org/D38822
2023-02-28 15:57:21 -05:00
Mark Johnston
3aff4ccdd7 netinet: Remove IP(V6)_BINDMULTI
This option was added in commit 0a100a6f1e but was never completed.
In particular, there is no logic to map flowids to different listening
sockets, so it accomplishes basically the same thing as SO_REUSEPORT.
Meanwhile, we've since added SO_REUSEPORT_LB, which at least tries to
balance among listening sockets using a hash of the 4-tuple and some
optional NUMA policy.

The option was never documented or completed, and an exp-run revealed
nothing using it in the ports tree.  Moreover, it complicates the
already very complicated in_pcbbind_setup(), and the checking in
in_pcbbind_check_bindmulti() is insufficient.  So, let's remove it.

PR:		261398 (exp-run)
Reviewed by:	glebius
Sponsored by:	Klara, Inc.
Differential Revision:	https://reviews.freebsd.org/D38574
2023-02-27 10:03:11 -05:00
Gleb Smirnoff
96871af013 inpcb: use family specific sockaddr argument for bind functions
Do the cast from sockaddr to either IPv4 or IPv6 sockaddr in the
protocol's pr_bind method and from there on go down the call
stack with family specific argument.

Reviewed by:		zlei, melifaro, markj
Differential Revision:	https://reviews.freebsd.org/D38601
2023-02-15 10:30:16 -08:00
Mark Johnston
4130ea611f inpcb: Split in_pcblookup_hash_locked() and clean up a bit
Split the in_pcblookup_hash_locked() function into several independent
subroutine calls, each of which does some kind of hash table lookup.
This refactoring makes it easier to introduce variants of the lookup
algorithm that behave differently depending on whether they are
synchronized by SMR or the PCB database hash lock.

While here, do some related cleanup:
- Remove an unused ifnet parameter from internal functions.  Keep it in
  external functions so that it can be used in the future to derive a v6
  scopeid.
- Reorder the parameters to in_pcblookup_lbgroup() to be consistent with
  the other lookup functions.
- Remove an always-true check from in_pcblookup_lbgroup(): we can assume
  that we're performing a wildcard match.

No functional change intended.

Reviewed by:	glebius
Differential Revision:	https://reviews.freebsd.org/D38364
2023-02-09 16:15:03 -05:00
Gleb Smirnoff
220d892129 inpcb: immediately return matching pcb on lookup
This saves a lot of CPU cycles if you got large connection table.

The code removed originates from 413628a7e3, a very large changeset.
Discussed that with Bjoern, Jamie we can't recover why would we ever
have identical 4-tuples in the hash, even in the presence of jails.
Bjoern did a test that confirms that it is impossible to allocate an
identical connection from a jail to a host. Code review also confirms
that system shouldn't allow for such connections to exist.

With a lack of proper test suite we decided to take a risk and go
forward with removing that code.

Reviewed by:		gallatin, bz, markj
Differential Revision:	https://reviews.freebsd.org/D38015
2023-02-07 09:21:52 -08:00
Gleb Smirnoff
a9d22cce10 inpcb: use family specific sockaddr argument for connect functions
Do the cast from sockaddr to either IPv4 or IPv6 sockaddr in the
protocol's pr_connect method and from there on go down the call
stack with family specific argument.

Reviewed by:		markj
Differential revision:	https://reviews.freebsd.org/D38356
2023-02-03 11:33:36 -08:00
Gleb Smirnoff
3d76be28ec netinet6: require network epoch for in6_pcbconnect()
This removes recursive epoch entry in the syncache case.  Fixes
unprotected access to V_in6_ifaddrhead in in6_pcbladdr(), as
well as access to prison IP address lists. It also matches what
IPv4 in_pcbconnect() does.

Reviewed by:		markj
Differential revision:	https://reviews.freebsd.org/D38355
2023-02-03 11:33:36 -08:00
Gleb Smirnoff
221b9e3d06 inpcb: merge two versions of in6_pcbconnect() into one
No functional change.

Reviewed by:		markj
Differential revision:	https://reviews.freebsd.org/D38354
2023-02-03 11:33:35 -08:00
Mark Johnston
2589ec0f36 pcb: Move an assignment into in_pcbdisconnect()
All callers of in_pcbdisconnect() clear the local address, so let's just
do that in the function itself.

Note that the inp's local address is not a parameter to the inp hash
functions.  No functional change intended.

Reviewed by:	glebius
MFC after:	2 weeks
Sponsored by:	Klara, Inc.
Sponsored by:	Modirum MDPay
Differential Revision:	https://reviews.freebsd.org/D38362
2023-02-03 11:48:25 -05:00
Mark Johnston
b0ccf53f24 inpcb: Assert against wildcard addrs in in_pcblookup_hash_locked()
No functional change intended.

Reviewed by:	glebius
MFC after:	1 week
Sponsored by:	Klara, Inc.
Sponsored by:	Modirum MDPay
Differential Revision:	https://reviews.freebsd.org/D38361
2023-02-03 11:48:25 -05:00
Mark Johnston
675e2618ae inpcb: Deduplicate some assertions
It makes more sense to check lookupflags in the function which actually
uses SMR.  No functional change intended.

Reviewed by:	glebius
MFC after:	1 week
Sponsored by:	Klara, Inc.
Sponsored by:	Modirum MDPay
Differential Revision:	https://reviews.freebsd.org/D38359
2023-02-03 11:48:25 -05:00
Justin Hibbits
3d0d5b21c9 IfAPI: Explicitly include <net/if_private.h> in netstack
Summary:
In preparation of making if_t completely opaque outside of the netstack,
explicitly include the header.  <net/if_var.h> will stop including the
header in the future.

Sponsored by:	Juniper Networks, Inc.
Reviewed by:	glebius, melifaro
Differential Revision: https://reviews.freebsd.org/D38200
2023-01-31 15:02:16 -05:00
Justin Hibbits
361ac40b0f IfAPI: Hide the in6m_lookup_locked() implementation.
Summary:
in6m_lookup_locked() iterates over the ifnet's multiaddrs list.  Keep
this implementation detail private, by moving the implementation to the
netstack source from the header.

Sponsored by:	Juniper Networks, Inc.
Reviewed by:	glebius, melifaro
Differential Revision: https://reviews.freebsd.org/D38201
2023-01-31 15:02:14 -05:00
Gleb Smirnoff
5c67f7c43e udp: don't forget to initialize udpcb for UDPv6
Reported by:	tuexen
Fixes:		483fe96511
2023-01-26 10:16:32 -08:00
Alexander V. Chernikov
30dd227cff netinet6: honor blackhole/unreach routes in the non-fastforwading code.
Currently, under the conditions specified below, IPv6 ingress packet
 processing can ignore blackhole/reject flag on the prefix. The packet
 will instead be looped locally till TTL expiration and a single ICMPv6
 unreachable message will be send to the source even in case of
 RTF_BLACKHOLE.
The following conditions needs hold to make the scenario happen:
* IPv6 forwarding is enabled
* Packet is not fast-forwarded
* Destination prefix has either RTF_BLACKHOLE or RTF_REJECT flag
Fix this behavior by checking for the blackhole/reject flags in
ip6_forward().

Reported by:	Dmitriy Smirnov <fox@sage.su>
Reviewed by:	ae
Differential Revision: https://reviews.freebsd.org/D38164
MFC after:	3 days
2023-01-22 18:48:07 +00:00
Gordon Bergling
fa7de6dcb9 ip_gre: Fix a common typo in source code comments
- s/addres/address/

MFC after:	3 days
2023-01-19 14:13:02 +01:00
Alexander V. Chernikov
6468b6b23e nd6: fix panic in lltable_drop_entry_queue()
nd6_resolve_slow() can be called without mbuf. If the LLE entry
 is not reachable, nd6_resolve_slow() will add this NULL mbuf to
 the holdchain via lltable_append_entry_queue, which will "append"
 NULL to the end of the queue (effectively no-op) and bump la_numhold
 value. When this entry gets freed, the kernel will panic due to the
 inconsistency between the amount of mbufs in the queue and the value
 of la_numhold.

Fix the panic by checking of mbuf is not NULL prior to inserting it
 into the holdchain.

Reported by:	kib
MFC after:	3 days
2023-01-15 15:22:42 +00:00
Justin Hibbits
5674838159 inet6: Fix LINT build
mli_delete_locked() is the only function that takes a const ifnet.
Since it's a static function there's no advantage to keeping it const.
Since `if_t` is not a const struct (currently) the compiler throws an
error passing the ifp around to ifnet functions.

Fixes:		eb1da3e525
Sponsored by:	Juniper Networks, Inc.
2022-12-20 15:23:49 -05:00
Gleb Smirnoff
3f89900bf1 udp6: fix build with INET6 and without INVARIANTS
Reported by:	Michael Butler <imb protected-networks.net>
Fixes:		483fe96511
2022-12-07 12:27:15 -08:00
Gleb Smirnoff
1aed3b3430 udp: add protocol method declarations to udp_var.h
They are shared between UDP over IPv4 and over IPv6.  To prevent all
possible kernel build failures wrap them in #ifdef _SYS_PROTOSW_H_.
Prompted by feedback from jhb@ and jrtc27@ on c93db4abf4.
2022-12-07 11:51:49 -08:00
Gleb Smirnoff
5bfc014f23 udp6: inline udp6_output() into udp6_send() 2022-12-07 11:51:48 -08:00
Gleb Smirnoff
483fe96511 udp: embed inpcb into udpcb
See similar change to TCP e68b379244 for more context.  For UDP the
change is much simplier, though.
2022-12-07 11:51:42 -08:00
Gleb Smirnoff
e68b379244 tcp: embed inpcb into tcpcb
For the TCP protocol inpcb storage specify allocation size that would
provide space to most of the data a TCP connection needs, embedding
into struct tcpcb several structures, that previously were allocated
separately.

The most import one is the inpcb itself.  With embedding we can provide
strong guarantee that with a valid TCP inpcb the tcpcb is always valid
and vice versa.  Also we reduce number of allocs/frees per connection.
The embedded inpcb is placed in the beginning of the struct tcpcb,
since in_pcballoc() requires that.  However, later we may want to move
it around for cache line efficiency, and this can be done with a little
effort.  The new intotcpcb() macro is ready for such move.

The congestion algorithm data, the TCP timers and osd(9) data are
also embedded into tcpcb, and temprorary struct tcpcb_mem goes away.
There was no extra allocation here, but we went through extra pointer
every time we accessed this data.

One interesting side effect is that now TCP data is allocated from
SMR-protected zone.  Potentially this allows the TCP stacks or other
TCP related modules to utilize that for their own synchronization.

Large part of the change was done with sed script:

s/tp->ccv->/tp->t_ccv./g
s/tp->ccv/\&tp->t_ccv/g
s/tp->cc_algo/tp->t_cc/g
s/tp->t_timers->tt_/tp->tt_/g
s/CCV\(ccv, osd\)/\&CCV(ccv, t_osd)/g

Dependency side effect is that code that needs to know struct tcpcb
should also know struct inpcb, that added several <netinet/in_pcb.h>.

Differential revision:	https://reviews.freebsd.org/D37127
2022-12-07 09:00:48 -08:00
John Baldwin
d00c20882f udp[6]_multi_input: Don't unlock freed inp.
If udp[6]_append() returns non-zero, it is because the inp has gone
away (inpcbrele_rlocked returned 1 after running the tunnel function).

Reviewed by:	ae
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D37511
2022-11-30 14:38:51 -08:00
Michael Tuexen
f83db6441a sctp: minor changes due to upstreaming of Glebs recent changes 2022-11-06 23:06:40 +01:00
Mark Johnston
d93ec8cb13 inpcb: Allow SO_REUSEPORT_LB to be used in jails
Currently SO_REUSEPORT_LB silently does nothing when set by a jailed
process.  It is trivial to support this option in VNET jails, but it's
also useful in traditional jails.

This patch enables LB groups in jails with the following semantics:
- all PCBs in a group must belong to the same jail,
- PCB lookup prefers jailed groups to non-jailed groups

This is a straightforward extension of the semantics used for individual
listening sockets.  One pre-existing quirk of the lbgroup implementation
is that non-jailed lbgroups are searched before jailed listening
sockets; that is preserved with this change.

Discussed with:	glebius
MFC after:	1 month
Sponsored by:	Modirum MDPay
Sponsored by:	Klara, Inc.
Differential Revision:	https://reviews.freebsd.org/D37029
2022-11-02 13:46:24 -04:00
Mark Johnston
0d5d356b36 in6: Consolidate IN6_ARE_ADDR_EQUAL definitions
It is ok to use memcmp() in the kernel.  No functional change intended.

Reviewed by:	glebius, melifaro
MFC after:	1 week
Sponsored by:	Modirum MDPay
Sponsored by:	Klara, Inc.
Differential Revision:	https://reviews.freebsd.org/D37028
2022-11-02 13:46:24 -04:00
Mark Johnston
ac1750dd14 inpcb: Remove NULL checks of credential references
Some auditing of the code shows that "cred" is never non-NULL in these
functions, either because all callers pass a non-NULL reference or
because they unconditionally dereference "cred".  So, let's simplify the
code a bit and remove NULL checks.  No functional change intended.

Reviewed by:	glebius
MFC after:	1 week
Sponsored by:	Modirum MDPay
Sponsored by:	Klara, Inc.
Differential Revision:	https://reviews.freebsd.org/D37025
2022-11-02 13:46:24 -04:00
Elliott Mitchell
21cc0918c7 sys: Nuke double-semicolons
A distinct number of double-semicolons have ended up in FreeBSD.  Take a
pass at getting rid of many of these harmless typos.

Reviewed by: emaste, rrs
Pull Request: https://github.com/freebsd/freebsd-src/pull/609
Differential Revision: https://reviews.freebsd.org/D31716
2022-11-02 09:34:20 -06:00
John Baldwin
744bfb2131 Import the WireGuard driver from zx2c4.com.
This commit brings back the driver from FreeBSD commit
f187d6dfbf plus subsequent fixes from
upstream.

Relative to upstream this commit includes a few other small fixes such
as additional INET and INET6 #ifdef's, #include cleanups, and updates
for recent API changes in main.

Reviewed by:	pauamma, gbe, kevans, emaste
Obtained from:	git@git.zx2c4.com:wireguard-freebsd @ 3cc22b2
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D36909
2022-10-28 13:36:12 -07:00
Gleb Smirnoff
77fe40cf2f netinet*: add back necessary headers
The LINT successful build was provided by the includes that SCTP
pulled in.

Fixes:	92e190f11f
2022-10-26 08:16:44 -07:00
Gleb Smirnoff
92e190f11f netinet*: remove unneeded headers from files that just declare domains 2022-10-25 11:09:23 -07:00
Gleb Smirnoff
a3da8329c5 carp: fix regression panic from ccd69bd573
Reported & tested by:	Oleg Ginzburg <olevole olevole.ru>
Fixes:			ccd69bd573
2022-10-17 11:39:40 -07:00
Kristof Provost
a974702e27 pf: apply the network stack's ICMP rate limiting to ICMP errors sent by pf
PR:		266477
Event:		Aberdeen Hackathon 2022
Differential Revision:	https://reviews.freebsd.org/D36903
2022-10-14 10:36:16 +02:00
Gleb Smirnoff
2e0e273927 netinet6: trim overly long lines in GET_PKTOPT_VAR(), fit into 80 chars 2022-10-13 09:03:38 -07:00
Gleb Smirnoff
53af690381 tcp: remove INP_TIMEWAIT flag
Mechanically cleanup INP_TIMEWAIT from the kernel sources.  After
0d7445193a, this commit shall not cause any functional changes.

Note: this flag was very often checked together with INP_DROPPED.
If we modify in_pcblookup*() not to return INP_DROPPED pcbs, we
will be able to remove most of this checks and turn them to
assertions.  Some of them can be turned into assertions right now,
but that should be carefully done on a case by case basis.

Differential revision:	https://reviews.freebsd.org/D36400
2022-10-06 19:24:37 -07:00
Gleb Smirnoff
0d7445193a tcp: remove tcptw, the compressed timewait state structure
The memory savings the tcptw brought back in 2003 (see 340c35de6a) no
longer justify the complexity required to maintain it.  For longer
explanation please check out the email [1].

Surpisingly through almost 20 years the TCP stack functionality of
handling the TIME_WAIT state with a normal tcpcb did not bitrot.  The
existing tcp_input() properly handles a tcpcb in TCPS_TIME_WAIT state,
which is confirmed by the packetdrill tcp-testsuite [2].

This change just removes tcptw and leaves INP_TIMEWAIT.  The flag will
be removed in a separate commit.  This makes it easier to review and
possibly debug the changes.

[1] https://lists.freebsd.org/archives/freebsd-net/2022-January/001206.html
[2] https://github.com/freebsd-net/tcp-testsuite

Differential revision:	https://reviews.freebsd.org/D36398
2022-10-06 19:22:23 -07:00
Andrey V. Elsukov
ccd69bd573 Ignore IPv6 NA and drop IPv6 NS when BACKUP CARP address is used
When system acts as CARP BACKUP ignore received IPv6 Neighbor Advertisements
to ensure that neighbor cache will not be changed.
Also do not send IPv6 Neighbor Solicitation from CARP BACKUP source address.
Such packets can confuse network switch and it detects MAC addresses
flapping.

Obtained from:	Yandex LLC
MFC after:	2 weeks
Sponsored by:	Yandex LLC
Differential Revision:	    https://reviews.freebsd.org/D36649
2022-10-06 20:01:16 +03:00
Gleb Smirnoff
fcb3f813f3 netinet*: remove PRC_ constants and streamline ICMP processing
In the original design of the network stack from the protocol control
input method pr_ctlinput was used notify the protocols about two very
different kinds of events: internal system events and receival of an
ICMP messages from outside.  These events were coded with PRC_ codes.
Today these methods are removed from the protosw(9) and are isolated
to IPv4 and IPv6 stacks and are called only from icmp*_input().  The
PRC_ codes now just create a shim layer between ICMP codes and errors
or actions taken by protocols.

- Change ipproto_ctlinput_t to pass just pointer to ICMP header.  This
  allows protocols to not deduct it from the internal IP header.
- Change ip6proto_ctlinput_t to pass just struct ip6ctlparam pointer.
  It has all the information needed to the protocols.  In the structure,
  change ip6c_finaldst fields to sockaddr_in6.  The reason is that
  icmp6_input() already has this address wrapped in sockaddr, and the
  protocols want this address as sockaddr.
- For UDP tunneling control input, as well as for IPSEC control input,
  change the prototypes to accept a transparent union of either ICMP
  header pointer or struct ip6ctlparam pointer.
- In icmp_input() and icmp6_input() do only validation of ICMP header and
  count bad packets.  The translation of ICMP codes to errors/actions is
  done by protocols.
- Provide icmp_errmap() and icmp6_errmap() as substitute to inetctlerrmap,
  inet6ctlerrmap arrays.
- In protocol ctlinput methods either trust what icmp_errmap() recommend,
  or do our own logic based on the ICMP header.

Differential revision:	https://reviews.freebsd.org/D36731
2022-10-03 20:53:04 -07:00
Gleb Smirnoff
c0fc81e913 netinet*: remove dead code from TCP, UDP, SCTP control input
Now these functions are called only from icmp*_input().  The pointer
to the ICMP data is never NULL and cmd has a limited set of values.

In the past the functions were demultiplexing control messages from
ICMP layer, as well as internally generated events.  In the latter
case the the pointer to IP would be NULL.

Differential revision:	https://reviews.freebsd.org/D36729
2022-10-03 20:53:04 -07:00
Gleb Smirnoff
53807a8a27 netinet*: use sparse C99 initializer for inetctlerrmap
and mark those PRC_* codes, that are used.  The rest are dead code.
This is not a functional change, but illustrative to make easier
review of following changes.
2022-10-03 20:53:04 -07:00
Gleb Smirnoff
43d39ca7e5 netinet*: de-void control input IP protocol methods
After decoupling of protosw(9) and IP wire protocols in 78b1fc05b2 for
IPv4 we got vector ip_ctlprotox[] that is executed only and only from
icmp_input() and respectively for IPv6 we got ip6_ctlprotox[] executed
only and only from icmp6_input().  This allows to use protocol specific
argument types in these methods instead of struct sockaddr and void.

Reviewed by:		melifaro
Differential revision:	https://reviews.freebsd.org/D36727
2022-10-03 20:53:04 -07:00
Gleb Smirnoff
46ddeb6be8 netinet6: retire ip6protosw.h
The netinet/ipprotosw.h and netinet6/ip6protosw.h were KAME relics, with
the former removed in f0ffb944d2 in 2001 and the latter survived until
today.  It has been reduced down to only one useful declaration that
moves to ip6_var.h

Reviewed by:		melifaro
Differential revision:	https://reviews.freebsd.org/D36726
2022-10-03 20:53:04 -07:00
Gleb Smirnoff
24b96f35b9 netinet*: move ipproto_register() and co to ip_var.h and ip6_var.h
This is a FreeBSD KPI and belongs to private header not netinet/in.h.

Reviewed by:		melifaro
Differential revision:	https://reviews.freebsd.org/D36723
2022-10-03 20:53:04 -07:00
Alexander V. Chernikov
e437991fc9 netinet6: factor interface addition code to the dedicated function
Summary:
Move SIOCAIFADDR_IN6 (current "primary" ioctl to add an IPv6
 interface address) handling code to the dedicated in6_addifaddr()
 function and make it a part of KPI. This allows in-kernel users to
 add/delete interfaces addresses without relying on ioctl interface.

Subscribers: imp, ae, glebius

Differential Revision: https://reviews.freebsd.org/D36713
2022-09-27 13:23:34 +00:00
Sébastien BINI
64cce803c4 Correct IPv6 MLD group state string table
MLD_REPORTING_MEMBER was missing

MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D36311
2022-09-19 09:01:36 -04:00
Gordon Bergling
bcb2341c7d netinet6: Remove a double word in a source code comment
- s/to to/to/

MFC after:	3 days
2022-09-10 13:01:44 +02:00
Mateusz Guzik
dda6376b04 net: employ newly added pfil_mbuf_{in,out} where approriate
Reviewed by:	glebius
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D36454
2022-09-08 16:21:08 +00:00
Mateusz Guzik
14c9a2dbfb net: retire PFIL_FWD
It is now unused and not having it allows further clean ups.

Reviewed by:	cy, glebius, kp
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D36452
2022-09-07 10:04:31 +00:00
Mateusz Guzik
223a73a1c4 net: remove stale altq_input reference
Code setting it was removed in:
commit 325fab802e
Author: Eric van Gyzen <vangyzen@FreeBSD.org>
Date:   Tue Dec 4 23:46:43 2018 +0000

    altq: remove ALTQ3_COMPAT code

Reviewed by:	glebius, kp
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D36471
2022-09-07 10:03:12 +00:00