Commit Graph

470 Commits

Author SHA1 Message Date
Gleb Smirnoff
c2a69e846f tcp_hpts: move HPTS related fields from inpcb to tcpcb
This makes inpcb lighter and allows future cache line optimizations
of tcpcb.  The reason why HPTS originally used inpcb is the compressed
TIME-WAIT state (see 0d7445193a), that used to free a tcpcb, while the
associated connection is still on the HPTS ring.

Reviewed by:		rrs
Differential Revision:	https://reviews.freebsd.org/D39697
2023-04-25 12:18:33 -07:00
Mark Johnston
93998d166f inpcb: Fix some bugs in _in_pcbinshash_wild()
- In _in_pcbinshash_wild(), we should avoid returning v6 sockets unless
  no other matches are available.  This preserves pre-existing
  semantics.
- Fix an inverted test: when inserting a non-jailed PCB, we want to
  search for the first non-jailed PCB in the hash chain.
- Test the right PCB when searching for a non-jailed PCB.

While here, add a required locking assertion.

Fixes:	7b92493ab1 ("inpcb: Avoid inp_cred dereferences in SMR-protected lookup")
2023-04-23 13:55:57 -04:00
Mark Johnston
5fd1a67e88 inpcb: Release the inpcb cred reference before freeing the structure
Now that the inp_cred pointer is accessed only while the inpcb lock is
held, we can avoid deferring a crfree() call when freeing an inpcb.

This fixes a problem introduced when inpcb hash tables started being
synchronized with SMR: the credential reference previously could not be
released until all lockless readers have drained, and there is no
mechanism to explicitly purge cached, freed UMA items.  Thus, ucred
references could linger indefinitely, and since ucreds hold a jail
reference, the jail would linger indefinitely as well.  This manifests
as jails getting stuck in the DYING state.

Discussed with:	glebius
Tested by:	glebius
Sponsored by:	Klara, Inc.
Sponsored by:	Modirum MDPay
Differential Revision:	https://reviews.freebsd.org/D38573
2023-04-20 12:13:06 -04:00
Mark Johnston
7b92493ab1 inpcb: Avoid inp_cred dereferences in SMR-protected lookup
The SMR-protected inpcb lookup algorithm currently has to check whether
a matching inpcb belongs to a jail, in order to prioritize jailed
bound sockets.  To do this it has to maintain a ucred reference, and for
this to be safe, the reference can't be released until the UMA
destructor is called, and this will not happen within any bounded time
period.

Changing SMR to periodically recycle garbage is not trivial.  Instead,
let's implement SMR-synchronized lookup without needing to dereference
inp_cred.  This will allow the inpcb code to free the inp_cred reference
immediately when a PCB is freed, ensuring that ucred (and thus jail)
references are released promptly.

Commit 220d892129 ("inpcb: immediately return matching pcb on lookup")
gets us part of the way there.  This patch goes further to handle
lookups of unconnected sockets.  Here, the strategy is to maintain a
well-defined order of items within a hash chain so that a wild lookup
can simply return the first match and preserve existing semantics.  This
makes insertion of listening sockets more complicated in order to make
lookup simpler, which seems like the right tradeoff anyway given that
bind() is already a fairly expensive operation and lookups are more
common.

In particular, when inserting an unconnected socket, in_pcbinhash() now
keeps the following ordering:
- jailed sockets before non-jailed sockets,
- specified local addresses before unspecified local addresses.

Most of the change adds a separate SMR-based lookup path for inpcb hash
lookups.  When a match is found, we try to lock the inpcb and
re-validate its connection info.  In the common case, this works well
and we can simply return the inpcb.  If this fails, typically because
something is concurrently modifying the inpcb, we go to the slow path,
which performs a serialized lookup.

Note, I did not touch lbgroup lookup, since there the credential
reference is formally synchronized by net_epoch, not SMR.  In
particular, lbgroups are rarely allocated or freed.

I think it is possible to simplify in_pcblookup_hash_wild_locked() now,
but I didn't do it in this patch.

Discussed with:	glebius
Tested by:	glebius
Sponsored by:	Klara, Inc.
Sponsored by:	Modirum MDPay
Differential Revision:	https://reviews.freebsd.org/D38572
2023-04-20 12:13:06 -04:00
Mark Johnston
3e98dcb3d5 inpcb: Move inpcb matching logic into separate functions
These functions will get some additional callers in future revisions.

No functional change intended.

Discussed with:	glebius
Tested by:	glebius
Sponsored by:	Modirum MDPay
Sponsored by:	Klara, Inc.
Differential Revision:	https://reviews.freebsd.org/D38571
2023-04-20 12:13:06 -04:00
Mark Johnston
fdb987bebd inpcb: Split PCB hash tables
Currently we use a single hash table per PCB database for connected and
bound PCBs.  Since we started using net_epoch to synchronize hash table
lookups, there's been a bug, noted in a comment above in_pcbrehash():
connecting a socket can cause an inpcb to move between hash chains, and
this can cause a concurrent lookup to follow the wrong linkage pointers.
I believe this could cause rare, spurious ECONNREFUSED errors in the
worse case.

Address the problem by introducing a second hash table and adding more
linkage pointers to struct inpcb.  Now the database has one table each
for connected and unconnected sockets.

When inserting an inpcb into the hash table, in_pcbinhash() now looks at
the foreign address of the inpcb to figure out which table to use.  This
ensures that queue linkage pointers are stable until the socket is
disconnected, so the problem described above goes away.  There is also a
small benefit in that in_pcblookup_*() can now search just one of the
two possible hash buckets.

I also made the "rehash" parameter of in(6)_pcbconnect() unused.  This
parameter seems confusing and it is simpler to let the inpcb code figure
out what to do using the existing INP_INHASHLIST flag.

UDP sockets pose a special problem since they can be connected and
disconnected multiple times during their lifecycle.  To handle this, the
patch plugs a hole in the inpcb structure and uses it to store an SMR
sequence number.  When an inpcb is disconnected - an operation which
requires the global PCB database hash lock - the write sequence number
is advanced, and in order to reconnect, the connecting thread must wait
for readers to drain before reusing the inpcb's hash chain linkage
pointers.

raw_ip (ab)uses the hash table without using the corresponding
accessors.  Since there are now two hash tables, it arbitrarily uses the
"connected" table for all of its PCBs.  This will be addressed in some
way in the future.

inp interators which specify a hash bucket will only visit connected
PCBs.  This is not really correct, but nothing in the tree uses that
functionality except raw_ip, which as mentioned above places all of its
PCBs in the "connected" table and so is unaffected.

Discussed with:	glebius
Tested by:	glebius
Sponsored by:	Klara, Inc.
Sponsored by:	Modirum MDPay
Differential Revision:	https://reviews.freebsd.org/D38569
2023-04-20 12:13:06 -04:00
Mark Johnston
713264f6b8 netinet: Tighten checks for unspecified source addresses
The assertions added in commit b0ccf53f24 ("inpcb: Assert against
wildcard addrs in in_pcblookup_hash_locked()") revealed that protocol
layers may pass the unspecified address to in_pcblookup().

Add some checks to filter out such packets before we attempt an inpcb
lookup:
- Disallow the use of an unspecified source address in in_pcbladdr() and
  in6_pcbladdr().
- Disallow IP packets with an unspecified destination address.
- Disallow TCP packets with an unspecified source address, and add an
  assertion to verify the comment claiming that the case of an
  unspecified destination address is handled by the IP layer.

Reported by:	syzbot+9ca890fb84e984e82df2@syzkaller.appspotmail.com
Reported by:	syzbot+ae873c71d3c71d5f41cb@syzkaller.appspotmail.com
Reported by:	syzbot+e3e689aba1d442905067@syzkaller.appspotmail.com
Reviewed by:	glebius, melifaro
MFC after:	2 weeks
Sponsored by:	Klara, Inc.
Sponsored by:	Modirum MDPay
Differential Revision:	https://reviews.freebsd.org/D38570
2023-03-06 15:06:00 -05:00
Mark Johnston
317fa5169d netinet: Remove the IP(V6)_RSS_LISTEN_BUCKET socket option
It has no effect, and an exp-run revealed that it is not in use.

PR:		261398 (exp-run)
Reviewed by:	mjg, glebius
Sponsored by:	Klara, Inc.
Differential Revision:	https://reviews.freebsd.org/D38822
2023-02-28 15:57:21 -05:00
Mark Johnston
3aff4ccdd7 netinet: Remove IP(V6)_BINDMULTI
This option was added in commit 0a100a6f1e but was never completed.
In particular, there is no logic to map flowids to different listening
sockets, so it accomplishes basically the same thing as SO_REUSEPORT.
Meanwhile, we've since added SO_REUSEPORT_LB, which at least tries to
balance among listening sockets using a hash of the 4-tuple and some
optional NUMA policy.

The option was never documented or completed, and an exp-run revealed
nothing using it in the ports tree.  Moreover, it complicates the
already very complicated in_pcbbind_setup(), and the checking in
in_pcbbind_check_bindmulti() is insufficient.  So, let's remove it.

PR:		261398 (exp-run)
Reviewed by:	glebius
Sponsored by:	Klara, Inc.
Differential Revision:	https://reviews.freebsd.org/D38574
2023-02-27 10:03:11 -05:00
Gleb Smirnoff
96871af013 inpcb: use family specific sockaddr argument for bind functions
Do the cast from sockaddr to either IPv4 or IPv6 sockaddr in the
protocol's pr_bind method and from there on go down the call
stack with family specific argument.

Reviewed by:		zlei, melifaro, markj
Differential Revision:	https://reviews.freebsd.org/D38601
2023-02-15 10:30:16 -08:00
Mark Johnston
c7ea65ec69 inpcb: refcount_release() returns a bool
No functional change intended.

MFC after:	1 week
Sponsored by:	Klara, Inc.
2023-02-13 16:35:47 -05:00
Mark Johnston
4130ea611f inpcb: Split in_pcblookup_hash_locked() and clean up a bit
Split the in_pcblookup_hash_locked() function into several independent
subroutine calls, each of which does some kind of hash table lookup.
This refactoring makes it easier to introduce variants of the lookup
algorithm that behave differently depending on whether they are
synchronized by SMR or the PCB database hash lock.

While here, do some related cleanup:
- Remove an unused ifnet parameter from internal functions.  Keep it in
  external functions so that it can be used in the future to derive a v6
  scopeid.
- Reorder the parameters to in_pcblookup_lbgroup() to be consistent with
  the other lookup functions.
- Remove an always-true check from in_pcblookup_lbgroup(): we can assume
  that we're performing a wildcard match.

No functional change intended.

Reviewed by:	glebius
Differential Revision:	https://reviews.freebsd.org/D38364
2023-02-09 16:15:03 -05:00
Gleb Smirnoff
220d892129 inpcb: immediately return matching pcb on lookup
This saves a lot of CPU cycles if you got large connection table.

The code removed originates from 413628a7e3, a very large changeset.
Discussed that with Bjoern, Jamie we can't recover why would we ever
have identical 4-tuples in the hash, even in the presence of jails.
Bjoern did a test that confirms that it is impossible to allocate an
identical connection from a jail to a host. Code review also confirms
that system shouldn't allow for such connections to exist.

With a lack of proper test suite we decided to take a risk and go
forward with removing that code.

Reviewed by:		gallatin, bz, markj
Differential Revision:	https://reviews.freebsd.org/D38015
2023-02-07 09:21:52 -08:00
Gleb Smirnoff
9e46ff4d4c netinet: don't return conflicting inpcb in in_pcbconnect_setup()
Last time this inpcb was actually used was in tcp_connect()
before c94c54e4df.
2023-02-03 11:33:36 -08:00
Gleb Smirnoff
a9d22cce10 inpcb: use family specific sockaddr argument for connect functions
Do the cast from sockaddr to either IPv4 or IPv6 sockaddr in the
protocol's pr_connect method and from there on go down the call
stack with family specific argument.

Reviewed by:		markj
Differential revision:	https://reviews.freebsd.org/D38356
2023-02-03 11:33:36 -08:00
Mark Johnston
2589ec0f36 pcb: Move an assignment into in_pcbdisconnect()
All callers of in_pcbdisconnect() clear the local address, so let's just
do that in the function itself.

Note that the inp's local address is not a parameter to the inp hash
functions.  No functional change intended.

Reviewed by:	glebius
MFC after:	2 weeks
Sponsored by:	Klara, Inc.
Sponsored by:	Modirum MDPay
Differential Revision:	https://reviews.freebsd.org/D38362
2023-02-03 11:48:25 -05:00
Mark Johnston
b0ccf53f24 inpcb: Assert against wildcard addrs in in_pcblookup_hash_locked()
No functional change intended.

Reviewed by:	glebius
MFC after:	1 week
Sponsored by:	Klara, Inc.
Sponsored by:	Modirum MDPay
Differential Revision:	https://reviews.freebsd.org/D38361
2023-02-03 11:48:25 -05:00
Mark Johnston
675e2618ae inpcb: Deduplicate some assertions
It makes more sense to check lookupflags in the function which actually
uses SMR.  No functional change intended.

Reviewed by:	glebius
MFC after:	1 week
Sponsored by:	Klara, Inc.
Sponsored by:	Modirum MDPay
Differential Revision:	https://reviews.freebsd.org/D38359
2023-02-03 11:48:25 -05:00
Gleb Smirnoff
5ebea466dc inpcb: add myself to the copyright notice
for the SMR synchronization in late 2021 and following cleanups
2023-02-01 09:39:25 -08:00
Justin Hibbits
3d0d5b21c9 IfAPI: Explicitly include <net/if_private.h> in netstack
Summary:
In preparation of making if_t completely opaque outside of the netstack,
explicitly include the header.  <net/if_var.h> will stop including the
header in the future.

Sponsored by:	Juniper Networks, Inc.
Reviewed by:	glebius, melifaro
Differential Revision: https://reviews.freebsd.org/D38200
2023-01-31 15:02:16 -05:00
Andrew Gallatin
c4a4b2633d allocate inpcb aligned to cachelines
The inpcb struct is one of the most heavily utilized in the kernel
on a busy network server.  By aligning it to a cacheline
boundary, we can ensure that closely related fields in the inpcb
and tcbcb can be predictably located on the same cacheline.  rrs
has already done a lot of this work to put related fields on the
same line for the tcbcb.

In combination with a forthcoming patch to align the start of the tcpcb,
we see a roughly 3% reduction in CPU use on a busy web server serving
traffic over roughly 50,000 TCP connections.

Reviewed by: glebius, markj, tuexen
Differential Revision: https://reviews.freebsd.org/D37687
Sponsored by: Netflix
2022-12-14 14:19:35 -05:00
Gleb Smirnoff
0aa120d52f inpcb: allow to provide protocol specific pcb size
The protocol specific structure shall start with inpcb.

Differential revision:	https://reviews.freebsd.org/D37126
2022-12-02 14:10:55 -08:00
Gleb Smirnoff
73bebcc5bd inpcb: remove TCP includes, all TCP specific code was moved 2022-11-08 10:24:40 -08:00
Gleb Smirnoff
ab0ef9455f hpts: move inp initialization from the generic inpcb code to TCP
Differential revision:	https://reviews.freebsd.org/D37124
2022-11-08 10:24:40 -08:00
Gleb Smirnoff
f567d55f51 inpcb: don't return INP_DROPPED entries from pcb lookups
The in_pcbdrop() KPI, which is used solely by TCP, allows to remove a
pcb from hash list and mark it as dropped.  The comment suggests that
such pcb won't be returned by lookups.  Indeed, every call to
in_pcblookup*() is accompanied by a check for INP_DROPPED.  Do what
comment suggests: never return such pcbs and remove unnecessary checks.

Reviewed by:		tuexen
Differential revision:	https://reviews.freebsd.org/D37061
2022-11-08 10:24:39 -08:00
Mark Johnston
d93ec8cb13 inpcb: Allow SO_REUSEPORT_LB to be used in jails
Currently SO_REUSEPORT_LB silently does nothing when set by a jailed
process.  It is trivial to support this option in VNET jails, but it's
also useful in traditional jails.

This patch enables LB groups in jails with the following semantics:
- all PCBs in a group must belong to the same jail,
- PCB lookup prefers jailed groups to non-jailed groups

This is a straightforward extension of the semantics used for individual
listening sockets.  One pre-existing quirk of the lbgroup implementation
is that non-jailed lbgroups are searched before jailed listening
sockets; that is preserved with this change.

Discussed with:	glebius
MFC after:	1 month
Sponsored by:	Modirum MDPay
Sponsored by:	Klara, Inc.
Differential Revision:	https://reviews.freebsd.org/D37029
2022-11-02 13:46:24 -04:00
Mark Johnston
a152dd8634 inpcb: Remove a PCB from its LB group upon a subsequent error
If a memory allocation failure causes bind to fail, we should take the
inpcb back out of its LB group since it's not prepared to handle
connections.

Reviewed by:	glebius
MFC after:	2 weeks
Sponsored by:	Modirum MDPay
Sponsored by:	Klara, Inc.
Differential Revision:	https://reviews.freebsd.org/D37027
2022-11-02 13:46:24 -04:00
Mark Johnston
ac1750dd14 inpcb: Remove NULL checks of credential references
Some auditing of the code shows that "cred" is never non-NULL in these
functions, either because all callers pass a non-NULL reference or
because they unconditionally dereference "cred".  So, let's simplify the
code a bit and remove NULL checks.  No functional change intended.

Reviewed by:	glebius
MFC after:	1 week
Sponsored by:	Modirum MDPay
Sponsored by:	Klara, Inc.
Differential Revision:	https://reviews.freebsd.org/D37025
2022-11-02 13:46:24 -04:00
Gleb Smirnoff
19acc50667 inpcb: retire suppresion of randomization of ephemeral ports
The suppresion was added in 5f311da2cc with no explanation in the
commit message of the exact problem that was fixed. In the BSDCan
2006 talk [1], slides 12 to 14, we can find that it seems that there
was some problem with the TIME_WAIT state not properly being handled
on the remote side (also FreeBSD!), and this switching off the
suppression had hidden the problem.  The rationale of the change was
that other stacks may also be buggy wrt the TIME_WAIT.

I did not find the actual problem in TIME_WAIT that the suppression
has hidden, neither a commit that would fix it.  However, since that
time we started to handle SYNs with RFC5961 instead of RFC793, see
3220a2121c.  We also now have the tcp-testsuite [2], that has full
coverage of all possible scenarios of receiving SYN in TIME_WAIT.

This effectively reverts 5f311da2cc
and 6ee79c59d2.

[1] https://www.bsdcan.org/2006/papers/ImprovingTCPIP.pdf
[2] https://github.com/freebsd-net/tcp-testsuite

Reviewed by:		rscheff
Discussed with:		rscheff, rrs, tuexen
Differential revision:	https://reviews.freebsd.org/D37042
2022-10-31 08:57:11 -07:00
Gleb Smirnoff
24cf7a8d62 inpcb: provide pcbinfo pointer argument to inp_apply_all()
Allows to clear inpcb layer of TCP knowledge.
2022-10-19 15:15:53 -07:00
Gleb Smirnoff
b6a816f116 inpcb: garbage collect so_sototcpcb()
It had very little use and required inpcb layer to know tcpcb.
2022-10-19 15:15:53 -07:00
Gleb Smirnoff
3ba34b07a4 inpcb: provide in_pcbremhash() to reduce copy-paste 2022-10-13 09:03:38 -07:00
Gleb Smirnoff
53af690381 tcp: remove INP_TIMEWAIT flag
Mechanically cleanup INP_TIMEWAIT from the kernel sources.  After
0d7445193a, this commit shall not cause any functional changes.

Note: this flag was very often checked together with INP_DROPPED.
If we modify in_pcblookup*() not to return INP_DROPPED pcbs, we
will be able to remove most of this checks and turn them to
assertions.  Some of them can be turned into assertions right now,
but that should be carefully done on a case by case basis.

Differential revision:	https://reviews.freebsd.org/D36400
2022-10-06 19:24:37 -07:00
Gleb Smirnoff
0d7445193a tcp: remove tcptw, the compressed timewait state structure
The memory savings the tcptw brought back in 2003 (see 340c35de6a) no
longer justify the complexity required to maintain it.  For longer
explanation please check out the email [1].

Surpisingly through almost 20 years the TCP stack functionality of
handling the TIME_WAIT state with a normal tcpcb did not bitrot.  The
existing tcp_input() properly handles a tcpcb in TCPS_TIME_WAIT state,
which is confirmed by the packetdrill tcp-testsuite [2].

This change just removes tcptw and leaves INP_TIMEWAIT.  The flag will
be removed in a separate commit.  This makes it easier to review and
possibly debug the changes.

[1] https://lists.freebsd.org/archives/freebsd-net/2022-January/001206.html
[2] https://github.com/freebsd-net/tcp-testsuite

Differential revision:	https://reviews.freebsd.org/D36398
2022-10-06 19:22:23 -07:00
Gleb Smirnoff
c7a62c925c inpcb: gather v4/v6 handling code into in_pcballoc() from protocols
Reviewed by:		rrs, tuexen
Differential revision:	https://reviews.freebsd.org/D36062
2022-08-10 11:09:34 -07:00
Mike Karels
637f317c6d IPv6: fix problem with duplicate port assignment with v4-mapped addrs
In in_pcb_lport_dest(), if an IPv6 socket does not match any other IPv6
socket using in6_pcblookup_local(), and if the socket can also connect
to IPv4 (the INP_IPV4 vflag is set), check for IPv4 matches as well.
Otherwise, we can allocate a port that is used by an IPv4 socket
(possibly one created from IPv6 via the same procedure), and then
connect() can fail with EADDRINUSE, when it could have succeeded if
the bound port was not in use.

PR:		265064
Submitted by:	firk at cantconnect.ru (with modifications)
Reviewed by:	bz, melifaro
Differential Revision: https://reviews.freebsd.org/D36012
2022-08-02 09:49:46 -05:00
John Baldwin
fe5324aca0 in_pcballoc: error is only used for IPSEC or MAC. 2022-04-13 16:08:23 -07:00
John Baldwin
bab34d6349 in_pcboutput_txrtlmt: Remove unused variable. 2022-04-12 14:58:59 -07:00
Gordon Bergling
942e8cab8c netinet: Fix a typo in a source code comment
- s/exisitng/existing/

MFC after:	3 days
2022-04-02 14:39:32 +02:00
Michael Tuexen
a0aeb1cef5 in_pcb.c: fix compilation of an IPv4 only configuration
While there, remove a duplicate inclusion of sysctl.h.

Reported by:	Gary Jennejohn
Fixes:		a35bdd4489 - main - tcp: add sysctl interface for setting socket options
Sponsored by:	Netflix, Inc.
2022-02-09 19:58:29 +01:00
Michael Tuexen
a35bdd4489 tcp: add sysctl interface for setting socket options
This interface allows to set a socket option on a TCP endpoint,
which is specified by its inp_gencnt. This interface will be
used in an upcoming command line tool tcpsso.

Reviewed by:		glebius, rrs
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D34138
2022-02-09 12:24:41 +01:00
Gleb Smirnoff
fec8a8c7cb inpcb: use global UMA zones for protocols
Provide structure inpcbstorage, that holds zones and lock names for
a protocol.  Initialize it with global protocol init using macro
INPCBSTORAGE_DEFINE().  Then, at VNET protocol init supply it as
the main argument to the in_pcbinfo_init().  Each VNET pcbinfo uses
its private hash, but they all use same zone to allocate and SMR
section to synchronize.

Note: there is kern.ipc.maxsockets sysctl, which controls UMA limit
on the socket zone, which was always global.  Historically same
maxsockets value is applied also to every PCB zone.  Important fact:
you can't create a pcb without a socket!  A pcb may outlive its socket,
however.  Given that there are multiple protocols, and only one socket
zone, the per pcb zone limits seem to have little value.  Under very
special conditions it may trigger a little bit earlier than socket zone
limit, but in most setups the socket zone limit will be triggered
earlier.  When VIMAGE was added to the kernel PCB zones became per-VNET.
This magnified existing disbalance further: now we have multiple pcb
zones in multiple vnets limited to maxsockets, but every pcb requires a
socket allocated from the global zone also limited by maxsockets.
IMHO, this per pcb zone limit doesn't bring any value, so this patch
drops it.  If anybody explains value of this limit, it can be restored
very easy - just 2 lines change to in_pcbstorage_init().

Differential revision:	https://reviews.freebsd.org/D33542
2022-01-03 10:17:46 -08:00
Michael Tuexen
430df2abee in_pcb: improve inp_next()
If there is no inp to check, exit the loop iterating through them.

Reported by:	syzbot+403406a9cbf082b36ea4@syzkaller.appspotmail.com
Reviewed by:	glebius
Sponsored by:	Netflix, Inc.
2022-01-01 19:04:10 +01:00
Gleb Smirnoff
a057769205 in_pcb: use jenkins hash over the entire IPv6 (or IPv4) address
The intent is to provide more entropy than can be provided
by just the 32-bits of the IPv6 address which overlaps with
6to4 tunnels.  This is needed to mitigate potential algorithmic
complexity attacks from attackers who can control large
numbers of IPv6 addresses.

Together with:		gallatin
Reviewed by:		dwmalone, rscheff
Differential revision:	https://reviews.freebsd.org/D33254
2021-12-26 10:47:28 -08:00
Gleb Smirnoff
a370832bec tcp: remove delayed drop KPI
No longer needed after tcp_output() can ask caller to drop.

Reviewed by:		rrs, tuexen
Differential revision:	https://reviews.freebsd.org/D33371
2021-12-26 08:48:24 -08:00
Gleb Smirnoff
d8b45c8e14 inpcb: don't leak the port zone in in_pcbinfo_destroy() 2021-12-16 15:15:02 -08:00
Gleb Smirnoff
185e659c40 inpcb: use locked variant of prison_check_ip*()
The pcb lookup always happens in the network epoch and in SMR section.
We can't block on a mutex due to the latter.  Right now this patch opens
up a race.  But soon that will be addressed by D33339.

Reviewed by:		markj, jamie
Differential revision:	https://reviews.freebsd.org/D33340
Fixes:			de2d47842e
2021-12-14 09:38:52 -08:00
Gleb Smirnoff
eb93b99d69 in_pcb: delay crfree() down into UMA dtor
inpcb lookups, which check inp_cred, work with pcbs that potentially went
through in_pcbfree().  So inp_cred should stay valid until SMR guarantees
its invisibility to lookups.

While here, put the whole inpcb destruction sequence of in_pcbfree(),
inpcb_dtor() and inpcb_fini() sequentially.

Submitted by:		markj
Differential revision:	https://reviews.freebsd.org/D33273
2021-12-05 10:46:37 -08:00
Peter Lei
4c018b5aed in_pcb: limit the effect of wraparound in TCP random port allocation check
The check to see if TCP port allocation should change from random to
sequential port allocation mode may incorrectly cause a false positive
due to negative wraparound.
Example:
    V_ipport_tcpallocs = 2147483585 (0x7fffffc1)
    V_ipport_tcplastcount = 2147483553 (0x7fffffa1)
    V_ipport_randomcps = 100
The original code would compare (2147483585 <= -2147483643) and thus
incorrectly move to sequential allocation mode.

Compute the delta first before comparing against the desired limit to
limit the wraparound effect (since tcplastcount is always a snapshot
of a previous tcpallocs).
2021-12-03 12:38:12 -08:00
Peter Lei
13e3f3349f in_pcb: fix TCP local ephemeral port accounting
Fix logic error causing UDP(-Lite) local ephemeral port bindings
to count against the TCP allocation counter, potentially causing
TCP to go from random to sequential port allocation mode prematurely.
2021-12-03 12:30:21 -08:00