Commit Graph

7172 Commits

Author SHA1 Message Date
Gleb Smirnoff
d74b7baeb0 ifnet_byindex() actually requires network epoch
Sweep over potentially unsafe calls to ifnet_byindex() and wrap them
in epoch.  Most of the code touched remains unsafe, as the returned
pointer is being used after epoch exit.  Mark that with a comment.

Validate the index argument inside the function, reducing argument
validation requirement from the callers and making V_if_index
private to if.c.

Reviewed by:		melifaro
Differential revision:	https://reviews.freebsd.org/D33263
2021-12-06 09:32:31 -08:00
Randall Stewart
dadbc04250 tcp: rack fails to send out a TLP after a MTU change
When rack sends out a TLP it sets up various state to make sure
it avoids the cwnd (its been more than 1 RTT since our last send) and
it may at times send new data. If an MTU change as occurred
and our cwnd has collapsed we can have a situation where must_retran
flag is set and we obey the cwnd thus never sending the TLP and then
sitting stuck.

This one line fix addresses that problem
Reviewed by: Michael Tuexen
Sponsored by: Netflix Inc.
Differential Revision: https://reviews.freebsd.org/D33231
2021-12-06 09:56:09 -05:00
Gleb Smirnoff
eb93b99d69 in_pcb: delay crfree() down into UMA dtor
inpcb lookups, which check inp_cred, work with pcbs that potentially went
through in_pcbfree().  So inp_cred should stay valid until SMR guarantees
its invisibility to lookups.

While here, put the whole inpcb destruction sequence of in_pcbfree(),
inpcb_dtor() and inpcb_fini() sequentially.

Submitted by:		markj
Differential revision:	https://reviews.freebsd.org/D33273
2021-12-05 10:46:37 -08:00
Michael Tuexen
54912d47b6 sctp: unbreak NOINET6 builds.
PR:		260119
Reported by:	kostikbel
MFC after:	1 week
2021-12-04 19:16:18 +01:00
Michael Tuexen
d79676fb13 sctp: inherit IP level socket options from listening socket
Ensure that TTL and TOS values set on a listener get inheritet
to the accepted sockets.

PR:		260119
MFC after:	1 week
2021-12-03 22:44:01 +01:00
Gleb Smirnoff
36f42c5ebf tcp_ccalgounload(): initialize the inpcb iterator when curvnet is set
Pointy hat to:	glebius
Fixes:		de2d47842e
2021-12-03 12:39:56 -08:00
Peter Lei
4c018b5aed in_pcb: limit the effect of wraparound in TCP random port allocation check
The check to see if TCP port allocation should change from random to
sequential port allocation mode may incorrectly cause a false positive
due to negative wraparound.
Example:
    V_ipport_tcpallocs = 2147483585 (0x7fffffc1)
    V_ipport_tcplastcount = 2147483553 (0x7fffffa1)
    V_ipport_randomcps = 100
The original code would compare (2147483585 <= -2147483643) and thus
incorrectly move to sequential allocation mode.

Compute the delta first before comparing against the desired limit to
limit the wraparound effect (since tcplastcount is always a snapshot
of a previous tcpallocs).
2021-12-03 12:38:12 -08:00
Michael Tuexen
f32357be53 sctp: use the correct traffic class when sending SCTP/IPv6 packets
When sending packets the stcb was used to access the inp and then
access the endpoint specific IPv6 level options. This fails when
there exists an inp, but no stcb yet. This is the case for sending
an INIT-ACK in response to an INIT when no association already
exists. Fix this by just providing the inp instead of the stcb.

PR:		260120
MFC after:	1 week
2021-12-03 21:36:44 +01:00
Peter Lei
13e3f3349f in_pcb: fix TCP local ephemeral port accounting
Fix logic error causing UDP(-Lite) local ephemeral port bindings
to count against the TCP allocation counter, potentially causing
TCP to go from random to sequential port allocation mode prematurely.
2021-12-03 12:30:21 -08:00
Gleb Smirnoff
12ae3476f3 tcp_drain(): initialize the inpcb iterator when curvnet is set
Reported by:	cy
Pointy hat to:	glebius
Fixes:		de2d47842e
2021-12-02 21:08:30 -08:00
Gleb Smirnoff
651a545143 udp_detach(): fix set but not used warning 2021-12-02 20:12:40 -08:00
Gleb Smirnoff
bd1d085045 udp_multi_input(): the UDP header is only needed for probes
Reported by:	kib
Fixes:		de2d47842e
2021-12-02 20:12:40 -08:00
Cy Schubert
db0ac6ded6 Revert "wpa: Import wpa_supplicant/hostapd commit 14ab4a816"
This reverts commit 266f97b5e9, reversing
changes made to a10253cffe.

A mismerge of a merge to catch up to main resulted in files being
committed which should not have been.
2021-12-02 14:45:04 -08:00
Cy Schubert
266f97b5e9 wpa: Import wpa_supplicant/hostapd commit 14ab4a816
This is the November update to vendor/wpa committed upstream 2021-11-26.

MFC after:      1 month
2021-12-02 13:35:14 -08:00
Gleb Smirnoff
3cce6164ab ip_input: remove pointless check in INP_RECVIF handling
An mbuf rcvif pointer is supposed to be valid and doesn't
need extra checks.  The code appeared in d314ad7b73.
2021-12-02 11:15:04 -08:00
Gleb Smirnoff
2e27230ff9 tcp_hpts: rewrite inpcb synchronization
Just trust the pcb database, that if we did in_pcbref(), no way
an inpcb can go away.  And if we never put a dropped inpcb on
our queue, and tcp_discardcb() always removes an inpcb to be
dropped from the queue, then any inpcb on the queue is valid.

Now, to solve LOR between inpcb lock and HPTS queue lock do the
following trick.  When we are about to process a certain time
slot, take the full queue of the head list into on stack list,
drop the HPTS lock and work on our queue.  This of course opens
a race when an inpcb is being removed from the on stack queue,
which was already mentioned in comments.  To address this race
introduce generation count into queues.  If we want to remove
an inpcb with generation count mismatch, we can't do that, we
can only mark it with desired new time slot or -1 for remove.

Reviewed by:		rrs
Differential revision:	https://reviews.freebsd.org/D33026
2021-12-02 10:48:49 -08:00
Gleb Smirnoff
f971e79139 tcp_hpts: rename input queue to drop queue and trim dead code
The HPTS input queue is in reality used only for "delayed drops".
When a TCP stack decides to drop a connection on the output path
it can't do that due to locking protocol between main tcp_output()
and stacks.  So, rack/bbr utilize HPTS to drop the connection in
a different context.

In the past the queue could also process input packets in context
of HPTS thread, but now no stack uses this, so remove this
functionality.

Reviewed by:		rrs
Differential revision:	https://reviews.freebsd.org/D33025
2021-12-02 10:48:48 -08:00
Gleb Smirnoff
b0a7c008cb tcp_hpts: make struct tcp_hpts_entry private to the module.
Also, make some of the functions also private to the module. Remove
unused functions discovered after that.

Reviewed by:		rrs
Differential revision:	https://reviews.freebsd.org/D33024
2021-12-02 10:48:48 -08:00
Gleb Smirnoff
50f081ecb7 tcp_hpts: provide tcp_in_hpts().
It will hide some internal HPTS knowledge from the consumers.

Reviewed by:		rrs
Differential revision:	https://reviews.freebsd.org/D33023
2021-12-02 10:48:48 -08:00
Gleb Smirnoff
de2d47842e SMR protection for inpcbs
With introduction of epoch(9) synchronization to network stack the
inpcb database became protected by the network epoch together with
static network data (interfaces, addresses, etc).  However, inpcb
aren't static in nature, they are created and destroyed all the
time, which creates some traffic on the epoch(9) garbage collector.

Fairly new feature of uma(9) - Safe Memory Reclamation allows to
safely free memory in page-sized batches, with virtually zero
overhead compared to uma_zfree().  However, unlike epoch(9), it
puts stricter requirement on the access to the protected memory,
needing the critical(9) section to access it.  Details:

- The database is already build on CK lists, thanks to epoch(9).
- For write access nothing is changed.
- For a lookup in the database SMR section is now required.
  Once the desired inpcb is found we need to transition from SMR
  section to r/w lock on the inpcb itself, with a check that inpcb
  isn't yet freed.  This requires some compexity, since SMR section
  itself is a critical(9) section.  The complexity is hidden from
  KPI users in inp_smr_lock().
- For a inpcb list traversal (a pcblist sysctl, or broadcast
  notification) also a new KPI is provided, that hides internals of
  the database - inp_next(struct inp_iterator *).

Reviewed by:		rrs
Differential revision:	https://reviews.freebsd.org/D33022
2021-12-02 10:48:48 -08:00
Gleb Smirnoff
565655f4e3 inpcb: reduce some aliased functions after removal of PCBGROUP.
Reviewed by:		rrs
Differential revision:	https://reviews.freebsd.org/D33021
2021-12-02 10:48:48 -08:00
Gleb Smirnoff
93c67567e0 Remove "options PCBGROUP"
With upcoming changes to the inpcb synchronisation it is going to be
broken. Even its current status after the move of PCB synchronization
to the network epoch is very questionable.

This experimental feature was sponsored by Juniper but ended never to
be used in Juniper and doesn't exist in their source tree [sjg@, stevek@,
jtl@]. In the past (AFAIK, pre-epoch times) it was tried out at Netflix
[gallatin@, rrs@] with no positive result and at Yandex [ae@, melifaro@].

I'm up to resurrecting it back if there is any interest from anybody.

Reviewed by:		rrs
Differential revision:	https://reviews.freebsd.org/D33020
2021-12-02 10:48:48 -08:00
Gleb Smirnoff
1cec1c5831 Allow to compile RSS without PCBGROUP.
Reviewed by:		rrs
Differential revision:	https://reviews.freebsd.org/D33019
2021-12-02 10:48:48 -08:00
Randall Stewart
dcf2dfed26 tcp: unloading a module that is set to default should error.
I just discovered that the return of the EBUSY error was incorrectly
rigged so that you could unload a CC module that was set to default.
Its supposed to be an EBUSY error. Make it so.

Reviewed by: Michael Tuexen
Sponsored by: Netflix Inc.
Differential Revision: https://reviews.freebsd.org/D33229
2021-12-02 06:12:16 -05:00
Michael Tuexen
13c196a41e sctp: improve handling of assoc ids in socket options
For socket options related to local and remote addresses providing
generic association ids does not make sense. Report EINVAL in this
case.

MFC after:	1 week
2021-12-01 14:54:55 +01:00
Michael Tuexen
a01b8859cb sctp: cleanup, no functional change intended.
MFC after:	1 week
2021-12-01 09:19:40 +01:00
Gordon Bergling
1dadeab367 netinet: Fix a common typo in source code comments
- s/segement/segment/

MFC after:	3 days
2021-11-30 10:37:20 +01:00
Gordon Bergling
27c4abc7cd inet(3): Fix two typos in sysctl descriptions
- s/sequental/sequential/

MFC after:	3 days
2021-11-30 10:21:47 +01:00
Gordon Bergling
b4aa9cb217 tcp(4): Fix a typo in a sysctl description
- s/entires/entries/

MFC after:	3 days
2021-11-30 07:17:30 +01:00
Michael Tuexen
147bf5e930 tcp: Don't try to upgrade a read lock just for logging
Reviewed by:		glebius, lstewart, rrs
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D33098
2021-11-29 13:48:40 +01:00
Michael Tuexen
3c1ba6f394 sctp: improve consistency, no functional change intended 2021-11-26 12:53:43 +01:00
Michael Tuexen
0906362646 sctp: add some asserts, no functional changes intended
This might help in narrowing down
https://syzkaller.appspot.com/bug?id=fbd79abaec55f5aede63937182f4247006ea883b
2021-11-26 12:19:33 +01:00
Mark Johnston
44775b163b netinet: Remove unneeded mb_unmapped_to_ext() calls
in_cksum_skip() now handles unmapped mbufs on platforms where they're
permitted.

Reviewed by:	glebius, jhb
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D33097
2021-11-24 13:31:16 -05:00
Mark Johnston
0d9c3423f5 netinet: Implement in_cksum_skip() using m_apply()
This allows it to work with unmapped mbufs.  In particular,
in_cksum_skip() calls no longer need to be preceded by calls to
mb_unmapped_to_ext() to avoid a page fault.

PR:		259645
Reviewed by:	gallatin, glebius, jhb
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D33096
2021-11-24 13:31:16 -05:00
Mark Johnston
ecbbe83144 netinet: Deduplicate most in_cksum() implementations
in_cksum() and related routines are implemented separately for each
platform, but only i386 and arm have optimized versions.  Other
platforms' copies of in_cksum.c are identical except for style
differences and support for big-endian CPUs.

Deduplicate the implementations for the rest of the platforms.  This
will make it easier to implement in_cksum() for unmapped mbufs.  On arm
and i386, define HAVE_MD_IN_CKSUM to mean that the MI implementation is
not to be compiled.

No functional change intended.

Reviewed by:	kp, glebius
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D33095
2021-11-24 13:31:16 -05:00
Mark Johnston
5195bcc212 netinet: Remove in_cksum.c
It does not get compiled into the kernel.  No functional change
inteneded.

Reviewed by:	kp, glebius, cy
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D33094
2021-11-24 13:31:16 -05:00
Gordon Bergling
b4fbc855a5 cc_newreno(4): Fix a typo in a source code comment
- s/conditons/conditions/

MFC after:	3 days
2021-11-19 19:16:02 +01:00
Gleb Smirnoff
ff94500855 Add tcp_freecb() - single place to free tcpcb.
Until this change there were two places where we would free tcpcb -
tcp_discardcb() in case if all timers are drained and tcp_timer_discard()
otherwise.  They were pretty much copy-n-paste, except that in the
default case we would run tcp_hc_update().  Merge this into single
function tcp_freecb() and move new short version of tcp_timer_discard()
to tcp_timer.c and make it static.

Reviewed by:		rrs, hselasky
Differential revision:	https://reviews.freebsd.org/D32965
2021-11-18 20:27:45 -08:00
Gleb Smirnoff
fb8588d2cb tcp_timewait: use on stack struct tcptw as last resort
In case we failed to uma_zalloc() and also failed to reuse with
tcp_tw_2msl_scan(), then just use on stack tcptw.  This will allow
to run through tcp_twrespond() and standard tcpcb discard routine.

Reviewed by:		rrs
Differential revision:	https://reviews.freebsd.org/D32965
2021-11-18 20:27:45 -08:00
Randall Stewart
97e28f0f58 tcp: Rack ack war with a mis-behaving firewall or nat with resets.
Previously we added ack-war prevention for misbehaving firewalls. This is
where the f/w or nat messes up its sequence numbers and causes an ack-war.
There is yet another type of ack war that we have found in the wild that is
like unto this. Basically the f/w or nat gets a ack (keep-alive probe or such)
and instead of turning the ack/seq around and adding a TH_RST it does something
real stupid and sends a new packet with seq=0. This of course triggers the challenge
ack in the reset processing which then sends in a challenge ack (if the seq=0 is within
the range of possible sequence numbers allowed by the challenge) and then we rinse-repeat.

This will add the needed tweaks (similar to the last ack-war prevention using the same sysctls and counters)
to prevent it and allow say 5 per second by default.

Reviewed by: Michael Tuexen
Sponsored by: Netflix Inc.
Differential Revision: https://reviews.freebsd.org/D32938
2021-11-17 09:45:51 -05:00
Mark Johnston
756bb50b6a sctp: Remove now-unneeded mb_unmapped_to_ext() calls
sctp_delayed_checksum() now handles unmapped mbufs, thanks to m_apply().

No functional change intended.

Reviewed by:	tuexen
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D32942
2021-11-16 13:38:09 -05:00
Mark Johnston
b4d758a0cc sctp: Use m_apply() to calcuate a checksum for an mbuf chain
m_apply() works on unmapped mbufs, so this will let us elide
mb_unmapped_to_ext() calls preceding sctp_calculate_cksum() calls in
the network stack.

Modify sctp_calculate_cksum() to assume it's passed an mbuf header.
This assumption appears to be true in practice, and we need to know the
full length of the chain.

No functional change intended.

Reviewed by:	tuexen, jhb
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D32941
2021-11-16 13:36:30 -05:00
Mike Karels
2f35e7d9fa kernel: partially revert e9efb1125a15, default inet mask
When no mask is supplied to the ioctl adding an Internet interface
address, revert to using the historical class mask rather than a
single default.  Similarly for the NFS bootp code.

MFC after:	3 weeks
Reviewed by:	melifaro glebius
Differential Revision: https://reviews.freebsd.org/D32951
2021-11-14 14:12:25 -06:00
Michael Tuexen
2f62f92e37 tcp: Fix a locking issue related to logging
tcp_respond() is sometimes called with only a read lock.
The logging however, requires a write lock. So either
try to upgrade the lock if needed, or don't log the packet.

Reported by:		syzbot+8151ef969c170f76706b@syzkaller.appspotmail.com
Reported by:		syzbot+eb679adb3304c511c1e4@syzkaller.appspotmail.com
Reviewed by:		markj, rrs
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D32983
2021-11-14 15:04:27 +01:00
Gleb Smirnoff
ef396441ce tcp_usr_detach: revert debugging piece from f5cf1e5f5a.
The code was probably useful during the problem being chased down,
but for brevity makes sense just to return to the original KASSERT.

Reviewed by:		rrs
Differential revision:	https://reviews.freebsd.org/D32968
2021-11-13 08:33:32 -08:00
Gleb Smirnoff
9a06a82455 tcp_timers: check for (INP_TIMEWAIT | INP_DROPPED) only once
All timers keep inpcb locked through their execution.  We need to
check these flags only once.  Checking for INP_TIMEWAIT earlier is
is also safer, since such inpcbs point into tcptw rather than tcpcb,
and any dereferences of inp_ppcb as tcpcb are erroneous.

Reviewed by:		rrs, hselasky
Differential revision:	https://reviews.freebsd.org/D32967
2021-11-13 08:32:06 -08:00
Michael Tuexen
df07bfda67 tcp: Fix a locking issue
INP_WLOCK_RECHECK_CLEANUP() and INP_WLOCK_RECHECK() might return
from the function, so any locks held must be released.

Reported by:		syzbot+b1a888df08efaa7b4bf1@syzkaller.appspotmail.com
Reviewed by:		markj
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D32975
2021-11-12 22:13:50 +01:00
Mark Johnston
034a924009 tcp: Ensure that vnets have an initialized V_default_cc_ptr
This causes new vnets to inherit the cc algorithm from vnet0. This is a
temporary patch to fix vnet jail creation.

With encouragement from: glebius
Fixes: b8d60729de ("tcp: Congestion control cleanup.")
Differential Revision: https://reviews.freebsd.org/D32970
2021-11-12 12:18:12 -07:00
Warner Losh
7e3c9ec906 tcp: better congestion control defaults
Define CC_NEWRENO in all the appropriate DEFAULTS and std.* config
files. It's the default congestion control algorithm.  Add code to cc.c
so that CC_DEFAULT is "newreno" if it's not overriden in the config
file.

Sponsored by: Netflix
Fixes: b8d60729de ("tcp: Congestion control cleanup.")
Revired by: manu, hselasky, jhb, glebius, tuexen
Differential Revision:	https://reviews.freebsd.org/D32964
2021-11-12 12:16:11 -07:00
Gleb Smirnoff
2ce85919bb Add net.inet.ip.source_address_validation
Drop packets arriving from the network that have our source IP
address.  If maliciously crafted they can create evil effects
like an RST exchange between two of our listening TCP ports.
Such packets just can't be legitimate.  Enable the tunable
by default.  Long time due for a modern Internet host.

Reviewed by:		donner, melifaro
Differential revision:	https://reviews.freebsd.org/D32914
2021-11-12 09:00:33 -08:00