freebsd-dev

Author	SHA1	Message	Date
Gleb Smirnoff	d74b7baeb0	ifnet_byindex() actually requires network epoch Sweep over potentially unsafe calls to ifnet_byindex() and wrap them in epoch. Most of the code touched remains unsafe, as the returned pointer is being used after epoch exit. Mark that with a comment. Validate the index argument inside the function, reducing argument validation requirement from the callers and making V_if_index private to if.c. Reviewed by: melifaro Differential revision: https://reviews.freebsd.org/D33263	2021-12-06 09:32:31 -08:00
Randall Stewart	dadbc04250	tcp: rack fails to send out a TLP after a MTU change When rack sends out a TLP it sets up various state to make sure it avoids the cwnd (its been more than 1 RTT since our last send) and it may at times send new data. If an MTU change as occurred and our cwnd has collapsed we can have a situation where must_retran flag is set and we obey the cwnd thus never sending the TLP and then sitting stuck. This one line fix addresses that problem Reviewed by: Michael Tuexen Sponsored by: Netflix Inc. Differential Revision: https://reviews.freebsd.org/D33231	2021-12-06 09:56:09 -05:00
Gleb Smirnoff	eb93b99d69	in_pcb: delay crfree() down into UMA dtor inpcb lookups, which check inp_cred, work with pcbs that potentially went through in_pcbfree(). So inp_cred should stay valid until SMR guarantees its invisibility to lookups. While here, put the whole inpcb destruction sequence of in_pcbfree(), inpcb_dtor() and inpcb_fini() sequentially. Submitted by: markj Differential revision: https://reviews.freebsd.org/D33273	2021-12-05 10:46:37 -08:00
Michael Tuexen	54912d47b6	sctp: unbreak NOINET6 builds. PR: 260119 Reported by: kostikbel MFC after: 1 week	2021-12-04 19:16:18 +01:00
Michael Tuexen	d79676fb13	sctp: inherit IP level socket options from listening socket Ensure that TTL and TOS values set on a listener get inheritet to the accepted sockets. PR: 260119 MFC after: 1 week	2021-12-03 22:44:01 +01:00
Gleb Smirnoff	36f42c5ebf	tcp_ccalgounload(): initialize the inpcb iterator when curvnet is set Pointy hat to: glebius Fixes: `de2d47842e`	2021-12-03 12:39:56 -08:00
Peter Lei	4c018b5aed	in_pcb: limit the effect of wraparound in TCP random port allocation check The check to see if TCP port allocation should change from random to sequential port allocation mode may incorrectly cause a false positive due to negative wraparound. Example: V_ipport_tcpallocs = 2147483585 (0x7fffffc1) V_ipport_tcplastcount = 2147483553 (0x7fffffa1) V_ipport_randomcps = 100 The original code would compare (2147483585 <= -2147483643) and thus incorrectly move to sequential allocation mode. Compute the delta first before comparing against the desired limit to limit the wraparound effect (since tcplastcount is always a snapshot of a previous tcpallocs).	2021-12-03 12:38:12 -08:00
Michael Tuexen	f32357be53	sctp: use the correct traffic class when sending SCTP/IPv6 packets When sending packets the stcb was used to access the inp and then access the endpoint specific IPv6 level options. This fails when there exists an inp, but no stcb yet. This is the case for sending an INIT-ACK in response to an INIT when no association already exists. Fix this by just providing the inp instead of the stcb. PR: 260120 MFC after: 1 week	2021-12-03 21:36:44 +01:00
Peter Lei	13e3f3349f	in_pcb: fix TCP local ephemeral port accounting Fix logic error causing UDP(-Lite) local ephemeral port bindings to count against the TCP allocation counter, potentially causing TCP to go from random to sequential port allocation mode prematurely.	2021-12-03 12:30:21 -08:00
Gleb Smirnoff	12ae3476f3	tcp_drain(): initialize the inpcb iterator when curvnet is set Reported by: cy Pointy hat to: glebius Fixes: `de2d47842e`	2021-12-02 21:08:30 -08:00
Gleb Smirnoff	651a545143	udp_detach(): fix set but not used warning	2021-12-02 20:12:40 -08:00
Gleb Smirnoff	bd1d085045	udp_multi_input(): the UDP header is only needed for probes Reported by: kib Fixes: `de2d47842e`	2021-12-02 20:12:40 -08:00
Cy Schubert	db0ac6ded6	Revert "wpa: Import wpa_supplicant/hostapd commit 14ab4a816" This reverts commit `266f97b5e9`, reversing changes made to `a10253cffe`. A mismerge of a merge to catch up to main resulted in files being committed which should not have been.	2021-12-02 14:45:04 -08:00
Cy Schubert	266f97b5e9	wpa: Import wpa_supplicant/hostapd commit 14ab4a816 This is the November update to vendor/wpa committed upstream 2021-11-26. MFC after: 1 month	2021-12-02 13:35:14 -08:00
Gleb Smirnoff	3cce6164ab	ip_input: remove pointless check in INP_RECVIF handling An mbuf rcvif pointer is supposed to be valid and doesn't need extra checks. The code appeared in `d314ad7b73`.	2021-12-02 11:15:04 -08:00
Gleb Smirnoff	2e27230ff9	tcp_hpts: rewrite inpcb synchronization Just trust the pcb database, that if we did in_pcbref(), no way an inpcb can go away. And if we never put a dropped inpcb on our queue, and tcp_discardcb() always removes an inpcb to be dropped from the queue, then any inpcb on the queue is valid. Now, to solve LOR between inpcb lock and HPTS queue lock do the following trick. When we are about to process a certain time slot, take the full queue of the head list into on stack list, drop the HPTS lock and work on our queue. This of course opens a race when an inpcb is being removed from the on stack queue, which was already mentioned in comments. To address this race introduce generation count into queues. If we want to remove an inpcb with generation count mismatch, we can't do that, we can only mark it with desired new time slot or -1 for remove. Reviewed by: rrs Differential revision: https://reviews.freebsd.org/D33026	2021-12-02 10:48:49 -08:00
Gleb Smirnoff	f971e79139	tcp_hpts: rename input queue to drop queue and trim dead code The HPTS input queue is in reality used only for "delayed drops". When a TCP stack decides to drop a connection on the output path it can't do that due to locking protocol between main tcp_output() and stacks. So, rack/bbr utilize HPTS to drop the connection in a different context. In the past the queue could also process input packets in context of HPTS thread, but now no stack uses this, so remove this functionality. Reviewed by: rrs Differential revision: https://reviews.freebsd.org/D33025	2021-12-02 10:48:48 -08:00
Gleb Smirnoff	b0a7c008cb	tcp_hpts: make struct tcp_hpts_entry private to the module. Also, make some of the functions also private to the module. Remove unused functions discovered after that. Reviewed by: rrs Differential revision: https://reviews.freebsd.org/D33024	2021-12-02 10:48:48 -08:00
Gleb Smirnoff	50f081ecb7	tcp_hpts: provide tcp_in_hpts(). It will hide some internal HPTS knowledge from the consumers. Reviewed by: rrs Differential revision: https://reviews.freebsd.org/D33023	2021-12-02 10:48:48 -08:00
Gleb Smirnoff	de2d47842e	SMR protection for inpcbs With introduction of epoch(9) synchronization to network stack the inpcb database became protected by the network epoch together with static network data (interfaces, addresses, etc). However, inpcb aren't static in nature, they are created and destroyed all the time, which creates some traffic on the epoch(9) garbage collector. Fairly new feature of uma(9) - Safe Memory Reclamation allows to safely free memory in page-sized batches, with virtually zero overhead compared to uma_zfree(). However, unlike epoch(9), it puts stricter requirement on the access to the protected memory, needing the critical(9) section to access it. Details: - The database is already build on CK lists, thanks to epoch(9). - For write access nothing is changed. - For a lookup in the database SMR section is now required. Once the desired inpcb is found we need to transition from SMR section to r/w lock on the inpcb itself, with a check that inpcb isn't yet freed. This requires some compexity, since SMR section itself is a critical(9) section. The complexity is hidden from KPI users in inp_smr_lock(). - For a inpcb list traversal (a pcblist sysctl, or broadcast notification) also a new KPI is provided, that hides internals of the database - inp_next(struct inp_iterator *). Reviewed by: rrs Differential revision: https://reviews.freebsd.org/D33022	2021-12-02 10:48:48 -08:00
Gleb Smirnoff	565655f4e3	inpcb: reduce some aliased functions after removal of PCBGROUP. Reviewed by: rrs Differential revision: https://reviews.freebsd.org/D33021	2021-12-02 10:48:48 -08:00
Gleb Smirnoff	93c67567e0	Remove "options PCBGROUP" With upcoming changes to the inpcb synchronisation it is going to be broken. Even its current status after the move of PCB synchronization to the network epoch is very questionable. This experimental feature was sponsored by Juniper but ended never to be used in Juniper and doesn't exist in their source tree [sjg@, stevek@, jtl@]. In the past (AFAIK, pre-epoch times) it was tried out at Netflix [gallatin@, rrs@] with no positive result and at Yandex [ae@, melifaro@]. I'm up to resurrecting it back if there is any interest from anybody. Reviewed by: rrs Differential revision: https://reviews.freebsd.org/D33020	2021-12-02 10:48:48 -08:00
Gleb Smirnoff	1cec1c5831	Allow to compile RSS without PCBGROUP. Reviewed by: rrs Differential revision: https://reviews.freebsd.org/D33019	2021-12-02 10:48:48 -08:00
Randall Stewart	dcf2dfed26	tcp: unloading a module that is set to default should error. I just discovered that the return of the EBUSY error was incorrectly rigged so that you could unload a CC module that was set to default. Its supposed to be an EBUSY error. Make it so. Reviewed by: Michael Tuexen Sponsored by: Netflix Inc. Differential Revision: https://reviews.freebsd.org/D33229	2021-12-02 06:12:16 -05:00
Michael Tuexen	13c196a41e	sctp: improve handling of assoc ids in socket options For socket options related to local and remote addresses providing generic association ids does not make sense. Report EINVAL in this case. MFC after: 1 week	2021-12-01 14:54:55 +01:00
Michael Tuexen	a01b8859cb	sctp: cleanup, no functional change intended. MFC after: 1 week	2021-12-01 09:19:40 +01:00
Gordon Bergling	1dadeab367	netinet: Fix a common typo in source code comments - s/segement/segment/ MFC after: 3 days	2021-11-30 10:37:20 +01:00
Gordon Bergling	27c4abc7cd	inet(3): Fix two typos in sysctl descriptions - s/sequental/sequential/ MFC after: 3 days	2021-11-30 10:21:47 +01:00
Gordon Bergling	b4aa9cb217	tcp(4): Fix a typo in a sysctl description - s/entires/entries/ MFC after: 3 days	2021-11-30 07:17:30 +01:00
Michael Tuexen	147bf5e930	tcp: Don't try to upgrade a read lock just for logging Reviewed by: glebius, lstewart, rrs Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D33098	2021-11-29 13:48:40 +01:00
Michael Tuexen	3c1ba6f394	sctp: improve consistency, no functional change intended	2021-11-26 12:53:43 +01:00
Michael Tuexen	0906362646	sctp: add some asserts, no functional changes intended This might help in narrowing down https://syzkaller.appspot.com/bug?id=fbd79abaec55f5aede63937182f4247006ea883b	2021-11-26 12:19:33 +01:00
Mark Johnston	44775b163b	netinet: Remove unneeded mb_unmapped_to_ext() calls in_cksum_skip() now handles unmapped mbufs on platforms where they're permitted. Reviewed by: glebius, jhb MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D33097	2021-11-24 13:31:16 -05:00
Mark Johnston	0d9c3423f5	netinet: Implement in_cksum_skip() using m_apply() This allows it to work with unmapped mbufs. In particular, in_cksum_skip() calls no longer need to be preceded by calls to mb_unmapped_to_ext() to avoid a page fault. PR: 259645 Reviewed by: gallatin, glebius, jhb MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D33096	2021-11-24 13:31:16 -05:00
Mark Johnston	ecbbe83144	netinet: Deduplicate most in_cksum() implementations in_cksum() and related routines are implemented separately for each platform, but only i386 and arm have optimized versions. Other platforms' copies of in_cksum.c are identical except for style differences and support for big-endian CPUs. Deduplicate the implementations for the rest of the platforms. This will make it easier to implement in_cksum() for unmapped mbufs. On arm and i386, define HAVE_MD_IN_CKSUM to mean that the MI implementation is not to be compiled. No functional change intended. Reviewed by: kp, glebius MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D33095	2021-11-24 13:31:16 -05:00
Mark Johnston	5195bcc212	netinet: Remove in_cksum.c It does not get compiled into the kernel. No functional change inteneded. Reviewed by: kp, glebius, cy MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D33094	2021-11-24 13:31:16 -05:00
Gordon Bergling	b4fbc855a5	cc_newreno(4): Fix a typo in a source code comment - s/conditons/conditions/ MFC after: 3 days	2021-11-19 19:16:02 +01:00
Gleb Smirnoff	ff94500855	Add tcp_freecb() - single place to free tcpcb. Until this change there were two places where we would free tcpcb - tcp_discardcb() in case if all timers are drained and tcp_timer_discard() otherwise. They were pretty much copy-n-paste, except that in the default case we would run tcp_hc_update(). Merge this into single function tcp_freecb() and move new short version of tcp_timer_discard() to tcp_timer.c and make it static. Reviewed by: rrs, hselasky Differential revision: https://reviews.freebsd.org/D32965	2021-11-18 20:27:45 -08:00
Gleb Smirnoff	fb8588d2cb	tcp_timewait: use on stack struct tcptw as last resort In case we failed to uma_zalloc() and also failed to reuse with tcp_tw_2msl_scan(), then just use on stack tcptw. This will allow to run through tcp_twrespond() and standard tcpcb discard routine. Reviewed by: rrs Differential revision: https://reviews.freebsd.org/D32965	2021-11-18 20:27:45 -08:00
Randall Stewart	97e28f0f58	tcp: Rack ack war with a mis-behaving firewall or nat with resets. Previously we added ack-war prevention for misbehaving firewalls. This is where the f/w or nat messes up its sequence numbers and causes an ack-war. There is yet another type of ack war that we have found in the wild that is like unto this. Basically the f/w or nat gets a ack (keep-alive probe or such) and instead of turning the ack/seq around and adding a TH_RST it does something real stupid and sends a new packet with seq=0. This of course triggers the challenge ack in the reset processing which then sends in a challenge ack (if the seq=0 is within the range of possible sequence numbers allowed by the challenge) and then we rinse-repeat. This will add the needed tweaks (similar to the last ack-war prevention using the same sysctls and counters) to prevent it and allow say 5 per second by default. Reviewed by: Michael Tuexen Sponsored by: Netflix Inc. Differential Revision: https://reviews.freebsd.org/D32938	2021-11-17 09:45:51 -05:00
Mark Johnston	756bb50b6a	sctp: Remove now-unneeded mb_unmapped_to_ext() calls sctp_delayed_checksum() now handles unmapped mbufs, thanks to m_apply(). No functional change intended. Reviewed by: tuexen MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D32942	2021-11-16 13:38:09 -05:00
Mark Johnston	b4d758a0cc	sctp: Use m_apply() to calcuate a checksum for an mbuf chain m_apply() works on unmapped mbufs, so this will let us elide mb_unmapped_to_ext() calls preceding sctp_calculate_cksum() calls in the network stack. Modify sctp_calculate_cksum() to assume it's passed an mbuf header. This assumption appears to be true in practice, and we need to know the full length of the chain. No functional change intended. Reviewed by: tuexen, jhb MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D32941	2021-11-16 13:36:30 -05:00
Mike Karels	2f35e7d9fa	kernel: partially revert e9efb1125a15, default inet mask When no mask is supplied to the ioctl adding an Internet interface address, revert to using the historical class mask rather than a single default. Similarly for the NFS bootp code. MFC after: 3 weeks Reviewed by: melifaro glebius Differential Revision: https://reviews.freebsd.org/D32951	2021-11-14 14:12:25 -06:00
Michael Tuexen	2f62f92e37	tcp: Fix a locking issue related to logging tcp_respond() is sometimes called with only a read lock. The logging however, requires a write lock. So either try to upgrade the lock if needed, or don't log the packet. Reported by: syzbot+8151ef969c170f76706b@syzkaller.appspotmail.com Reported by: syzbot+eb679adb3304c511c1e4@syzkaller.appspotmail.com Reviewed by: markj, rrs Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D32983	2021-11-14 15:04:27 +01:00
Gleb Smirnoff	ef396441ce	tcp_usr_detach: revert debugging piece from `f5cf1e5f5a`. The code was probably useful during the problem being chased down, but for brevity makes sense just to return to the original KASSERT. Reviewed by: rrs Differential revision: https://reviews.freebsd.org/D32968	2021-11-13 08:33:32 -08:00
Gleb Smirnoff	9a06a82455	tcp_timers: check for (INP_TIMEWAIT \| INP_DROPPED) only once All timers keep inpcb locked through their execution. We need to check these flags only once. Checking for INP_TIMEWAIT earlier is is also safer, since such inpcbs point into tcptw rather than tcpcb, and any dereferences of inp_ppcb as tcpcb are erroneous. Reviewed by: rrs, hselasky Differential revision: https://reviews.freebsd.org/D32967	2021-11-13 08:32:06 -08:00
Michael Tuexen	df07bfda67	tcp: Fix a locking issue INP_WLOCK_RECHECK_CLEANUP() and INP_WLOCK_RECHECK() might return from the function, so any locks held must be released. Reported by: syzbot+b1a888df08efaa7b4bf1@syzkaller.appspotmail.com Reviewed by: markj Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D32975	2021-11-12 22:13:50 +01:00
Mark Johnston	034a924009	tcp: Ensure that vnets have an initialized V_default_cc_ptr This causes new vnets to inherit the cc algorithm from vnet0. This is a temporary patch to fix vnet jail creation. With encouragement from: glebius Fixes: `b8d60729de` ("tcp: Congestion control cleanup.") Differential Revision: https://reviews.freebsd.org/D32970	2021-11-12 12:18:12 -07:00
Warner Losh	7e3c9ec906	tcp: better congestion control defaults Define CC_NEWRENO in all the appropriate DEFAULTS and std.* config files. It's the default congestion control algorithm. Add code to cc.c so that CC_DEFAULT is "newreno" if it's not overriden in the config file. Sponsored by: Netflix Fixes: `b8d60729de` ("tcp: Congestion control cleanup.") Revired by: manu, hselasky, jhb, glebius, tuexen Differential Revision: https://reviews.freebsd.org/D32964	2021-11-12 12:16:11 -07:00
Gleb Smirnoff	2ce85919bb	Add net.inet.ip.source_address_validation Drop packets arriving from the network that have our source IP address. If maliciously crafted they can create evil effects like an RST exchange between two of our listening TCP ports. Such packets just can't be legitimate. Enable the tunable by default. Long time due for a modern Internet host. Reviewed by: donner, melifaro Differential revision: https://reviews.freebsd.org/D32914	2021-11-12 09:00:33 -08:00

1 2 3 4 5 ...

7172 Commits