freebsd-dev

Author	SHA1	Message	Date
Bjoern A. Zeeb	945aad9c62	Improve the comment for arpresolve_full() in if_ether.c. No functional changes. MFC after: 6 weeks	2018-11-17 16:13:09 +00:00
Bjoern A. Zeeb	90d99b6587	Retire arpresolve_addr(), which is not used anywhere, from if_ether.c.	2018-11-17 16:08:36 +00:00
Jonathan T. Looney	2157f3c36a	Add some additional length checks to the IPv4 fragmentation code. Specifically, block 0-length fragments, even when the MF bit is clear. Also, ensure that every fragment with the MF bit clear ends at the same offset and that no subsequently-received fragments exceed that offset. Reviewed by: glebius, markj MFC after: 3 days Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D17922	2018-11-16 18:32:48 +00:00
Mark Johnston	86af1d0241	Ensure that IP fragments do not extend beyond IP_MAXPACKET. Such fragments are obviously invalid, and when processed may end up violating the sort order (by offset) of fragments of a given packet. This doesn't appear to be exploitable, however. Reviewed by: emaste Discussed with: jtl MFC after: 3 days Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D17914	2018-11-10 03:00:36 +00:00
Ed Maste	2bfaf585ca	Avoid buffer underwrite in icmp_error icmp_error allocates either an mbuf (with pkthdr) or a cluster depending on the size of data to be quoted in the ICMP reply, but the calculation failed to account for the additional padding that m_align may apply. Include the ip header in the size passed to m_align. On 64-bit archs this will have the net effect of moving everything 4 bytes later in the mbuf or cluster. This will result in slightly pessimal alignment for the ICMP data copy. Also add an assertion that we do not move m_data before the beginning of the mbuf or cluster. Reported by: A reddit user Reviewed by: bz, jtl MFC after: 3 days Security: CVE-2018-17156 Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D17909	2018-11-08 20:17:36 +00:00
Michael Tuexen	8553b984a5	Don't use a function when neither INET nor INET6 are defined. This is a valid case for the userland stack, where this fixes two set-but-not-used warnings in this case. Thanks to Christian Wright for reporting the issue.	2018-11-06 12:55:03 +00:00
Jonathan T. Looney	54e675342b	m_pulldown() may reallocate n. Update the oip pointer after the m_pulldown() call. MFC after: 2 weeks Sponsored by: Netflix	2018-11-02 19:14:15 +00:00
Bjoern A. Zeeb	e2c532f156	carpstats are the last virtualised variable in the file and end up at the end of the vnet_set. The generated code uses an absolute relocation at one byte beyond the end of the carpstats array. This means the relocation for the vnet does not happen for carpstats initialisation and as a result the kernel panics on module load. This problem has only been observed with carp and only on i386. We considered various possible solutions including using linker scripts to add padding to all kernel modules for pcpu and vnet sections. While the symbols (by chance) stay in the order of appearance in the file adding an unused non-file-local variable at the end of the file will extend the size of set_vnet and hence make the absolute relocation for carpstats work (think of this as a single-module set_vnet padding). This is a (tmporary) hack. It is the least intrusive one as we need a timely solution for the upcoming release. We will revisit the problem in HEAD. For a lot more information and the possible alternate solutions please see the PR and the references therein. PR: 230857 MFC after: 3 days	2018-11-01 17:26:18 +00:00
Mark Johnston	d9ff5789be	Remove redundant checks for a NULL lbgroup table. No functional change intended. MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D17108	2018-11-01 15:52:49 +00:00
Mark Johnston	79ee680b65	Improve style in in_pcbinslbgrouphash() and related subroutines. No functional change intended. MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D17107	2018-11-01 15:51:49 +00:00
Michael Tuexen	6999f6975c	Remove debug code which slipped in accidently. MFC after: 4 weeks X-MFC with: r339989 Sponsored by: Netflix, Inc.	2018-11-01 11:41:40 +00:00
Michael Tuexen	099ab39f44	Improve a comment to refer to the actual sections in the TCP specification for the comparisons made. Thanks to lstewart@ for the suggestion. MFC after: 4 weeks Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D17595	2018-11-01 11:35:28 +00:00
Bjoern A. Zeeb	201100c58b	Initial implementation of draft-ietf-6man-ipv6only-flag. This change defines the RA "6" (IPv6-Only) flag which routers may advertise, kernel logic to check if all routers on a link have the flag set and accordingly update a per-interface flag. If all routers agree that it is an IPv6-only link, ether_output_frame(), based on the interface flag, will filter out all ETHERTYPE_IP/ARP frames, drop them, and return EAFNOSUPPORT to upper layers. The change also updates ndp to show the "6" flag, ifconfig to display the IPV6_ONLY nd6 flag if set, and rtadvd to allow announcing the flag. Further changes to tcpdump (contrib code) are availble and will be upstreamed. Tested the code (slightly earlier version) with 2 FreeBSD IPv6 routers, a FreeBSD laptop on ethernet as well as wifi, and with Win10 and OSX clients (which did not fall over with the "6" flag set but not understood). We may also want to (a) implement and RX filter, and (b) over time enahnce user space to, say, stop dhclient from running when the interface flag is set. Also we might want to start IPv6 before IPv4 in the future. All the code is hidden under the EXPERIMENTAL option and not compiled by default as the draft is a work-in-progress and we cannot rely on the fact that IANA will assign the bits as requested by the draft and hence they may change. Dear 6man, you have running code. Discussed with: Bob Hinden, Brian E Carpenter	2018-10-30 20:08:48 +00:00
Mark Johnston	da7d7778b0	Expose some netdump configuration parameters through sysctl. Reviewed by: cem MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D17755	2018-10-29 21:16:26 +00:00
Eugene Grosbein	1a5995cc88	Prevent ip_input() from panicing due to unprotected access to INADDR_HASH. PR: 220078 MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D12457 Tested-by: Cassiano Peixoto and others	2018-10-27 04:59:35 +00:00
Eugene Grosbein	4f1e3122ac	Prevent multicast code from panicing due to unprotected access to INADDR_HASH. PR: 220078 MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D12457 Tested-by: Cassiano Peixoto and others	2018-10-27 04:53:25 +00:00
Michael Tuexen	de00ad05e6	Add initial descriptions for SCTP related MIB variable. This work was mostly done by Marie-Helene Kvello-Aune. MFC after: 3 days Differential Revision: https://reviews.freebsd.org/D3583	2018-10-26 21:04:17 +00:00
Andrey V. Elsukov	8796e291f8	Add the check that current VNET is ready and access to srchash is allowed. This change is similar to r339646. The callback that checks for appearing and disappearing of tunnel ingress address can be called during VNET teardown. To prevent access to already freed memory, add check to the callback and epoch_wait() call to be sure that callback has finished its work. MFC after: 20 days	2018-10-23 13:11:45 +00:00
John Baldwin	74e10fb613	A couple of style fixes in recent TCP changes. - Add a blank line before a block comment to match other block comments in the same function. - Sort the prototype for sbsndptr_adv and fix whitespace between return type and function name. Reviewed by: gallatin, bz Differential Revision: https://reviews.freebsd.org/D17474	2018-10-22 21:17:36 +00:00
Eugene Grosbein	410634efd1	New sysctl: net.inet.icmp.error_keeptags Currently, icmp_error() function copies FIB number from original packet into generated ICMP response but not mbuf_tags(9) chain. This prevents us from easily matching ICMP responses corresponding to tagged original packets by means of packet filter such as ipfw(8). For example, ICMP "time-exceeded in-transit" packets usually generated in response to traceroute probes lose tags attached to original packets. This change adds new sysctl net.inet.icmp.error_keeptags that defaults to 0 to avoid extra overhead when this feature not needed. Set net.inet.icmp.error_keeptags=1 to make icmp_error() copy mbuf_tags from original packet to generated ICMP response. PR: 215874 MFC after: 1 month	2018-10-21 21:29:19 +00:00
Andrey V. Elsukov	f252e3f2f2	Include <sys/eventhandler.h> to fix the build. MFC after: 1 month	2018-10-21 18:39:34 +00:00
Andrey V. Elsukov	19873f4780	Add handling for appearing/disappearing of ingress addresses to if_gre(4). * register handler for ingress address appearing/disappearing; * add new srcaddr hash table for fast softc lookup by srcaddr; * when srcaddr disappears, clear IFF_DRV_RUNNING flag from interface, and set it otherwise; MFC after: 1 month Sponsored by: Yandex LLC Differential Revision: https://reviews.freebsd.org/D17214	2018-10-21 18:13:45 +00:00
Andrey V. Elsukov	009d82ee0f	Add handling for appearing/disappearing of ingress addresses to if_gif(4). * register handler for ingress address appearing/disappearing; * add new srcaddr hash table for fast softc lookup by srcaddr; * when srcaddr disappears, clear IFF_DRV_RUNNING flag from interface, and set it otherwise; * remove the note about ingress address from BUGS section. MFC after: 1 month Sponsored by: Yandex LLC Differential Revision: https://reviews.freebsd.org/D17134	2018-10-21 18:06:15 +00:00
Andrey V. Elsukov	8251c68d5c	Add KPI that can be used by tunneling interfaces to handle IP addresses appearing and disappearing on the host system. Such handling is need, because tunneling interfaces must use addresses, that are configured on the host as ingress addresses for tunnels. Otherwise the system can send spoofed packets with source address, that belongs to foreign host. The KPI uses ifaddr_event_ext event to implement addresses tracking. Tunneling interfaces register event handlers and then they are notified by the kernel, when an address disappears or appears. ifaddr_event_compat() handler from if.c replaced by srcaddr_change_event() in the ip_encap.c MFC after: 1 month Sponsored by: Yandex LLC Differential Revision: https://reviews.freebsd.org/D17134	2018-10-21 17:55:26 +00:00
Andrey V. Elsukov	094d6f8d75	Add IPFW_RULE_JUSTOPTS flag, that is used by ipfw(8) to mark rule, that was added using "new rule format". And then, when the kernel returns rule with this flag, ipfw(8) can correctly show it. Reported by: lev MFC after: 3 weeks Sponsored by: Yandex LLC Differential Revision: https://reviews.freebsd.org/D17373	2018-10-21 15:10:59 +00:00
Andrey V. Elsukov	64d63b1e03	Add ifaddr_event_ext event. It is similar to ifaddr_event, but the handler receives the type of event IFADDR_EVENT_ADD/IFADDR_EVENT_DEL, and the pointer to ifaddr. Also ifaddr_event now is implemented using ifaddr_event_ext handler. MFC after: 3 weeks Sponsored by: Yandex LLC Differential Revision: https://reviews.freebsd.org/D17100	2018-10-21 15:02:06 +00:00
Michael Tuexen	93899d10b4	The handling of RST segments in the SYN-RCVD state exists in the code paths. Both are not consistent and the one on the syn cache code does not conform to the relevant specifications (Page 69 of RFC 793 and Section 4.2 of RFC 5961). This patch fixes this: * The sequence numbers checks are fixed as specified on page Page 69 RFC 793. * The sysctl variable net.inet.tcp.insecure_rst is now honoured and the behaviour as specified in Section 4.2 of RFC 5961. Approved by: re (gjb@) Reviewed by: bz@, glebius@, rrs@, Differential Revision: https://reviews.freebsd.org/D17595 Sponsored by: Netflix, Inc.	2018-10-18 19:21:18 +00:00
Jonathan T. Looney	ac75e35d85	In r338102, the TCP reassembly code was substantially restructured. Prior to this change, the code sometimes used a temporary stack variable to hold details of a TCP segment. r338102 stopped using the variable to hold segments, but did not actually remove the variable. Because the variable is no longer used, we can safely remove it. Approved by: re (gjb)	2018-10-16 14:41:09 +00:00
Bjoern A. Zeeb	4ba16a92c7	In udp_input() when walking the pcblist we can come across an inp marked FREED after the epoch(9) changes. Check once we hold the lock and skip the inp if it is the case. Contrary to IPv6 the locking of the inp is outside the multicast section and hence a single check seems to suffice. PR: 232192 Reviewed by: mmacy, markj Approved by: re (kib) Differential Revision: https://reviews.freebsd.org/D17540	2018-10-12 22:51:45 +00:00
Bjoern A. Zeeb	3afdfcaf33	r217592 moved the check for imo in udp_input() into the conditional block but leaving the variable assignment outside the block, where it is no longer used. Move both the variable and the assignment one block further in. This should result in no functional changes. It will however make upcoming changes slightly easier to apply. Reviewed by: markj, jtl, tuexen Approved by: re (kib) Differential Revision: https://reviews.freebsd.org/D17525	2018-10-12 11:30:46 +00:00
Jonathan T. Looney	13c6ba6d94	There are three places where we return from a function which entered an epoch section without exiting that epoch section. This is bad for two reasons: the epoch section won't exit, and we will leave the epoch tracker from the stack on the epoch list. Fix the epoch leak by making sure we exit epoch sections before returning. Reviewed by: ae, gallatin, mmacy Approved by: re (gjb, kib) Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D17450	2018-10-09 13:26:06 +00:00
Michael Tuexen	3535cdc43e	Avoid truncating unrecognised parameters when reporting them. This resulted in sending malformed packets. Approved by: re (kib@) MFC after: 1 week	2018-10-07 15:13:47 +00:00
Michael Tuexen	3924dfa721	Ensure that the ips_localout counter is incremented for locally generated SCTP packets sent over IPv4. This make the behaviour consistent with IPv6. Reviewed by: ae@, bz@, jtl@ Approved by: re (kib@) MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D17406	2018-10-07 11:26:15 +00:00
Tom Jones	b6e870116f	Convert UDP length to host byte order When getting the number of bytes to checksum make sure to convert the UDP length to host byte order when the entire header is not in the first mbuf. Reviewed by: jtl, tuexen, ae Approved by: re (gjb), jtl (mentor) Differential Revision: https://reviews.freebsd.org/D17357	2018-10-05 12:51:30 +00:00
Ryan Stone	083a010c62	Hold a write lock across udp_notify() With the new route cache feature udp_notify() will modify the inp when it needs to invalidate the route cache. Ensure that we hold a write lock on the inp before calling the function to ensure that multiple threads don't race while trying to invalidate the cache (which previously lead to a page fault). Differential Revision: https://reviews.freebsd.org/D17246 Reviewed by: sbruno, bz, karels Sponsored by: Dell EMC Isilon Approved by: re (gjb)	2018-10-04 22:03:58 +00:00
Michael Tuexen	15a087e551	Mitigate providing a timing signal if the COOKIE or AUTH validation fails. Thanks to jmg@ for reporting the issue, which was discussed in https://admbugs.freebsd.org/show_bug.cgi?id=878 Approved by: re (TBD@) MFC after: 1 week	2018-10-01 14:05:31 +00:00
Michael Tuexen	9d2e3f14c4	After allocating chunks set the fields in a consistent way. This removes two assignments for the flags field being done twice and adds one, which was missing. Thanks to Felix Weinrank for reporting the issue he found by using fuzz testing of the userland stack. Approved by: re (kib@) MFC after: 1 week	2018-10-01 13:09:18 +00:00
Andrey V. Elsukov	384a5c3c28	Add INP_INFO_WUNLOCK_ASSERT() macro and use it instead of INP_INFO_UNLOCK_ASSERT() in TCP-related code. For encapsulated traffic it is possible, that the code is running in net_epoch_preempt section, and INP_INFO_UNLOCK_ASSERT() is very strict assertion for such case. PR: 231428 Reviewed by: mmacy, tuexen Approved by: re (kib) Differential Revision: https://reviews.freebsd.org/D17335	2018-10-01 10:46:00 +00:00
Michael Tuexen	1b084a5e5e	Plug mbuf leak in the SCTP input path in an error case. Approved by: re (kib@) MFC after: 1 week CID: 749312	2018-09-30 21:54:02 +00:00
Michael Tuexen	66bcf0b333	Plug mbuf leaks in the SCTP output path in error cases. Approved by: re (kib@) MFC after: 1 week CID: 1395307	2018-09-30 21:31:33 +00:00
Michael Tuexen	8184648425	Fix the handling of ancillary data for SCTP socket. Implement sctp_process_cmsgs_for_init() and sctp_findassociation_cmsgs() similar to sctp_find_cmsg() to improve consistency and avoid the signed/unsigned issues in sctp_process_cmsgs_for_init() and sctp_findassociation_cmsgs(). Thanks to andrew@ for reporting the problem he found using syzcaller. Approved by: re (kib@) MFC after: 1 week	2018-09-30 16:21:31 +00:00
Michael Tuexen	ae0a9a8850	Increment the corresponding UDP stats counter (udps_opackets) when sending UDP encapsulated SCTP packets. This is consistent with the behaviour that when such packets are received, the corresponding UDP stats counter (udps_ipackets) is incremented. Thanks to Peter Lei for making me aware of this inconsistency. Approved by: re (kib@) MFC after: 1 week	2018-09-30 12:16:06 +00:00
Michael Tuexen	3552f16d82	Fix typo in comment. Reported by: @danfe Approved by: re (kib@) MFC after: 1 week X-MFC: r338941	2018-09-28 19:47:32 +00:00
Michael Tuexen	0277ec9c43	Whitespace changes and fixing a typo. No functional change. Approved by: re (kib@) MFC after: 1 week	2018-09-26 10:24:50 +00:00
Michael Tuexen	078a49a077	Remove the unused parameter 'locked' from the function syncache_respond(). There is no functional change. The parameter became unused in r313330, but wasn't removed. Approved by: re (kib@) MFC after: 1 month Sponsored by: Netflix, Inc.	2018-09-23 16:37:32 +00:00
Andrey V. Elsukov	76b09d1823	Add new field max_hdrsize to struct encap_config. It is currently unused and reserved for future use to keep KBI/KPI. Also add several spare pointers to be able extend structure if it will be needed. Approved by: re (gjb)	2018-09-20 19:45:27 +00:00
Michael Tuexen	ba4704a278	Remove unused code. Approved by: re (kib@) MFC after: 1 week	2018-09-18 10:53:07 +00:00
Michael Tuexen	a8a8a8a808	Fix TCP Fast Open for the TCP RACK stack. * Fix a bug where the SYN handling during established state was applied to a front state. * Move a check for retransmission after the timer handling. This was suppressing timer based retransmissions. * Fix an off-by one byte in the sequence number of retransmissions. * Apply fixes corresponding to https://svnweb.freebsd.org/changeset/base/336934 Reviewed by: rrs@ Approved by: re (kib@) MFC after: 1 month Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D16912	2018-09-12 10:27:58 +00:00
Mark Johnston	54af3d0dac	Fix synchronization of LB group access. Lookups are protected by an epoch section, so the LB group linkage must be a CK_LIST rather than a plain LIST. Furthermore, we were not deferring LB group frees, so in_pcbremlbgrouphash() could race with readers and cause a use-after-free. Reviewed by: sbruno, Johannes Lundberg <johalun0@gmail.com> Tested by: gallatin Approved by: re (gjb) Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D17031	2018-09-10 19:00:29 +00:00
Mark Johnston	a7026c7fd9	Use ratecheck(9) in in_pcbinslbgrouphash(). Reviewed by: bz, Johannes Lundberg <johalun0@gmail.com> Approved by: re (kib) Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D17065	2018-09-07 21:11:41 +00:00
Bjoern A. Zeeb	113c4fad55	The inp_lle field to struct inpcb, along with two "valid" flags for the rt and lle cache were added in r191129 (2009). To my best knowledge they have never been used and route caching has converted the inp_rt field from that commit to inp_route rendering this field and these flags obsolete. Convert the pointer into a spare pointer to not change the size of the structure anymore (and to have a spare pointer) and mark the two fields as unused. Reviewed by: markj, karels Approved by: re (gjb) Differential Revision: https://reviews.freebsd.org/D17062	2018-09-06 19:55:40 +00:00
Bjoern A. Zeeb	6d2b0c0166	Make tcp_hpts.c compile a LINT kernel with options RSS and PCBGROUPS added by adding the missing include files and changing a the type of cpuid which would otherwise cause a false comparison with NETISR_CPUID_NONE. Reviewed by: rrs Approved by: re (marius) Differential Revision: https://reviews.freebsd.org/D16891	2018-09-06 16:11:24 +00:00
Mark Johnston	49365eb433	Define sctp probes only when SCTP is configured. Otherwise the "depends_on provider" guard in sctp.d does not work as intended. Reported by: mjg Reviewed by: tuexen Approved by: re (gjb) Differential Revision: https://reviews.freebsd.org/D17057	2018-09-06 14:15:03 +00:00
Mark Johnston	8be02ee4da	Fix style bugs in in_pcblookup_lbgroup(). No functional change intended. Reviewed by: bz, Johannes Lundberg <johalun0@gmail.com> Approved by: re (rgrimes) Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D17030	2018-09-05 15:04:11 +00:00
Eugene Grosbein	d5d21ad932	Fix "ipfw fwd" to work for incoming IPv4 packets when ip_tryforward() chooses fast forwarding path, as it already works for IPv6 and for both of them on old slow path. PR: 231143 Reviewed by: ae Approved by: re (gjb) MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D17039	2018-09-05 13:59:36 +00:00
Mark Johnston	73ad0b6abf	Use the correct malloc type in in_pcblbgroup_free(). Approved by: re (kib) Sponsored by: The FreeBSD Foundation	2018-09-03 17:39:09 +00:00
Michael Tuexen	c6c0be2765	Fix a shadowed variable warning. Thanks to Peter Lei for reporting the issue. Approved by: re(kib@) MFH: 1 month Sponsored by: Netflix, Inc.	2018-08-24 10:50:19 +00:00
Michael Tuexen	90ab3571d8	Use arc4rand() instead of read_random() in the SCTP and TCP code. This was suggested by jmg@. Reviewed by: delphij@, jmg@, jtl@ MFC after: 1 month Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D16860	2018-08-23 19:10:45 +00:00
Michael Tuexen	4ba1513d1a	Don't use the explicit number 32 for the length of the secrets, use sizeof() or explicit #definesi instead. No functional change. This was suggested by jmg@. MFC after: 1 month XMFC with: r338053 Sponsored by: Netflix, Inc.	2018-08-23 06:03:59 +00:00
Michael Tuexen	1e88cc8b59	Add support for send, receive and state-change DTrace providers for SCTP. They are based on what is specified in the Solaris DTrace manual for Solaris 11.4. Reviewed by: 0mp, dteske, markj Relnotes: yes Differential Revision: https://reviews.freebsd.org/D16839	2018-08-22 21:23:32 +00:00
Matt Macy	d3878608d7	in_mcast: fix copy paste error when clearing flag	2018-08-22 04:09:55 +00:00
Michael Tuexen	5dff1c3845	Enabling the IPPROTO_IPV6 level socket option IPV6_USE_MIN_MTU on a TCP socket resulted in sending fragmented IPV6 packets. This is fixes by reducing the MSS to the appropriate value. In addtion, if the socket option is set before the handshake happens, announce this MSS to the peer. This is not stricly required, but done since TCP is conservative. PR: 173444 Reviewed by: bz@, rrs@ MFC after: 1 month Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D16796	2018-08-21 14:12:30 +00:00
Michael Tuexen	7d4dcc36a8	Fix the inheritance of IPv6 level socket options on TCP sockets. This was broken for IPv6 listening socket, which are not IPV6_ONLY, and the accepted TCP connection was using IPv4. Reviewed by: bz@, rrs@ MFC after: 1 month Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D16792	2018-08-21 14:07:36 +00:00
Michael Tuexen	6ef849e601	Whitespace change.	2018-08-21 13:37:06 +00:00
Michael Tuexen	1a0b021677	Refactor the SHUTDOWN_PENDING state handling. This is not a functional change but a preperation for the upcoming DTrace support. It is necessary to change the state in one logical operation, even if it involves clearing the sub state SHUTDOWN_PENDING. MFC after: 1 month	2018-08-21 13:25:32 +00:00
Bjoern A. Zeeb	10b070c166	GC inc_isipv6; it was added for "temp" compatibility in 2001, r86764 and does not seem to be used.	2018-08-20 20:06:36 +00:00
Randall Stewart	c28440db29	This change represents a substantial restructure of the way we reassembly inbound tcp segments. The old algorithm just blindly dropped in segments without coalescing. This meant that every segment could take up greater and greater room on the linked list of segments. This of course is now subject to a tighter limit (100) of segments which in a high BDP situation will cause us to be a lot more in-efficent as we drop segments beyond 100 entries that we receive. What this restructure does is cause the reassembly buffer to coalesce segments putting an emphasis on the two common cases (which avoid walking the list of segments) i.e. where we add to the back of the queue of segments and where we add to the front. We also have the reassembly buffer supporting a couple of debug options (black box logging as well as counters for code coverage). These are compiled out by default but can be added by uncommenting the defines. Sponsored by: Netflix Inc. Differential Revision: https://reviews.freebsd.org/D16626	2018-08-20 12:43:18 +00:00
Michael Tuexen	8e02b4e00c	Don't expose the uptime via the TCP timestamps. The TCP client side or the TCP server side when not using SYN-cookies used the uptime as the TCP timestamp value. This patch uses in all cases an offset, which is the result of a keyed hash function taking the source and destination addresses and port numbers into account. The keyed hash function is the same a used for the initial TSN. Reviewed by: rrs@ MFC after: 1 month Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D16636	2018-08-19 14:56:10 +00:00
Navdeep Parhar	32d2623ae2	Add the ability to look up the 3b PCP of a VLAN interface. Use it in toe_l2_resolve to fill up the complete vtag and not just the vid. Reviewed by: kib@ MFC after: 1 week Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D16752	2018-08-16 23:46:38 +00:00
Matt Macy	f9be038601	Fix in6_multi double free This is actually several different bugs: - The code is not designed to handle inpcb deletion after interface deletion - add reference for inpcb membership - The multicast address has to be removed from interface lists when the refcount goes to zero OR when the interface goes away - decouple list disconnect from refcount (v6 only for now) - ifmultiaddr can exist past being on interface lists - add flag for tracking whether or not it's enqueued - deferring freeing moptions makes the incpb cleanup code simpler but opens the door wider still to races - call inp_gcmoptions synchronously after dropping the the inpcb lock Fundamentally multicast needs a rewrite - but keep applying band-aids for now. Tested by: kp Reported by: novel, kp, lwhsu	2018-08-15 20:23:08 +00:00
Luiz Otavio O Souza	59b2022f94	Late style follow up on r312770. Submitted by: glebius X-MFC with: r312770 MFC after: 3 days	2018-08-15 15:44:30 +00:00
Jonathan T. Looney	a967df1c8f	Lower the default limits on the IPv4 reassembly queue. In particular, try to ensure that no bucket will have a reassembly queue larger than approximately 100 items. This limits the cost to find the correct reassembly queue when processing an incoming fragment. Due to the low limits on each bucket's length, increase the size of the hash table from 64 to 1024. Reviewed by: jhb Security: FreeBSD-SA-18:10.ip Security: CVE-2018-6923	2018-08-14 17:30:46 +00:00
Jonathan T. Looney	ff790bbad0	Implement a limit on on the number of IPv4 reassembly queues per bucket. There is a hashing algorithm which should distribute IPv4 reassembly queues across the available buckets in a relatively even way. However, if there is a flaw in the hashing algorithm which allows a large number of IPv4 fragment reassembly queues to end up in a single bucket, a per- bucket limit could help mitigate the performance impact of this flaw. Implement such a limit, with a default of twice the maximum number of reassembly queues divided by the number of buckets. Recalculate the limit any time the maximum number of reassembly queues changes. However, allow the user to override the value using a sysctl (net.inet.ip.maxfragbucketsize). Reviewed by: jhb Security: FreeBSD-SA-18:10.ip Security: CVE-2018-6923	2018-08-14 17:23:05 +00:00
Jonathan T. Looney	7b9c5eb0a5	Add a global limit on the number of IPv4 fragments. The IP reassembly fragment limit is based on the number of mbuf clusters, which are a global resource. However, the limit is currently applied on a per-VNET basis. Given enough VNETs (or given sufficient customization of enough VNETs), it is possible that the sum of all the VNET limits will exceed the number of mbuf clusters available in the system. Given the fact that the fragment limit is intended (at least in part) to regulate access to a global resource, the fragment limit should be applied on a global basis. VNET-specific limits can be adjusted by modifying the net.inet.ip.maxfragpackets and net.inet.ip.maxfragsperpacket sysctls. To disable fragment reassembly globally, set net.inet.ip.maxfrags to 0. To disable fragment reassembly for a particular VNET, set net.inet.ip.maxfragpackets to 0. Reviewed by: jhb Security: FreeBSD-SA-18:10.ip Security: CVE-2018-6923	2018-08-14 17:19:49 +00:00
Jonathan T. Looney	5d9bd45518	Improve hashing of IPv4 fragments. Currently, IPv4 fragments are hashed into buckets based on a 32-bit key which is calculated by (src_ip ^ ip_id) and combined with a random seed. However, because an attacker can control the values of src_ip and ip_id, it is possible to construct an attack which causes very deep chains to form in a given bucket. To ensure more uniform distribution (and lower predictability for an attacker), calculate the hash based on a key which includes all the fields we use to identify a reassembly queue (dst_ip, src_ip, ip_id, and the ip protocol) as well as a random seed. Reviewed by: jhb Security: FreeBSD-SA-18:10.ip Security: CVE-2018-6923	2018-08-14 17:15:47 +00:00
Michael Tuexen	0f1346f7f4	Remove a set but not used warning showing up in usrsctp.	2018-08-14 08:32:33 +00:00
Andrey V. Elsukov	62484790e0	Restore ability to send ICMP and ICMPv6 redirects. It was lost when tryforward appeared. Now ip[6]_tryforward will be enabled only when sending redirects for corresponding IP version is disabled via sysctl. Otherwise will be used default forwarding function. PR: 221137 Submitted by: mckay@ MFC after: 2 weeks	2018-08-14 07:54:14 +00:00
Michael Tuexen	839d21d62e	Use the stacb instead of the asoc in state macros. This is not a functional change. Just a preparation for upcoming dtrace state change provider support.	2018-08-13 13:58:45 +00:00
Michael Tuexen	61a2188021	Use consistently the macors to modify the assoc state. No functional change.	2018-08-13 11:56:21 +00:00
Michael Tuexen	812649d86f	Add explicit cast to silence a warning for the userland stack. Thanks to Felix Weinrank for providing the patch.	2018-08-12 14:05:15 +00:00
Devin Teske	ab9ed8a1bd	Fix misspellings of transmitter/transmitted Reviewed by: emaste, bcr Sponsored by: Smule, Inc. Differential Revision: https://reviews.freebsd.org/D16025	2018-08-10 20:37:32 +00:00
Andrey V. Elsukov	16bbf600d9	Remove unneeded ipsec-related includes. Reviewed by: rrs Differential Revision: https://reviews.freebsd.org/D16637	2018-08-10 07:24:01 +00:00
Leandro Lupori	c8e2123b6a	[ppc] Fix kernel panic when using BOOTP_NFSROOT On PowerPC (and possibly other architectures), that doesn't use EARLY_AP_STARTUP, the config task queue may be used initialized. This was observed while trying to mount the root fs from NFS, as reported here: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=230168. This patch has 2 main changes: 1- Perform a basic initialization of qgroup_config, similar to what is done in taskqgroup_adjust, but simpler. This makes qgroup_config ready to be used during NFS root mount. 2- When EARLY_AP_STARTUP is not used, call inm_init() and in6m_init() right before SI_SUB_ROOT_CONF, because bootp needs to send multicast packages to request an IP. PR: Bug 230168 Reported by: sbruno Reviewed by: jhibbits, mmacy, sbruno Approved by: jhibbits Differential Revision: D16633	2018-08-09 14:04:51 +00:00
Randall Stewart	d18ea344e6	Fix a small bug in rack where it will end up sending the FIN twice. Sponsored by: Netflix Inc. Differential Revision: https://reviews.freebsd.org/D16604	2018-08-08 13:36:49 +00:00
Jonathan T. Looney	95a914f631	Address concerns about CPU usage while doing TCP reassembly. Currently, the per-queue limit is a function of the receive buffer size and the MSS. In certain cases (such as connections with large receive buffers), the per-queue segment limit can be quite large. Because we process segments as a linked list, large queues may not perform acceptably. The better long-term solution is to make the queue more efficient. But, in the short-term, we can provide a way for a system administrator to set the maximum queue size. We set the default queue limit to 100. This is an effort to balance performance with a sane resource limit. Depending on their environment, goals, etc., an administrator may choose to modify this limit in either direction. Reviewed by: jhb Approved by: so Security: FreeBSD-SA-18:08.tcp Security: CVE-2018-6922	2018-08-06 17:36:57 +00:00
Randall Stewart	936b2b64ae	This fixes a bug in Rack where we were not properly using the correct value for Delayed Ack. Sponsored by: Netflix Inc. Differential Revision: https://reviews.freebsd.org/D16579	2018-08-06 09:22:07 +00:00
Gleb Smirnoff	cc7963191d	Now that after r335979 the kernel addresses in API structures are fixed size, there is no reason left for the unions. Discussed with: brooks	2018-08-04 00:03:21 +00:00
Michael Tuexen	7bda966394	Add a dtrace provider for UDP-Lite. The dtrace provider for UDP-Lite is modeled after the UDP provider. This fixes the bug that UDP-Lite packets were triggering the UDP provider. Thanks to dteske@ for providing the dwatch module. Reviewed by: dteske@, markj@, rrs@ Relnotes: yes Differential Revision: https://reviews.freebsd.org/D16377	2018-07-31 22:56:03 +00:00
Michael Tuexen	51e08d53ae	Fix INET only builds. r336940 introduced an "unused variable" warning on platforms which support INET, but not INET6, like MALTA and MALTA64 as reported by Mark Millard. Improve the #ifdefs to address this issue. Sponsored by: Netflix, Inc.	2018-07-31 06:27:05 +00:00
Michael Tuexen	888973f5ae	Allow implicit TCP connection setup for TCP/IPv6. TCP/IPv4 allows an implicit connection setup using sendto(), which is used for TTCP and TCP fast open. This patch adds support for TCP/IPv6. While there, improve some tests for detecting multicast addresses, which are mapped. Reviewed by: bz@, kbowling@, rrs@ Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D16458	2018-07-30 21:27:26 +00:00
Michael Tuexen	e2662978b8	Send consistent SEG.WIN when using timewait codepath for TCP. When sending TCP segments from the timewait code path, a stored value of the last sent window is used. Use the same code for computing this in the timewait code path as in the main code path used in tcp_output() to avoiv inconsistencies. Reviewed by: rrs@ MFC after: 1 month Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D16503	2018-07-30 21:13:42 +00:00
Michael Tuexen	8db239dc6b	Fix some TCP fast open issues. The following issues are fixed: * Whenever a TCP server with TCP fast open enabled, calls accept(), recv(), send(), and close() before the TCP-ACK segment has been received, the TCP connection is just dropped and the reception of the TCP-ACK segment triggers the sending of a TCP-RST segment. * Whenever a TCP server with TCP fast open enabled, calls accept(), recv(), send(), send(), and close() before the TCP-ACK segment has been received, the first byte provided in the second send call is not transferred. * Whenever a TCP client with TCP fast open enabled calls sendto() followed by close() the TCP connection is just dropped. Reviewed by: jtl@, kbowling@, rrs@ Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D16485	2018-07-30 20:35:50 +00:00
Michael Tuexen	6138da62a9	Add missing send/recv dtrace probes for TCP. These missing probe are mostly in the syncache and timewait code. Reviewed by: markj@, rrs@ MFC after: 1 month Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D16369	2018-07-30 20:13:38 +00:00
Alan Somers	6040822c4e	Make timespecadd(3) and friends public The timespecadd(3) family of macros were imported from NetBSD back in r35029. However, they were initially guarded by #ifdef _KERNEL. In the meantime, we have grown at least 28 syscalls that use timespecs in some way, leading many programs both inside and outside of the base system to redefine those macros. It's better just to make the definitions public. Our kernel currently defines two-argument versions of timespecadd and timespecsub. NetBSD, OpenBSD, and FreeDesktop.org's libbsd, however, define three-argument versions. Solaris also defines a three-argument version, but only in its kernel. This revision changes our definition to match the common three-argument version. Bump _FreeBSD_version due to the breaking KPI change. Discussed with: cem, jilles, ian, bde Differential Revision: https://reviews.freebsd.org/D14725	2018-07-30 15:46:40 +00:00
Randall Stewart	4ad5b7a0ac	This fixes a hole where rack could end up sending an invalid segment into the reassembly queue. This would happen if you enabled the data after close option. Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D16453	2018-07-30 10:23:29 +00:00
Andrew Turner	1e0582fd55	icmp_quotelen was accidentially changes in r336676, undo this. Sponsored by: DARPA, AFRL	2018-07-24 16:45:01 +00:00
Andrew Turner	5f901c92a8	Use the new VNET_DEFINE_STATIC macro when we are defining static VNET variables. Reviewed by: bz Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D16147	2018-07-24 16:35:52 +00:00
Randall Stewart	399973c33d	Delete the example tcp stack "fastpath" which was only put in has an example. Sponsored by: Netflix inc. Differential Revision: https://reviews.freebsd.org/D16420	2018-07-24 14:55:47 +00:00
Matt Macy	e5e3e746fe	Fix a potential use after free in getsockopt() access to inp_options Discussed with: jhb Reviewed by: sbruno, transport MFC after: 2 weeks Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D14621	2018-07-22 20:02:14 +00:00
Matt Macy	2269988749	NULL out cc_data in pluggable TCP {cc}_cb_destroy When ABE was added (rS331214) to NewReno and leak fixed (rS333699) , it now has a destructor (newreno_cb_destroy) for per connection state. Other congestion controls may allocate and free cc_data on entry and exit, but the field is never explicitly NULLed if moving back to NewReno which only internally allocates stateful data (no entry contstructor) resulting in a situation where newreno_cb_destory might be called on a junk pointer. - NULL out cc_data in the framework after calling {cc}_cb_destroy - free(9) checks for NULL so there is no need to perform not NULL checks before calling free. - Improve a comment about NewReno in tcp_ccalgounload This is the result of a debugging session from Jason Wolfe, Jason Eggleston, and mmacy@ and very helpful insight from lstewart@. Submitted by: Kevin Bowling Reviewed by: lstewart Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D16282	2018-07-22 05:37:58 +00:00
Michael Tuexen	34fc9072ce	Set the IPv4 version in the IP header for UDP and UDPLite.	2018-07-21 02:14:13 +00:00
Michael Tuexen	e1526d5a5b	Add missing dtrace probes for received UDP packets. Fire UDP receive probes when a packet is received and there is no endpoint consuming it. Fire the probe also if the TTL of the received packet is smaller than the minimum required by the endpoint. Clarify also in the man page, when the probe fires. Reviewed by: dteske@, markj@, rrs@ Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D16046	2018-07-20 15:32:20 +00:00
Michael Tuexen	0053ed28ff	Whitespace changes due to changes in ident.	2018-07-19 20:16:33 +00:00
Michael Tuexen	b0471b4b95	Revert https://svnweb.freebsd.org/changeset/base/336503 since I also ran the export script with different parameters.	2018-07-19 20:11:14 +00:00
Michael Tuexen	7679e49dd4	Whitespace changes due to change if ident.	2018-07-19 19:33:42 +00:00
Randall Stewart	8de9ac5eec	Bump the ICMP echo limits to match the RFC Reviewed by: tuexen Sponsored by: Netflix Inc. Differential Revision: https://reviews.freebsd.org/D16333	2018-07-18 22:49:53 +00:00
Andrey V. Elsukov	acf673edf0	Move invoking of callout_stop(&lle->lle_timer) into llentry_free(). This deduplicates the code a bit, and also implicitly adds missing callout_stop() to in[6]_lltable_delete_entry() functions. PR: 209682, 225927 Submitted by: hselasky (previous version) MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D4605	2018-07-17 11:33:23 +00:00
Sean Bruno	c8b1bdc31c	There was quite a bit of feedback on r336282 that has led to the submitter to want to revert it.	2018-07-14 23:53:51 +00:00
Sean Bruno	179a28b098	Fixup memory management for fetching options in ip_ctloutput() Submitted by: Jason Eggleston <jason@eggnet.com> Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D14621	2018-07-14 16:19:46 +00:00
Mark Johnston	aaf268f9f6	Remove a duplicate check. PR: 229663 Submitted by: David Binderman <dcb314@hotmail.com> MFC after: 3 days	2018-07-11 14:54:56 +00:00
Brooks Davis	3a20f06a1c	Use uintptr_t alone when assigning to kvaddr_t variables. Suggested by: jhb	2018-07-10 13:03:06 +00:00
Michael Tuexen	c9da58534d	Add support for printing the TCP FO client-side cookie cache via the sysctl interface. This is similar to the TCP host cache. Reviewed by: pkelsey@, kbowling@ Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D14554	2018-07-10 10:50:43 +00:00
Michael Tuexen	a026a53a76	Use appropriate MSS value when populating the TCP FO client cookie cache When a client receives a SYN-ACK segment with a TFP fast open cookie, but without an MSS option, an MSS value from uninitialised stack memory is used. This patch ensures that in case no MSS option is included in the SYN-ACK, the appropriate value as given in RFC 7413 is used. Reviewed by: kbowling@ Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D16175	2018-07-10 10:42:48 +00:00
Steven Hartland	65c3a353e6	Removed pointless NULL check Removed pointless NULL check after malloc with M_WAITOK which can never return NULL. Sponsored by: Multiplay	2018-07-10 08:05:32 +00:00
Andrey V. Elsukov	f7c4fdee1a	Add "record-state", "set-limit" and "defer-action" rule options to ipfw. "record-state" is similar to "keep-state", but it doesn't produce implicit O_PROBE_STATE opcode in a rule. "set-limit" is like "limit", but it has the same feature as "record-state", it is single opcode without implicit O_PROBE_STATE opcode. "defer-action" is targeted to be used with dynamic states. When rule with this opcode is matched, the rule's action will not be executed, instead dynamic state will be created. And when this state will be matched by "check-state", then rule action will be executed. This allows create a more complicated rulesets. Submitted by: lev MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D1776	2018-07-09 11:35:18 +00:00
Michael Tuexen	5f1347d7c9	Allow alternate TCP stack to populate the TCP FO client cookie cache. Without this patch, TCP FO could be used when using alternate TCP stack, but only existing entires in the TCP client cookie cache could be used. This cache was not populated by connections using alternate TCP stacks. Sponsored by: Netflix, Inc.	2018-07-07 12:28:16 +00:00
Michael Tuexen	c556884f8e	When initializing the TCP FO client cookie cache, take into account whether the TCP FO support is enabled or not for the client side. The code in tcp_fastopen_init() implicitly assumed that the sysctl variable V_tcp_fastopen_client_enable was initialized to 0. This was initially true, but was changed in r335610, which unmasked this bug. Thanks to Pieter de Goeje for reporting the issue on freebsd-net@	2018-07-07 11:18:26 +00:00
Brooks Davis	5c5e39e3d5	One more 32-bit fix for r335979. Reported by: tuexen	2018-07-06 13:34:45 +00:00
Brooks Davis	7524b4c14b	Correct breakage on 32-bit platforms from r335979.	2018-07-06 10:03:33 +00:00
Andrew Turner	2bf9501287	Create a new macro for static DPCPU data. On arm64 (and possible other architectures) we are unable to use static DPCPU data in kernel modules. This is because the compiler will generate PC-relative accesses, however the runtime-linker expects to be able to relocate these. In preparation to fix this create two macros depending on if the data is global or static. Reviewed by: bz, emaste, markj Sponsored by: ABT Systems Ltd Differential Revision: https://reviews.freebsd.org/D16140	2018-07-05 17:13:37 +00:00
Brooks Davis	f38b68ae8a	Make struct xinpcb and friends word-size independent. Replace size_t members with ksize_t (uint64_t) and pointer members (never used as pointers in userspace, but instead as unique idenitifiers) with kvaddr_t (uint64_t). This makes the structs identical between 32-bit and 64-bit ABIs. On 64-bit bit systems, the ABI is maintained. On 32-bit systems, this is an ABI breaking change. The ABI of most of these structs was previously broken in r315662. This also imposes a small API change on userspace consumers who must handle kernel pointers becoming virtual addresses. PR: 228301 (exp-run by antoine) Reviewed by: jtl, kib, rwatson (various versions) Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D15386	2018-07-05 13:13:48 +00:00
Hiroki Sato	5ba05d3d0e	- Fix a double unlock in inp_block_unblock_source() and lock leakage in inp_leave_group() which caused a panic. - Make order of CTR1() and IN_MULTI_LIST_LOCK() consistent around inm_merge().	2018-07-04 06:47:34 +00:00
Matt Macy	6573d7580b	epoch(9): allow preemptible epochs to compose - Add tracker argument to preemptible epochs - Inline epoch read path in kernel and tied modules - Change in_epoch to take an epoch as argument - Simplify tfb_tcp_do_segment to not take a ti_locked argument, there's no longer any benefit to dropping the pcbinfo lock and trying to do so just adds an error prone branchfest to these functions - Remove cases of same function recursion on the epoch as recursing is no longer free. - Remove the the TAILQ_ENTRY and epoch_section from struct thread as the tracker field is now stack or heap allocated as appropriate. Tested by: pho and Limelight Networks Reviewed by: kbowling at llnw dot com Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D16066	2018-07-04 02:47:16 +00:00
Matt Macy	99208b820f	inpcb: don't gratuitously defer frees Don't defer frees in sysctl handlers. It isn't necessary and it just confuses things. revert: r333911, r334104, and r334125 Requested by: jtl	2018-07-02 05:19:44 +00:00
Kristof Provost	0d3d234cd1	carp: Set DSCP value CS7 Update carp to set DSCP value CS7(Network Traffic) in the flowlabel field of packets by default. Currently carp only sets TOS_LOWDELAY in IPv4 which was deprecated in 1998. This also implements sysctl that can revert carp back to it's old behavior if desired. This will allow implementation of QOS on modern network devices to make sure carp packets aren't dropped during interface contention. Submitted by: Nick Wolff <darkfiberiru AT gmail.com> Reviewed by: kp, mav (earlier version) Differential Revision: https://reviews.freebsd.org/D14536	2018-07-01 08:37:07 +00:00
Andrey V. Elsukov	6e081509db	Add NULL pointer check. encap_lookup_t method can be invoked by IP encap subsytem even if none of gif/gre/me interfaces are exist. Hash tables are allocated on demand, when first interface is created. So, make NULL pointer check before doing access to hash table. PR: 229378	2018-06-28 11:39:27 +00:00
Gleb Smirnoff	b8ab659396	Check the inp_flags under inp lock. Looks like the race was hidden before, the conversion of tcbinfo to CK_LIST have uncovered it.	2018-06-27 22:01:59 +00:00
Sean Bruno	af4da58655	Enable TCP_FASTOPEN by default for FreeBSD 12. Submitted by: kbowling Reviewed by: tuexen Differential Revision: https://reviews.freebsd.org/D15959	2018-06-24 21:46:29 +00:00
Sean Bruno	45fc0718d8	Reap unused variable and assignment that had no effect. Noted by cross compiling with gcc on mips. Reviewed by: mmacy	2018-06-24 21:36:37 +00:00
Gleb Smirnoff	a00f4ac22f	Revert r334843, and partially revert r335180. tcp_outflags[] were defined since 4BSD and are defined nowadays in all its descendants. Removing them breaks third party application.	2018-06-23 06:53:53 +00:00
Randall Stewart	581a046a8b	This adds in an optimization so that we only walk one time through the mbuf chain during copy and TSO limiting. It is used by both Rack and now the FreeBSD stack. Sponsored by: Netflix Inc Differential Revision: https://reviews.freebsd.org/D15937	2018-06-21 21:03:58 +00:00
Matt Macy	e93fdbe212	raw_ip: validate inp in both loops Continuation of r335497. Also move the lock acquisition up to validate before referencing inp_cred. Reported by: pho	2018-06-21 20:18:23 +00:00
Matt Macy	3d348772e7	in_pcblookup_hash: validate inp before return Post r335356 it is possible to have an inpcb on the hash lists that is partially torn down. Validate before using. Also as a side effect of this change the lock ordering issue between hash lock and inpcb no longer exists allowing some simplification. Reported by: pho@	2018-06-21 18:40:15 +00:00
Matt Macy	e5c331cf78	raw_ip: validate inp Post r335356 it is possible to have an inpcb on the hash lists that is partially torn down. Validate before using. Reported by: pho	2018-06-21 17:24:10 +00:00
Matt Macy	46374cbf54	udp_ctlinput: don't refer to unpcb after we drop the lock Reported by: pho@	2018-06-21 06:10:52 +00:00
Randall Stewart	c6f76759ca	Make sure that the t_peakrate_thr is not compiled in by default until NF can upstream it. Reviewed by: and suggested lstewart Sponsored by: Netflix Inc.	2018-06-19 11:20:28 +00:00
Randall Stewart	f923a734b3	Move the tp set back to where it was before we started playing with the VNET sets. This way we have verified the INP settings before we go to the trouble of de-referencing it. Reviewed by: and suggested by lstewart Sponsored by: Netflix Inc.	2018-06-19 05:28:14 +00:00
Matt Macy	9e58ff6ff9	convert inpcbinfo hash and info rwlocks to epoch + mutex - Convert inpcbinfo info & hash locks to epoch for read and mutex for write - Garbage collect code that handled INP_INFO_TRY_RLOCK failures as INP_INFO_RLOCK which can no longer fail When running 64 netperfs sending minimal sized packets on a 2x8x2 reduces unhalted core cycles samples in rwlock rlock/runlock in udp_send from 51% to 3%. Overall packet throughput rate limited by CPU affinity and NIC driver design choices. On the receiver unhalted core cycles samples in in_pcblookup_hash went from 13% to to 1.6% Tested by LLNW and pho@ Reviewed by: jtl Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D15686	2018-06-19 01:54:00 +00:00
Randall Stewart	f994ead330	Move to using the inp->vnet pointer has suggested by lstewart. This is far better since the hpts system is using the inp as its basis anyway. Unfortunately his comments came late. Sponsored by: Netflix Inc.	2018-06-18 14:10:12 +00:00
Andrey V. Elsukov	20efcfc602	Switch RIB and RADIX_NODE_HEAD lock from rwlock(9) to rmlock(9). Using of rwlock with multiqueue NICs for IP forwarding on high pps produces high lock contention and inefficient. Rmlock fits better for such workloads. Reviewed by: melifaro, olivier Obtained from: Yandex LLC Sponsored by: Yandex LLC Differential Revision: https://reviews.freebsd.org/D15789	2018-06-16 08:26:23 +00:00
Michael Tuexen	43b223f42e	When retransmitting TCP SYN-ACK segments with the TCP timestamp option enabled use an updated timestamp instead of reusing the one used in the initial TCP SYN-ACK segment. This patch ensures that an updated timestamp is used when sending the SYN-ACK from the syncache code. It was already done if the SYN-ACK was retransmitted from the generic code. This makes the behaviour consistent and also conformant with the TCP specification. Reviewed by: jtl@, Jason Eggleston MFC after: 1 month Sponsored by: Neflix, Inc. Differential Revision: https://reviews.freebsd.org/D15634	2018-06-15 12:28:43 +00:00
Gleb Smirnoff	9293873e83	TCPOUTFLAGS no longer exists since r334843.	2018-06-14 22:25:10 +00:00
Michael Tuexen	33ef123090	Provide the ip6_plen in network byte order when calling ip6_output(). This is not strictly required by ip6_output(), since it overrides it, but it is needed for upcoming dtrace support.	2018-06-14 21:30:52 +00:00
Michael Tuexen	8d86bd564f	Whitespace changes.	2018-06-14 21:22:14 +00:00
Andrey V. Elsukov	eb548a1a5c	In m_megapullup() use m_getjcl() to allocate 9k or 16k mbuf when requested. It is better to try allocate a big mbuf, than just silently drop a big packet. A better solution could be reworking of libalias modules to be able use m_copydata()/m_copyback() instead of requiring the single contiguous buffer. PR: 229006 MFC after: 1 week	2018-06-14 11:15:39 +00:00
Randall Stewart	4aec110f70	This fixes several bugs that Larry Rosenman helped me find in Rack with respect to its handling of TCP Fast Open. Several fixes all related to TFO are included in this commit: 1) Handling of non-TFO retransmissions 2) Building the proper send-map when we are doing TFO 3) Dealing with the ack that comes back that includes the SYN and data. It appears that with this commit TFO now works :-) Thanks Larry for all your help!! Sponsored by: Netflix Inc. Differential Revision: https://reviews.freebsd.org/D15758	2018-06-14 03:27:42 +00:00
Matt Macy	feeef8509b	Fix PCBGROUPS build post CK conversion of pcbinfo	2018-06-13 23:19:54 +00:00
Andrey V. Elsukov	a5185adeb6	Rework if_gre(4) to use encap_lookup_t method to speedup lookup of needed interface when many gre interfaces are present. Remove rmlock from gre_softc, use epoch(9) and CK_LIST instead. Move more AF-related code into AF-related locations. Use hash table to speedup lookup of needed softc.	2018-06-13 11:11:33 +00:00
Matt Macy	483305b99c	Handle INP_FREED when looking up an inpcb When hash table lookups are not serialized with in_pcbfree it will be possible for callers to find an inpcb that has been marked free. We need to check for this and return NULL.	2018-06-13 04:23:49 +00:00
Randall Stewart	c9b4ac7587	This fixes missing VNET sets in the hpts system. Basically without this and running vnets with a TCP stack that uses some of the features is a recipe for panic (without this commit). Reported by: Larry Rosenman Sponsored by: Netflix Inc. Differential Revision: https://reviews.freebsd.org/D15757	2018-06-12 23:54:08 +00:00

1 2 3 4 5 ...

6225 Commits