freebsd-dev

Author	SHA1	Message	Date
Navdeep Parhar	b9a9e8e9bd	Fix RSS build (broken in r331309). Sponsored by: Chelsio Communications	2018-03-29 19:48:17 +00:00
Brooks Davis	69f0fecbd6	Remove infrastructure for token-ring networks. Reviewed by: cem, imp, jhb, jmallett Relnotes: yes Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D14875	2018-03-28 23:33:26 +00:00
Sean Bruno	e2041bfa7c	CC Cubic: fix underflow for cubic_cwnd() Singed calculations in cubic_cwnd() can result in negative cwnd value which is then cast to an unsigned value. Values less than 1 mss are generally bad for other parts of the code, also fixed. Submitted by: Jason Eggleston <jason@eggnet.com> Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D14141	2018-03-26 19:53:36 +00:00
Jonathan T. Looney	e24e568336	Make the TCP blackbox code committed in r331347 be an optional feature controlled by the TCP_BLACKBOX option. Enable this as part of amd64 GENERIC. For now, leave it disabled on other platforms. Sponsored by: Netflix, Inc.	2018-03-24 12:48:10 +00:00
Jonathan T. Looney	9959c8b9e6	Fix compilation for platforms that don't support atomic_fetchadd_64() after r331347. Reported by: avg, br, jhibbits Sponsored by: Netflix, Inc.	2018-03-24 12:40:45 +00:00
Sean Bruno	72bfa0bf63	Revert r331379 as the "simple" lock changes have revealed a deeper problem and need for a rethink. Submitted by: Jason Eggleston <jason@eggnet.com> Sponsored by: Limelight Networks	2018-03-23 18:34:38 +00:00
Kristof Provost	effaab8861	netpfil: Introduce PFIL_FWD flag Forwarded packets passed through PFIL_OUT, which made it difficult for firewalls to figure out if they were forwarding or producing packets. This in turn is an issue for pf for IPv6 fragment handling: it needs to call ip6_output() or ip6_forward() to handle the fragments. Figuring out which was difficult (and until now, incorrect). Having pfil distinguish the two removes an ugly piece of code from pf. Introduce a new variant of the netpfil callbacks with a flags variable, which has PFIL_FWD set for forwarded packets. This allows pf to reliably work out if a packet is forwarded. Reviewed by: ae, kevans Differential Revision: https://reviews.freebsd.org/D13715	2018-03-23 16:56:44 +00:00
Sean Bruno	2a499acf59	Simple locking fixes in ip_ctloutput, ip6_ctloutput, rip_ctloutput. Submitted by: Jason Eggleston <jason@eggnet.com> Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D14624	2018-03-22 22:29:32 +00:00
Jonathan T. Looney	2529f56ed3	Add the "TCP Blackbox Recorder" which we discussed at the developer summits at BSDCan and BSDCam in 2017. The TCP Blackbox Recorder allows you to capture events on a TCP connection in a ring buffer. It stores metadata with the event. It optionally stores the TCP header associated with an event (if the event is associated with a packet) and also optionally stores information on the sockets. It supports setting a log ID on a TCP connection and using this to correlate multiple connections that share a common log ID. You can log connections in different modes. If you are doing a coordinated test with a particular connection, you may tell the system to put it in mode 4 (continuous dump). Or, if you just want to monitor for errors, you can put it in mode 1 (ring buffer) and dump all the ring buffers associated with the connection ID when we receive an error signal for that connection ID. You can set a default mode that will be applied to a particular ratio of incoming connections. You can also manually set a mode using a socket option. This commit includes only basic probes. rrs@ has added quite an abundance of probes in his TCP development work. He plans to commit those soon. There are user-space programs which we plan to commit as ports. These read the data from the log device and output pcapng files, and then let you analyze the data (and metadata) in the pcapng files. Reviewed by: gnn (previous version) Obtained from: Netflix, Inc. Relnotes: yes Differential Revision: https://reviews.freebsd.org/D11085	2018-03-22 09:40:08 +00:00
Gleb Smirnoff	3108a71a34	Fix LINT-NOINET build initializing local to false. This is a dead code, since for NOINET build isipv6 is always true, but this dead code makes it compilable. Reported by: rpokala	2018-03-22 05:07:57 +00:00
Gleb Smirnoff	dd388cfd9b	The net.inet.tcp.nolocaltimewait=1 optimization prevents local TCP connections from entering the TIME_WAIT state. However, it omits sending the ACK for the FIN, which results in RST. This becomes a bigger deal if the sysctl net.inet.tcp.blackhole is 2. In this case RST isn't send, so the other side of the connection (also local) keeps retransmitting FINs. To fix that in tcp_twstart() we will not call tcp_close() immediately. Instead we will allocate a tcptw on stack and proceed to the end of the function all the way to tcp_twrespond(), to generate the correct ACK, then we will drop the last PCB reference. While here, make a few tiny improvements: - use bools for boolean variable - staticize nolocaltimewait - remove pointless acquisiton of socket lock Reported by: jtl Reviewed by: jtl Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D14697	2018-03-21 20:59:30 +00:00
Jonathan T. Looney	7fb2986ff6	If the INP lock is uncontested, avoid taking a reference and jumping through the lock-switching hoops. A few of the INP lookup operations that lock INPs after the lookup do so using this mechanism (to maintain lock ordering): 1. Lock lookup structure. 2. Find INP. 3. Acquire reference on INP. 4. Drop lock on lookup structure. 5. Acquire INP lock. 6. Drop reference on INP. This change provides a slightly shorter path for cases where the INP lock is uncontested: 1. Lock lookup structure. 2. Find INP. 3. Try to acquire the INP lock. 4. If successful, drop lock on lookup structure. Of course, if the INP lock is contested, the functions will need to revert to the previous way of switching locks safely. This saves a few atomic operations when the INP lock is uncontested. Discussed with: gallatin, rrs, rwatson MFC after: 2 weeks Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D12911	2018-03-21 15:54:46 +00:00
Lawrence Stewart	370efe5ac8	Add support for the experimental Internet-Draft "TCP Alternative Backoff with ECN (ABE)" proposal to the New Reno congestion control algorithm module. ABE reduces the amount of congestion window reduction in response to ECN-signalled congestion relative to the loss-inferred congestion response. More details about ABE can be found in the Internet-Draft: https://tools.ietf.org/html/draft-ietf-tcpm-alternativebackoff-ecn The implementation introduces four new sysctls: - net.inet.tcp.cc.abe defaults to 0 (disabled) and can be set to non-zero to enable ABE for ECN-enabled TCP connections. - net.inet.tcp.cc.newreno.beta and net.inet.tcp.cc.newreno.beta_ecn set the multiplicative window decrease factor, specified as a percentage, applied to the congestion window in response to a loss-based or ECN-based congestion signal respectively. They default to the values specified in the draft i.e. beta=50 and beta_ecn=80. - net.inet.tcp.cc.abe_frlossreduce defaults to 0 (disabled) and can be set to non-zero to enable the use of standard beta (50% by default) when repairing loss during an ECN-signalled congestion recovery episode. It enables a more conservative congestion response and is provided for the purposes of experimentation as a result of some discussion at IETF 100 in Singapore. The values of beta and beta_ecn can also be set per-connection by way of the TCP_CCALGOOPT TCP-level socket option and the new CC_NEWRENO_BETA or CC_NEWRENO_BETA_ECN CC algo sub-options. Submitted by: Tom Jones <tj@enoti.me> Tested by: Tom Jones <tj@enoti.me>, Grenville Armitage <garmitage@swin.edu.au> Relnotes: Yes Differential Revision: https://reviews.freebsd.org/D11616	2018-03-19 16:37:47 +00:00
Alexander V. Chernikov	1435dcd94f	Fix outgoing TCP/UDP packet drop on arp/ndp entry expiration. Current arp/nd code relies on the feedback from the datapath indicating that the entry is still used. This mechanism is incorporated into the arpresolve()/nd6_resolve() routines. After the inpcb route cache introduction, the packet path for the locally-originated packets changed, passing cached lle pointer to the ether_output() directly. This resulted in the arp/ndp entry expire each time exactly after the configured max_age interval. During the small window between the ARP/NDP request and reply from the router, most of the packets got lost. Fix this behaviour by plugging datapath notification code to the packet path used by route cache. Unify the notification code by using single inlined function with the per-AF callbacks. Reported by: sthaug at nethelp.no Reviewed by: ae MFC after: 2 weeks	2018-03-17 17:05:48 +00:00
Michael Tuexen	1574b1e41e	Set the inp_vflag consistently for accepted TCP/IPv6 connections when net.inet6.ip6.v6only=0. Without this patch, the inp_vflag would have INP_IPV4 and the INP_IPV6 flags for accepted TCP/IPv6 connections if the sysctl variable net.inet6.ip6.v6only is 0. This resulted in netstat to report the source and destination addresses as IPv4 addresses, even they are IPv6 addresses. PR: 226421 Reviewed by: bz, hiren, kib MFC after: 3 days Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D13514	2018-03-16 15:26:07 +00:00
Sean Bruno	d7fb35d13a	Update tcp_lro with tested bugfixes from Netflix and LLNW: rrs - Lets make the LRO code look for true dup-acks and window update acks fly on through and combine. rrs - Make the LRO engine a bit more aware of ack-only seq space. Lets not have it incorrectly wipe out newer acks for older acks when we have out-of-order acks (common in wifi environments). jeggleston - LRO eating window updates Based on all of the above I think we are RFC compliant doing it this way: https://tools.ietf.org/html/rfc1122 section 4.2.2.16 "Note that TCP has a heuristic to select the latest window update despite possible datagram reordering; as a result, it may ignore a window update with a smaller window than previously offered if neither the sequence number nor the acknowledgment number is increased." Submitted by: Kevin Bowling <kevin.bowling@kev009.com> Reviewed by: rstone gallatin Sponsored by: NetFlix and Limelight Networks Differential Revision: https://reviews.freebsd.org/D14540	2018-03-09 00:08:43 +00:00
Michael Tuexen	1c714531e8	When checking the TCP fast cookie length, conststently also check for the minimum length. This fixes a bug where cookies of length 2 bytes (which is smaller than the minimum length of 4) is provided by the server. Sponsored by: Netflix, Inc.	2018-02-27 22:12:38 +00:00
Patrick Kelsey	1f13c23f3d	Ensure signed comparison to avoid false trip of assert during VNET teardown. Reported by: lwhsu MFC after: 1 month	2018-02-26 20:31:16 +00:00
Patrick Kelsey	18a7530938	Greatly reduce the number of #ifdefs supporting the TCP_RFC7413 kernel option. The conditional compilation support is now centralized in tcp_fastopen.h and tcp_var.h. This doesn't provide the minimum theoretical code/data footprint when TCP_RFC7413 is disabled, but nearly all the TFO code should wind up being removed by the optimizer, the additional footprint in the syncache entries is a single pointer, and the additional overhead in the tcpcb is at the end of the structure. This enables the TCP_RFC7413 kernel option by default in amd64 and arm64 GENERIC. Reviewed by: hiren MFC after: 1 month Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D14048	2018-02-26 03:03:41 +00:00
Patrick Kelsey	c560df6f12	This is an implementation of the client side of TCP Fast Open (TFO) [RFC7413]. It also includes a pre-shared key mode of operation in which the server requires the client to be in possession of a shared secret in order to successfully open TFO connections with that server. The names of some existing fastopen sysctls have changed (e.g., net.inet.tcp.fastopen.enabled -> net.inet.tcp.fastopen.server_enable). Reviewed by: tuexen MFC after: 1 month Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D14047	2018-02-26 02:53:22 +00:00
Patrick Kelsey	798caa2ee5	Fix harmless locking bug in tfp_fastopen_check_cookie(). The keylist lock was not being acquired early enough. The only side effect of this bug is that the effective add time of a new key could be slightly later than it would have been otherwise, as seen by a TFO client. Reviewed by: tuexen MFC after: 1 month Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D14046	2018-02-26 02:43:26 +00:00
Andrey V. Elsukov	b2fe54bc8b	Reinitialize IP header length after checksum calculation. It is used later by TCP-MD5 code. This fixes the problem with broken TCP-MD5 over IPv4 when NIC has disabled TCP checksum offloading. PR: 223835 MFC after: 1 week	2018-02-10 10:13:17 +00:00
Andrey V. Elsukov	b99a682320	Rework ipfw dynamic states implementation to be lockless on fast path. o added struct ipfw_dyn_info that keeps all needed for ipfw_chk and for dynamic states implementation information; o added DYN_LOOKUP_NEEDED() macro that can be used to determine the need of new lookup of dynamic states; o ipfw_dyn_rule now becomes obsolete. Currently it used to pass information from kernel to userland only. o IPv4 and IPv6 states now described by different structures dyn_ipv4_state and dyn_ipv6_state; o IPv6 scope zones support is added; o ipfw(4) now depends from Concurrency Kit; o states are linked with "entry" field using CK_SLIST. This allows lockless lookup and protected by mutex modifications. o the "expired" SLIST field is used for states expiring. o struct dyn_data is used to keep generic information for both IPv4 and IPv6; o struct dyn_parent is used to keep O_LIMIT_PARENT information; o IPv4 and IPv6 states are stored in different hash tables; o O_LIMIT_PARENT states now are kept separately from O_LIMIT and O_KEEP_STATE states; o per-cpu dyn_hp pointers are used to implement hazard pointers and they prevent freeing states that are locklessly used by lookup threads; o mutexes to protect modification of lists in hash tables now kept in separate arrays. 65535 limit to maximum number of hash buckets now removed. o Separate lookup and install functions added for IPv4 and IPv6 states and for parent states. o By default now is used Jenkinks hash function. Obtained from: Yandex LLC MFC after: 42 days Sponsored by: Yandex LLC Differential Revision: https://reviews.freebsd.org/D12685	2018-02-07 18:59:54 +00:00
John Baldwin	f17985319d	Export tcp_always_keepalive for use by the Chelsio TOM module. This used to work by accident with ld.bfd even though always_keepalive was marked as static. LLD honors static more correctly, so export this variable properly (including moving it into the tcp_* namespace). Reviewed by: bz, emaste MFC after: 2 weeks Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D14129	2018-01-30 23:01:37 +00:00
Michael Tuexen	cf6339882e	Add constant for the PAD chunk as defined in RFC 4820. This will be used by traceroute and traceroute6 soon. MFC after: 1 week	2018-01-27 13:46:55 +00:00
Michael Tuexen	07e75d0a37	Update references in comments, since the IDs have become an RFC long time ago. Also cleanup whitespaces. No functional change. MFC after: 1 week	2018-01-27 13:43:03 +00:00
Conrad Meyer	222daa421f	style: Remove remaining deprecated MALLOC/FREE macros Mechanically replace uses of MALLOC/FREE with appropriate invocations of malloc(9) / free(9) (a series of sed expressions). Something like: * MALLOC(a, b, ... -> a = malloc(... * FREE( -> free( * free((caddr_t) -> free( No functional change. For now, punt on modifying contrib ipfilter code, leaving a definition of the macro in its KMALLOC(). Reported by: jhb Reviewed by: cy, imp, markj, rmacklem Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D14035	2018-01-25 22:25:13 +00:00
Mark Johnston	5557762b72	Use tcpinfoh_t for TCP headers in the tcp:::debug-{drop,input} probes. The header passed to these probes has some fields converted to host order by tcp_fields_to_host(), so the tcpinfo_t translator doesn't do what we want. Submitted by: Hannes Mehnert MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D12647	2018-01-25 15:35:34 +00:00
Navdeep Parhar	09b0b8c058	Do not generate illegal mbuf chains during IP fragment reassembly. Only the first mbuf of the reassembled datagram should have a pkthdr. This was discovered with cxgbe(4) + IPSEC + ping with payload more than interface MTU. cxgbe can generate !M_WRITEABLE mbufs and this results in m_unshare being called on the reassembled datagram, and it complains: panic: m_unshare: m0 0xfffff80020f82600, m 0xfffff8005d054100 has M_PKTHDR PR: 224922 Reviewed by: ae@ MFC after: 1 week Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D14009	2018-01-24 05:09:21 +00:00
Ryan Stone	fc21c53f63	Reduce code duplication for inpcb route caching Add a new macro to clear both the L3 and L2 route caches, to hopefully prevent future instances where only the L3 cache was cleared when both should have been. MFC after: 1 week Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D13989 Reviewed by: karels	2018-01-23 03:15:39 +00:00
Michael Tuexen	e9a3a1b13c	Fix a bug related to fast retransmissions. When processing a SACK advancing the cumtsn-ack in fast recovery, increment the miss-indications for all TSN's reported as missing. Thanks to Fabian Ising for finding the bug and to Timo Voelker for provinding a fix. This fix moves also CMT related initialisation of some variables to a more appropriate place. MFC after: 1 week	2018-01-16 21:58:38 +00:00
Michael Tuexen	46bf534caf	Don't provide a (meaningless) cmsg when proving a notification in a recvmsg() call. MFC after: 1 week	2018-01-15 21:59:20 +00:00
Pedro F. Giffuni	84f210952c	libalias: small memory allocation cleanups. Make the calloc wrappers behave as expected by using mallocarray. It is rather weird that the malloc wrappers also zeroes the memory: update a comment to reflect at least two cases where it is expected. Reviewed by: tuexen	2018-01-12 23:12:30 +00:00
Cy Schubert	f813e520c2	Correct the comment describing badrs which is bad router solicitiation, not bad router advertisement. MFC after: 3 days	2017-12-29 07:23:18 +00:00
Michael Tuexen	fa5867cbd6	White cleanups.	2017-12-26 16:33:55 +00:00
Michael Tuexen	f34a628e7e	Clearify CID 1008197. MFC after: 3 days	2017-12-26 16:12:04 +00:00
Michael Tuexen	0460135495	Clearify issue reported in CID 1008198. MFC after: 3 days	2017-12-26 16:06:11 +00:00
Michael Tuexen	f6ea123171	Fix CID 1008428. MFC after: 1 week	2017-12-26 15:29:11 +00:00
Michael Tuexen	4830aee72f	Fix CID 1008936.	2017-12-26 15:24:42 +00:00
Michael Tuexen	c9256941d0	Allow the first (and second) argument of sn_calloc() be a sum. This fixes a bug reported in https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=224103 PR: 224103	2017-12-26 14:37:47 +00:00
Michael Tuexen	cd90150413	When adding support for sending SCTP packets containing an ABORT chunk to ipfw in https://svnweb.freebsd.org/changeset/base/326233, a dependency on the SCTP stack was added to ipfw by accident. This was noted by Kevel Bowling in https://reviews.freebsd.org/D13594 where also a solution was suggested. This patch is based on Kevin's suggestion, but implements the required SCTP checksum computation without any dependency on other SCTP sources. While there, do some cleanups and improve comments. Thanks to Kevin Kevin Browling for reporting the issue and suggesting a fix.	2017-12-26 12:35:02 +00:00
Alexander Kabaev	151ba7933a	Do pass removing some write-only variables from the kernel. This reduces noise when kernel is compiled by newer GCC versions, such as one used by external toolchain ports. Reviewed by: kib, andrew(sys/arm and sys/arm64), emaste(partial), erj(partial) Reviewed by: jhb (sys/dev/pci/* sys/kern/vfs_aio.c and sys/kern/kern_synch.c) Differential Revision: https://reviews.freebsd.org/D10385	2017-12-25 04:48:39 +00:00
Andrey V. Elsukov	2aad62408b	Fix mbuf leak when TCPMD5_OUTPUT() method returns error. PR: 223817 MFC after: 1 week	2017-12-14 12:54:20 +00:00
Michael Tuexen	cd6340caf7	Cleaup, no functional change.	2017-12-13 17:11:57 +00:00
Gleb Smirnoff	66492fea49	Separate out send buffer autoscaling code into function, so that alternative TCP stacks may reuse it instead of pasting. Obtained from: Netflix	2017-12-07 22:36:58 +00:00
Michael Tuexen	9f0abda051	Retire SCTP_WITH_NO_CSUM option. This option was used in the early days to allow performance measurements extrapolating the use of SCTP checksum offloading. Since this feature is now available, get rid of this option. This also un-breaks the LINT kernel. Thanks to markj@ for making me aware of the problem.	2017-12-07 22:19:08 +00:00
Pedro F. Giffuni	fe267a5590	sys: general adoption of SPDX licensing ID tags. Mainly focus on files that use BSD 2-Clause license, however the tool I was using misidentified many licenses so this was mostly a manual - error prone - task. The Software Package Data Exchange (SPDX) group provides a specification to make it easier for automated tools to detect and summarize well known opensource licenses. We are gradually adopting the specification, noting that the tags are considered only advisory and do not, in any way, superceed or replace the license texts. No functional change intended.	2017-11-27 15:23:17 +00:00
Michael Tuexen	665c8a2ee5	Add to ipfw support for sending an SCTP packet containing an ABORT chunk. This is similar to the TCP case. where a TCP RST segment can be sent. There is one limitation: When sending an ABORT in response to an incoming packet, it should be tested if there is no ABORT chunk in the received packet. Currently, it is only checked if the first chunk is an ABORT chunk to avoid parsing the whole packet, which could result in a DOS attack. Thanks to Timo Voelker for helping me to test this patch. Reviewed by: bcr@ (man page part), ae@ (generic, non-SCTP part) Differential Revision: https://reviews.freebsd.org/D13239	2017-11-26 18:19:01 +00:00
Michael Tuexen	18442f0a5b	Fix SPDX line as suggested by pfg	2017-11-24 19:38:59 +00:00
Michael Tuexen	ad15e1548f	Unbreak compilation when using SCTP_DETAILED_STR_STATS option. MFC after: 1 week	2017-11-24 12:18:48 +00:00

1 2 3 4 5 ...

5888 Commits