freebsd-nq

Author	SHA1	Message	Date
Jonathan T. Looney	7fb2986ff6	If the INP lock is uncontested, avoid taking a reference and jumping through the lock-switching hoops. A few of the INP lookup operations that lock INPs after the lookup do so using this mechanism (to maintain lock ordering): 1. Lock lookup structure. 2. Find INP. 3. Acquire reference on INP. 4. Drop lock on lookup structure. 5. Acquire INP lock. 6. Drop reference on INP. This change provides a slightly shorter path for cases where the INP lock is uncontested: 1. Lock lookup structure. 2. Find INP. 3. Try to acquire the INP lock. 4. If successful, drop lock on lookup structure. Of course, if the INP lock is contested, the functions will need to revert to the previous way of switching locks safely. This saves a few atomic operations when the INP lock is uncontested. Discussed with: gallatin, rrs, rwatson MFC after: 2 weeks Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D12911	2018-03-21 15:54:46 +00:00
Lawrence Stewart	370efe5ac8	Add support for the experimental Internet-Draft "TCP Alternative Backoff with ECN (ABE)" proposal to the New Reno congestion control algorithm module. ABE reduces the amount of congestion window reduction in response to ECN-signalled congestion relative to the loss-inferred congestion response. More details about ABE can be found in the Internet-Draft: https://tools.ietf.org/html/draft-ietf-tcpm-alternativebackoff-ecn The implementation introduces four new sysctls: - net.inet.tcp.cc.abe defaults to 0 (disabled) and can be set to non-zero to enable ABE for ECN-enabled TCP connections. - net.inet.tcp.cc.newreno.beta and net.inet.tcp.cc.newreno.beta_ecn set the multiplicative window decrease factor, specified as a percentage, applied to the congestion window in response to a loss-based or ECN-based congestion signal respectively. They default to the values specified in the draft i.e. beta=50 and beta_ecn=80. - net.inet.tcp.cc.abe_frlossreduce defaults to 0 (disabled) and can be set to non-zero to enable the use of standard beta (50% by default) when repairing loss during an ECN-signalled congestion recovery episode. It enables a more conservative congestion response and is provided for the purposes of experimentation as a result of some discussion at IETF 100 in Singapore. The values of beta and beta_ecn can also be set per-connection by way of the TCP_CCALGOOPT TCP-level socket option and the new CC_NEWRENO_BETA or CC_NEWRENO_BETA_ECN CC algo sub-options. Submitted by: Tom Jones <tj@enoti.me> Tested by: Tom Jones <tj@enoti.me>, Grenville Armitage <garmitage@swin.edu.au> Relnotes: Yes Differential Revision: https://reviews.freebsd.org/D11616	2018-03-19 16:37:47 +00:00
Alexander V. Chernikov	1435dcd94f	Fix outgoing TCP/UDP packet drop on arp/ndp entry expiration. Current arp/nd code relies on the feedback from the datapath indicating that the entry is still used. This mechanism is incorporated into the arpresolve()/nd6_resolve() routines. After the inpcb route cache introduction, the packet path for the locally-originated packets changed, passing cached lle pointer to the ether_output() directly. This resulted in the arp/ndp entry expire each time exactly after the configured max_age interval. During the small window between the ARP/NDP request and reply from the router, most of the packets got lost. Fix this behaviour by plugging datapath notification code to the packet path used by route cache. Unify the notification code by using single inlined function with the per-AF callbacks. Reported by: sthaug at nethelp.no Reviewed by: ae MFC after: 2 weeks	2018-03-17 17:05:48 +00:00
Michael Tuexen	1574b1e41e	Set the inp_vflag consistently for accepted TCP/IPv6 connections when net.inet6.ip6.v6only=0. Without this patch, the inp_vflag would have INP_IPV4 and the INP_IPV6 flags for accepted TCP/IPv6 connections if the sysctl variable net.inet6.ip6.v6only is 0. This resulted in netstat to report the source and destination addresses as IPv4 addresses, even they are IPv6 addresses. PR: 226421 Reviewed by: bz, hiren, kib MFC after: 3 days Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D13514	2018-03-16 15:26:07 +00:00
Sean Bruno	d7fb35d13a	Update tcp_lro with tested bugfixes from Netflix and LLNW: rrs - Lets make the LRO code look for true dup-acks and window update acks fly on through and combine. rrs - Make the LRO engine a bit more aware of ack-only seq space. Lets not have it incorrectly wipe out newer acks for older acks when we have out-of-order acks (common in wifi environments). jeggleston - LRO eating window updates Based on all of the above I think we are RFC compliant doing it this way: https://tools.ietf.org/html/rfc1122 section 4.2.2.16 "Note that TCP has a heuristic to select the latest window update despite possible datagram reordering; as a result, it may ignore a window update with a smaller window than previously offered if neither the sequence number nor the acknowledgment number is increased." Submitted by: Kevin Bowling <kevin.bowling@kev009.com> Reviewed by: rstone gallatin Sponsored by: NetFlix and Limelight Networks Differential Revision: https://reviews.freebsd.org/D14540	2018-03-09 00:08:43 +00:00
Michael Tuexen	1c714531e8	When checking the TCP fast cookie length, conststently also check for the minimum length. This fixes a bug where cookies of length 2 bytes (which is smaller than the minimum length of 4) is provided by the server. Sponsored by: Netflix, Inc.	2018-02-27 22:12:38 +00:00
Patrick Kelsey	1f13c23f3d	Ensure signed comparison to avoid false trip of assert during VNET teardown. Reported by: lwhsu MFC after: 1 month	2018-02-26 20:31:16 +00:00
Patrick Kelsey	18a7530938	Greatly reduce the number of #ifdefs supporting the TCP_RFC7413 kernel option. The conditional compilation support is now centralized in tcp_fastopen.h and tcp_var.h. This doesn't provide the minimum theoretical code/data footprint when TCP_RFC7413 is disabled, but nearly all the TFO code should wind up being removed by the optimizer, the additional footprint in the syncache entries is a single pointer, and the additional overhead in the tcpcb is at the end of the structure. This enables the TCP_RFC7413 kernel option by default in amd64 and arm64 GENERIC. Reviewed by: hiren MFC after: 1 month Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D14048	2018-02-26 03:03:41 +00:00
Patrick Kelsey	c560df6f12	This is an implementation of the client side of TCP Fast Open (TFO) [RFC7413]. It also includes a pre-shared key mode of operation in which the server requires the client to be in possession of a shared secret in order to successfully open TFO connections with that server. The names of some existing fastopen sysctls have changed (e.g., net.inet.tcp.fastopen.enabled -> net.inet.tcp.fastopen.server_enable). Reviewed by: tuexen MFC after: 1 month Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D14047	2018-02-26 02:53:22 +00:00
Patrick Kelsey	798caa2ee5	Fix harmless locking bug in tfp_fastopen_check_cookie(). The keylist lock was not being acquired early enough. The only side effect of this bug is that the effective add time of a new key could be slightly later than it would have been otherwise, as seen by a TFO client. Reviewed by: tuexen MFC after: 1 month Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D14046	2018-02-26 02:43:26 +00:00
Andrey V. Elsukov	b2fe54bc8b	Reinitialize IP header length after checksum calculation. It is used later by TCP-MD5 code. This fixes the problem with broken TCP-MD5 over IPv4 when NIC has disabled TCP checksum offloading. PR: 223835 MFC after: 1 week	2018-02-10 10:13:17 +00:00
Andrey V. Elsukov	b99a682320	Rework ipfw dynamic states implementation to be lockless on fast path. o added struct ipfw_dyn_info that keeps all needed for ipfw_chk and for dynamic states implementation information; o added DYN_LOOKUP_NEEDED() macro that can be used to determine the need of new lookup of dynamic states; o ipfw_dyn_rule now becomes obsolete. Currently it used to pass information from kernel to userland only. o IPv4 and IPv6 states now described by different structures dyn_ipv4_state and dyn_ipv6_state; o IPv6 scope zones support is added; o ipfw(4) now depends from Concurrency Kit; o states are linked with "entry" field using CK_SLIST. This allows lockless lookup and protected by mutex modifications. o the "expired" SLIST field is used for states expiring. o struct dyn_data is used to keep generic information for both IPv4 and IPv6; o struct dyn_parent is used to keep O_LIMIT_PARENT information; o IPv4 and IPv6 states are stored in different hash tables; o O_LIMIT_PARENT states now are kept separately from O_LIMIT and O_KEEP_STATE states; o per-cpu dyn_hp pointers are used to implement hazard pointers and they prevent freeing states that are locklessly used by lookup threads; o mutexes to protect modification of lists in hash tables now kept in separate arrays. 65535 limit to maximum number of hash buckets now removed. o Separate lookup and install functions added for IPv4 and IPv6 states and for parent states. o By default now is used Jenkinks hash function. Obtained from: Yandex LLC MFC after: 42 days Sponsored by: Yandex LLC Differential Revision: https://reviews.freebsd.org/D12685	2018-02-07 18:59:54 +00:00
John Baldwin	f17985319d	Export tcp_always_keepalive for use by the Chelsio TOM module. This used to work by accident with ld.bfd even though always_keepalive was marked as static. LLD honors static more correctly, so export this variable properly (including moving it into the tcp_* namespace). Reviewed by: bz, emaste MFC after: 2 weeks Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D14129	2018-01-30 23:01:37 +00:00
Michael Tuexen	cf6339882e	Add constant for the PAD chunk as defined in RFC 4820. This will be used by traceroute and traceroute6 soon. MFC after: 1 week	2018-01-27 13:46:55 +00:00
Michael Tuexen	07e75d0a37	Update references in comments, since the IDs have become an RFC long time ago. Also cleanup whitespaces. No functional change. MFC after: 1 week	2018-01-27 13:43:03 +00:00
Conrad Meyer	222daa421f	style: Remove remaining deprecated MALLOC/FREE macros Mechanically replace uses of MALLOC/FREE with appropriate invocations of malloc(9) / free(9) (a series of sed expressions). Something like: * MALLOC(a, b, ... -> a = malloc(... * FREE( -> free( * free((caddr_t) -> free( No functional change. For now, punt on modifying contrib ipfilter code, leaving a definition of the macro in its KMALLOC(). Reported by: jhb Reviewed by: cy, imp, markj, rmacklem Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D14035	2018-01-25 22:25:13 +00:00
Mark Johnston	5557762b72	Use tcpinfoh_t for TCP headers in the tcp:::debug-{drop,input} probes. The header passed to these probes has some fields converted to host order by tcp_fields_to_host(), so the tcpinfo_t translator doesn't do what we want. Submitted by: Hannes Mehnert MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D12647	2018-01-25 15:35:34 +00:00
Navdeep Parhar	09b0b8c058	Do not generate illegal mbuf chains during IP fragment reassembly. Only the first mbuf of the reassembled datagram should have a pkthdr. This was discovered with cxgbe(4) + IPSEC + ping with payload more than interface MTU. cxgbe can generate !M_WRITEABLE mbufs and this results in m_unshare being called on the reassembled datagram, and it complains: panic: m_unshare: m0 0xfffff80020f82600, m 0xfffff8005d054100 has M_PKTHDR PR: 224922 Reviewed by: ae@ MFC after: 1 week Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D14009	2018-01-24 05:09:21 +00:00
Ryan Stone	fc21c53f63	Reduce code duplication for inpcb route caching Add a new macro to clear both the L3 and L2 route caches, to hopefully prevent future instances where only the L3 cache was cleared when both should have been. MFC after: 1 week Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D13989 Reviewed by: karels	2018-01-23 03:15:39 +00:00
Michael Tuexen	e9a3a1b13c	Fix a bug related to fast retransmissions. When processing a SACK advancing the cumtsn-ack in fast recovery, increment the miss-indications for all TSN's reported as missing. Thanks to Fabian Ising for finding the bug and to Timo Voelker for provinding a fix. This fix moves also CMT related initialisation of some variables to a more appropriate place. MFC after: 1 week	2018-01-16 21:58:38 +00:00
Michael Tuexen	46bf534caf	Don't provide a (meaningless) cmsg when proving a notification in a recvmsg() call. MFC after: 1 week	2018-01-15 21:59:20 +00:00
Pedro F. Giffuni	84f210952c	libalias: small memory allocation cleanups. Make the calloc wrappers behave as expected by using mallocarray. It is rather weird that the malloc wrappers also zeroes the memory: update a comment to reflect at least two cases where it is expected. Reviewed by: tuexen	2018-01-12 23:12:30 +00:00
Cy Schubert	f813e520c2	Correct the comment describing badrs which is bad router solicitiation, not bad router advertisement. MFC after: 3 days	2017-12-29 07:23:18 +00:00
Michael Tuexen	fa5867cbd6	White cleanups.	2017-12-26 16:33:55 +00:00
Michael Tuexen	f34a628e7e	Clearify CID 1008197. MFC after: 3 days	2017-12-26 16:12:04 +00:00
Michael Tuexen	0460135495	Clearify issue reported in CID 1008198. MFC after: 3 days	2017-12-26 16:06:11 +00:00
Michael Tuexen	f6ea123171	Fix CID 1008428. MFC after: 1 week	2017-12-26 15:29:11 +00:00
Michael Tuexen	4830aee72f	Fix CID 1008936.	2017-12-26 15:24:42 +00:00
Michael Tuexen	c9256941d0	Allow the first (and second) argument of sn_calloc() be a sum. This fixes a bug reported in https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=224103 PR: 224103	2017-12-26 14:37:47 +00:00
Michael Tuexen	cd90150413	When adding support for sending SCTP packets containing an ABORT chunk to ipfw in https://svnweb.freebsd.org/changeset/base/326233, a dependency on the SCTP stack was added to ipfw by accident. This was noted by Kevel Bowling in https://reviews.freebsd.org/D13594 where also a solution was suggested. This patch is based on Kevin's suggestion, but implements the required SCTP checksum computation without any dependency on other SCTP sources. While there, do some cleanups and improve comments. Thanks to Kevin Kevin Browling for reporting the issue and suggesting a fix.	2017-12-26 12:35:02 +00:00
Alexander Kabaev	151ba7933a	Do pass removing some write-only variables from the kernel. This reduces noise when kernel is compiled by newer GCC versions, such as one used by external toolchain ports. Reviewed by: kib, andrew(sys/arm and sys/arm64), emaste(partial), erj(partial) Reviewed by: jhb (sys/dev/pci/* sys/kern/vfs_aio.c and sys/kern/kern_synch.c) Differential Revision: https://reviews.freebsd.org/D10385	2017-12-25 04:48:39 +00:00
Andrey V. Elsukov	2aad62408b	Fix mbuf leak when TCPMD5_OUTPUT() method returns error. PR: 223817 MFC after: 1 week	2017-12-14 12:54:20 +00:00
Michael Tuexen	cd6340caf7	Cleaup, no functional change.	2017-12-13 17:11:57 +00:00
Gleb Smirnoff	66492fea49	Separate out send buffer autoscaling code into function, so that alternative TCP stacks may reuse it instead of pasting. Obtained from: Netflix	2017-12-07 22:36:58 +00:00
Michael Tuexen	9f0abda051	Retire SCTP_WITH_NO_CSUM option. This option was used in the early days to allow performance measurements extrapolating the use of SCTP checksum offloading. Since this feature is now available, get rid of this option. This also un-breaks the LINT kernel. Thanks to markj@ for making me aware of the problem.	2017-12-07 22:19:08 +00:00
Pedro F. Giffuni	fe267a5590	sys: general adoption of SPDX licensing ID tags. Mainly focus on files that use BSD 2-Clause license, however the tool I was using misidentified many licenses so this was mostly a manual - error prone - task. The Software Package Data Exchange (SPDX) group provides a specification to make it easier for automated tools to detect and summarize well known opensource licenses. We are gradually adopting the specification, noting that the tags are considered only advisory and do not, in any way, superceed or replace the license texts. No functional change intended.	2017-11-27 15:23:17 +00:00
Michael Tuexen	665c8a2ee5	Add to ipfw support for sending an SCTP packet containing an ABORT chunk. This is similar to the TCP case. where a TCP RST segment can be sent. There is one limitation: When sending an ABORT in response to an incoming packet, it should be tested if there is no ABORT chunk in the received packet. Currently, it is only checked if the first chunk is an ABORT chunk to avoid parsing the whole packet, which could result in a DOS attack. Thanks to Timo Voelker for helping me to test this patch. Reviewed by: bcr@ (man page part), ae@ (generic, non-SCTP part) Differential Revision: https://reviews.freebsd.org/D13239	2017-11-26 18:19:01 +00:00
Michael Tuexen	18442f0a5b	Fix SPDX line as suggested by pfg	2017-11-24 19:38:59 +00:00
Michael Tuexen	ad15e1548f	Unbreak compilation when using SCTP_DETAILED_STR_STATS option. MFC after: 1 week	2017-11-24 12:18:48 +00:00
Michael Tuexen	b7d2b5d5b1	Add SPDX line.	2017-11-24 11:25:53 +00:00
Mark Johnston	7a5c730561	Use the right variable for the IP header parameter to tcp:::send. This addresses a regression from r311225. MFC after: 1 week	2017-11-22 14:13:40 +00:00
Pedro F. Giffuni	51369649b0	sys: further adoption of SPDX licensing ID tags. Mainly focus on files that use BSD 3-Clause license. The Software Package Data Exchange (SPDX) group provides a specification to make it easier for automated tools to detect and summarize well known opensource licenses. We are gradually adopting the specification, noting that the tags are considered only advisory and do not, in any way, superceed or replace the license texts. Special thanks to Wind River for providing access to "The Duke of Highlander" tool: an older (2014) run over FreeBSD tree was useful as a starting point.	2017-11-20 19:43:44 +00:00
Michael Tuexen	3e87bccde3	Fix the handling of ERROR chunks which a lot of error causes. While there, clean up the code. Thanks to Felix Weinrank who found the bug by using fuzz-testing the SCTP userland stack. MFC after: 1 week	2017-11-15 22:13:10 +00:00
Michael Tuexen	d0f6ab7920	Simply the code and use the full buffer for contigous chunk representation. MFC after: 1 week	2017-11-14 02:30:21 +00:00
Gleb Smirnoff	3e21cbc802	Style r320614: don't initialize at declaration, new line after declarations, shorten variable name to avoid extra long lines. No functional changes.	2017-11-13 22:16:47 +00:00
Michael Tuexen	469a65d1f8	Cleanup the handling of control chunks. While there fix some minor bug related to clearing the assoc retransmit counter and the dup TSN handling of NR-SACK chunks. MFC after: 3 days	2017-11-12 21:43:33 +00:00
Konstantin Belousov	06193f0be0	Use hardware timestamps to report packet timestamps for SO_TIMESTAMP and other similar socket options. Provide new control message SCM_TIME_INFO to supply information about timestamp. Currently it indicates that the timestamp was hardware-assisted and high-precision, for software timestamps the message is not returned. Reserved fields are added to ABI to report additional info about it, it is expected that raw hardware clock value might be useful for some applications. Reviewed by: gallatin (previous version), hselasky Sponsored by: Mellanox Technologies MFC after: 2 weeks X-Differential revision: https://reviews.freebsd.org/D12638	2017-11-07 09:46:26 +00:00
Michael Tuexen	253a63b817	Fix an accounting bug where data was counted twice if on the read queue and on the ordered or unordered queue. While there, improve the checking in INVARIANTs when computing the a_rwnd. MFC after: 3 days	2017-11-05 11:59:33 +00:00
Michael Tuexen	28a6adde1d	Allow the setting of the MTU for future paths using an SCTP socket option. This functionality was missing. MFC after: 1 week	2017-11-03 20:46:12 +00:00
Michael Tuexen	ba5fc4cf78	Fix the reporting of the MTU for SCTP sockets when using IPv6. MFC after: 1 week	2017-11-01 16:32:11 +00:00
Michael Tuexen	966dfbf910	Fix parsing error when processing cmsg in SCTP send calls. Thei bug is related to a signed/unsigned mismatch. This should most likely fix the issue in sctp_sosend reported by Dmitry Vyukov on the freebsd-hackers mailing list and found by running syzkaller.	2017-10-27 19:27:05 +00:00
Michael Tuexen	8d9b040dd4	Fix a bug reported by Felix Weinrank using the libfuzzer on the userland stack. MFC after: 3 days	2017-10-25 09:12:22 +00:00
Michael Tuexen	701492a5f6	Fix a bug in handling special ABORT chunks. Thanks to Felix Weinrank for finding this issue using libfuzzer with the userland stack. MFC after: 3 days	2017-10-24 16:24:12 +00:00
Michael Tuexen	adc59f7f46	Fix a locking issue found by running AFL on the userland stack. Thanks to Felix Weinrank for reporting the issue. MFC after: 3 days	2017-10-24 14:28:56 +00:00
Alexander Motin	81098a018e	Relax per-ifnet cif_vrs list double locking in carp(4). In all cases where cif_vrs list is modified, two locks are held: per-ifnet CIF_LOCK and global carp_sx. It means to read that list only one of them is enough to be held, so we can skip CIF_LOCK when we already have carp_sx. This fixes kernel panic, caused by attempts of copyout() to sleep while holding non-sleepable CIF_LOCK mutex. Discussed with: glebius MFC after: 2 weeks Sponsored by: iXsystems, Inc.	2017-10-19 09:01:15 +00:00
Michael Tuexen	af03054c8a	Fix a signed/unsigned warning. MFC after: 1 week	2017-10-18 21:08:35 +00:00
Michael Tuexen	7f75695a3e	Abort an SCTP association, when a DATA chunk is followed by an unknown chunk with a length smaller than the minimum length. Thanks to Felix Weinrank for making me aware of the problem. MFC after: 3 days	2017-10-18 20:17:44 +00:00
Michael Tuexen	0d5af38ceb	Revert change which got in accidently.	2017-10-18 18:59:35 +00:00
Michael Tuexen	3ed8d364a7	Fix a bug introduced in r324638. Thanks to Felix Weinrank for making me aware of this. MFC after: 3 days	2017-10-18 18:56:56 +00:00
Michael Tuexen	80a2d1406f	Fix the handling of parital and too short chunks. Ensure that the current behaviour is consistent: stop processing of the chunk, but finish the processing of the previous chunks. This behaviour might be changed in a later commit to ABORT the assoication due to a protocol violation, but changing this is a separate issue. MFC after: 3 days	2017-10-15 19:33:30 +00:00
Michael Tuexen	8c8e10b763	Code cleanup, not functional change. This avoids taking a pointer of a packed structure which allows simpler compilation of the userland stack. MFC after: 1 week	2017-10-14 10:02:59 +00:00
Gleb Smirnoff	3bdf4c4274	Declare more TCP globals in tcp_var.h, so that alternative TCP stacks can use them. Gather all TCP tunables in tcp_var.h in one place and alphabetically sort them, to ease maintainance of the list. Don't copy and paste declarations in tcp_stacks/fastpath.c.	2017-10-11 20:36:09 +00:00
Gleb Smirnoff	e29c55e4bb	Declare pmtud_blackhole global variables in tcp_timer.h, so that alternative TCP stacks can legally use them.	2017-10-06 20:33:40 +00:00
Michael Tuexen	ff76c8c9fd	Ensure that the accept ABORT chunks with the T-bit set only the a non-zero matching peer tag is provided. MFC after: 1 week	2017-10-05 13:29:54 +00:00
Julien Charbon	5fcd2d9bfc	Forgotten bits in r324179: Include sys/syslog.h if INVARIANTS is not defined MFC after: 1 week X-MFC with: r324179 Pointy hat to: jch	2017-10-02 09:45:17 +00:00
Patrick Kelsey	3f43239f21	The soisconnected() call removed from syncache_socket() in r307966 was not extraneous in the TCP Fast Open (TFO) passive-open case. In the TFO passive-open case, syncache_socket() is being called during processing of a TFO SYN bearing a valid cookie, and a call to soisconnected() is required in order to allow the application to immediately consume any data delivered in the SYN and to have a chance to generate response data to accompany the SYN-ACK. The removal of this call to soisconnected() effectively converted all TFO passive opens to having the same RTT cost as a standard 3WHS. This commit adds a call to soisconnected() to syncache_tfo_expand() so that it is only in the TFO passive-open path, thereby restoring TFO passve-open RTT performance and preserving the non-TFO connection-rate performance gains realized by r307966. MFC after: 1 week Sponsored by: Limelight Networks	2017-10-01 23:37:17 +00:00
Julien Charbon	dfa1f80ce9	Fix an infinite loop in tcp_tw_2msl_scan() when an INP_TIMEWAIT inp has been destroyed before its tcptw with INVARIANTS undefined. This is a symmetric change of r307551: A INP_TIMEWAIT inp should not be destroyed before its tcptw, and INVARIANTS will catch this case. If INVARIANTS is undefined it will emit a log(LOG_ERR) and avoid a hard to debug infinite loop in tcp_tw_2msl_scan(). Reported by: Ben Rubson, hselasky Submitted by: hselasky Tested by: Ben Rubson, jch MFC after: 1 week Sponsored by: Verisign, inc Differential Revision: https://reviews.freebsd.org/D12267	2017-10-01 21:20:28 +00:00
Andrey V. Elsukov	f415d666c3	Some mbuf related fixes in icmp_error() * check mbuf length before doing mtod() and accessing to IP header; * update oip pointer and all depending pointers after m_pullup(); * remove extra checks and extra parentheses, wrap long lines; PR: 222670 Reported by: Prabhakar Lakhera MFC after: 1 week	2017-09-29 06:24:45 +00:00
Michael Tuexen	09c53cb6cc	Remove unused function. MFC after: 1 week	2017-09-27 13:05:23 +00:00
Sepherosa Ziehau	fc572e261f	tcp: Don't "negotiate" MSS. _NO_ OSes actually "negotiate" MSS. RFC 879: "... This Maximum Segment Size (MSS) announcement (often mistakenly called a negotiation) ..." This negotiation behaviour was introduced 11 years ago by r159955 without any explaination about why FreeBSD had to "negotiate" MSS: In syncache_respond() do not reply with a MSS that is larger than what the peer announced to us but make it at least tcp_minmss in size. Sponsored by: TCP/IP Optimization Fundraise 2005 The tcp_minmss behaviour is still kept. Syncookie fix was prodded by tuexen, who also helped to test this patch w/ packetdrill. Reviewed by: tuexen, karels, bz (previous version) MFC after: 2 week Sponsored by: Microsoft Differential Revision: https://reviews.freebsd.org/D12430	2017-09-27 05:52:37 +00:00
Michael Tuexen	d28a3a393b	Add missing locking. Found by Coverity while scanning the usrsctp library. MFC after: 1 week	2017-09-22 06:33:01 +00:00
Michael Tuexen	afb908dada	Add missing socket lock. MFC after: 1 week	2017-09-22 06:07:47 +00:00
Michael Tuexen	cdd2d7d4a5	Code cleanup, no functional change. MFC after: 1 week	2017-09-21 11:56:31 +00:00
Michael Tuexen	53999485e0	Free the control structure after using is, not before. Found by Coverity while scanning the usrsctp library. MFC after: 1 week	2017-09-21 09:47:56 +00:00
Michael Tuexen	d0d8c7de19	No need to wakeup, since sctp_add_to_readq() does it. MFC after: 1 week	2017-09-21 09:18:05 +00:00
Michael Tuexen	2c62ba7377	Protect the address workqueue timer by a mutex. MFC after: 1 week	2017-09-20 21:29:54 +00:00
Michael Tuexen	3ec509bcd3	Fix a warning. MFC after: 1 week	2017-09-19 20:24:13 +00:00
Michael Tuexen	564a95f485	Avoid an overflow when computing the staleness. This issue was found by running libfuzz on the userland stack. MFC after: 1 week	2017-09-19 20:09:58 +00:00
Michael Tuexen	ad608f06ed	Remove a no longer used variable. Reported by: Felix Weinrank MFC after: 1 week	2017-09-19 15:00:19 +00:00
Michael Tuexen	72e23aba22	Fix an accounting bug and use sctp_timer_start to start a timer. MFC after: 1 week	2017-09-17 09:27:27 +00:00
Michael Tuexen	fe40f49bb3	Remove code not used on any platform currently supported. MFC after: 1 week	2017-09-16 21:26:06 +00:00
Michael Tuexen	292efb1bc0	Export the UDP encapsualation port and the path state.	2017-09-12 21:08:50 +00:00
Michael Tuexen	e5cccc35c3	Add support to print the TCP stack being used. Sponsored by: Netflix, Inc.	2017-09-12 13:34:43 +00:00
Michael Tuexen	7b5f06fbcc	Fix MTU computation. Coverity scanning usrsctp pointed to this code... MFC after: 3 days	2017-09-09 21:03:40 +00:00
Michael Tuexen	f55c326691	Fix locking issues found by Coverity scanning the usrsctp library. MFC after: 3 days	2017-09-09 20:44:56 +00:00
Michael Tuexen	0c4622dab2	Silence a Coverity warning from scanning the usrsctp library. MFC after: 3 days	2017-09-09 20:08:26 +00:00
Michael Tuexen	6c2cfc0419	Savely remove a chunk from the control queue. This bug was found by Coverity scanning the usrsctp library. MFC after: 3 days	2017-09-09 19:49:50 +00:00
Hans Petter Selasky	95ed5015ec	Add support for generic backpressure indicator for ratelimited transmit queues aswell as non-ratelimited ones. Add the required structure bits in order to support a backpressure indication with ratelimited connections aswell as non-ratelimited ones. The backpressure indicator is a value between zero and 65535 inclusivly, indicating if the destination transmit queue is empty or full respectivly. Applications can use this value as a decision point for when to stop transmitting data to avoid endless ENOBUFS error codes upon transmitting an mbuf. This indicator is also useful to reduce the latency for ratelimited queues. Reviewed by: gallatin, kib, gnn Differential Revision: https://reviews.freebsd.org/D11518 Sponsored by: Mellanox Technologies	2017-09-06 13:56:18 +00:00
Michael Tuexen	3d5af7a127	Fix blackhole detection. There were two bugs related to the blackhole detection: * The smalles size was tried more than two times. * The restored MSS was not the original one, but the second candidate. MFC after: 1 week Sponsored by: Netflix, Inc.	2017-08-28 11:41:18 +00:00
Sean Bruno	32a04bb81d	Use counter(9) for PLPMTUD counters. Remove unused PLPMTUD sysctl counters. Bump UPDATING and FreeBSD Version to indicate a rebuild is required. Submitted by: kevin.bowling@kev009.com Reviewed by: jtl Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D12003	2017-08-25 19:41:38 +00:00
Michael Tuexen	e8aba2eb21	Avoid TCP log messages which are false positives. This is https://svnweb.freebsd.org/changeset/base/322812, just for alternate TCP stacks. XMFC with: 322812	2017-08-23 15:08:51 +00:00
Michael Tuexen	4da9052a05	Avoid TCP log messages which are false positives. The check for timestamps are too early to handle SYN-ACK correctly. So move it down after the corresponing processing has been done. PR: 216832 Obtained from: antonfb@hesiod.org MFC after: 1 week	2017-08-23 14:50:08 +00:00
Michael Tuexen	63ec505a4f	Ensure inp_vflag is consistently set for TCP endpoints. Make sure that the flags INP_IPV4 and INP_IPV6 are consistently set for inpcbs used for TCP sockets, no matter if the setting is derived from the net.inet6.ip6.v6only sysctl or the IPV6_V6ONLY socket option. For UDP this was already done right. PR: 221385 MFC after: 1 week	2017-08-18 07:27:15 +00:00
Oleg Bulyzhin	ff21796d25	Fix comment typo.	2017-08-09 10:46:34 +00:00
Dag-Erling Smørgrav	d80fbbee5f	Correct sysctl names.	2017-08-09 07:24:58 +00:00
Bjoern A. Zeeb	ae69ad884d	After inpcb route caching was put back in place there is no need for flowtable anymore (as flowtable was never considered to be useful in the forwarding path). Reviewed by: np Differential Revision: https://reviews.freebsd.org/D11448	2017-07-27 13:03:36 +00:00
Ed Maste	1edbb54fe9	cc_cubic: restore braces around if-condition block r307901 was reverted in r321480, restoring an incorrect block delimitation bug present in the original cc_cubic commit. Restore only the bugfix (brace addition) from r307901. CID: 1090182 Approved by: sbruno	2017-07-26 21:23:09 +00:00
Sean Bruno	43053c125a	Revert r307901 - Inform CC modules about loss events. This was discussed between various transport@ members and it was requested to be reverted and discussed. Submitted by: Kevin Bowling <kevin.bowling@kev009.com> Reported by: lawrence Reviewed by: hiren Sponsored by: Limelight Networks	2017-07-25 15:08:52 +00:00
Sean Bruno	5d53981a18	Revert r308180 - Set slow start threshold more accurrately on loss ... This was discussed between various transport@ members and it was requested to be reverted and discussed. Submitted by: kevin Reported by: lawerence Reviewed by: hiren	2017-07-25 15:03:05 +00:00
Michael Tuexen	e5a9c519bc	Remove duplicate statement.	2017-07-25 11:05:53 +00:00
Michael Tuexen	9dd6ca9602	Deal with listening socket correctly.	2017-07-20 14:50:13 +00:00
Michael Tuexen	bbc9dfbc08	Fix the explicit EOR mode. If the final messages is not complete, send an ABORT. Joint work with rrs@ MFC after: 1 week	2017-07-20 11:09:33 +00:00
Michael Tuexen	1f76872c36	Avoid shadowed variables. MFC after: 1 week	2017-07-19 15:12:23 +00:00
Michael Tuexen	5ba7f91f9d	Use memset/memcpy instead of bzero/bcopy. Just use one variant instead of both. Use the memset/memcpy ones since they cause less problems in crossplatform deployment. MFC after: 1 week	2017-07-19 14:28:58 +00:00
Michael Tuexen	28cd0699b6	Fix the accounting and add code to detect errors in accounting. Joint work with rrs@ MFC after: 1 week	2017-07-19 12:27:40 +00:00
Michael Tuexen	d32ed2c735	Fix the handling of Explicit EOR mode. While there, appropriately handle the overhead depending on the usage of DATA or I-DATA chunks. Take the overhead only into account, when required. Joint work with rrs@ MFC after: 1 week	2017-07-15 19:54:03 +00:00
Konstantin Belousov	5cead59181	Correct sysent flags for dynamically loaded syscalls. Using the https://github.com/google/capsicum-test/ suite, the PosixMqueue.CapModeForked test was failing due to an ECAPMODE after calling kmq_notify(). On further inspection, the dynamically loaded syscall entry was initialized with sy_flags zeroed out, since SYSCALL_INIT_HELPER() left sysent.sy_flags with the default value. Add a new helper SYSCALL{,32}_INIT_HELPER_F() which takes an additional argument to specify the sy_flags value. Submitted by: Siva Mahadevan <smahadevan@freebsdfoundation.org> Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D11576	2017-07-14 09:34:44 +00:00
Jonathan T. Looney	cb503ae22d	Don't overpromote values when calculating len in tcp_output(). sbavail() returns u_int and sendwin is a uint32_t. Therefore, min() (which operates on two u_int values) is able to correctly calculate the minimum of these two arguments. Reported by: rrs MFC after: 1 week Sponsored by: Netflix	2017-07-05 16:10:30 +00:00
Michael Tuexen	1698cbd919	Move to open state after plausibility checks. When doing this too early, the MIB counters go wrong. MFC after: 1 week	2017-07-04 18:24:50 +00:00
Michael Tuexen	afffa1a9ad	Don't hold if refcount on an stcb when it is not needed. This improves the consistency with other parts of the code.	2017-07-04 18:04:44 +00:00
Sean Bruno	ac952dd274	Add a sysctl to toggle the use of the sockets LOWAT when calculating auto window growth Submitted by: j@nitrology.com (Jason Wolfe) Reviewed by: gnn hiren Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D11016	2017-07-03 19:39:58 +00:00
Michael Tuexen	f4358911bf	Handle sctp_get_next_param() in a consistent way. This addresses an issue found by Felix Weinrank using libfuzz. While there, use also consistent nameing. MFC after: 3 days	2017-06-23 21:01:57 +00:00
Michael Tuexen	d44b45df2c	Check the length of a COOKIE chunk before accessing fields in it. Thanks to Felix Weinrank for reporting the issue he found by using libFuzzer. MFC after: 3 days	2017-06-23 10:09:49 +00:00
Michael Tuexen	1a7abbb3be	Use a longer buffer for messages in ERROR chunks. This allows them to be sent in a non truncated way and addresses a warning given by newver versions of gcc. Thanks to Anselm Jonas Scholl for reporting it and providing a patch.	2017-06-23 09:27:31 +00:00
Michael Tuexen	94f66d603a	Honor the backlog field.	2017-06-23 08:35:54 +00:00
Michael Tuexen	3017b21bb6	Improve compilation on platforms different from FreeBSD.	2017-06-23 08:34:01 +00:00
Gleb Smirnoff	779f106aa1	Listening sockets improvements. o Separate fields of struct socket that belong to listening from fields that belong to normal dataflow, and unionize them. This shrinks the structure a bit. - Take out selinfo's from the socket buffers into the socket. The first reason is to support braindamaged scenario when a socket is added to kevent(2) and then listen(2) is cast on it. The second reason is that there is future plan to make socket buffers pluggable, so that for a dataflow socket a socket buffer can be changed, and in this case we also want to keep same selinfos through the lifetime of a socket. - Remove struct struct so_accf. Since now listening stuff no longer affects struct socket size, just move its fields into listening part of the union. - Provide sol_upcall field and enforce that so_upcall_set() may be called only on a dataflow socket, which has buffers, and for listening sockets provide solisten_upcall_set(). o Remove ACCEPT_LOCK() global. - Add a mutex to socket, to be used instead of socket buffer lock to lock fields of struct socket that don't belong to a socket buffer. - Allow to acquire two socket locks, but the first one must belong to a listening socket. - Make soref()/sorele() to use atomic(9). This allows in some situations to do soref() without owning socket lock. There is place for improvement here, it is possible to make sorele() also to lock optionally. - Most protocols aren't touched by this change, except UNIX local sockets. See below for more information. o Reduce copy-and-paste in kernel modules that accept connections from listening sockets: provide function solisten_dequeue(), and use it in the following modules: ctl(4), iscsi(4), ng_btsocket(4), ng_ksocket(4), infiniband, rpc. o UNIX local sockets. - Removal of ACCEPT_LOCK() global uncovered several races in the UNIX local sockets. Most races exist around spawning a new socket, when we are connecting to a local listening socket. To cover them, we need to hold locks on both PCBs when spawning a third one. This means holding them across sonewconn(). This creates a LOR between pcb locks and unp_list_lock. - To fix the new LOR, abandon the global unp_list_lock in favor of global unp_link_lock. Indeed, separating these two locks didn't provide us any extra parralelism in the UNIX sockets. - Now call into uipc_attach() may happen with unp_link_lock hold if, we are accepting, or without unp_link_lock in case if we are just creating a socket. - Another problem in UNIX sockets is that uipc_close() basicly did nothing for a listening socket. The vnode remained opened for connections. This is fixed by removing vnode in uipc_close(). Maybe the right way would be to do it for all sockets (not only listening), simply move the vnode teardown from uipc_detach() to uipc_close()? Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D9770	2017-06-08 21:30:34 +00:00
Jonathan T. Looney	dc6a41b936	Add the infrastructure to support loading multiple versions of TCP stack modules. It adds support for mangling symbols exported by a module by prepending a string to them. (This avoids overlapping symbols in the kernel linker.) It allows the use of a macro as the module name in the DECLARE_MACRO() and MACRO_VERSION() macros. It allows the code to register stack aliases (e.g. both a generic name ["default"] and version-specific name ["default_10_3p1"]). With these changes, it is trivial to compile TCP stack modules with the name defined in the Makefile and to load multiple versions of the same stack simultaneously. This functionality can be used to enable side-by-side testing of an old and new version of the same TCP stack. It also could support upgrading the TCP stack without a reboot. Reviewed by: gnn, sjg (makefiles only) Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D11086	2017-06-08 20:41:28 +00:00
Gleb Smirnoff	3acfe1e1b0	This code was missing socket unlock and socket buffer lock, but it worked since right now these two locks are the same.	2017-06-08 06:37:11 +00:00
Gleb Smirnoff	12d8a8e7a3	The desired lock here is socket buffer, not socket. Right now they match, but won't in future.	2017-06-08 06:34:09 +00:00
Michael Tuexen	8cb5a8e90a	Fix the ICMP6 handling for TCP. The ICMP6 packets might not be contained in a single mbuf. So don't assume this. Keep the IPv4 and IPv6 code in sync and make explicit that the syncache code only need the TCP sequence number, not the complete TCP header. MFC after: 3 days Sponsored by: Netflix, Inc.	2017-06-03 21:53:58 +00:00
Michael Tuexen	98732609d5	Improve comments to describe what the code does. Reported by: jtl Sponsored by: Netflix, Inc.	2017-06-01 15:11:18 +00:00
Jonathan T. Looney	382a6bbcf1	Enforce the limit on ICMP messages before doing work to formulate the response. Delete an unneeded rate limit for UDP under IPv6. Because ICMP6 messages have their own rate limit, it is unnecessary to apply a second rate limit to UDP messages. Reviewed by: glebius MFC after: 2 weeks Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D10387	2017-05-30 14:32:44 +00:00
Michael Tuexen	5d08768a2b	Use the SCTP_PCB_FLAGS_ACCEPTING flags to check for listeners. While there, use a macro for checking the listen state to allow for easier changes if required. This done to help glebius@ with his listen changes.	2017-05-26 16:29:00 +00:00
Gleb Smirnoff	6a6cefac3d	o Rearrange struct inpcb fields to optimize the TCP output code path considering cache line hits and misses. Put the lock and hash list glue into the first cache line, put inp_refcount inp_flags inp_socket into the second cache line. o On allocation zero out entire structure except the lock and list entries, including inp_route inp_lle inp_gencnt. When inp_route and inp_lle were introduced, they were added below inp_zero_size, resulting on not being cleared after free/alloc. This definitely was a source of bugs with route caching. Could be that r315956 has just fixed one of them. The inp_gencnt is reinitialized on every alloc, so it is safe to clear it. This has been proved to improve TCP performance at Netflix. Obtained from: rrs Differential Revision: D10686	2017-05-24 17:47:16 +00:00
Michael Tuexen	5dba6ada91	The connect() system call should return -1 and set errno to EAFNOSUPPORT if it is called on a TCP socket * with an IPv6 address and the socket is bound to an IPv4-mapped IPv6 address. * with an IPv4-mapped IPv6 address and the socket is bound to an IPv6 address. Thanks to Jonathan T. Leighton for reporting this issue. Reviewed by: bz gnn MFC after: 3 days Differential Revision: https://reviews.freebsd.org/D9163	2017-05-22 15:29:10 +00:00
Andrey V. Elsukov	38cc96a887	Set M_BCAST and M_MCAST flags on mbuf sent via divert socket. r290383 has changed how mbufs sent by divert socket are handled. Previously they are always handled by slow path processing in ip_input(). Now ip_tryforward() is invoked from ip_input() before in_broadcast() check. Since diverted packet lost all mbuf flags, it passes the broadcast check in ip_tryforward() due to missing M_BCAST flag. In the result the broadcast packet is forwarded to the wire instead of be consumed by network stack. Add in_broadcast() check to the div_output() function. And restore the M_BCAST flag if destination address is broadcast for the given network interface. PR: 209491 MFC after: 1 week	2017-05-17 09:04:09 +00:00
Ed Maste	3e85b721d6	Remove register keyword from sys/ and ANSIfy prototypes A long long time ago the register keyword told the compiler to store the corresponding variable in a CPU register, but it is not relevant for any compiler used in the FreeBSD world today. ANSIfy related prototypes while here. Reviewed by: cem, jhb Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D10193	2017-05-17 00:34:34 +00:00
Gleb Smirnoff	cc487c1697	Reduce in_pcbinfo_init() by two params. No users supply any flags to this function (they used to say UMA_ZONE_NOFREE), so flag parameter goes away. The zone_fini parameter also goes away. Previously no protocols (except divert) supplied zone_fini function, so inpcb locks were leaked with slabs. This was okay while zones were allocated with UMA_ZONE_NOFREE flag, but now this is a leak. Fix that by suppling inpcb_fini() function as fini method for all inpcb zones.	2017-05-15 21:58:36 +00:00
Enji Cooper	bd7459366e	Add missing braces around MCAST_EXCLUDE check when KTR support is compiled into the kernel This ensures that .iss_asm (the number of ASM listeners) isn't incorrectly decremented for MLD-layer source datagrams when inspecting im*s_st[1] (the second state in the structure). MFC after: 2 months PR: 217509 [1] Reported by: Coverity (Isilon) Reviewed by: ae ("This patch looks correct to me." [1]) Submitted by: Miles Ohlrich <miles.ohlrich@isilon.com> Sponsored by: Dell EMC Isilon	2017-05-13 18:41:24 +00:00
Gleb Smirnoff	7637c57ee1	There is no good reason for TCP reassembly zone to be UMA_ZONE_NOFREE. It has strong locking model, doesn't have any timers associated with entries. The entries theirselves are referenced only from the tcpcb zone, which itself is a normal zone, without the UMA_ZONE_NOFREE flag.	2017-05-10 23:32:31 +00:00
Eugene Grosbein	1a356b8b90	ipfw nat and natd support multiple aliasing instances with "nat global" feature that chooses right alias_address for outgoing packets that already have corresponding state in one of aliasing instances. This feature works just fine for ICMP, UDP, TCP and SCTP packes but not for others. For example, outgoing PPtP/GRE packets always get alias_address of latest configured instance no matter whether such packets have corresponding state or not. This change unbreaks translation of transit PPtP/GRE connections for "nat global" case fixing a bug in static ProtoAliasOut() function that ignores its "create" argument and performs translation regardless of its value. This static function is called only by LibAliasOutLocked() function and only for packers other than ICMP, UDP, TCP and SCTP. LibAliasOutLocked() passes its "create" argument unmodified. We have only two consumers of LibAliasOutLocked() in the source tree calling it with "create" unequal to 1: "ipfw nat global" code and similar natd code having same problem. All other consumers of LibAliasOutLocked() call it with create = 1 and the patch is "no-op" for such cases. PR: 218968 Approved by: ae, vsevolod (mentor) MFC after: 1 week	2017-05-10 19:41:52 +00:00
Michael Tuexen	10e0318afa	Allow SCTP to use the hostcache. This patch allows the MTU stored in the hostcache to be used as an initial value for SCTP paths. When an ICMP PTB message is received, store the MTU in the hostcache. MFC after: 1 week	2017-04-29 19:20:50 +00:00
Michael Tuexen	4f43a14a85	Don't set the DF-bit on timer based retransmissions. MFC after: 1 week	2017-04-29 09:57:27 +00:00
Michael Tuexen	b6ecf43450	Set the DF bit for responses to out-of-the-blue packets. MFC after: 1 week	2017-04-28 15:38:34 +00:00
Michael Tuexen	d274bcc661	Fix an issue with MTU calculation if an ICMP messaeg is received for an SCTP/UDP packet. MFC after: 1 week	2017-04-26 20:21:05 +00:00
Michael Tuexen	6ebfa5ee14	Use consistently uint32_t for mtu values. This does not change functionality, but this cleanup is need for further improvements of ICMP handling. MFC after: 1 week	2017-04-26 19:26:40 +00:00
Michael Tuexen	ebfd753408	When a SYN-ACK is received in SYN-SENT state, RFC 793 requires the validation of SEG.ACK as the first step. If the ACK is not acceptable, a RST segment should be sent and the segment should be dropped. Up to now, the segment was partially processed. This patch moves the check for the SEG.ACK validation up to the front as required. Reviewed by: hiren, gnn MFC after: 1 week Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D10424	2017-04-26 06:20:58 +00:00
Navdeep Parhar	f8acc03ef1	Flush the LRO ctrl as soon as lro_mbufs fills up. There is no need to wait for the next enqueue from the driver. Reviewed by: gnn@, hselasky@, gallatin@ MFC after: 1 week Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D10432	2017-04-24 22:35:00 +00:00
Navdeep Parhar	ea9a92f112	Frames that are not considered for LRO should not be counted in LRO statistics. Reviewed by: gnn@, hselasky@, gallatin@ MFC after: 1 week Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D10430	2017-04-24 22:31:56 +00:00
Brooks Davis	a7dc31283a	Remove the NATM framework including the en(4), fatm(4), hatm(4), and patm(4) devices. Maintaining an address family and framework has real costs when we make infrastructure improvements. In the case of NATM we support no devices manufactured in the last 20 years and some will not even work in modern motherboards (some newer devices that patm(4) could be updated to support apparently exist, but we do not currently have support). With this change, support remains for some netgraph modules that don't require NATM support code. It is unclear if all these should remain, though ng_atmllc certainly stands alone. Note well: FreeBSD 11 supports NATM and will continue to do so until at least September 30, 2021. Improvements to the code in FreeBSD 11 are certainly welcome. Reviewed by: philip Approved by: harti	2017-04-24 21:21:49 +00:00
Michael Tuexen	75e7a91649	Represent "a syncache overflow hasn't happend yet" by using -(SYNCOOKIE_LIFETIME + 1) instead of INT64_MIN, since it is good enough and works when time_t is int32 or int64. This fixes the issue reported by cy@ on i386. Reported by: cy MFC after: 1 week Sponsored by: Netflix, Inc.	2017-04-21 06:05:34 +00:00
Michael Tuexen	190d9abce7	Syncoockies can be used in combination with the syncache. If the cache overflows, syncookies are used. This patch restricts the usage of syncookies in this case: accept syncookies only if there was an overflow of the syncache recently. This mitigates a problem reported in PR217637, where is syncookie was accepted without any recent drops. Thanks to glebius@ for suggesting an improvement. PR: 217637 Reviewed by: gnn, glebius MFC after: 1 week Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D10272	2017-04-20 19:19:33 +00:00
Navdeep Parhar	b0ca71f0a0	Free lro_hash unconditionally, just like lro_mbuf_data a few lines later. Fix whitespace nit while here.	2017-04-19 23:06:07 +00:00
Navdeep Parhar	a3927369fa	Do not leak lro_hash on failure to allocate lro_mbuf_data. MFC after: 1 week	2017-04-19 22:27:26 +00:00
Navdeep Parhar	3d24e03800	Remove redundant assignment.	2017-04-19 22:20:41 +00:00
Andrey V. Elsukov	c33a231337	Rework r316770 to make it protocol independent and general, like we do for streaming sockets. And do more cleanup in the sbappendaddr_locked_internal() to prevent leak information from existing mbuf to the one, that will be possible created later by netgraph. Suggested by: glebius Tested by: Irina Liakh <spell at itl ua> MFC after: 1 week	2017-04-14 09:00:48 +00:00
Andrey V. Elsukov	8428914909	Clear h/w csum flags on mbuf handled by UDP. When checksums of received IP and UDP header already checked, UDP uses sbappendaddr_locked() to pass received data to the socket. sbappendaddr_locked() uses given mbuf as is, and if NIC supports checksum offloading, mbuf contains csum_data and csum_flags that were calculated for already stripped headers. Some NICs support only limited checksums offloading and do not use CSUM_PSEUDO_HDR flag, and csum_data contains some value that UDP/TCP should use for pseudo header checksum calculation. When L2TP is used for tunneling with mpd5, ng_ksocket receives mbuf with filled csum_flags and csum_data, that were calculated for outer headers. When L2TP header is stripped, a packet that was tunneled goes to the IP layer and due to presence of csum_flags (without CSUM_PSEUDO_HDR) and csum_data, the UDP/TCP checksum check fails for this packet. Reported by: Irina Liakh <spell at itl ua> Tested by: Irina Liakh <spell at itl ua> MFC after: 1 week	2017-04-13 17:03:57 +00:00
Michael Tuexen	013f4df643	The sysctl variable net.inet.tcp.drop_synfin is not honored in all states, for example not in SYN-SENT. This patch adds code to check the sysctl variable in other states than LISTEN. Thanks to ae and gnn for providing comments. Reviewed by: gnn MFC after: 1 week Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D9894	2017-04-12 20:27:15 +00:00
Andrey V. Elsukov	7faa0d213b	Make sysctl identifiers for direct netisr queue unique. Introduce IPCTL_INTRDQMAXLEN and IPCTL_INTRDQDROPS macros for this purpose. Reviewed by: gnn MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D10358	2017-04-11 19:20:20 +00:00
Steven Hartland	e44c1887fd	Use estimated RTT for receive buffer auto resizing instead of timestamps Switched from using timestamps to RTT estimates when performing TCP receive buffer auto resizing, as not all hosts support / enable TCP timestamps. Disabled reset of receive buffer auto scaling when not in bulk receive mode, which gives an extra 20% performance increase. Also extracted auto resizing to a common method shared between standard and fastpath modules. With this AWS S3 downloads at ~17ms latency on a 1Gbps connection jump from ~3MB/s to ~100MB/s using the default settings. Reviewed by: lstewart, gnn MFC after: 2 weeks Relnotes: Yes Sponsored by: Multiplay Differential Revision: https://reviews.freebsd.org/D9668	2017-04-10 08:19:35 +00:00
Ryan Stone	4af540d197	Revert the optimization from r304436 r304436 attempted to optimize the handling of incoming UDP packet by only making an expensive call to in_broadcast() if the mbuf was marked as an broadcast packet. Unfortunately, this cannot work in the case of point-to- point L2 protocols like PPP, which have no notion of "broadcast". The optimization has been disabled for several months now with no progress towards fixing it, so it needs to go.	2017-04-05 16:57:13 +00:00
Andrey V. Elsukov	11c56650f0	Add O_EXTERNAL_DATA opcode support. This opcode can be used to attach some data to external action opcode. And unlike to O_EXTERNAL_INSTANCE opcode, this opcode does not require creating of named instance to pass configuration arguments to external action handler. The data is coming just next to O_EXTERNAL_ACTION opcode. The userlevel part currenly supports formatting for opcode with ipfw_insn size, by default it expects u16 numeric value in the arg1. Obtained from: Yandex LLC MFC after: 2 weeks Sponsored by: Yandex LLC	2017-04-03 02:44:40 +00:00
Steven Hartland	6ebc1b7b7d	Allow explicitly assigned IPv4 loopback address to be used in jails If a jail has an explicitly assigned loopback address then allow it to be used instead of remapping requests for the loopback adddress to the first IPv4 address assigned to the jail. This fixes issues where applications attempt to detect their bound port where they requested a loopback address, which was available, but instead the kernel remapped it to the jails first address. A example of this is binding nginx to 127.0.0.1 and then running "service nginx upgrade" which before this change would cause nginx to fail. Also: * Correct the description of prison_check_ip4_locked to match the code. MFC after: 2 weeks Relnotes: Yes Sponsored by: Multiplay	2017-03-31 00:41:54 +00:00
Mike Karels	4a5c6c6ab0	Enable route and LLE (ndp) caching in TCP/IPv6 tcp_output.c was using a route on the stack for IPv6, which does not allow route caching or LLE/ndp caching. Switch to using the route (v6 flavor) in the in_pcb, which was already present, which caches both L3 and L2 lookups. Reviewed by: gnn hiren MFC after: 2 weeks	2017-03-27 23:48:36 +00:00
Mike Karels	8c1960d506	Fix reference count leak with L2 caching. ip_forward, TCP/IPv6, and probably SCTP leaked references to L2 cache entry because they used their own routes on the stack, not in_pcb routes. The original model for route caching was callers that provided a route structure to ip{,6}input() would keep the route, and this model was used for L2 caching as well. Instead, change L2 caching to be done by default only when using a route structure in the in_pcb; the pcb deallocation code frees L2 as well as L3 cacches. A separate change will add route caching to TCP/IPv6. Another suggestion was to have the transport protocols indicate willingness to use L2 caching, but this approach keeps the changes in the network level Reviewed by: ae gnn MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D10059	2017-03-25 15:06:28 +00:00
Gleb Smirnoff	3ae4b0e7e8	Force same alignment on struct xinpgen as we have on struct xinpcb. This fixes 32-bit builds.	2017-03-21 16:23:44 +00:00
Gleb Smirnoff	cc65eb4e79	Hide struct inpcb, struct tcpcb from the userland. This is a painful change, but it is needed. On the one hand, we avoid modifying them, and this slows down some ideas, on the other hand we still eventually modify them and tools like netstat(1) never work on next version of FreeBSD. We maintain a ton of spares in them, and we already got some ifdef hell at the end of tcpcb. Details: - Hide struct inpcb, struct tcpcb under _KERNEL \|\| _WANT_FOO. - Make struct xinpcb, struct xtcpcb pure API structures, not including kernel structures inpcb and tcpcb inside. Export into these structures the fields from inpcb and tcpcb that are known to be used, and put there a ton of spare space. - Make kernel and userland utilities compilable after these changes. - Bump __FreeBSD_version. Reviewed by: rrs, gnn Differential Revision: D10018	2017-03-21 06:39:49 +00:00
Eric van Gyzen	40769242ed	Add some ntohl() love to r315277 inet_ntoa() and inet_ntoa_r() take the address in network byte-order. When I removed those calls, I should have replaced them with ntohl() to make the hex addresses slightly less unreadable. Here they are. See r315277 regarding classic blunders. vangyzen: you're deep in "no good deed" territory, it seems --badger Reported by: ian MFC after: 3 days MFC when: I finally get it right Sponsored by: Dell EMC	2017-03-14 20:57:54 +00:00
Eric van Gyzen	47d803ea71	KTR: log IPv4 addresses in hex rather than dotted-quad When I made the changes in r313821, I fell victim to one of the classic blunders, the most famous of which is: never get involved in a land war in Asia. But only slightly less well known is this: Keep your brain turned on and engaged when making a tedious, sweeping, mechanical change. KTR can correctly log the immediate integral values passed to it, as well as constant strings, but not non-constant strings, since they might change by the time ktrdump retrieves them. Reported by: glebius MFC after: 3 days Sponsored by: Dell EMC	2017-03-14 18:27:48 +00:00
Conrad Meyer	fdb727f4f2	alias_proxy.c: Fix accidental error quashing This was introduced on accident in r165243, when return sites were unified to add a lock around LibAliasProxyRule(). PR: 217749 Submitted by: Svyatoslav <razmyslov at viva64.com> Sponsored by: Viva64 (PVS-Studio)	2017-03-13 18:05:31 +00:00
Andrey V. Elsukov	719498102c	Fix the L2 address printed in the "arp: %s moved from %*D" message. In the r292978 struct llentry was changed and the ll_addr field become the pointer. PR: 217667 MFC after: 1 week	2017-03-11 04:57:52 +00:00
Gleb Smirnoff	c75e266608	Make inp_lock_assert() depend on INVARIANT_SUPPORT, not INVARIANTS. This will make INVARIANT-enabled modules, that use this function to load successfully on a kernel that has INVARIANT_SUPPORT only.	2017-03-09 00:55:19 +00:00
Ermal Luçi	dce33a45c9	The patch provides the same socket option as Linux IP_ORIGDSTADDR. Unfortunately they will have different integer value due to Linux value being already assigned in FreeBSD. The patch is similar to IP_RECVDSTADDR but also provides the destination port value to the application. This allows/improves implementation of transparent proxies on UDP sockets due to having the whole information on forwarded packets. Reviewed by: adrian, aw Approved by: ae (mentor) Sponsored by: rsync.net Differential Revision: D9235	2017-03-06 04:01:58 +00:00
Warner Losh	fbbd9655e5	Renumber copyright clause 4 Renumber cluase 4 to 3, per what everybody else did when BSD granted them permission to remove clause 3. My insistance on keeping the same numbering for legal reasons is too pedantic, so give up on that point. Submitted by: Jan Schaumann <jschauma@stevens.edu> Pull Request: https://github.com/freebsd/freebsd/pull/96	2017-02-28 23:42:47 +00:00
Michael Tuexen	8d62aae8df	TCP window updates are only sent if the window can be increased by at least 2 * MSS. However, if the receive buffer size is small, this might be impossible. Add back a criterion to send a TCP window update if the window can be increased by at least half of the receive buffer size. This condition was removed in r242252. This patch simply brings it back. PR: 211003 Reviewed by: gnn MFC after: 1 week Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D9475	2017-02-23 18:14:36 +00:00
Eric van Gyzen	922193e7ff	Remove inet_ntoa() from the kernel inet_ntoa() cannot be used safely in a multithreaded environment because it uses a static local buffer. Remove it from the kernel. Suggested by: glebius, emaste Reviewed by: gnn MFC after: never Sponsored by: Dell EMC Differential Revision: https://reviews.freebsd.org/D9625	2017-02-16 20:50:01 +00:00
Eric van Gyzen	8144690af4	Use inet_ntoa_r() instead of inet_ntoa() throughout the kernel inet_ntoa() cannot be used safely in a multithreaded environment because it uses a static local buffer. Instead, use inet_ntoa_r() with a buffer on the caller's stack. Suggested by: glebius, emaste Reviewed by: gnn MFC after: 2 weeks Sponsored by: Dell EMC Differential Revision: https://reviews.freebsd.org/D9625	2017-02-16 20:47:41 +00:00
Andrey V. Elsukov	7a60a91011	Add missing check to fix the build with IPSEC_SUPPORT and without MAC. Submitted by: netchild	2017-02-14 21:33:10 +00:00
Andrey V. Elsukov	627c036f65	Remove IPsec related PCB code from SCTP. The inpcb structure has inp_sp pointer that is initialized by ipsec_init_pcbpolicy() function. This pointer keeps strorage for IPsec security policies associated with a specific socket. An application can use IP_IPSEC_POLICY and IPV6_IPSEC_POLICY socket options to configure these security policies. Then ip[6]_output() uses inpcb pointer to specify that an outgoing packet is associated with some socket. And IPSEC_OUTPUT() method can use a security policy stored in the inp_sp. For inbound packet the protocol-specific input routine uses IPSEC_CHECK_POLICY() method to check that a packet conforms to inbound security policy configured in the inpcb. SCTP protocol doesn't specify inpcb for ip[6]_output() when it sends packets. Thus IPSEC_OUTPUT() method does not consider such packets as associated with some socket and can not apply security policies from inpcb, even if they are configured. Since IPSEC_CHECK_POLICY() method is called from protocol-specific input routine, it can specify inpcb pointer and associated with socket inbound policy will be checked. But there are two problems: 1. Such check is asymmetric, becasue we can not apply security policy from inpcb for outgoing packet. 2. IPSEC_CHECK_POLICY() expects that caller holds INPCB lock and access to inp_sp is protected. But for SCTP this is not correct, becasue SCTP uses own locks to protect inpcb. To fix these problems remove IPsec related PCB code from SCTP. This imply that IP_IPSEC_POLICY and IPV6_IPSEC_POLICY socket options will be not applicable to SCTP sockets. To be able correctly check inbound security policies for SCTP, mark its protocol header with the PR_LASTHDR flag. Reported by: tuexen Reviewed by: tuexen Differential Revision: https://reviews.freebsd.org/D9538	2017-02-13 11:37:52 +00:00
Ermal Luçi	c10c5b1eba	Committed without approval from mentor. Reported by: gnn	2017-02-12 06:56:33 +00:00
Ryan Stone	5ede40dcf2	Don't zero out srtt after excess retransmits If the TCP stack has retransmitted more than 1/4 of the total number of retransmits before a connection drop, it decides that its current RTT estimate is hopelessly out of date and decides to recalculate it from scratch starting with the next ACK. Unfortunately, it implements this by zeroing out the current RTT estimate. Drop this hack entirely, as it makes it significantly more difficult to debug connection issues. Instead check for excessive retransmits at the point where srtt is updated from an ACK being received. If we've exceeded 1/4 of the maximum retransmits, discard the previous srtt estimate and replace it with the latest rtt measurement. Differential Revision: https://reviews.freebsd.org/D9519 Reviewed by: gnn Sponsored by: Dell EMC Isilon	2017-02-11 17:05:08 +00:00
Gleb Smirnoff	cfff3743cd	Move tcp_fields_to_net() static inline into tcp_var.h, just below its friend tcp_fields_to_host(). There is third party code that also uses this inline. Reviewed by: ae	2017-02-10 17:46:26 +00:00
Ermal Luçi	e97b60264d	Fix build after r313524 Reported-by: ohartmann@walstatt.org	2017-02-10 06:01:47 +00:00
Ermal Luçi	4616026faf	Revert r313527 Heh svn is not git	2017-02-10 05:58:16 +00:00
Ermal Luçi	c0fadfdbbf	Correct missed variable name. Reported-by: ohartmann@walstatt.org	2017-02-10 05:51:39 +00:00
Ermal Luçi	ed55edceef	The patch provides the same socket option as Linux IP_ORIGDSTADDR. Unfortunately they will have different integer value due to Linux value being already assigned in FreeBSD. The patch is similar to IP_RECVDSTADDR but also provides the destination port value to the application. This allows/improves implementation of transparent proxies on UDP sockets due to having the whole information on forwarded packets. Sponsored-by: rsync.net Differential Revision: D9235 Reviewed-by: adrian	2017-02-10 05:16:14 +00:00
Eric van Gyzen	edf0313b70	Fix garbage IP addresses in UDP log_in_vain messages If multiple threads emit a UDP log_in_vain message concurrently, the IP addresses could be garbage due to concurrent usage of a single string buffer inside inet_ntoa(). Use inet_ntoa_r() with two stack buffers instead. Reported by: Mark Martinec <Mark.Martinec+freebsd@ijs.si> MFC after: 3 days Relnotes: yes Sponsored by: Dell EMC	2017-02-07 18:57:57 +00:00
Andrey V. Elsukov	fcf596178b	Merge projects/ipsec into head/. Small summary ------------- o Almost all IPsec releated code was moved into sys/netipsec. o New kernel modules added: ipsec.ko and tcpmd5.ko. New kernel option IPSEC_SUPPORT added. It enables support for loading and unloading of ipsec.ko and tcpmd5.ko kernel modules. o IPSEC_NAT_T option was removed. Now NAT-T support is enabled by default. The UDP_ENCAP_ESPINUDP_NON_IKE encapsulation type support was removed. Added TCP/UDP checksum handling for inbound packets that were decapsulated by transport mode SAs. setkey(8) modified to show run-time NAT-T configuration of SA. o New network pseudo interface if_ipsec(4) added. For now it is build as part of ipsec.ko module (or with IPSEC kernel). It implements IPsec virtual tunnels to create route-based VPNs. o The network stack now invokes IPsec functions using special methods. The only one header file <netipsec/ipsec_support.h> should be included to declare all the needed things to work with IPsec. o All IPsec protocols handlers (ESP/AH/IPCOMP protosw) were removed. Now these protocols are handled directly via IPsec methods. o TCP_SIGNATURE support was reworked to be more close to RFC. o PF_KEY SADB was reworked: - now all security associations stored in the single SPI namespace, and all SAs MUST have unique SPI. - several hash tables added to speed up lookups in SADB. - SADB now uses rmlock to protect access, and concurrent threads can do SA lookups in the same time. - many PF_KEY message handlers were reworked to reflect changes in SADB. - SADB_UPDATE message was extended to support new PF_KEY headers: SADB_X_EXT_NEW_ADDRESS_SRC and SADB_X_EXT_NEW_ADDRESS_DST. They can be used by IKE daemon to change SA addresses. o ipsecrequest and secpolicy structures were cardinally changed to avoid locking protection for ipsecrequest. Now we support only limited number (4) of bundled SAs, but they are supported for both INET and INET6. o INPCB security policy cache was introduced. Each PCB now caches used security policies to avoid SP lookup for each packet. o For inbound security policies added the mode, when the kernel does check for full history of applied IPsec transforms. o References counting rules for security policies and security associations were changed. The proper SA locking added into xform code. o xform code was also changed. Now it is possible to unregister xforms. tdb_xxx structures were changed and renamed to reflect changes in SADB/SPDB, and changed rules for locking and refcounting. Reviewed by: gnn, wblock Obtained from: Yandex LLC Relnotes: yes Sponsored by: Yandex LLC Differential Revision: https://reviews.freebsd.org/D9352	2017-02-06 08:49:57 +00:00
Patrick Kelsey	ec93ed8d95	Fix VIMAGE-related bugs in TFO. The autokey callout vnet context was not being initialized, and the per-vnet fastopen context was only being initialized for the default vnet. PR: 216613 Reported by: Alex Deiter <alex dot deiter at gmail dot com> MFC after: 1 week	2017-02-03 17:02:57 +00:00
George V. Neville-Neil	82988b50a1	Add an mbuf to ipinfo_t translator to finish cleanup of mbuf passing to TCP probes. Reviewed by: markj MFC after: 2 weeks Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D9401	2017-02-01 19:33:00 +00:00
Michael Tuexen	c03627fd06	Ensure that the variable bail is always initialized before used. MFC after: 1 week	2017-02-01 00:10:29 +00:00
Michael Tuexen	2aa116007c	Take the SCTP common header into account when computing the space available for chunks. This unbreaks the handling of ICMPV6 packets indicating "packet too big". It just worked for IPv4 since we are overbooking for IPv4. MFC after: 1 week	2017-01-31 23:36:31 +00:00
Michael Tuexen	7858d7cb8e	Remove a duplicate debug statement. MFC after: 1 week	2017-01-31 23:34:02 +00:00
Cy Schubert	3df96ee68e	Correct comment grammar and make it easier to understand. MFC after: 1 week	2017-01-30 04:51:18 +00:00
Hiren Panchasara	6134aabe38	Add a knob to change default behavior of inheriting listen socket's tcp stack regardless of what the default stack for the system is set to. With current/default behavior, after changing the default tcp stack, the application needs to be restarted to pick up that change. Setting this new knob net.inet.tcp.functions_inherit_listen_socket_stack to '0' would change that behavior and make any new connection use the newly selected default tcp stack. Reviewed by: rrs MFC after: 2 weeks Sponsored by: Limelight Networks	2017-01-27 23:10:46 +00:00
Luiz Otavio O Souza	338e227ac0	After the in_control() changes in r257692, an existing address is (intentionally) deleted first and then completely added again (so all the events, announces and hooks are given a chance to run). This cause an issue with CARP where the existing CARP data structure is removed together with the last address for a given VHID, which will cause a subsequent fail when the address is later re-added. This change fixes this issue by adding a new flag to keep the CARP data structure when an address is not being removed. There was an additional issue with IPv6 CARP addresses, where the CARP data structure would never be removed after a change and lead to VHIDs which cannot be destroyed. Reviewed by: glebius Obtained from: pfSense MFC after: 2 weeks Sponsored by: Rubicon Communications, LLC (Netgate)	2017-01-25 19:04:08 +00:00
Michael Tuexen	bd60638c98	Fix a bug where the overhead of the I-DATA chunk was not considered. MFC after: 1 week	2017-01-24 21:30:31 +00:00
Hans Petter Selasky	f3e7afe2d7	Implement kernel support for hardware rate limited sockets. - Add RATELIMIT kernel configuration keyword which must be set to enable the new functionality. - Add support for hardware driven, Receive Side Scaling, RSS aware, rate limited sendqueues and expose the functionality through the already established SO_MAX_PACING_RATE setsockopt(). The API support rates in the range from 1 to 4Gbytes/s which are suitable for regular TCP and UDP streams. The setsockopt(2) manual page has been updated. - Add rate limit function callback API to "struct ifnet" which supports the following operations: if_snd_tag_alloc(), if_snd_tag_modify(), if_snd_tag_query() and if_snd_tag_free(). - Add support to ifconfig to view, set and clear the IFCAP_TXRTLMT flag, which tells if a network driver supports rate limiting or not. - This patch also adds support for rate limiting through VLAN and LAGG intermediate network devices. - How rate limiting works: 1) The userspace application calls setsockopt() after accepting or making a new connection to set the rate which is then stored in the socket structure in the kernel. Later on when packets are transmitted a check is made in the transmit path for rate changes. A rate change implies a non-blocking ifp->if_snd_tag_alloc() call will be made to the destination network interface, which then sets up a custom sendqueue with the given rate limitation parameter. A "struct m_snd_tag" pointer is returned which serves as a "snd_tag" hint in the m_pkthdr for the subsequently transmitted mbufs. 2) When the network driver sees the "m->m_pkthdr.snd_tag" different from NULL, it will move the packets into a designated rate limited sendqueue given by the snd_tag pointer. It is up to the individual drivers how the rate limited traffic will be rate limited. 3) Route changes are detected by the NIC drivers in the ifp->if_transmit() routine when the ifnet pointer in the incoming snd_tag mismatches the one of the network interface. The network adapter frees the mbuf and returns EAGAIN which causes the ip_output() to release and clear the send tag. Upon next ip_output() a new "snd_tag" will be tried allocated. 4) When the PCB is detached the custom sendqueue will be released by a non-blocking ifp->if_snd_tag_free() call to the currently bound network interface. Reviewed by: wblock (manpages), adrian, gallatin, scottl (network) Differential Revision: https://reviews.freebsd.org/D3687 Sponsored by: Mellanox Technologies MFC after: 3 months	2017-01-18 13:31:17 +00:00
Maxim Sobolev	339efd75a4	Add a new socket option SO_TS_CLOCK to pick from several different clock sources to return timestamps when SO_TIMESTAMP is enabled. Two additional clock sources are: o nanosecond resolution realtime clock (equivalent of CLOCK_REALTIME); o nanosecond resolution monotonic clock (equivalent of CLOCK_MONOTONIC). In addition to this, this option provides unified interface to get bintime (equivalent of using SO_BINTIME), except it also supported with IPv6 where SO_BINTIME has never been supported. The long term plan is to depreciate SO_BINTIME and move everything to using SO_TS_CLOCK. Idea for this enhancement has been briefly discussed on the Net session during dev summit in Ottawa last June and the general input was positive. This change is believed to benefit network benchmarks/profiling as well as other scenarios where precise time of arrival measurement is necessary. There are two regression test cases as part of this commit: one extends unix domain test code (unix_cmsg) to test new SCM_XXX types and another one implementis totally new test case which exchanges UDP packets between two processes using both conventional methods (i.e. calling clock_gettime(2) before recv(2) and after send(2)), as well as using setsockopt()+recv() in receive path. The resulting delays are checked for sanity for all supported clock types. Reviewed by: adrian, gnn Differential Revision: https://reviews.freebsd.org/D9171	2017-01-16 17:46:38 +00:00
Conrad Meyer	1d64db52f3	Fix a variety of cosmetic typos and misspellings No functional change. PR: 216096, 216097, 216098, 216101, 216102, 216106, 216109, 216110 Reported by: Bulat <bltsrc at mail.ru> Sponsored by: Dell EMC Isilon	2017-01-15 18:00:45 +00:00
Gleb Smirnoff	0f7ddf91e9	Use getsock_cap() instead of deprecated fgetsock(). Reviewed by: tuexen	2017-01-13 16:54:44 +00:00
Michael Tuexen	24209f0122	Ensure that the buffer length and the length provided in the IPv4 header match when using a raw socket to send IPv4 packets and providing the header. If they don't match, let send return -1 and set errno to EINVAL. Before this patch is was only enforced that the length in the header is not larger then the buffer length. PR: 212283 Reviewed by: ae, gnn MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D9161	2017-01-13 10:55:26 +00:00
Maxim Sobolev	5e946c03c7	Fix slight type mismatch between so_options defined in sys/socketvar.h and tw_so_options defined here which is supposed to be a copy of the former (short vs u_short respectively). Switch tw_so_options to be "signed short" to match the type of the field it's inherited from.	2017-01-12 10:14:54 +00:00
Hiren Panchasara	b8a2fb91f6	sysctl net.inet.tcp.hostcache.list in a jail can see connections from other jails and the host. This commit fixes it. PR: 200361 Submitted by: bz (original version), hiren (minor corrections) Reported by: Marcus Reid <marcus at blazingdot dot com> Reviewed by: bz, gnn Tested by: Lohith Bellad <lohithbsd at gmail dot com> MFC after: 1 week Sponsored by: Limelight Networks (minor corrections)	2017-01-05 17:22:09 +00:00
George V. Neville-Neil	fad073dd44	Followup to mtod removal in main stack (r311225). Continued removal of mtod() calls from TCP_PROBE macros. MFC after: 1 week Sponsored by: Limelight Networks	2017-01-04 04:00:28 +00:00
George V. Neville-Neil	2b9c998413	Fix DTrace TCP tracepoints to not use mtod() as it is both unnecessary and dangerous. Those wanting data from an mbuf should use DTrace itself to get the data. PR: 203409 Reviewed by: hiren MFC after: 1 week Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D9035	2017-01-04 02:19:13 +00:00
Enji Cooper	cfff8d3dbd	Unbreak ip_carp with WITHOUT_INET6 enabled by conditionalizing all IPv6 structs under the INET6 #ifdef. Similarly (even though it doesn't seem to affect the build), conditionalize all IPv4 structs under the INET #ifdef This also unbreaks the LINT-NOINET6 tinderbox target on amd64; I have not verified other MACHINE/TARGET pairs (e.g. armv6/arm). MFC after: 2 weeks X-MFC with: r310847 Pointyhat to: jpaetzel Reported by: O. Hartmann <o.hartmann@walstatt.org>	2016-12-30 21:33:01 +00:00
Josh Paetzel	8151740c88	Harden CARP against network loops. If there is a loop in the network a CARP that is in MASTER state will see it's own broadcasts, which will then cause it to assume BACKUP state. When it assumes BACKUP it will stop sending advertisements. In that state it will no longer see advertisements and will assume MASTER... We can't catch all the cases where we are seeing our own CARP broadcast, but we can catch the obvious case. Submitted by: torek Obtained from: FreeNAS MFC after: 2 weeks Sponsored by: iXsystems	2016-12-30 18:46:21 +00:00
Andrey V. Elsukov	2e77d270c1	When we are sending IP fragments, update ip pointers in IP_PROBE() for each fragment. MFC after: 1 week	2016-12-29 19:57:46 +00:00

... 2 3 4 5 6 ...

6027 Commits