freebsd-dev

Author	SHA1	Message	Date
Gleb Smirnoff	cc65eb4e79	Hide struct inpcb, struct tcpcb from the userland. This is a painful change, but it is needed. On the one hand, we avoid modifying them, and this slows down some ideas, on the other hand we still eventually modify them and tools like netstat(1) never work on next version of FreeBSD. We maintain a ton of spares in them, and we already got some ifdef hell at the end of tcpcb. Details: - Hide struct inpcb, struct tcpcb under _KERNEL \|\| _WANT_FOO. - Make struct xinpcb, struct xtcpcb pure API structures, not including kernel structures inpcb and tcpcb inside. Export into these structures the fields from inpcb and tcpcb that are known to be used, and put there a ton of spare space. - Make kernel and userland utilities compilable after these changes. - Bump __FreeBSD_version. Reviewed by: rrs, gnn Differential Revision: D10018	2017-03-21 06:39:49 +00:00
Warner Losh	fbbd9655e5	Renumber copyright clause 4 Renumber cluase 4 to 3, per what everybody else did when BSD granted them permission to remove clause 3. My insistance on keeping the same numbering for legal reasons is too pedantic, so give up on that point. Submitted by: Jan Schaumann <jschauma@stevens.edu> Pull Request: https://github.com/freebsd/freebsd/pull/96	2017-02-28 23:42:47 +00:00
Gleb Smirnoff	cfff3743cd	Move tcp_fields_to_net() static inline into tcp_var.h, just below its friend tcp_fields_to_host(). There is third party code that also uses this inline. Reviewed by: ae	2017-02-10 17:46:26 +00:00
Andrey V. Elsukov	fcf596178b	Merge projects/ipsec into head/. Small summary ------------- o Almost all IPsec releated code was moved into sys/netipsec. o New kernel modules added: ipsec.ko and tcpmd5.ko. New kernel option IPSEC_SUPPORT added. It enables support for loading and unloading of ipsec.ko and tcpmd5.ko kernel modules. o IPSEC_NAT_T option was removed. Now NAT-T support is enabled by default. The UDP_ENCAP_ESPINUDP_NON_IKE encapsulation type support was removed. Added TCP/UDP checksum handling for inbound packets that were decapsulated by transport mode SAs. setkey(8) modified to show run-time NAT-T configuration of SA. o New network pseudo interface if_ipsec(4) added. For now it is build as part of ipsec.ko module (or with IPSEC kernel). It implements IPsec virtual tunnels to create route-based VPNs. o The network stack now invokes IPsec functions using special methods. The only one header file <netipsec/ipsec_support.h> should be included to declare all the needed things to work with IPsec. o All IPsec protocols handlers (ESP/AH/IPCOMP protosw) were removed. Now these protocols are handled directly via IPsec methods. o TCP_SIGNATURE support was reworked to be more close to RFC. o PF_KEY SADB was reworked: - now all security associations stored in the single SPI namespace, and all SAs MUST have unique SPI. - several hash tables added to speed up lookups in SADB. - SADB now uses rmlock to protect access, and concurrent threads can do SA lookups in the same time. - many PF_KEY message handlers were reworked to reflect changes in SADB. - SADB_UPDATE message was extended to support new PF_KEY headers: SADB_X_EXT_NEW_ADDRESS_SRC and SADB_X_EXT_NEW_ADDRESS_DST. They can be used by IKE daemon to change SA addresses. o ipsecrequest and secpolicy structures were cardinally changed to avoid locking protection for ipsecrequest. Now we support only limited number (4) of bundled SAs, but they are supported for both INET and INET6. o INPCB security policy cache was introduced. Each PCB now caches used security policies to avoid SP lookup for each packet. o For inbound security policies added the mode, when the kernel does check for full history of applied IPsec transforms. o References counting rules for security policies and security associations were changed. The proper SA locking added into xform code. o xform code was also changed. Now it is possible to unregister xforms. tdb_xxx structures were changed and renamed to reflect changes in SADB/SPDB, and changed rules for locking and refcounting. Reviewed by: gnn, wblock Obtained from: Yandex LLC Relnotes: yes Sponsored by: Yandex LLC Differential Revision: https://reviews.freebsd.org/D9352	2017-02-06 08:49:57 +00:00
Maxim Sobolev	5e946c03c7	Fix slight type mismatch between so_options defined in sys/socketvar.h and tw_so_options defined here which is supposed to be a copy of the former (short vs u_short respectively). Switch tw_so_options to be "signed short" to match the type of the field it's inherited from.	2017-01-12 10:14:54 +00:00
Jonathan T. Looney	68bd7ed102	The TFO server-side code contains some changes that are not conditioned on the TCP_RFC7413 kernel option. This change removes those few instructions from the packet processing path. While not strictly necessary, for the sake of consistency, I applied the new IS_FASTOPEN macro to all places in the packet processing path that used the (t_flags & TF_FASTOPEN) check. Reviewed by: hiren Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D8219	2016-10-12 19:06:50 +00:00
Jonathan T. Looney	bd79708dbf	In the TCP stack, the hhook(9) framework provides hooks for kernel modules to add actions that run when a TCP frame is sent or received on a TCP session in the ESTABLISHED state. In the base tree, this functionality is only used for the h_ertt module, which is used by the cc_cdg, cc_chd, cc_hd, and cc_vegas congestion control modules. Presently, we incur overhead to check for hooks each time a TCP frame is sent or received on an ESTABLISHED TCP session. This change adds a new compile-time option (TCP_HHOOK) to determine whether to include the hhook(9) framework for TCP. To retain backwards compatibility, I added the TCP_HHOOK option to every configuration file that already defined "options INET". (Therefore, this patch introduces no functional change. In order to see a functional difference, you need to compile a custom kernel without the TCP_HHOOK option.) This change will allow users to easily exclude this functionality from their kernel, should they wish to do so. Note that any users who use a custom kernel configuration and use one of the congestion control modules listed above will need to add the TCP_HHOOK option to their kernel configuration. Reviewed by: rrs, lstewart, hiren (previous version), sjg (makefiles only) Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D8185	2016-10-12 02:16:42 +00:00
Jonathan T. Looney	3ac125068a	Remove "long" variables from the TCP stack (not including the modular congestion control framework). Reviewed by: gnn, lstewart (partial) Sponsored by: Juniper Networks, Netflix Differential Revision: (multiple) Tested by: Limelight, Netflix	2016-10-06 16:28:34 +00:00
Jonathan T. Looney	55a429a6dc	Remove declaration of un-defined function tcp_seq_subtract(). Reviewed by: gnn MFC after: 1 week Sponsored by: Juniper Networks, Netflix Differential Revision: https://reviews.freebsd.org/D7055	2016-10-06 15:57:15 +00:00
Lawrence Stewart	4b7b743c16	Pass the number of segments coalesced by LRO up the stack by repurposing the tso_segsz pkthdr field during RX processing, and use the information in TCP for more correct accounting and as a congestion control input. This is only a start, and an audit of other uses for the data is left as future work. Reviewed by: gallatin, rrs Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D7564	2016-08-25 13:33:32 +00:00
Randall Stewart	587d67c008	Here we update the modular tcp to be able to switch to an alternate TCP stack in other then the closed state (pre-listen/connect). The idea is that if that is supported by the alternate stack, it is asked if its ok to switch. If it approves the "handoff" then we allow the switch to happen. Also the fini() function now gets a flag to tell if you are switching away or the tcb is destroyed. The init() call into the alternate stack is moved to the end so the tcb is more fully formed before the init transpires. Sponsored by: Netflix Inc. Differential Revision: D6790	2016-08-16 15:11:46 +00:00
Bjoern A. Zeeb	3f58662dd9	The pr_destroy field does not allow us to run the teardown code in a specific order. VNET_SYSUNINITs however are doing exactly that. Thus remove the VIMAGE conditional field from the domain(9) protosw structure and replace it with VNET_SYSUNINITs. This also allows us to change some order and to make the teardown functions file local static. Also convert divert(4) as it uses the same mechanism ip(4) and ip6(4) use internally. Slightly reshuffle the SI_SUB_* fields in kernel.h and add a new ones, e.g., for pfil consumers (firewalls), partially for this commit and for others to come. Reviewed by: gnn, tuexen (sctp), jhb (kernel.h) Obtained from: projects/vnet MFC after: 2 weeks X-MFC: do not remove pr_destroy Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D6652	2016-06-01 10:14:04 +00:00
Gleb Smirnoff	f59d975e10	Tiny refactor of r294869/r296881: use defines to mask the VNET() macro. Suggested by: bz	2016-05-17 23:14:17 +00:00
Randall Stewart	5105a92c49	This small change adopts the excellent suggestion for using named structures in the add of a new tcp-stack that came in late to me via email after the last commit. It also makes it so that a new stack may optionally get a callback during a retransmit timeout. This allows the new stack to clear specific state (think sack scoreboards or other such structures). Sponsored by: Netflix Inc. Differential Revision: http://reviews.freebsd.org/D6303	2016-05-17 09:53:22 +00:00
Randall Stewart	e5ad64562a	This cleans up the timers code in TCP to start using the new async_drain functionality. This as been tested in NF as well as by Verisign. Still to do in here is to remove all the old flags. They are currently left being maintained but probably are no longer needed. Sponsored by: Netflix Inc. Differential Revision: http://reviews.freebsd.org/D5924	2016-04-28 13:27:12 +00:00
Jonathan T. Looney	5d20f97461	to_flags is currently a 64-bit integer; however, we only use 7 bits. Furthermore, there is no reason this needs to be a 64-bit integer for the forseeable future. Also, there is an inconsistency between to_flags and the mask in tcp_addoptions(). Before r195654, to_flags was a u_long and the mask in tcp_addoptions() was a u_int. r195654 changed to_flags to be a u_int64_t but left the mask in tcp_addoptions() as a u_int, meaning that these variables will only be the same width on platforms with 64-bit integers. Convert both to_flags and the mask in tcp_addoptions() to be explicitly 32-bit variables. This may save a few cycles on 32-bit platforms, and avoids unnecessarily mixing types. Differential Revision: https://reviews.freebsd.org/D5584 Reviewed by: hiren MFC after: 2 weeks Sponsored by: Juniper Networks	2016-03-22 15:55:17 +00:00
Gleb Smirnoff	bf840a1707	Redo r294869. The array of counters for TCP states doesn't belong to struct tcpstat, because the structure can be zeroed out by netstat(1) -z, and of course running connection counts shouldn't be touched. Place running connection counts into separate array, and provide separate read-only sysctl oid for it.	2016-03-15 00:15:10 +00:00
Gleb Smirnoff	2f06d2ab91	Comment fix: statistics are not read-only.	2016-03-14 18:06:59 +00:00
Gleb Smirnoff	57a78e3bae	Augment struct tcpstat with tcps_states[], which is used for book-keeping the amount of TCP connections by state. Provides a cheap way to get connection count without traversing the whole pcb list. Sponsored by: Netflix	2016-01-27 00:45:46 +00:00
Gleb Smirnoff	d17d4c6b2a	Provide TCPSTAT_DEC() and TCPSTAT_FETCH() macros.	2016-01-27 00:20:07 +00:00
Gleb Smirnoff	0c39d38d21	Historically we have two fields in tcpcb to describe sender MSS: t_maxopd, and t_maxseg. This dualism emerged with T/TCP, but was not properly cleaned up after T/TCP removal. After all permutations over the years the result is that t_maxopd stores a minimum of peer offered MSS and MTU reduced by minimum protocol header. And t_maxseg stores (t_maxopd - TCPOLEN_TSTAMP_APPA) if timestamps are in action, or is equal to t_maxopd otherwise. That's a very rough estimate of MSS reduced by options length. Throughout the code it was used in places, where preciseness was not important, like cwnd or ssthresh calculations. With this change: - t_maxopd goes away. - t_maxseg now stores MSS not adjusted by options. - new function tcp_maxseg() is provided, that calculates MSS reduced by options length. The functions gives a better estimate, since it takes into account SACK state as well. Reviewed by: jtl Differential Revision: https://reviews.freebsd.org/D3593	2016-01-07 00:14:42 +00:00
Patrick Kelsey	281a0fd4f9	Implementation of server-side TCP Fast Open (TFO) [RFC7413]. TFO is disabled by default in the kernel build. See the top comment in sys/netinet/tcp_fastopen.c for implementation particulars. Reviewed by: gnn, jch, stas MFC after: 3 days Sponsored by: Verisign, Inc. Differential Revision: https://reviews.freebsd.org/D4350	2015-12-24 19:09:48 +00:00
Randall Stewart	55bceb1e2b	First cut of the modularization of our TCP stack. Still to do is to clean up the timer handling using the async-drain. Other optimizations may be coming to go with this. Whats here will allow differnet tcp implementations (one included). Reviewed by: jtl, hiren, transports Sponsored by: Netflix Inc. Differential Revision: D4055	2015-12-16 00:56:45 +00:00
Hiren Panchasara	4d16338223	Clean up unused bandwidth entry in the TCP hostcache. Submitted by: Jason Wolfe (j at nitrology dot com) Reviewed by: rrs, hiren Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D4154	2015-12-11 06:22:58 +00:00
Hiren Panchasara	b87170f210	r290122 added 4 bytes and removed 8 in struct sackhint. Add a pad entry of 4 bytes to restore the size. Spotted by: rrs Reviewed by: rrs X-MFC with: r290122 Sponsored by: Limelight Networks	2015-12-10 03:20:10 +00:00
Hiren Panchasara	021eaf7996	One of the ways to detect loss is to count duplicate acks coming back from the other end till it reaches predetermined threshold which is 3 for us right now. Once that happens, we trigger fast-retransmit to do loss recovery. Main problem with the current implementation is that we don't honor SACK information well to detect whether an incoming ack is a dupack or not. RFC6675 has latest recommendations for that. According to it, dupack is a segment that arrives carrying a SACK block that identifies previously unknown information between snd_una and snd_max even if it carries new data, changes the advertised window, or moves the cumulative acknowledgment point. With the prevalence of Selective ACK (SACK) these days, improper handling can lead to delayed loss recovery. With the fix, new behavior looks like following: 0) th_ack < snd_una --> ignore Old acks are ignored. 1) th_ack == snd_una, !sack_changed --> ignore Acks with SACK enabled but without any new SACK info in them are ignored. 2) th_ack == snd_una, window == old_window --> increment Increment on a good dupack. 3) th_ack == snd_una, window != old_window, sack_changed --> increment When SACK enabled, it's okay to have advertized window changed if the ack has new SACK info. 4) th_ack > snd_una --> reset to 0 Reset to 0 when left edge moves. 5) th_ack > snd_una, sack_changed --> increment Increment if left edge moves but there is new SACK info. Here, sack_changed is the indicator that incoming ack has previously unknown SACK info in it. Note: This fix is not fully compliant to RFC6675. That may require a few changes to current implementation in order to keep per-sackhole dupack counter and change to the way we mark/handle sack holes. PR: 203663 Reviewed by: jtl MFC after: 3 weeks Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D4225	2015-12-08 21:21:48 +00:00
Hiren Panchasara	12eeb81fc1	Calculate the correct amount of bytes that are in-flight for a connection as suggested by RFC 6675. Currently differnt places in the stack tries to guess this in suboptimal ways. The main problem is that current calculations don't take sacked bytes into account. Sacked bytes are the bytes receiver acked via SACK option. This is suboptimal because it assumes that network has more outstanding (unacked) bytes than the actual value and thus sends less data by setting congestion window lower than what's possible which in turn may cause slower recovery from losses. As an example, one of the current calculations looks something like this: snd_nxt - snd_fack + sackhint.sack_bytes_rexmit New proposal from RFC 6675 is: snd_max - snd_una - sackhint.sacked_bytes + sackhint.sack_bytes_rexmit which takes sacked bytes into account which is a new addition to the sackhint struct. Only thing we are missing from RFC 6675 is isLost() i.e. segment being considered lost and thus adjusting pipe based on that which makes this calculation a bit on conservative side. The approach is very simple. We already process each ack with sack info in tcp_sack_doack() and extract sack blocks/holes out of it. We'd now also track this new variable sacked_bytes which keeps track of total sacked bytes reported. One downside to this approach is that we may get incorrect count of sacked_bytes if the other end decides to drop sack info in the ack because of memory pressure or some other reasons. But in this (not very likely) case also the pipe calculation would be conservative which is okay as opposed to being aggressive in sending packets into the network. Next step is to use this more accurate pipe estimation to drive congestion window adjustments. In collaboration with: rrs Reviewed by: jason_eggnet dot com, rrs MFC after: 2 weeks Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D3971	2015-10-28 22:57:51 +00:00
Hiren Panchasara	356c7958a4	Add sysctl tunable net.inet.tcp.initcwnd_segments to specify initial congestion window in number of segments on fly. It is set to 10 segments by default. Remove net.inet.tcp.experimental.initcwnd10 which is now redundant. Also remove the parent node net.inet.tcp.experimental as it's not needed anymore and also because it was not well thought out. Differential Revision: https://reviews.freebsd.org/D3858 In collaboration with: lstewart Reviewed by: gnn (prev version), rwatson, allanjude, wblock (man page) MFC after: 2 weeks Relnotes: yes Sponsored by: Limelight Networks	2015-10-27 09:43:05 +00:00
Hiren Panchasara	86a996e6bd	There are times when it would be really nice to have a record of the last few packets and/or state transitions from each TCP socket. That would help with narrowing down certain problems we see in the field that are hard to reproduce without understanding the history of how we got into a certain state. This change provides just that. It saves copies of the last N packets in a list in the tcpcb. When the tcpcb is destroyed, the list is freed. I thought this was likely to be more performance-friendly than saving copies of the tcpcb. Plus, with the packets, you should be able to reverse-engineer what happened to the tcpcb. To enable the feature, you will need to compile a kernel with the TCPPCAP option. Even then, the feature defaults to being deactivated. You can activate it by setting a positive value for the number of captured packets. You can do that on either a global basis or on a per-socket basis (via a setsockopt call). There is no way to get the packets out of the kernel other than using kmem or getting a coredump. I thought that would help some of the legal/privacy concerns regarding such a feature. However, it should be possible to add a future effort to export them in PCAP format. I tested this at low scale, and found that there were no mbuf leaks and the peak mbuf usage appeared to be unchanged with and without the feature. The main performance concern I can envision is the number of mbufs that would be used on systems with a large number of sockets. If you save five packets per direction per socket and have 3,000 sockets, that will consume at least 30,000 mbufs just to keep these packets. I tried to reduce the concerns associated with this by limiting the number of clusters (not mbufs) that could be used for this feature. Again, in my testing, that appears to work correctly. Differential Revision: D3100 Submitted by: Jonathan Looney <jlooney at juniper dot net> Reviewed by: gnn, hiren	2015-10-14 00:35:37 +00:00
Alexander V. Chernikov	1558cb2448	Eliminate nd6_nud_hint() and its TCP bindings. Initially function was introduced in r53541 (KAME initial commit) to "provide hints from upper layer protocols that indicate a connection is making "forward progress"" (quote from RFC 2461 7.3.1 Reachability Confirmation). However, it was converted to do nothing (e.g. just return) in r122922 (tcp_hostcache implementation) back in 2003. Some defines were moved to tcp_var.h in r169541. Then, it was broken (for non-corner cases) by r186119 (L2<>L3 split) in 2008 (NULL ifp in nd6_lookup). So, right now this code is broken and has no "real" base users. Differential Revision: https://reviews.freebsd.org/D3699	2015-09-27 05:29:34 +00:00
Gleb Smirnoff	24067db8ca	Make tcp_mtudisc() static and void. No functional changes. Sponsored by: Nginx, Inc.	2015-09-04 12:02:12 +00:00
Hiren Panchasara	03041aaac8	Update snd_una description to make it more readable. Differential Revision: https://reviews.freebsd.org/D3179 Reviewed by: gnn Sponsored by: Limelight Networks	2015-07-30 19:24:49 +00:00
Patrick Kelsey	4741bfcb57	Revert r265338, r271089 and r271123 as those changes do not handle non-inline urgent data and introduce an mbuf exhaustion attack vector similar to FreeBSD-SA-15:15.tcp, but not requiring VNETs. Address the issue described in FreeBSD-SA-15:15.tcp. Reviewed by: glebius Approved by: so Approved by: jmallett (mentor) Security: FreeBSD-SA-15:15.tcp Sponsored by: Norse Corp, Inc.	2015-07-29 17:59:13 +00:00
Julien Charbon	5571f9cf81	Fix an old and well-documented use-after-free race condition in TCP timers: - Add a reference from tcpcb to its inpcb - Defer tcpcb deletion until TCP timers have finished Differential Revision: https://reviews.freebsd.org/D2079 Submitted by: jch, Marc De La Gueronniere <mdelagueronniere@verisign.com> Reviewed by: imp, rrs, adrian, jhb, bz Approved by: jhb Sponsored by: Verisign, Inc.	2015-04-16 10:00:06 +00:00
Julien Charbon	71da715374	Re-introduce padding fields removed with r264321 to keep struct tcptw ABI unchanged. Suggested by: jhb Approved by: jhb (mentor) MFC after: 1 day X-MFC-With: r264321	2014-11-17 14:56:02 +00:00
Hans Petter Selasky	4952ad427a	Restore spares used in "struct tcpcb" and bump "__FreeBSD_version" to indicate need for kernel module re-compilation. Sponsored by: Mellanox Technologies	2014-11-03 13:01:58 +00:00
Julien Charbon	cea40c4888	Fix a race condition in TCP timewait between tcp_tw_2msl_reuse() and tcp_tw_2msl_scan(). This race condition drives unplanned timewait timeout cancellation. Also simplify implementation by holding inpcb reference and removing tcptw reference counting. Differential Revision: https://reviews.freebsd.org/D826 Submitted by: Marc De la Gueronniere <mdelagueronniere@verisign.com> Submitted by: jch Reviewed By: jhb (mentor), adrian, rwatson Sponsored by: Verisign, Inc. MFC after: 2 weeks X-MFC-With: r264321	2014-10-30 08:53:56 +00:00
Sean Bruno	f6f6703f27	Implement PLPMTUD blackhole detection (RFC 4821), inspired by code from xnu sources. If we encounter a network where ICMP is blocked the Needs Frag indicator may not propagate back to us. Attempt to downshift the mss once to a preconfigured value. Default this feature to off for now while we do not have a full PLPMTUD implementation in our stack. Adds the following new sysctl's for control: net.inet.tcp.pmtud_blackhole_detection -- turns on/off this feature net.inet.tcp.pmtud_blackhole_mss -- mss to try for ipv4 net.inet.tcp.v6pmtud_blackhole_mss -- mss to try for ipv6 Adds the following new sysctl's for monitoring: -- Number of times the code was activated to attempt a mss downshift net.inet.tcp.pmtud_blackhole_activated -- Number of times the blackhole mss was used in an attempt to downshift net.inet.tcp.pmtud_blackhole_min_activated -- Number of times that we failed to connect after we downshifted the mss net.inet.tcp.pmtud_blackhole_failed Phabricator: https://reviews.freebsd.org/D506 Reviewed by: rpaulo bz MFC after: 2 weeks Relnotes: yes Sponsored by: Limelight Networks	2014-10-07 21:50:28 +00:00
Alexander V. Chernikov	29c47f18da	* Split tcp_signature_compute() into 2 pieces: - tcp_get_sav() - SADB key lookup - tcp_signature_do_compute() - actual computation * Fix TCP signature case for listening socket: do not assume EVERY connection coming to socket with TCP_SIGNATURE set to be md5 signed regardless of SADB key existance for particular address. This fixes the case for routing software having _some_ BGP sessions secured by md5. * Simplify TCP_SIGNATURE handling in tcp_input() MFC after: 2 weeks	2014-09-27 07:04:12 +00:00
Hans Petter Selasky	9fd573c39d	Improve transmit sending offload, TSO, algorithm in general. The current TSO limitation feature only takes the total number of bytes in an mbuf chain into account and does not limit by the number of mbufs in a chain. Some kinds of hardware is limited by two factors. One is the fragment length and the second is the fragment count. Both of these limits need to be taken into account when doing TSO. Else some kinds of hardware might have to drop completely valid mbuf chains because they cannot loaded into the given hardware's DMA engine. The new way of doing TSO limitation has been made backwards compatible as input from other FreeBSD developers and will use defaults for values not set. Reviewed by: adrian, rmacklem Sponsored by: Mellanox Technologies MFC after: 1 week	2014-09-22 08:27:27 +00:00
Kevin Lo	8f5a8818f5	Merge 'struct ip6protosw' and 'struct protosw' into one. Now we have only one protocol switch structure that is shared between ipv4 and ipv6. Phabric: D476 Reviewed by: jhb	2014-08-08 01:57:15 +00:00
Bjoern A. Zeeb	4fd2b4eb53	Make tcp_twrespond() file local private; this removes it from the public KPI; it is not used anywhere else and seems it never was. MFC after: 2 weeks	2014-05-24 14:01:18 +00:00
Bjoern A. Zeeb	255cd9fd58	Move the tcp_fields_to_host() and tcp_fields_to_net() (inline) functions to the tcp_var.h header file in order to avoid further duplication with upcoming commits. Reviewed by: np MFC after: 2 weeks	2014-05-23 20:15:01 +00:00
Gleb Smirnoff	b1a4156614	Provide compatibility #define after r265408. Suggested by: truckman	2014-05-17 12:33:27 +00:00
Mike Silbersack	f1395664e5	Remove the function tcp_twrecycleable; it has been #if 0'd for eight years. The original concept was to improve the corner case where you run out of ephemeral ports, but it was causing performance problems and the mechanism of limiting the number of time_wait sockets serves the same purpose in the end. Reviewed by: bz	2014-05-16 01:38:38 +00:00
Gleb Smirnoff	c669105d17	- Remove net.inet.tcp.reass.overflows sysctl. It counts exactly same events that tcpstat's tcps_rcvmemdrop counter counts. - Rename tcps_rcvmemdrop to tcps_rcvreassfull and improve its description in netstat(1) output. Sponsored by: Netflix Sponsored by: Nginx, Inc.	2014-05-06 00:00:07 +00:00
Gleb Smirnoff	e407b67be4	The FreeBSD-SA-14:08.tcp was a lesson on not doing acrobatics with mixing on stack memory and UMA memory in one linked list. Thus, rewrite TCP reassembly code in terms of memory usage. The algorithm remains unchanged. We actually do not need extra memory to build a reassembly queue. Arriving mbufs are always packet header mbufs. So we got the length of data as pkthdr.len. We got m_nextpkt for linkage. And we need only one pointer to point at the tcphdr, use PH_loc for that. In tcpcb the t_segq fields becomes mbuf pointer. The t_segqlen field now counts not packets, but bytes in the queue. This gives us more precision when comparing to socket buffer limits. Sponsored by: Netflix Sponsored by: Nginx, Inc.	2014-05-04 23:25:32 +00:00
John Baldwin	66eefb1eae	Currently, the TCP slow timer can starve TCP input processing while it walks the list of connections in TIME_WAIT closing expired connections due to contention on the global TCP pcbinfo lock. To remediate, introduce a new global lock to protect the list of connections in TIME_WAIT. Only acquire the TCP pcbinfo lock when closing an expired connection. This limits the window of time when TCP input processing is stopped to the amount of time needed to close a single connection. Submitted by: Julien Charbon <jcharbon@verisign.com> Reviewed by: rwatson, rrs, adrian MFC after: 2 months	2014-04-10 18:15:35 +00:00
John Baldwin	5b26ea5df3	Remove more constants related to static sysctl nodes. The MAXID constants were primarily used to size the sysctl name list macros that were removed in r254295. A few other constants either did not have an associated sysctl node, or the associated node used OID_AUTO instead. PR: ports/184525 (exp-run)	2014-02-25 18:44:33 +00:00
Bjoern A. Zeeb	a5f44cd7a1	Introduce spares in the TCP syncache and timewait structures so that fixed TCP_SIGNATURE handling can later be merged. This is derived from follow-up work to SVN r183001 posted to net@ on Sep 13 2008. Approved by: re (gjb)	2013-09-21 10:01:51 +00:00

1 2 3 4 5 ...

261 Commits