freebsd-nq

Author	SHA1	Message	Date
Maxim Konovalov	10fe523e99	o Unbreak "options TCPDEBUG" && "nooptions INET6" kernel build. PR: kern/112517 Submitted by: vd	2007-05-09 06:09:40 +00:00
Andre Oppermann	3529149e9a	Use existing TF_SACK_PERMIT flag in struct tcpcb t_flags field instead of a decdicated sack_enable int for this bool. Change all users accordingly.	2007-05-06 15:56:31 +00:00
Andre Oppermann	0ca3f933eb	o Remove redundant tcp reassembly check in header prediction code o Rearrange code to make intent in TCPS_SYN_SENT case more clear o Assorted style cleanup o Comment clarification for tcp_dropwithreset()	2007-05-06 15:41:06 +00:00
Andre Oppermann	c5ad39b910	Reorder the TCP header prediction test to check for the most volatile values first to spend less time on a fallback to normal processing.	2007-05-06 15:23:51 +00:00
Andre Oppermann	679d9708b6	Remove the defunct remains of the TCPS_TIME_WAIT cases from tcp_do_segment and change it to a void function. We use a compressed structure for TCPS_TIME_WAIT to save memory. Any late late segments arriving for such a connection is handled directly in the TW code.	2007-05-06 15:16:05 +00:00
Robert Watson	1cd6eadfbb	Tweak comment at end of tcp_input() when calling into tcp_do_segment(): the pcbinfo lock will be released as well, not just the pcb lock.	2007-05-04 17:45:52 +00:00
Andre Oppermann	9fa198bead	o Fix INP lock leak in the minttl case o Remove indirection in the decision of unlocking inp o Further annotation of locking in tcp_input()	2007-04-23 19:41:47 +00:00
Andre Oppermann	df47e4377b	o Remove unncessary TOF_SIGLEN flag from struct tcpopt o Correctly set to->to_signature in tcp_dooptions() o Update comments	2007-04-20 15:28:01 +00:00
Andre Oppermann	7824d002c0	Add more KASSERT's.	2007-04-20 15:21:29 +00:00
Andre Oppermann	4d6e713043	Remove bogus check for accept queue length and associated failure handling from the incoming SYN handling section of tcp_input(). Enforcement of the accept queue limits is done by sonewconn() after the 3WHS is completed. It is not necessary to have an earlier check before a connection request enters the SYN cache awaiting the full handshake. It rather limits the effectiveness of the syncache by preventing legit and illegit connections from entering it and having them shaken out before we hit the real limit which may have vanished by then. Change return value of syncache_add() to void. No status communication is required.	2007-04-20 14:34:54 +00:00
Andre Oppermann	e207f80039	Simplifly syncache_expand() and clarify its semantics. Zero is returned when the ACK is invalid and doesn't belong to any registered connection, either in syncache or through SYN cookies. True but a NULL struct socket is returned when the 3WHS completed but the socket could not be created due to insufficient resources or limits reached. For both cases an RST is sent back in tcp_input(). A logic error leading to a panic is fixed where syncache_expand() would free the mbuf on socket allocation failure but tcp_input() later supplies it to tcp_dropwithreset() to issue a RST to the peer. Reported by: kris (the panic)	2007-04-20 13:51:34 +00:00
Robert Watson	215c8d75b8	Remove unused variable tcbinfo_mtx.	2007-04-15 21:03:23 +00:00
Andre Oppermann	b8152ba793	Change the TCP timer system from using the callout system five times directly to a merged model where only one callout, the next to fire, is registered. Instead of callout_reset(9) and callout_stop(9) the new function tcp_timer_activate() is used which then internally manages the callout. The single new callout is a mutex callout on inpcb simplifying the locking a bit. tcp_timer() is the called function which handles all race conditions in one place and then dispatches the individual timer functions. Reviewed by: rwatson (earlier version)	2007-04-11 09:45:16 +00:00
Andre Oppermann	995a77176f	Add INP_INFO_UNLOCK_ASSERT() and use it in tcp_input(). Also add some further INP_INFO_WLOCK_ASSERT() while there.	2007-04-04 18:30:16 +00:00
Andre Oppermann	0c38fd0a7a	Move last tcpcb initialization for the inbound connection case from tcp_input() to syncache_socket() where it belongs and the majority of it already happens. The "tp->snd_up = tp->snd_una" is removed as it is done with the tcp_sendseqinit() macro a few lines earlier.	2007-04-04 16:13:45 +00:00
Andre Oppermann	5dd9dfefd6	Retire unused TCP_SACK_DEBUG.	2007-04-04 14:44:15 +00:00
Andre Oppermann	b728e90260	In tcp_dooptions() skip over SACK options if it is a SYN segment.	2007-04-04 14:39:49 +00:00
Andre Oppermann	1929eae1cc	When blackholing do a 'dropunlock' in the new world order to prevent the INP_INFO_LOCK from leaking. Reported by: ache Found by: rwatson	2007-03-28 12:58:13 +00:00
Maxim Konovalov	14739780bd	o Use a define for a buffer size. Prodded by: db o Add missed vars for TCPDEBUG in tcp_do_segment(). Prodded by: tinderbox	2007-03-24 22:15:02 +00:00
Andre Oppermann	302ce8d690	Split tcp_input() into its two functional parts: o tcp_input() now handles TCP segment sanity checks and preparations including the INPCB lookup and syncache. o tcp_do_segment() handles all data and ACK processing and is IPv4/v6 agnostic. Change all KASSERT() messages to ("%s: ", __func__). The changes in this commit are primarily of mechanical nature and no functional changes besides the function split are made. Discussed with: rwatson	2007-03-23 20:16:50 +00:00
Andre Oppermann	4dfdffe9e2	Tidy up some code to conform better to surroundings and style(9), 0 = NULL and space/tab.	2007-03-23 19:11:22 +00:00
Andre Oppermann	fc30a25199	Bring SACK option handling in tcp_dooptions() in line with all other options and ajust users accordingly.	2007-03-23 18:33:21 +00:00
Andre Oppermann	ad3f9ab320	ANSIfy function declarations and remove register keywords for variables. Consistently apply style to all function declarations.	2007-03-21 19:37:55 +00:00
Andre Oppermann	b10fbdeafa	Tidy up IPFIREWALL_FORWARD sections and comments.	2007-03-21 18:56:03 +00:00
Andre Oppermann	794235b737	Update and clarify comments in first section of tcp_input().	2007-03-21 18:52:58 +00:00
Andre Oppermann	db33b3e6a7	Tidy up the ACCEPTCONN section of tcp_input(), ajust comments and remove old dead T/TCP code.	2007-03-21 18:49:43 +00:00
Andre Oppermann	574b696407	Tidy up tcp_log_in_vain and blackhole.	2007-03-21 18:36:49 +00:00
Andre Oppermann	85c497918c	Make TCP_DROP_SYNFIN a standard part of TCP. Disabled by default it doesn't impede normal operation negatively and is only a few lines of code. It's close relatives blackhole and log_in_vain aren't options either.	2007-03-21 18:25:28 +00:00
Andre Oppermann	e406f5a1c9	Remove tcp_minmssoverload DoS detection logic. The problem it tried to protect us from wasn't really there and it only bloats the code. Should the problem surface in the future we can simply resurrect it from cvs history.	2007-03-21 18:05:54 +00:00
Andre Oppermann	6489fe6553	Match up SYSCTL declaration style.	2007-03-19 19:00:51 +00:00
Andre Oppermann	02a1a64357	Consolidate insertion of TCP options into a segment from within tcp_output() and syncache_respond() into its own generic function tcp_addoptions(). tcp_addoptions() is alignment agnostic and does optimal packing in all cases. In struct tcpopt rename to_requested_s_scale to just to_wscale. Add a comment with quote from RFC1323: "The Window field in a SYN (i.e., a <SYN> or <SYN,ACK>) segment itself is never scaled." Reviewed by: silby, mohans, julian Sponsored by: TCP/IP Optimization Fundraise 2005	2007-03-15 15:59:28 +00:00
Qing Li	95ad8418dc	This patch is provided to fix a couple of deployment issues observed in the field. In one situation, one end of the TCP connection sends a back-to-back RST packet, with delayed ack, the last_ack_sent variable has not been update yet. When tcp_insecure_rst is turned off, the code treats the RST as invalid because last_ack_sent instead of rcv_nxt is compared against th_seq. Apparently there is some kind of firewall that sits in between the two ends and that RST packet is the only RST packet received. With short lived HTTP connections, the symptom is a large accumulation of connections over a short period of time . The +/-(1) factor is to take care of implementations out there that generate RST packets with these types of sequence numbers. This behavior has also been observed in live environments. Reviewed by: silby, Mike Karels MFC after: 1 week	2007-03-07 23:21:59 +00:00
Mohan Srinivasan	4a32dc299f	In the SYN_SENT case, Initialize the snd_wnd before the call to tcp_mss(). The TCP hostcache logic in tcp_mss() depends on the snd_wnd being initialized.	2007-02-28 20:48:00 +00:00
Mohan Srinivasan	7c72af8770	Reap FIN_WAIT_2 connections marked SOCANTRCVMORE faster. This mitigate potential issues where the peer does not close, potentially leaving thousands of connections in FIN_WAIT_2. This is controlled by a new sysctl fast_finwait2_recycle, which is disabled by default. Reviewed by: gnn, silby.	2007-02-26 22:25:21 +00:00
Robert Watson	afdb42748d	Rename two identically named log_in_vain variables: tcp_input.c's static log_in_vain to tcp_log_in_vain, and udp_usrreq's global log_in_vain to udp_log_in_vain. MFC after: 1 week	2007-02-20 10:20:03 +00:00
Andre Oppermann	6741ecf595	Auto sizing TCP socket buffers. Normally the socket buffers are static (either derived from global defaults or set with setsockopt) and do not adapt to real network conditions. Two things happen: a) your socket buffers are too small and you can't reach the full potential of the network between both hosts; b) your socket buffers are too big and you waste a lot of kernel memory for data just sitting around. With automatic TCP send and receive socket buffers we can start with a small buffer and quickly grow it in parallel with the TCP congestion window to match real network conditions. FreeBSD has a default 32K send socket buffer. This supports a maximal transfer rate of only slightly more than 2Mbit/s on a 100ms RTT trans-continental link. Or at 200ms just above 1Mbit/s. With TCP send buffer auto scaling and the default values below it supports 20Mbit/s at 100ms and 10Mbit/s at 200ms. That's an improvement of factor 10, or 1000%. For the receive side it looks slightly better with a default of 64K buffer size. New sysctls are: net.inet.tcp.sendbuf_auto=1 (enabled) net.inet.tcp.sendbuf_inc=8192 (8K, step size) net.inet.tcp.sendbuf_max=262144 (256K, growth limit) net.inet.tcp.recvbuf_auto=1 (enabled) net.inet.tcp.recvbuf_inc=16384 (16K, step size) net.inet.tcp.recvbuf_max=262144 (256K, growth limit) Tested by: many (on HEAD and RELENG_6) Approved by: re MFC after: 1 month	2007-02-01 18:32:13 +00:00
Bjoern A. Zeeb	1d54aa3ba9	MFp4: 92972, 98913 + one more change In ip6_sprintf no longer use and return one of eight static buffers for printing/logging ipv6 addresses. The caller now has to hand in a sufficiently large buffer as first argument.	2006-12-12 12:17:58 +00:00
Robert Watson	aed5570872	Complete break-out of sys/sys/mac.h into sys/security/mac/mac_framework.h begun with a repo-copy of mac.h to mac_framework.h. sys/mac.h now contains the userspace and user<->kernel API and definitions, with all in-kernel interfaces moved to mac_framework.h, which is now included across most of the kernel instead. This change is the first step in a larger cleanup and sweep of MAC Framework interfaces in the kernel, and will not be MFC'd. Obtained from: TrustedBSD Project Sponsored by: SPARTA	2006-10-22 11:52:19 +00:00
John-Mark Gurney	e16fa5ca55	fix calculating to_tsecr... This prevents the rtt calculations from going all wonky...	2006-09-26 01:21:46 +00:00
Bruce M Simpson	f1edc3bde5	Always set the IP version in the TCP input path, to preserve the header field for possible later IPSEC SPD lookup, even when the kernel is built without 'options INET6'. PR: kern/57760 MFC after: 1 week Submitted by: Joachim Schueth	2006-09-23 16:26:31 +00:00
Andre Oppermann	bf6d304ab2	Rewrite of TCP syncookies to remove locking requirements and to enhance functionality: - Remove a rwlock aquisition/release per generated syncookie. Locking is now integrated with the bucket row locking of syncache itself and syncookies no longer add any additional lock overhead. - Syncookie secrets are different for and stored per syncache buck row. Secrets expire after 16 seconds and are reseeded on-demand. - The computational overhead for syncookie generation and verification is one MD5 hash computation as before. - Syncache can be turned off and run with syncookies only by setting the sysctl net.inet.tcp.syncookies_only=1. This implementation extends the orginal idea and first implementation of FreeBSD by using not only the initial sequence number field to store information but also the timestamp field if present. This way we can keep track of the entire state we need to know to recreate the session in its original form. Almost all TCP speakers implement RFC1323 timestamps these days. For those that do not we still have to live with the known shortcomings of the ISN only SYN cookies. The use of the timestamp field causes the timestamps to be randomized if syncookies are enabled. The idea of SYN cookies is to encode and include all necessary information about the connection setup state within the SYN-ACK we send back and thus to get along without keeping any local state until the ACK to the SYN-ACK arrives (if ever). Everything we need to know should be available from the information we encoded in the SYN-ACK. A detailed description of the inner working of the syncookies mechanism is included in the comments in tcp_syncache.c. Reviewed by: silby (slightly earlier version) Sponsored by: TCP/IP Optimization Fundraise 2005	2006-09-13 13:08:27 +00:00
Ruslan Ermilov	751dea2935	Back when we had T/TCP support, we used to apply different timeouts for TCP and T/TCP connections in the TIME_WAIT state, and we had two separate timed wait queues for them. Now that is has gone, the timeout is always 2*MSL again, and there is no reason to keep two queues (the first was unused anyway!). Also, reimplement the remaining queue using a TAILQ (it was technically impossible before, with two queues).	2006-09-07 13:06:00 +00:00
Andre Oppermann	233dcce118	First step of TSO (TCP segmentation offload) support in our network stack. o add IFCAP_TSO[46] for drivers to announce this capability for IPv4 and IPv6 o add CSUM_TSO flag to mbuf pkthdr csum_flags field o add tso_segsz field to mbuf pkthdr o enhance ip_output() packet length check to allow for large TSO packets o extend tcp_maxmtu[46]() with a flag pointer to pass interface capabilities o adjust all callers of tcp_maxmtu[46]() accordingly Discussed on: -current, -net Sponsored by: TCP/IP Optimization Fundraise 2005	2006-09-06 21:51:59 +00:00
Mohan Srinivasan	464469c713	Fixes an edge case bug in timewait handling where ticks rolling over causing the timewait expiry to be exactly 0 corrupts the timewait queues (and that entry). Reviewed by: silby	2006-08-11 21:15:23 +00:00
Bjoern A. Zeeb	421d8aa603	Use INPLOOKUP_WILDCARD instead of just 1 more consistently. OKed by: rwatson (some weeks ago)	2006-06-29 10:49:49 +00:00
Andre Oppermann	8bfb19180d	Some cleanups and janitorial work to tcp_syncache: o don't assign remote/local host/port information manually between provided struct in_conninfo and struct syncache, bcopy() it instead o rename sc_tsrecent to sc_tsreflect in struct syncache to better capture the purpose of this field o rename sc_request_r_scale to sc_requested_r_scale for ditto reasons o fix IPSEC error case printf's to report correct function name o in syncache_socket() only transpose enhanced tcp options parameters to struct tcpcb when the inpcb doesn't has TF_NOOPT set o in syncache_respond() reorder stack variables o in syncache_respond() remove bogus KASSERT() No functional changes. Sponsored by: TCP/IP Optimization Fundraise 2005	2006-06-26 16:14:19 +00:00
Andre Oppermann	f72167f4d1	Some cleanups and janitorial work to tcp_dooptions(): o redefine the parameter 'is_syn' to 'flags', add TO_SYN flag and adjust its usage accordingly o update the comments to the tcp_dooptions() invocation in tcp_input():after_listen to reflect reality o move the logic checking the echoed timestamp out of tcp_dooptions() to the only place that uses it next to the invocation described in the previous item o adjust parsing of TCPOPT_SACK_PERMITTED to use the same style as the others o add comments in to struct tcpopt.to_flags #defines No functional changes. Sponsored by: TCP/IP Optimization Fundraise 2005	2006-06-26 15:35:25 +00:00
David Malone	5e1aa27995	When we receive an out-of-window SYN for an "ESTABLISHED" connection, ACK the SYN as required by RFC793, rather than ignoring it. NetBSD have had a similar change since 1999. PR: 93236 Submitted by: Grant Edwards <grante@visi.com> MFC after: 1 month	2006-06-19 12:33:52 +00:00
Andre Oppermann	351630c40d	Add locking to TCP syncache and drop the global tcpinfo lock as early as possible for the syncache_add() case. The syncache timer no longer aquires the tcpinfo lock and timeout/retransmit runs can happen in parallel with bucket granularity. On a P4 the additional locks cause a slight degression of 0.7% in tcp connections per second. When IP and TCP input are deserialized and can run in parallel this little overhead can be neglected. The syncookie handling still leaves room for improvement and its random salts may be moved to the syncache bucket head structures to remove the second lock operation currently required for it. However this would be a more involved change from the way syncookies work at the moment. Reviewed by: rwatson Tested by: rwatson, ps (earlier version) Sponsored by: TCP/IP Optimization Fundraise 2005	2006-06-17 17:32:38 +00:00
Paul Saab	4f590175b7	Allow for nmbclusters and maxsockets to be increased via sysctl. An eventhandler is used to update all the various zones that depend on these values.	2006-04-21 09:25:40 +00:00

1 2 3 4 5 ...

344 Commits