freebsd-dev

Author	SHA1	Message	Date
Robert Watson	cce83ffb5a	tcp_timewait() performs multiple non-atomic reads on the tcptw structure, so assert the inpcb lock associated with the tcptw. Also assert the tcbinfo lock, as tcp_timewait() may call tcp_twclose() or tcp_2msl_rest(), which require it. Since tcp_timewait() is already called with that lock from tcp_input(), this doesn't change current locking, merely documents reasons for it. In tcp_twstart(), assert the tcbinfo lock, as tcp_timer_2msl_rest() is called, which requires that lock. In tcp_twclose(), assert the tcbinfo lock, as tcp_timer_2msl_stop() is called, which requires that lock. Document the locking strategy for the time wait queues in tcp_timer.c, which consists of protecting the time wait queues in the same manner as the tcbinfo structure (using the tcbinfo lock). In tcp_timer_2msl_reset(), assert the tcbinfo lock, as the time wait queues are modified. In tcp_timer_2msl_stop(), assert the tcbinfo lock, as the time wait queues may be modified. In tcp_timer_2msl_tw(), assert the tcbinfo lock, as the time wait queues may be modified. MFC after: 2 weeks	2004-11-23 17:21:30 +00:00
Robert Watson	ca127a3e80	Remove "Unlocked read" annotations associated with previously unlocked use of socket buffer fields in the TCP input code. These references are now protected by use of the receive socket buffer lock. MFC after: 1 week	2004-11-22 13:16:27 +00:00
Robert Watson	d6915262af	Do some re-sorting of TCP pcbinfo locking and assertions: make sure to retain the pcbinfo lock until we're done using a pcb in the in-bound path, as the pcbinfo lock acts as a pseuo-reference to prevent the pcb from potentially being recycled. Clean up assertions and make sure to assert that the pcbinfo is locked at the head of code subsections where it is needed. Free the mbuf at the end of tcp_input after releasing any held locks to reduce the time the locks are held. MFC after: 3 weeks	2004-11-07 19:19:35 +00:00
Andre Oppermann	c94c54e4df	Remove RFC1644 T/TCP support from the TCP side of the network stack. A complete rationale and discussion is given in this message and the resulting discussion: http://docs.freebsd.org/cgi/mid.cgi?4177C8AD.6060706 Note that this commit removes only the functional part of T/TCP from the tcp_* related functions in the kernel. Other features introduced with RFC1644 are left intact (socket layer changes, sendmsg(2) on connection oriented protocols) and are meant to be reused by a simpler and less intrusive reimplemention of the previous T/TCP functionality. Discussed on: -arch	2004-11-02 22:22:22 +00:00
Paul Saab	a55db2b6e6	- Estimate the amount of data in flight in sack recovery and use it to control the packets injected while in sack recovery (for both retransmissions and new data). - Cleanups to the sack codepaths in tcp_output.c and tcp_sack.c. - Add a new sysctl (net.inet.tcp.sack.initburst) that controls the number of sack retransmissions done upon initiation of sack recovery. Submitted by: Mohan Srinivasan <mohans@yahoo-inc.com>	2004-10-05 18:36:24 +00:00
Andre Oppermann	9b932e9e04	Convert ipfw to use PFIL_HOOKS. This is change is transparent to userland and preserves the ipfw ABI. The ipfw core packet inspection and filtering functions have not been changed, only how ipfw is invoked is different. However there are many changes how ipfw is and its add-on's are handled: In general ipfw is now called through the PFIL_HOOKS and most associated magic, that was in ip_input() or ip_output() previously, is now done in ipfw_check_[in\|out]() in the ipfw PFIL handler. IPDIVERT is entirely handled within the ipfw PFIL handlers. A packet to be diverted is checked if it is fragmented, if yes, ip_reass() gets in for reassembly. If not, or all fragments arrived and the packet is complete, divert_packet is called directly. For 'tee' no reassembly attempt is made and a copy of the packet is sent to the divert socket unmodified. The original packet continues its way through ip_input/output(). ipfw 'forward' is done via m_tag's. The ipfw PFIL handlers tag the packet with the new destination sockaddr_in. A check if the new destination is a local IP address is made and the m_flags are set appropriately. ip_input() and ip_output() have some more work to do here. For ip_input() the m_flags are checked and a packet for us is directly sent to the 'ours' section for further processing. Destination changes on the input path are only tagged and the 'srcrt' flag to ip_forward() is set to disable destination checks and ICMP replies at this stage. The tag is going to be handled on output. ip_output() again checks for m_flags and the 'ours' tag. If found, the packet will be dropped back to the IP netisr where it is going to be picked up by ip_input() again and the directly sent to the 'ours' section. When only the destination changes, the route's 'dst' is overwritten with the new destination from the forward m_tag. Then it jumps back at the route lookup again and skips the firewall check because it has been marked with M_SKIP_FIREWALL. ipfw 'forward' has to be compiled into the kernel with 'option IPFIREWALL_FORWARD' to enable it. DUMMYNET is entirely handled within the ipfw PFIL handlers. A packet for a dummynet pipe or queue is directly sent to dummynet_io(). Dummynet will then inject it back into ip_input/ip_output() after it has served its time. Dummynet packets are tagged and will continue from the next rule when they hit the ipfw PFIL handlers again after re-injection. BRIDGING and IPFW_ETHER are not changed yet and use ipfw_chk() directly as they did before. Later this will be changed to dedicated ETHER PFIL_HOOKS. More detailed changes to the code: conf/files Add netinet/ip_fw_pfil.c. conf/options Add IPFIREWALL_FORWARD option. modules/ipfw/Makefile Add ip_fw_pfil.c. net/bridge.c Disable PFIL_HOOKS if ipfw for bridging is active. Bridging ipfw is still directly invoked to handle layer2 headers and packets would get a double ipfw when run through PFIL_HOOKS as well. netinet/ip_divert.c Removed divert_clone() function. It is no longer used. netinet/ip_dummynet.[ch] Neither the route 'ro' nor the destination 'dst' need to be stored while in dummynet transit. Structure members and associated macros are removed. netinet/ip_fastfwd.c Removed all direct ipfw handling code and replace it with the new 'ipfw forward' handling code. netinet/ip_fw.h Removed 'ro' and 'dst' from struct ip_fw_args. netinet/ip_fw2.c (Re)moved some global variables and the module handling. netinet/ip_fw_pfil.c New file containing the ipfw PFIL handlers and module initialization. netinet/ip_input.c Removed all direct ipfw handling code and replace it with the new 'ipfw forward' handling code. ip_forward() does not longer require the 'next_hop' struct sockaddr_in argument. Disable early checks if 'srcrt' is set. netinet/ip_output.c Removed all direct ipfw handling code and replace it with the new 'ipfw forward' handling code. netinet/ip_var.h Add ip_reass() as general function. (Used from ipfw PFIL handlers for IPDIVERT.) netinet/raw_ip.c Directly check if ipfw and dummynet control pointers are active. netinet/tcp_input.c Rework the 'ipfw forward' to local code to work with the new way of forward tags. netinet/tcp_sack.c Remove include 'opt_ipfw.h' which is not needed here. sys/mbuf.h Remove m_claim_next() macro which was exclusively for ipfw 'forward' and is no longer needed. Approved by: re (scottl)	2004-08-17 22:05:54 +00:00
Robert Watson	a4f757cd5d	White space cleanup for netinet before branch: - Trailing tab/space cleanup - Remove spurious spaces between or before tabs This change avoids touching files that Andre likely has in his working set for PFIL hooks changes for IPFW/DUMMYNET. Approved by: re (scottl) Submitted by: Xin LI <delphij@frontfree.net>	2004-08-16 18:32:07 +00:00
Robert Watson	7cfc690440	After each label in tcp_input(), assert the inpcbinfo and inpcb lock state that we expect.	2004-07-12 19:28:07 +00:00
Jayanth Vijayaraghavan	a0445c2e2c	On receiving 3 duplicate acknowledgements, SACK recovery was not being entered correctly. Fix this problem by separating out the SACK and the newreno cases. Also, check if we are in FASTRECOVERY for the sack case and if so, turn off dupacks. Fix an issue where the congestion window was not being incremented by ssthresh. Thanks to Mohan Srinivasan for finding this problem.	2004-07-01 23:34:06 +00:00
Robert Watson	1e4d7da707	Reduce the number of unnecessary unlock-relocks on socket buffer mutexes associated with performing a wakeup on the socket buffer: - When performing an sbappend*() followed by a so[rw]wakeup(), explicitly acquire the socket buffer lock and use the _locked() variants of both calls. Note that the _locked() sowakeup() versions unlock the mutex on return. This is done in uipc_send(), divert_packet(), mroute socket_send(), raw_append(), tcp_reass(), tcp_input(), and udp_append(). - When the socket buffer lock is dropped before a sowakeup(), remove the explicit unlock and use the _locked() sowakeup() variant. This is done in soisdisconnecting(), soisdisconnected() when setting the can't send/ receive flags and dropping data, and in uipc_rcvd() which adjusting back-pressure on the sockets. For UNIX domain sockets running mpsafe with a contention-intensive SMP mysql benchmark, this results in a 1.6% query rate improvement due to reduce mutex costs.	2004-06-26 19:10:39 +00:00
Paul Saab	652178a12a	White space & spelling fixes Submitted by: Xin LI <delphij@frontfree.net>	2004-06-25 04:11:26 +00:00
Robert Watson	5905999b2f	Broaden scope of the socket buffer lock when processing an ACK so that the read and write of sb_cc are atomic. Call sbdrop_locked() instead of sbdrop() since we already hold the socket buffer lock.	2004-06-24 03:07:27 +00:00
Robert Watson	927c5cea3f	Protect so_oobmark with with SOCKBUF_LOCK(&so->so_rcv), and broaden locking in tcp_input() for TCP packets with urgent data pointers to hold the socket buffer lock across testing and updating oobmark from just protecting sb_state. Update socket locking annotations	2004-06-24 02:57:12 +00:00
Robert Watson	3f11a2f374	Introduce sbreserve_locked(), which asserts the socket buffer lock on the socket buffer having its limits adjusted. sbreserve() now acquires the lock before calling sbreserve_locked(). In soreserve(), acquire socket buffer locks across read-modify-writes of socket buffer fields, and calls into sbreserve/sbrelease; make sure to acquire in keeping with the socket buffer lock order. In tcp_mss(), acquire the socket buffer lock in the calling context so that we have atomic read-modify -write on buffer sizes.	2004-06-24 01:37:04 +00:00
Paul Saab	6d90faf3d8	Add support for TCP Selective Acknowledgements. The work for this originated on RELENG_4 and was ported to -CURRENT. The scoreboarding code was obtained from OpenBSD, and many of the remaining changes were inspired by OpenBSD, but not taken directly from there. You can enable/disable sack using net.inet.tcp.do_sack. You can also limit the number of sack holes that all senders can have in the scoreboard with net.inet.tcp.sackhole_limit. Reviewed by: gnn Obtained from: Yahoo! (Mohan Srinivasan, Jayanth Vijayaraghavan)	2004-06-23 21:04:37 +00:00
Robert Watson	1f82efb3b7	Assert the inpcb lock before letting MAC check whether we can deliver to the inpcb in tcp_input().	2004-06-20 20:17:29 +00:00
Bruce M Simpson	d420fcda27	Fix build for IPSEC && !INET6 PR: kern/66125 Submitted by: Cyrille Lefevre	2004-06-16 09:35:07 +00:00
Robert Watson	7721f5d760	Grab the socket buffer send or receive mutex when performing a read-modify-write on the sb_state field. This commit catches only the "easy" ones where it doesn't interact with as yet unmerged locking.	2004-06-15 03:51:44 +00:00
Robert Watson	c0b99ffa02	The socket field so_state is used to hold a variety of socket related flags relating to several aspects of socket functionality. This change breaks out several bits relating to send and receive operation into a new per-socket buffer field, sb_state, in order to facilitate locking. This is required because, in order to provide more granular locking of sockets, different state fields have different locking properties. The following fields are moved to sb_state: SS_CANTRCVMORE (so_state) SS_CANTSENDMORE (so_state) SS_RCVATMARK (so_state) Rename respectively to: SBS_CANTRCVMORE (so_rcv.sb_state) SBS_CANTSENDMORE (so_snd.sb_state) SBS_RCVATMARK (so_rcv.sb_state) This facilitates locking by isolating fields to be located with other identically locked fields, and permits greater granularity in socket locking by avoiding storing fields with different locking semantics in the same short (avoiding locking conflicts). In the future, we may wish to coallesce sb_state and sb_flags; for the time being I leave them separate and there is no additional memory overhead due to the packing/alignment of shorts in the socket buffer structure.	2004-06-14 18:16:22 +00:00
Robert Watson	310e7ceb94	Socket MAC labels so_label and so_peerlabel are now protected by SOCK_LOCK(so): - Hold socket lock over calls to MAC entry points reading or manipulating socket labels. - Assert socket lock in MAC entry point implementations. - When externalizing the socket label, first make a thread-local copy while holding the socket lock, then release the socket lock to externalize to userspace.	2004-06-13 02:50:07 +00:00
Darren Reed	2f3f1e6773	Rename m_claim_next_hop() to m_claim_next(), as suggested by Max Laier.	2004-05-02 15:10:17 +00:00
Darren Reed	7fbb130049	oops, I forgot this file in a prior commit (change was still sitting here, uncommitted): Rename ip_claim_next_hop() to m_claim_next_hop(), give it an extra arg (the type of tag to claim) and push it out of ip_var.h into mbuf.h alongside all of the other macros that work ok mbuf's and tag's.	2004-05-02 15:07:37 +00:00
Mike Silbersack	80dd2a81fb	Tighten up reset handling in order to make reset attacks as difficult as possible while maintaining compatibility with the widest range of TCP stacks. The algorithm is as follows: --- For connections in the ESTABLISHED state, only resets with sequence numbers exactly matching last_ack_sent will cause a reset, all other segments will be silently dropped. For connections in all other states, a reset anywhere in the window will cause the connection to be reset. All other segments will be silently dropped. --- The necessity of accepting all in-window resets was discovered by jayanth and jlemon, both of whom have seen TCP stacks that will respond to FIN-ACK packets with resets not meeting the strict last_ack_sent check. Idea by: Darren Reed Reviewed by: truckman, jlemon, others(?)	2004-04-26 02:56:31 +00:00
Andre Oppermann	2d166c0202	Correct an edge case in tcp_mss() where the cached path MTU from tcp_hostcache would have overridden a (now) lower MTU of an interface or route that changed since first PMTU discovery. The bug would have caused TCP to redo the PMTU discovery when not strictly necessary. Make a comment about already pre-initialized default values more clear. Reviewed by: sam	2004-04-23 22:44:59 +00:00
Warner Losh	f36cfd49ad	Remove advertising clause from University of California Regent's license, per letter dated July 22, 1999 and email from Peter Wemm, Alan Cox and Robert Watson. Approved by: core, peter, alc, rwatson	2004-04-07 20:46:16 +00:00
Hajimu UMEMOTO	04d3a45241	fix -O0 compilation without INET6. Pointed out by: ru	2004-03-01 19:10:31 +00:00
Robert Watson	a7b6a14aee	Remove now unneeded arguments to tcp_twrespond() -- so and msrc. These were needed by the MAC Framework until inpcbs gained labels. Submitted by: sam	2004-02-28 15:12:20 +00:00
Max Laier	ac9d7e2618	Re-remove MT_TAGs. The problems with dummynet have been fixed now. Tested by: -current, bms(mentor), me Approved by: bms(mentor), sam	2004-02-25 19:55:29 +00:00
Jeffrey Hsu	89c02376fc	Relax a KASSERT condition to allow for a valid corner case where the FIN on the last segment consumes an extra sequence number. Spurious panic reported by Mike Silbersack <silby@silby.com>.	2004-02-25 08:53:17 +00:00
Andre Oppermann	12e2e97051	Convert the tcp segment reassembly queue to UMA and limit the maximum amount of segments it will hold. The following tuneables and sysctls control the behaviour of the tcp segment reassembly queue: net.inet.tcp.reass.maxsegments (loader tuneable) specifies the maximum number of segments all tcp reassemly queues can hold (defaults to 1/16 of nmbclusters). net.inet.tcp.reass.maxqlen specifies the maximum number of segments any individual tcp session queue can hold (defaults to 48). net.inet.tcp.reass.cursegments (readonly) counts the number of segments currently in all reassembly queues. net.inet.tcp.reass.overflows (readonly) counts how often either the global or local queue limit has been reached. Tested by: bms, silby Reviewed by: bms, silby	2004-02-24 15:27:41 +00:00
Max Laier	36e8826ffb	Backout MT_TAG removal (i.e. bring back MT_TAGs) for now, as dummynet is not working properly with the patch in place. Approved by: bms(mentor)	2004-02-18 00:04:52 +00:00
Hajimu UMEMOTO	da0f40995d	IPSEC and FAST_IPSEC have the same internal API now; so merge these (IPSEC has an extra ipsecstat) Submitted by: "Bjoern A. Zeeb" <bzeeb+freebsd@zabbadoz.net>	2004-02-17 14:02:37 +00:00
Max Laier	1094bdca51	This set of changes eliminates the use of MT_TAG "pseudo mbufs", replacing them mostly with packet tags (one case is handled by using an mbuf flag since the linkage between "caller" and "callee" is direct and there's no need to incur the overhead of a packet tag). This is (mostly) work from: sam Silence from: -arch Approved by: bms(mentor), sam, rwatson	2004-02-13 19:14:16 +00:00
Bruce M Simpson	265ed01285	Brucification. Submitted by: bde	2004-02-13 18:21:45 +00:00
Bruce M Simpson	a0194ef1ea	Remove an unnecessary initialization that crept in from the code which verifies TCP-MD5 digests. Noticed by: njl	2004-02-12 20:08:28 +00:00
Bruce M Simpson	1cfd4b5326	Initial import of RFC 2385 (TCP-MD5) digest support. This is the first of two commits; bringing in the kernel support first. This can be enabled by compiling a kernel with options TCP_SIGNATURE and FAST_IPSEC. For the uninitiated, this is a TCP option which provides for a means of authenticating TCP sessions which came into being before IPSEC. It is still relevant today, however, as it is used by many commercial router vendors, particularly with BGP, and as such has become a requirement for interconnect at many major Internet points of presence. Several parts of the TCP and IP headers, including the segment payload, are digested with MD5, including a shared secret. The PF_KEY interface is used to manage the secrets using security associations in the SADB. There is a limitation here in that as there is no way to map a TCP flow per-port back to an SPI without polluting tcpcb or using the SPD; the code to do the latter is unstable at this time. Therefore this code only supports per-host keying granularity. Whilst FAST_IPSEC is mutually exclusive with KAME IPSEC (and thus IPv6), TCP_SIGNATURE applies only to IPv4. For the vast majority of prospective users of this feature, this will not pose any problem. This implementation is output-only; that is, the option is honoured when responding to a host initiating a TCP session, but no effort is made [yet] to authenticate inbound traffic. This is, however, sufficient to interwork with Cisco equipment. Tested with a Cisco 2501 running IOS 12.0(27), and Quagga 0.96.4 with local patches. Patches for tcpdump to validate TCP-MD5 sessions are also available from me upon request. Sponsored by: sentex.net	2004-02-11 04:26:04 +00:00
Hajimu UMEMOTO	f073c60f73	pass pcb rather than so. it is expected that per socket policy works again.	2004-02-03 18:20:55 +00:00
Jeffrey Hsu	61a36e3dfc	Merge from DragonFlyBSD rev 1.10: date: 2003/09/02 10:04:47; author: hsu; state: Exp; lines: +5 -6 Account for when Limited Transmit is not congestion window limited. Obtained from: DragonFlyBSD	2004-01-20 21:40:25 +00:00
Andre Oppermann	53369ac9bb	Limiters and sanity checks for TCP MSS (maximum segement size) resource exhaustion attacks. For network link optimization TCP can adjust its MSS and thus packet size according to the observed path MTU. This is done dynamically based on feedback from the remote host and network components along the packet path. This information can be abused to pretend an extremely low path MTU. The resource exhaustion works in two ways: o during tcp connection setup the advertized local MSS is exchanged between the endpoints. The remote endpoint can set this arbitrarily low (except for a minimum MTU of 64 octets enforced in the BSD code). When the local host is sending data it is forced to send many small IP packets instead of a large one. For example instead of the normal TCP payload size of 1448 it forces TCP payload size of 12 (MTU 64) and thus we have a 120 times increase in workload and packets. On fast links this quickly saturates the local CPU and may also hit pps processing limites of network components along the path. This type of attack is particularly effective for servers where the attacker can download large files (WWW and FTP). We mitigate it by enforcing a minimum MTU settable by sysctl net.inet.tcp.minmss defaulting to 256 octets. o the local host is reveiving data on a TCP connection from the remote host. The local host has no control over the packet size the remote host is sending. The remote host may chose to do what is described in the first attack and send the data in packets with an TCP payload of at least one byte. For each packet the tcp_input() function will be entered, the packet is processed and a sowakeup() is signalled to the connected process. For example an attack with 2 Mbit/s gives 4716 packets per second and the same amount of sowakeup()s to the process (and context switches). This type of attack is particularly effective for servers where the attacker can upload large amounts of data. Normally this is the case with WWW server where large POSTs can be made. We mitigate this by calculating the average MSS payload per second. If it goes below 'net.inet.tcp.minmss' and the pps rate is above 'net.inet.tcp.minmssoverload' defaulting to 1000 this particular TCP connection is resetted and dropped. MITRE CVE: CAN-2004-0002 Reviewed by: sam (mentor) MFC after: 1 day	2004-01-08 17:40:07 +00:00
Andre Oppermann	dba7bc6a65	Enable the following TCP options by default to give it more exposure: rfc3042 Limited retransmit rfc3390 Increasing TCP's initial congestion Window inflight TCP inflight bandwidth limiting All my production server have it enabled and there have been no issues. I am confident about having them on by default and it gives us better overall TCP performance. Reviewed by: sam (mentor)	2004-01-06 23:29:46 +00:00
Andre Oppermann	943ae30252	Restructure a too broad ifdef which was disabling the setting of the tcp flightsize sysctl value for local networks in the !INET6 case. Approved by: re (scottl)	2003-11-25 20:58:59 +00:00
Andre Oppermann	97d8d152c2	Introduce tcp_hostcache and remove the tcp specific metrics from the routing table. Move all usage and references in the tcp stack from the routing table metrics to the tcp hostcache. It caches measured parameters of past tcp sessions to provide better initial start values for following connections from or to the same source or destination. Depending on the network parameters to/from the remote host this can lead to significant speedups for new tcp connections after the first one because they inherit and shortcut the learning curve. tcp_hostcache is designed for multiple concurrent access in SMP environments with high contention and is hash indexed by remote ip address. It removes significant locking requirements from the tcp stack with regard to the routing table. Reviewed by: sam (mentor), bms Reviewed by: -net, -current, core@kame.net (IPv6 parts) Approved by: re (scottl)	2003-11-20 20:07:39 +00:00
Robert Watson	a557af222b	Introduce a MAC label reference in 'struct inpcb', which caches the MAC label referenced from 'struct socket' in the IPv4 and IPv6-based protocols. This permits MAC labels to be checked during network delivery operations without dereferencing inp->inp_socket to get to so->so_label, which will eventually avoid our having to grab the socket lock during delivery at the network layer. This change introduces 'struct inpcb' as a labeled object to the MAC Framework, along with the normal circus of entry points: initialization, creation from socket, destruction, as well as a delivery access control check. For most policies, the inpcb label will simply be a cache of the socket label, so a new protocol switch method is introduced, pr_sosetlabel() to notify protocols that the socket layer label has been updated so that the cache can be updated while holding appropriate locks. Most protocols implement this using pru_sosetlabel_null(), but IPv4/IPv6 protocols using inpcbs use the the worker function in_pcbsosetlabel(), which calls into the MAC Framework to perform a cache update. Biba, LOMAC, and MLS implement these entry points, as do the stub policy, and test policy. Reviewed by: sam, bms Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories	2003-11-18 00:39:07 +00:00
Andre Oppermann	122aad88d5	dropwithreset is not needed in this case as tcp_drop() is already notifying the other side. Before we were sending two RST packets.	2003-11-12 19:38:01 +00:00
Sam Leffler	c29afad673	o correct locking problem: the inpcb must be held across tcp_respond o add assertions in tcp_respond to validate inpcb locking assumptions o use local variable instead of chasing pointers in tcp_respond Supported by: FreeBSD Foundation	2003-11-08 22:59:22 +00:00
Sam Leffler	395bb18680	speedup stream socket recv handling by tracking the tail of the mbuf chain instead of walking the list for each append Submitted by: ps/jayanth Obtained from: netbsd (jason thorpe)	2003-10-28 05:47:40 +00:00
Hajimu UMEMOTO	b339980338	enclose IPv6 part with ifdef INET6. Obtained from: KAME	2003-10-20 16:19:01 +00:00
Hajimu UMEMOTO	31b3783c8d	correct linkmtu handling. Obtained from: KAME	2003-10-20 15:27:48 +00:00
Hajimu UMEMOTO	31b1bfe1b0	- add dom_if{attach,detach} framework. - transition to use ifp->if_afdata. Obtained from: KAME	2003-10-17 15:46:31 +00:00
Hartmut Brandt	3c653157a5	A number of patches in the last years have created new return paths in tcp_input that leave the function before hitting the tcp_trace function call for the TCPDEBUG option. This has made TCPDEBUG mostly useless (and tools like ports/benchmarks/dbs not working). Add tcp_trace calls to the return paths that could be identified in this maze. This is a NOP unless you compile with TCPDEBUG.	2003-08-13 08:46:54 +00:00
Jeffrey Hsu	9d11646de7	Unify the "send high" and "recover" variables as specified in the lastest rev of the spec. Use an explicit flag for Fast Recovery. [1] Fix bug with exiting Fast Recovery on a retransmit timeout diagnosed by Lu Guohan. [2] Reviewed by: Thomas Henderson <thomas.r.henderson@boeing.com> Reported and tested by: Lu Guohan <lguohan00@mails.tsinghua.edu.cn> [2] Approved by: Thomas Henderson <thomas.r.henderson@boeing.com>, Sally Floyd <floyd@acm.org> [1]	2003-07-15 21:49:53 +00:00
Poul-Henning Kamp	e4d2978dd8	Add /* FALLTHROUGH */ Found by: FlexeLint	2003-05-31 19:07:22 +00:00
Robert Watson	430c635447	Correct a bug introduced with reduced TCP state handling; make sure that the MAC label on TCP responses during TIMEWAIT is properly set from either the socket (if available), or the mbuf that it's responding to. Unfortunately, this is made somewhat difficult by the TCP code, as tcp_twstart() calls tcp_twrespond() after discarding the socket but without a reference to the mbuf that causes the "response". Passing both the socket and the mbuf works arounds this--eventually it might be good to make sure the mbuf always gets passed in in "response" scenarios but working through this provided to complicate things too much. Approved by: re (scottl) Reviewed by: hsu Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories	2003-05-07 05:26:27 +00:00
David E. O'Brien	152385d122	Explicitly declare 'int' parameters.	2003-04-21 16:27:46 +00:00
Jeffrey Hsu	48d2549c3e	Observe conservation of packets when entering Fast Recovery while doing Limited Transmit. Only artificially inflate the congestion window by 1 segment instead of the usual 3 to take into account the 2 already sent by Limited Transmit. Approved in principle by: Mark Allman <mallman@grc.nasa.gov>, Hari Balakrishnan <hari@nms.lcs.mit.edu>, Sally Floyd <floyd@icir.org>	2003-04-01 21:16:46 +00:00
Jeffrey Hsu	7792ea2700	Greatly simplify the unlocking logic by holding the TCP protocol lock until after FIN_WAIT_2 processing. Helped with debugging: Doug Barton	2003-03-13 11:46:57 +00:00
Jeffrey Hsu	da3a8a1a4f	Add support for RFC 3390, which allows for a variable-sized initial congestion window.	2003-03-13 01:43:45 +00:00
Jeffrey Hsu	582a954b00	Implement the Limited Transmit algorithm (RFC 3042).	2003-03-12 20:27:28 +00:00
Jonathan Lemon	607b0b0cc9	Remove a panic(); if the zone allocator can't provide more timewait structures, reuse the oldest one. Also move the expiry timer from a per-structure callout to the tcp slow timer. Sponsored by: DARPA, NAI Labs	2003-03-08 22:06:20 +00:00
Jonathan Lemon	272c5dfe93	In timewait state, if the incoming segment is a pure in-sequence ack that matches snd_max, then do not respond with an ack, just drop the segment. This fixes a problem where a simultaneous close results in an ack loop between two time-wait states. Test case supplied by: Tim Robbins <tjr@FreeBSD.ORG> Sponsored by: DARPA, NAI Labs	2003-02-26 18:20:41 +00:00
Jonathan Lemon	ef6b48deb9	The TCP protocol lock may still be held if the reassembly queue dropped FIN. Detect this case and drop the lock accordingly. Sponsored by: DARPA, NAI Labs	2003-02-26 13:55:13 +00:00
Jeffrey Hsu	11a20fb8b6	tcp_twstart() need to be called with the TCP protocol lock held to avoid a race condition with the TCP timer routines.	2003-02-24 00:52:03 +00:00
Jeffrey Hsu	2fbef91887	Pass the right function to callout_reset() for a compressed TIME-WAIT control block.	2003-02-24 00:48:12 +00:00
Jonathan Lemon	f243998be5	Yesterday just wasn't my day. Remove testing delta that crept into the diff. Pointy hat provided by: sam	2003-02-23 15:40:36 +00:00
Jonathan Lemon	a14c749f04	Check to see if the TF_DELACK flag is set before returning from tcp_input(). This unbreaks delack handling, while still preserving correct T/TCP behavior Tested by: maxim Sponsored by: DARPA, NAI Labs	2003-02-22 21:54:57 +00:00
Jonathan Lemon	340c35de6a	Add a TCP TIMEWAIT state which uses less space than a fullblown TCP control block. Allow the socket and tcpcb structures to be freed earlier than inpcb. Update code to understand an inp w/o a socket. Reviewed by: hsu, silby, jayanth Sponsored by: DARPA, NAI Labs	2003-02-19 22:32:43 +00:00
Jonathan Lemon	414462252a	Correct comments.	2003-02-19 21:33:46 +00:00
Jonathan Lemon	3bfd6421c2	Clean up delayed acks and T/TCP interactions: - delay acks for T/TCP regardless of delack setting - fix bug where a single pass through tcp_input might not delay acks - use callout_active() instead of callout_pending() Sponsored by: DARPA, NAI Labs	2003-02-19 21:18:23 +00:00
Jeffrey Hsu	85e8b24343	The protocol lock is always held in the dropafterack case, so we don't need to check for it at runtime.	2003-02-13 22:14:22 +00:00
Crist J. Clark	39eb27a4a9	Add the TCP flags to the log message whenever log_in_vain is 1, not just when set to 2. PR: kern/43348 MFC after: 5 days	2003-02-02 22:06:56 +00:00
Jeffrey Hsu	cb942153c8	Fix NewReno. Reviewed by: Tom Henderson <thomas.r.henderson@boeing.com>	2003-01-13 11:01:20 +00:00
Matthew Dillon	07fd333df3	Remove the PAWS ack-on-ack debugging printf(). Note that the original RFC 1323 (PAWS) says in 4.2.1 that the out of order / reverse-time-indexed packet should be acknowledged as specified in RFC-793 page 69 then dropped. The original PAWS code in FreeBSD (1994) simply acknowledged the segment unconditionally, which is incorrect, and was fixed in 1.183 (2002). At the moment we do not do checks for SYN or FIN in addition to (tlen != 0), which may or may not be correct, but the worst that ought to happen should be a retry by the sender.	2002-12-30 19:31:04 +00:00
Jeffrey Hsu	540e8b7e31	Unravel a nested conditional. Remove an unneeded local variable.	2002-12-20 11:16:52 +00:00
Matthew Dillon	967adce8df	Fix syntax in last commit.	2002-12-17 00:24:48 +00:00
Matthew Dillon	1ab4789dc2	Bruce forwarded this tidbit from an analysis Van Jacobson did on an apparent ack-on-ack problem with FreeBSD. Prof. Jacobson noticed a case in our TCP stack which would acknowledge a received ack-only packet, which is not legal in TCP. Submitted by: Van Jacobson <van@packetdesign.com>, bmah@packetdesign.com (Bruce A. Mah) MFC after: 7 days	2002-12-14 07:31:51 +00:00
Sam Leffler	6f0d017cf4	a better solution to building FAST_IPSEC w/o INET6 Submitted by: Jeffrey Hsu <hsu@FreeBSD.org>	2002-11-10 17:17:32 +00:00
Sam Leffler	58fcadfc0f	fixup FAST_IPSEC build w/o INET6	2002-11-08 23:33:59 +00:00
Jeff Roberson	1645d0903e	- Consistently update snd_wl1, snd_wl2, and rcv_up in the header prediction code. Previously, 2GB worth of header predicted data could leave these variables too far out of sequence which would cause problems after receiving a packet that did not match the header prediction. Submitted by: Bill Baumann <bbaumann@isilon.com> Sponsored by: Isilon Systems, Inc. Reviewed by: hsu, pete@isilon.com, neal@isilon.com, aaronp@isilon.com	2002-10-31 23:24:13 +00:00
Jeffrey Hsu	30613f5610	Don't need to check if SO_OOBINLINE is defined. Don't need to protect isipv6 conditional with INET6. Fix leading indentation in 2 lines.	2002-10-30 08:32:19 +00:00
Sam Leffler	b9234fafa0	Tie new "Fast IPsec" code into the build. This involves the usual configuration stuff as well as conditional code in the IPv4 and IPv6 areas. Everything is conditional on FAST_IPSEC which is mutually exclusive with IPSEC (KAME IPsec implmentation). As noted previously, don't use FAST_IPSEC with INET6 at the moment. Reviewed by: KAME, rwatson Approved by: silence Supported by: Vernier Networks	2002-10-16 02:25:05 +00:00
Sam Leffler	5d84645305	Replace aux mbufs with packet tags: o instead of a list of mbufs use a list of m_tag structures a la openbsd o for netgraph et. al. extend the stock openbsd m_tag to include a 32-bit ABI/module number cookie o for openbsd compatibility define a well-known cookie MTAG_ABI_COMPAT and use this in defining openbsd-compatible m_tag_find and m_tag_get routines o rewrite KAME use of aux mbufs in terms of packet tags o eliminate the most heavily used aux mbufs by adding an additional struct inpcb parameter to ip_output and ip6_output to allow the IPsec code to locate the security policy to apply to outbound packets o bump __FreeBSD_version so code can be conditionalized o fixup ipfilter's call to ip_output based on __FreeBSD_version Reviewed by: julian, luigi (silent), -arch, -net, darren Approved by: julian, silence from everyone else Obtained from: openbsd (mostly) MFC after: 1 month	2002-10-16 01:54:46 +00:00
Matthew Dillon	a84db8f49e	Guido found another bug. There is a situation with timestamped TCP packets where FreeBSD will send DATA+FIN and A W2K box will ack just the DATA portion. If this occurs after FreeBSD has done a (NewReno) fast-retransmit and is recovering it (dupacks > threshold) it triggers a case in tcp_newreno_partial_ack() (tcp_newreno() in stable) where tcp_output() is called with the expectation that the retransmit timer will be reloaded. But tcp_output() falls through and returns without doing anything, causing the persist timer to be loaded instead. This causes the connection to hang until W2K gives up. This occurs because in the case where only the FIN must be acked, the 'len' calculation in tcp_output() will be 0, a lot of checks will be skipped, and the FIN check will also be skipped because it is designed to handle FIN retransmits, not forced transmits from tcp_newreno(). The solution is to simply set TF_ACKNOW before calling tcp_output() to absolute guarentee that it will run the send code and reset the retransmit timer. TF_ACKNOW is already used for this purpose in other cases. For some unknown reason this patch also seems to greatly reduce the number of duplicate acks received when Guido runs his tests over a lossy network. It is quite possible that there are other tcp_newreno{_partial_ack()} cases which were not generating the expected output which this patch also fixes. X-MFC after: Will be MFC'd after the freeze is over	2002-09-30 18:55:45 +00:00
Mike Silbersack	c1c36a2c68	Fix issue where shutdown(socket, SHUT_RD) was effectively ignored for TCP sockets. NetBSD PR: 18185 Submitted by: Sean Boudreau <seanb@qnx.com> MFC after: 3 days	2002-09-22 02:54:07 +00:00
Matthew Dillon	fa55172bc0	Guido reported an interesting bug where an FTP connection between a Windows 2000 box and a FreeBSD box could stall. The problem turned out to be a timestamp reply bug in the W2K TCP stack. FreeBSD sends a timestamp with the SYN, W2K returns a timestamp of 0 in the SYN+ACK causing FreeBSD to calculate an insane SRTT and RTT, resulting in a maximal retransmit timeout (60 seconds). If there is any packet loss on the connection for the first six or so packets the retransmit case may be hit (the window will still be too small for fast-retransmit), causing a 60+ second pause. The W2K box gives up and closes the connection. This commit works around the W2K bug. 15:04:59.374588 FREEBSD.20 > W2K.1036: S 1420807004:1420807004(0) win 65535 <mss 1460,nop,wscale 2,nop,nop,timestamp 188297344 0> (DF) [tos 0x8] 15:04:59.377558 W2K.1036 > FREEBSD.20: S 4134611565:4134611565(0) ack 1420807005 win 17520 <mss 1460,nop,wscale 0,nop,nop,timestamp 0 0> (DF) Bug reported by: Guido van Rooij <guido@gvr.org>	2002-09-17 22:21:37 +00:00
Philippe Charnier	93b0017f88	Replace various spelling with FALLTHROUGH which is lint()able	2002-08-25 13:23:09 +00:00
Juli Mallett	ded7008a07	Enclose IPv6 addresses in brackets when they are displayed printable with a TCP/UDP port seperated by a colon. This is for the log_in_vain facility. Pointed out by: Edward J. M. Brocklesby Reviewed by: ume MFC after: 2 weeks	2002-08-19 19:47:13 +00:00
Matthew Dillon	1fcc99b5de	Implement TCP bandwidth delay product window limiting, similar to (but not meant to duplicate) TCP/Vegas. Add four sysctls and default the implementation to 'off'. net.inet.tcp.inflight_enable enable algorithm (defaults to 0=off) net.inet.tcp.inflight_debug debugging (defaults to 1=on) net.inet.tcp.inflight_min minimum window limit net.inet.tcp.inflight_max maximum window limit MFC after: 1 week	2002-08-17 18:26:02 +00:00
Jeffrey Hsu	c068736a61	Cosmetic-only changes for readability. Reviewed by: (early form passed by) bde Approved by: itojun (from core@kame.net)	2002-08-17 02:05:25 +00:00
Robert Watson	fb95b5d3c3	Rename mac_check_socket_receive() to mac_check_socket_deliver() so that we can use the names _receive() and _send() for the receive() and send() checks. Rename related constants, policy implementations, etc. Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs	2002-08-15 18:51:27 +00:00
Jeffrey Hsu	b5addd8564	Reset dupack count in header prediction. Follow-on to rev 1.39. Reviewed by: jayanth, Thomas R Henderson <thomas.r.henderson@boeing.com>, silby, dillon	2002-08-15 17:13:18 +00:00
Robert Watson	c488362e1a	Introduce support for Mandatory Access Control and extensible kernel access control. Instrument the TCP socket code for packet generation and delivery: label outgoing mbufs with the label of the socket, and check socket and mbuf labels before permitting delivery to a socket. Assign labels to newly accepted connections when the syncache/cookie code has done its business. Also set peer labels as convenient. Currently, MAC policies cannot influence the PCB matching algorithm, so cannot implement polyinstantiation. Note that there is at least one case where a PCB is not available due to the TCP packet not being associated with any socket, so we don't label in that case, but need to handle it in a special manner. Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs	2002-07-31 19:06:49 +00:00
Ruslan Ermilov	88c39af35f	Don't shrink socket buffers in tcp_mss(), application might have already configured them with setsockopt(SO_*BUF), for RFC1323's scaled windows. PR: kern/11966 MFC after: 1 week	2002-07-22 22:31:09 +00:00
Matthew Dillon	d65bf08af3	Add the tcps_sndrexmitbad statistic, keep track of late acks that caused unnecessary retransmissions.	2002-07-19 18:29:38 +00:00
Jeffrey Hsu	6fd22caf91	Avoid unlocking the inp twice if badport_bandlim() returns -1. Reported by: jlemon	2002-06-24 22:25:00 +00:00
Jeffrey Hsu	f14e4cfe33	Style bug: fix 4 space indentations that should have been tabs. Submitted by: jlemon	2002-06-24 16:47:02 +00:00
Luigi Rizzo	410bb1bfe2	Move two global variables to automatic variables within the only function where they are used (they are used with TCPDEBUG only).	2002-06-23 21:22:56 +00:00
Luigi Rizzo	2b25acc158	Remove (almost all) global variables that were used to hold packet forwarding state ("annotations") during ip processing. The code is considerably cleaner now. The variables removed by this change are: ip_divert_cookie used by divert sockets ip_fw_fwd_addr used for transparent ip redirection last_pkt used by dynamic pipes in dummynet Removal of the first two has been done by carrying the annotations into volatile structs prepended to the mbuf chains, and adding appropriate code to add/remove annotations in the routines which make use of them, i.e. ip_input(), ip_output(), tcp_input(), bdg_forward(), ether_demux(), ether_output_frame(), div_output(). On passing, remove a bug in divert handling of fragmented packet. Now it is the fragment at offset 0 which sets the divert status of the whole packet, whereas formerly it was the last incoming fragment to decide. Removal of last_pkt required a change in the interface of ip_fw_chk() and dummynet_io(). On passing, use the same mechanism for dummynet annotations and for divert/forward annotations. option IPFIREWALL_FORWARD is effectively useless, the code to implement it is very small and is now in by default to avoid the obfuscation of conditionally compiled code. NOTES: * there is at least one global variable left, sro_fwd, in ip_output(). I am not sure if/how this can be removed. * I have deliberately avoided gratuitous style changes in this commit to avoid cluttering the diffs. Minor stule cleanup will likely be necessary * this commit only focused on the IP layer. I am sure there is a number of global variables used in the TCP and maybe UDP stack. * despite the number of files touched, there are absolutely no API's or data structures changed by this commit (except the interfaces of ip_fw_chk() and dummynet_io(), which are internal anyways), so an MFC is quite safe and unintrusive (and desirable, given the improved readability of the code). MFC after: 10 days	2002-06-22 11:51:02 +00:00
Seigo Tanimura	03e4918190	Remove so*_locked(), which were backed out by mistake.	2002-06-18 07:42:02 +00:00
Jeffrey Hsu	f76fcf6d4c	Lock up inpcb. Submitted by: Jennifer Yang <yangjihui@yahoo.com>	2002-06-10 20:05:46 +00:00
Seigo Tanimura	4cc20ab1f0	Back out my lats commit of locking down a socket, it conflicts with hsu's work. Requested by: hsu	2002-05-31 11:52:35 +00:00
Seigo Tanimura	243917fe3b	Lock down a socket, milestone 1. o Add a mutex (sb_mtx) to struct sockbuf. This protects the data in a socket buffer. The mutex in the receive buffer also protects the data in struct socket. o Determine the lock strategy for each members in struct socket. o Lock down the following members: - so_count - so_options - so_linger - so_state o Remove *_locked() socket APIs. Make the following socket APIs touching the members above now require a locked socket: - sodisconnect() - soisconnected() - soisconnecting() - soisdisconnected() - soisdisconnecting() - sofree() - soref() - sorele() - sorwakeup() - sotryfree() - sowakeup() - sowwakeup() Reviewed by: alfred	2002-05-20 05:41:09 +00:00
Alfred Perlstein	f132072368	Redo the sigio locking. Turn the sigio sx into a mutex. Sigio lock is really only needed to protect interrupts from dereferencing the sigio pointer in an object when the sigio itself is being destroyed. In order to do this in the most unintrusive manner change pgsigio's sigio * argument into a **, that way we can lock internally to the function.	2002-05-01 20:44:46 +00:00
Seigo Tanimura	960ed29c4b	Revert the change of #includes in sys/filedesc.h and sys/socketvar.h. Requested by: bde Since locking sigio_lock is usually followed by calling pgsigio(), move the declaration of sigio_lock and the definitions of SIGIO_*() to sys/signalvar.h. While I am here, sort include files alphabetically, where possible.	2002-04-30 01:54:54 +00:00
Seigo Tanimura	d48d4b2501	Add a global sx sigio_lock to protect the pointer to the sigio object of a socket. This avoids lock order reversal caused by locking a process in pgsigio(). sowakeup() and the callers of it (sowwakeup, soisconnected, etc.) now require sigio_lock to be locked. Provide sowwakeup_locked(), soisconnected_locked(), and so on in case where we have to modify a socket and wake up a process atomically.	2002-04-27 08:24:29 +00:00
SUZUKI Shinsuke	88ff5695c1	just merged cosmetic changes from KAME to ease sync between KAME and FreeBSD. (based on freebsd4-snap-20020128) Reviewed by: ume MFC after: 1 week	2002-04-19 04:46:24 +00:00
Mike Silbersack	898568d8ab	Remove some ISN generation code which has been unused since the syncache went in. MFC after: 3 days	2002-04-10 22:12:01 +00:00
Bruce Evans	c1cd65bae8	Fixed some style bugs in the removal of __P(()). Continuation lines were not outdented to preserve non-KNF lining up of code with parentheses. Switch to KNF formatting.	2002-03-24 10:19:10 +00:00
Alfred Perlstein	4d77a549fe	Remove __P.	2002-03-19 21:25:46 +00:00
Crist J. Clark	93ec91ba6d	Change the wording of the inline comments from the previous commit. Objection from: ru	2002-02-27 13:52:06 +00:00
Crist J. Clark	2ca2159f22	The TCP code did not do sufficient checks on whether incoming packets were destined for a broadcast IP address. All TCP packets with a broadcast destination must be ignored. The system only ignored packets that were _link-layer_ broadcasts or multicast. We need to check the IP address too since it is quite possible for a broadcast IP address to come in with a unicast link-layer address. Note that the check existed prior to CSRG revision 7.35, but was removed. This commit effectively backs out that nine-year-old change. PR: misc/35022	2002-02-25 08:29:21 +00:00
Mike Barcroft	fd8e4ebc8c	o Move NTOHL() and associated macros into <sys/param.h>. These are deprecated in favor of the POSIX-defined lowercase variants. o Change all occurrences of NTOHL() and associated marcros in the source tree to use the lowercase function variants. o Add missing license bits to sparc64's <machine/endian.h>. Approved by: jake o Clean up <machine/endian.h> files. o Remove unused __uint16_swap_uint32() from i386's <machine/endian.h>. o Remove prototypes for non-existent bswapXX() functions. o Include <machine/endian.h> in <arpa/inet.h> to define the POSIX-required ntohl() family of functions. o Do similar things to expose the ntohl() family in libstand, <netinet/in.h>, and <sys/param.h>. o Prepend underscores to the ntohl() family to help deal with complexities associated with having MD (asm and inline) versions, and having to prevent exposure of these functions in other headers that happen to make use of endian-specific defines. o Create weak aliases to the canonical function name to help deal with third-party software forgetting to include an appropriate header. o Remove some now unneeded pollution from <sys/types.h>. o Add missing <arpa/inet.h> includes in userland. Tested on: alpha, i386 Reviewed by: bde, jake, tmm	2002-02-18 20:35:27 +00:00
Robert Watson	e6658b129e	o Spelling fix in comment: tcp_ouput -> tcp_output	2002-01-04 17:21:27 +00:00
Jonathan Lemon	0ef3206bf5	Fix up tabs in comments.	2001-12-13 04:02:09 +00:00
Matthew Dillon	262c1c1a4e	Fix a bug with transmitter restart after receiving a 0 window. The receiver was not sending an immediate ack with delayed acks turned on when the input buffer is drained, preventing the transmitter from restarting immediately. Propogate the TCP_NODELAY option to accept()ed sockets. (Helps tbench and is a good idea anyway). Some cleanup. Identify additonal issues in comments. MFC after: 1 day	2001-12-02 08:49:29 +00:00
Jonathan Lemon	be2ac88c59	Introduce a syncache, which enables FreeBSD to withstand a SYN flood DoS in an improved fashion over the existing code. Reviewed by: silby (in a previous iteration) Sponsored by: DARPA, NAI Labs	2001-11-22 04:50:44 +00:00
Jonathan Lemon	d00fd2011d	Move initialization of snd_recover into tcp_sendseqinit().	2001-11-21 18:45:51 +00:00
Julian Elischer	b40ce4165d	KSE Milestone 2 Note ALL MODULES MUST BE RECOMPILED make the kernel aware that there are smaller units of scheduling than the process. (but only allow one thread per process at this time). This is functionally equivalent to teh previousl -current except that there is a thread associated with each process. Sorry john! (your next MFC will be a doosie!) Reviewed by: peter@freebsd.org, dillon@freebsd.org X-MFC after: ha ha ha ha	2001-09-12 08:38:13 +00:00
Julian Elischer	f0ffb944d2	Patches from Keiichi SHIMA <keiichi@iij.ad.jp> to make ip use the standard protosw structure again. Obtained from: Well, KAME I guess.	2001-09-03 20:03:55 +00:00
Jayanth Vijayaraghavan	e7e2b80184	when newreno is turned on, if dupacks = 1 or dupacks = 2 and new data is acknowledged, reset the dupacks to 0. The problem was spotted when a connection had its send buffer full because the congestion window was only 1 MSS and was not being incremented because dupacks was not reset to 0. Obtained from: Yahoo!	2001-08-29 23:54:13 +00:00
Dima Dorfman	745bab7f84	Correct a typo in a comment: FIN_WAIT2 -> FIN_WAIT_2 PR: 29970 Submitted by: Joseph Mallett <jmallett@xMach.org>	2001-08-23 22:34:29 +00:00
Mike Silbersack	b0e3ad758b	Much delayed but now present: RFC 1948 style sequence numbers In order to ensure security and functionality, RFC 1948 style initial sequence number generation has been implemented. Barring any major crypographic breakthroughs, this algorithm should be unbreakable. In addition, the problems with TIME_WAIT recycling which affect our currently used algorithm are not present. Reviewed by: jesper	2001-08-22 00:58:16 +00:00
Mike Silbersack	2d610a5028	Temporary feature: Runtime tuneable tcp initial sequence number generation scheme. Users may now select between the currently used OpenBSD algorithm and the older random positive increment method. While the OpenBSD algorithm is more secure, it also breaks TIME_WAIT handling; this is causing trouble for an increasing number of folks. To switch between generation schemes, one sets the sysctl net.inet.tcp.tcp_seq_genscheme. 0 = random positive increments, 1 = the OpenBSD algorithm. 1 is still the default. Once a secure _and_ compatible algorithm is implemented, this sysctl will be removed. Reviewed by: jlemon Tested by: numerous subscribers of -net	2001-07-08 02:20:47 +00:00
Ruslan Ermilov	c73d99b567	Add netstat(1) knob to reset net.inet.{ip\|icmp\|tcp\|udp\|igmp}.stats. For example, ``netstat -s -p ip -z'' will show and reset IP stats. PR: bin/17338	2001-06-23 17:17:59 +00:00
Mike Silbersack	08517d530e	Eliminate the allocation of a tcp template structure for each connection. The information contained in a tcptemp can be reconstructed from a tcpcb when needed. Previously, tcp templates required the allocation of one mbuf per connection. On large systems, this change should free up a large number of mbufs. Reviewed by: bmilekic, jlemon, ru MFC after: 2 weeks	2001-06-23 03:21:46 +00:00
Hajimu UMEMOTO	3384154590	Sync with recent KAME. This work was based on kame-20010528-freebsd43-snap.tgz and some critical problem after the snap was out were fixed. There are many many changes since last KAME merge. TODO: - The definitions of SADB_* in sys/net/pfkeyv2.h are still different from RFC2407/IANA assignment because of binary compatibility issue. It should be fixed under 5-CURRENT. - ip6po_m member of struct ip6_pktopts is no longer used. But, it is still there because of binary compatibility issue. It should be removed under 5-CURRENT. Reviewed by: itojun Obtained from: KAME MFC after: 3 weeks	2001-06-11 12:39:29 +00:00
Jesper Skriver	65f28919b3	Silby's take one on increasing FreeBSD's resistance to SYN floods: One way we can reduce the amount of traffic we send in response to a SYN flood is to eliminate the RST we send when removing a connection from the listen queue. Since we are being flooded, we can assume that the majority of connections in the queue are bogus. Our RST is unwanted by these hosts, just as our SYN-ACK was. Genuine connection attempts will result in hosts responding to our SYN-ACK with an ACK packet. We will automatically return a RST response to their ACK when it gets to us if the connection has been dropped, so the early RST doesn't serve the genuine class of connections much. In summary, we can reduce the number of packets we send by a factor of two without any loss in functionality by ensuring that RST packets are not sent when dropping a connection from the listen queue. Submitted by: Mike Silbersack <silby@silby.com> Reviewed by: jesper MFC after: 2 weeks	2001-06-06 19:41:51 +00:00
Jesper Skriver	e4b6428171	Inline TCP_REASS() in the single location where it's used, just as OpenBSD and NetBSD has done. No functional difference. MFC after: 2 weeks	2001-05-29 19:54:45 +00:00
Jesper Skriver	853be1226e	properly delay acks in half-closed TCP connections PR: 24962 Submitted by: Tony Finch <dot@dotat.at> MFC after: 2 weeks	2001-05-29 19:51:45 +00:00
Jesper Skriver	d1745f454d	Say goodbye to TCP_COMPAT_42 Reviewed by: wollman Requested by: wollman	2001-04-20 11:58:56 +00:00
Kris Kennaway	f0a04f3f51	Randomize the TCP initial sequence numbers more thoroughly. Obtained from: OpenBSD Reviewed by: jesper, peter, -developers	2001-04-17 18:08:01 +00:00
Dag-Erling Smørgrav	c59319bf1a	Axe TCP_RESTRICT_RST. It was never a particularly good idea except for a few very specific scenarios, and now that we have had net.inet.tcp.blackhole for quite some time there is really no reason to use it any more. (last of three commits)	2001-03-19 22:09:00 +00:00
Jonathan Lemon	d8c85a260f	Do not delay a new ack if there already is a delayed ack pending on the connection, but send it immediately. Prior to this change, it was possible to delay a delayed-ack for multiple times, resulting in degraded TCP behavior in certain corner cases.	2001-02-25 15:17:24 +00:00
Bosko Milekic	a57815efd2	Clean up RST ratelimiting. Previously, ratelimiting occured before tests were performed to determine if the received packet should be reset. This created erroneous ratelimiting and false alarms in some cases. The code has now been reorganized so that the checks for validity come before the call to badport_bandlim. Additionally, a few changes in the symbolic names of the bandlim types have been made, as well as a clarification of exactly which type each RST case falls under. Submitted by: Mike Silbersack <silby@silby.com>	2001-02-11 07:39:51 +00:00
Garrett Wollman	a589a70ee1	Correct a comment.	2001-01-24 16:25:36 +00:00
Bosko Milekic	09f81a46a5	Change the following: 1. ICMP ECHO and TSTAMP replies are now rate limited. 2. RSTs generated due to packets sent to open and unopen ports are now limited by seperate counters. 3. Each rate limiting queue now has its own description, as follows: Limiting icmp unreach response from 439 to 200 packets per second Limiting closed port RST response from 283 to 200 packets per second Limiting open port RST response from 18724 to 200 packets per second Limiting icmp ping response from 211 to 200 packets per second Limiting icmp tstamp response from 394 to 200 packets per second Submitted by: Mike Silbersack <silby@silby.com>	2000-12-15 21:45:49 +00:00
David Malone	7cc0979fd6	Convert more malloc+bzero to malloc+M_ZERO. Submitted by: josh@zipperup.org Submitted by: Robert Drehmel <robd@gmx.net>	2000-12-08 21:51:06 +00:00
Jonathan Lemon	8735719e43	tp->snd_recover is part of the New Reno recovery algorithm, and should only be checked if the system is currently performing New Reno style fast recovery. However, this value was being checked regardless of the NR state, with the end result being that the congestion window was never opened. Change the logic to check t_dupack instead; the only code path that allows it to be nonzero at this point is NewReno, so if it is nonzero, we are in fast recovery mode and should not touch the congestion window. Tested by: phk	2000-11-04 15:59:39 +00:00
Jayanth Vijayaraghavan	e7f3269307	When a connection is being dropped due to a listen queue overflow, delete the cloned route that is associated with the connection. This does not exhaust the routing table memory when the system is under a SYN flood attack. The route entry is not deleted if there is any prior information cached in it. Reviewed by: Peter Wemm,asmodai	2000-07-21 23:26:37 +00:00
Jun-ichiro itojun Hagino	b474779f46	be more cautious about tcp option length field. drop bogus ones earlier. not sure if there is a real threat or not, but it seems that there's possibility for overrun/underrun (like non-NOP option with optlen > cnt).	2000-07-09 13:01:59 +00:00
Jun-ichiro itojun Hagino	686cdd19b1	sync with kame tree as of july00. tons of bug fixes/improvements. API changes: - additional IPv6 ioctls - IPsec PF_KEY API was changed, it is mandatory to upgrade setkey(8). (also syntax change)	2000-07-04 16:35:15 +00:00
Dan Moschuk	4f14ee00f2	sysctl'ize ICMP_BANDLIM and ICMP_BANDLIM_SUPPRESS_OUTPUT. Suggested by: des/nbm	2000-05-22 16:12:28 +00:00
Jayanth Vijayaraghavan	d841727499	snd_cwnd was updated twice in the tcp_newreno function.	2000-05-18 21:21:42 +00:00
Jayanth Vijayaraghavan	75c6e0e253	Sigh, fix a rookie patch merge error. Also-missed-by: peter	2000-05-17 06:55:00 +00:00
Jayanth Vijayaraghavan	6b2a5f92ba	snd_una was being updated incorrectly, this resulted in the newreno code retransmitting data from the wrong offset. As a footnote, the newreno code was partially derived from NetBSD and Tom Henderson <tomh@cs.berkeley.edu>	2000-05-16 03:13:59 +00:00
Jonathan Lemon	46f5848237	Implement TCP NewReno, as documented in RFC 2582. This allows better recovery for multiple packet losses in a single window. The algorithm can be toggled via the sysctl net.inet.tcp.newreno, which defaults to "on". Submitted by: Jayanth Vijayaraghavan <jayanth@yahoo-inc.com>	2000-05-06 03:31:09 +00:00
Munechika SUMIKAWA	5e0ab69d23	ND6_HINT() should not be called unless the connection status is ESTABLISHED. Obtained from: KAME Project	2000-04-17 20:27:02 +00:00
Yoshinobu Inoue	fdaf052eb3	Support per socket based IPv4 mapped IPv6 addr enable/disable control. Submitted by: ume	2000-04-01 22:35:47 +00:00
Jonathan Lemon	db4f9cc703	Add support for offloading IP/TCP/UDP checksums to NIC hardware which supports them.	2000-03-27 19:14:27 +00:00
Yoshinobu Inoue	4739b8076f	IPv6 6to4 support. Now most big problem of IPv6 is getting IPv6 address assignment. 6to4 solve the problem. 6to4 addr is defined like below, 2002: 4byte v4 addr : 2byte SLA ID : 8byte interface ID The most important point of the address format is that an IPv4 addr is embeded in it. So any user who has IPv4 addr can get IPv6 address block with 2byte subnet space. Also, the IPv4 addr is used for semi-automatic IPv6 over IPv4 tunneling. With 6to4, getting IPv6 addr become dramatically easy. The attached patch enable 6to4 extension, and confirmed to work, between "Richard Seaman, Jr." <dick@tar.com> and me. Approved by: jkh Reviewed by: itojun	2000-03-11 11:17:24 +00:00
Warner Losh	173c0f9f5c	Mitigate the stream.c attacks o Drop all broadcast and multicast source addresses in tcp_input. o Enable ICMP_BANDLIM in GENERIC. o Change default to 200/s from 100/s. This will still stop the attack, but is conservative enough to do this close to code freeze. This is not the optimal patch for the problem, but is likely the least intrusive patch that can be made for this. Obtained from: Don Lewis and Matt Dillon. Reviewed by: freebsd-security	2000-01-28 06:13:09 +00:00
Yoshinobu Inoue	69a3468578	Avoid m_len and m_pkthdr.len inconsistency when changing m_len for an mbuf whose M_PKTHDR is set. PR: related to kern/15175 Reviewed by: archie	2000-01-25 01:26:47 +00:00
Yoshinobu Inoue	3a2a9f7976	Fixed the problem that IPsec connection hangs when bigger data is sent. -opt_ipsec.h was missing on some tcp files (sorry for basic mistake) -made buildable as above fix -also added some missing IPv4 mapped IPv6 addr consideration into ipsec4_getpolicybysock	2000-01-15 14:56:38 +00:00
Yoshinobu Inoue	8972cdb14e	add a comment for some possible? IPv4 option processing.	2000-01-13 05:21:05 +00:00
Yoshinobu Inoue	fb59c426ff	tcp updates to support IPv6. also a small patch to sys/nfs/nfs_socket.c, as max_hdr size change. Reviewed by: freebsd-arch, cvs-committers Obtained from: KAME project	2000-01-09 19:17:30 +00:00
Yoshinobu Inoue	6a800098cc	IPSEC support in the kernel. pr_input() routines prototype is also changed to support IPSEC and IPV6 chained protocol headers. Reviewed by: freebsd-arch, cvs-committers Obtained from: KAME project	1999-12-22 19:13:38 +00:00
Jonathan Lemon	c0f7fd5575	Use SEQ_* macros for comparing sequence space numbers. Reviewed by: truckman	1999-12-14 15:43:56 +00:00
Jonathan Lemon	1a244a616d	According to RFC 793, a reset should be honored if the sequence number is within the receive window. Follow this behavior, instead of only allowing resets at last_ack_sent. Pointed out by: jayanth@yahoo-inc.com	1999-12-11 04:05:52 +00:00
Yoshinobu Inoue	cfa1ca9dfa	udp IPv6 support, IPv6/IPv4 tunneling support in kernel, packet divert at kernel for IPv6/IPv4 translater daemon This includes queue related patch submitted by jburkhol@home.com. Submitted by: queue related patch from jburkhol@home.com Reviewed by: freebsd-arch, cvs-committers Obtained from: KAME project	1999-12-07 17:39:16 +00:00
Brian Feldman	ecf723083f	Implement RLIMIT_SBSIZE in the kernel. This is a per-uid sockbuf total usage limit.	1999-10-09 20:42:17 +00:00
Dag-Erling Smørgrav	f861330504	Fix some more disordering, as well as the description string for the net.inet.tcp.drop_synfin sysctl, which for some mysterious reason said "Drop TCP packets with FIN+ACK set" (instead of "...with SYN+FIN set")	1999-09-14 16:14:05 +00:00
Dag-Erling Smørgrav	e46cd3d4d2	Add the net.inet.tcp.restrict_rst and net.inet.tcp.drop_synfin sysctl variables, conditional on the TCP_RESTRICT_RST and TCP_DROP_SYNFIN kernel options, respectively. See the comments in LINT for details.	1999-09-12 17:22:08 +00:00
Jonathan Lemon	9b8b58e033	Restructure TCP timeout handling: - eliminate the fast/slow timeout lists for TCP and instead use a callout entry for each timer. - increase the TCP timer granularity to HZ - implement "bad retransmit" recovery, as presented in "On Estimating End-to-End Network Path Properties", by Allman and Paxson. Submitted by: jlemon, wollmann	1999-08-30 21:17:07 +00:00
David E. O'Brien	5a8c77a83c	Remove extra indenting of `break' statements introducted in rev 1.89, plus wrap some long lines from that revision. While here, wrap some other long lines.	1999-08-29 21:59:03 +00:00
Peter Wemm	c3aac50f28	$Id$ -> $FreeBSD$	1999-08-28 01:08:13 +00:00
Geoff Rehmet	828b7f4069	Fix breakage if blackhole=1 and tiflags & TH_SYN, plus style(9) fixes Submitted by: Jonathon Lemon	1999-08-19 05:22:12 +00:00
Geoff Rehmet	2e4e1b4c31	Slight tweak to tcp.blackhole to add optional behaviour to drop any segment arriving at a closed port. tcp.blackhole=1 - only drop SYN without RST tcp.blackhole=2 - drop everything without RST tcp.blackhole=0 - always send RST - default behaviour This confuses nmap -sF or -sX or -sN quite badly.	1999-08-18 15:40:05 +00:00
Geoff Rehmet	16f7f31f04	Add net.inet.tcp.blackhole and net.inet.udp.blackhole sysctl knobs. With these knobs on, refused connection attempts are dropped without sending a RST, or Port unreachable in the UDP case. In the TCP case, sending of RST is inhibited iff the incoming segment was a SYN. Docs and rc.conf settings to follow.	1999-08-17 12:17:53 +00:00
Jonathan M. Bresler	e9bd3a37e8	fix comment re: RST received in TIME_WAIT to match the code.	1999-07-18 14:42:48 +00:00
Peter Wemm	dfd5dee1b0	Add sufficient braces to keep egcs happy about potentially ambiguous if/else nesting.	1999-05-06 18:13:11 +00:00
Bill Fumerola	3d177f465a	Add sysctl descriptions to many SYSCTL_XXXs PR: kern/11197 Submitted by: Adrian Chadd <adrian@FreeBSD.org> Reviewed by: billf(spelling/style/minor nits) Looked at by: bde(style)	1999-05-03 23:57:32 +00:00
Bill Fenner	51b7b33769	Use snd_nxt, not rcv_nxt, when calculating the ISS during TIME_WAIT. This was missed in the 4.4-Lite2 merge. Noticed by: Mohan Parthasarathy <Mohan.Parthasarathy@eng.Sun.COM> and jayanth@loc201.tandem.com (vijayaraghavan_jayanth) on the tcp-impl mailing list.	1999-02-06 00:47:45 +00:00
Matthew Dillon	831a80b0d5	Fix warnings in preparation for adding -Wall -Wcast-qual to the kernel compile	1999-01-27 22:42:27 +00:00
Matthew Dillon	51508de112	Reviewed by: freebsd-current Add ICMP_BANDLIM option and 'net.inet.icmp.icmplim' sysctl. If option is specified in kernel config, icmplim defaults to 100 pps. Setting it to 0 will disable the feature. This feature limits ICMP error responses for packets sent to bad tcp or udp ports, which does a lot to help the machine handle network D.O.S. attacks. The kernel will report packet rates that exceed the limit at a rate of one kernel printf per second. There is one issue in regards to the 'tail end' of an attack... the kernel will not output the last report until some unrelated and valid icmp error packet is return at some point after the attack is over. This is a minor reporting issue only.	1998-12-03 20:23:21 +00:00
Garrett Wollman	80ab7c0ed8	Fix RST validation. PR: 7892 Submitted by: Don.Lewis@tsc.tdk.com	1998-09-11 16:04:03 +00:00
Doug Rabson	6effc71332	Re-implement tcp and ip fragment reassembly to not store pointers in the ip header which can't work on alpha since pointers are too big. Reviewed by: Garrett Wollman <wollman@khavrinen.lcs.mit.edu>	1998-08-24 07:47:39 +00:00
Julian Elischer	f9e354df42	Support for IPFW based transparent forwarding. Any packet that can be matched by a ipfw rule can be redirected transparently to another port or machine. Redirection to another port mostly makes sense with tcp, where a session can be set up between a proxy and an unsuspecting client. Redirection to another machine requires that the other machine also be expecting to receive the forwarded packets, as their headers will not have been modified. /sbin/ipfw must be recompiled!!! Reviewed by: Peter Wemm <peter@freebsd.org> Submitted by: Chrisy Luke <chrisy@flix.net>	1998-07-06 03:20:19 +00:00
Peter Wemm	04a3fd1276	Let the sowwakeup macro decide when to call sowakeup rather than have tcp "know" about it. A pending upcall would be missed, eg: used by NFS. Obtained from: NetBSD	1998-05-31 18:42:49 +00:00
Guido van Rooij	068373b683	Grumble...It seems I'm suffering from some mental disease. Do it correct now.	1998-05-18 17:11:24 +00:00
Guido van Rooij	0bce271a1f	Add some parenthesis for clarity and fix a bug Pointed out by: Garrett Wollmand	1998-05-18 17:07:58 +00:00
Guido van Rooij	11ad455083	Refuse accellerated opens on listening sockets that have not set the TCP_NOPUSH socket option. This disables TAO for those services that do not know about T/TCP. Reviewed by: Garrett Wollman Submitted by: Peter Wemm	1998-05-04 17:59:52 +00:00
David Greenman	84e33c9eea	At the request of Garrett, changed sysctl: net.inet.tcp.delack_enabled -> net.inet.tcp.delayed_ack	1998-04-24 10:08:57 +00:00
Dag-Erling Smørgrav	dc73342347	Seventy-odd "its" / "it's" typos in comments fixed as per kern/6108.	1998-04-17 22:37:19 +00:00
Poul-Henning Kamp	8e5db87cdb	Remove the last traces of TUBA. Inspired by: PR kern/3317	1998-04-06 06:52:47 +00:00
Bill Fenner	75daa6a53f	Remove the check for SYN in SYN_RECEIVED state; it breaks simultaneous connect. This check was added as part of the defense against the "land" attack, to prevent attacks which guess the ISS from going into ESTABLISHED. The "src == dst" check will still prevent the single-homed case of the "land" attack, and guessing ISS's should be hard anyway. Submitted by: David Borman <dab@bsdi.com>	1998-03-20 00:43:29 +00:00
David Greenman	f498eeeead	Changes to support the addition of a new sysctl variable: net.inet.tcp.delack_enabled Which defaults to 1 and can be set to 0 to disable TCP delayed-ack processing (i.e. all acks are immediate).	1998-02-26 05:25:39 +00:00
David Greenman	c3229e05a3	Improved connection establishment performance by doing local port lookups via a hashed port list. In the new scheme, in_pcblookup() goes away and is replaced by a new routine, in_pcblookup_local() for doing the local port check. Note that this implementation is space inefficient in that the PCB struct is now too large to fit into 128 bytes. I might deal with this in the future by using the new zone allocator, but I wanted these changes to be extensively tested in their current form first. Also: 1) Fixed off-by-one errors in the port lookup loops in in_pcbbind(). 2) Got rid of some unneeded rehashing. Adding a new routine, in_pcbinshash() to do the initialial hash insertion. 3) Renamed in_pcblookuphash() to in_pcblookup_hash() for easier readability. 4) Added a new routine, in_pcbremlists() to remove the PCB from the various hash lists. 5) Added/deleted comments where appropriate. 6) Removed unnecessary splnet() locking. In general, the PCB functions should be called at splnet()...there are unfortunately a few exceptions, however. 7) Reorganized a few structs for better cache line behavior. 8) Killed my TCP_ACK_HACK kludge. It may come back in a different form in the future, however. These changes have been tested on wcarchive for more than a month. In tests done here, connection establishment overhead is reduced by more than 50 times, thus getting rid of one of the major networking scalability problems. Still to do: make tcp_fastimo/tcp_slowtimo scale well for systems with a large number of connections. tcp_fastimo is easy; tcp_slowtimo is difficult. WARNING: Anything that knows about inpcb and tcpcb structs will have to be recompiled; at the very least, this includes netstat(1).	1998-01-27 09:15:13 +00:00
Bill Fenner	764d8cef56	A more complete fix for the "land" attack, removing the "quick fix" from rev 1.66. This fix contains both belt and suspenders. Belt: ignore packets where src == dst and srcport == dstport in TCPS_LISTEN. These packets can only legitimately occur when connecting a socket to itself, which doesn't go through TCPS_LISTEN (it goes CLOSED->SYN_SENT->SYN_RCVD-> ESTABLISHED). This prevents the "standard" "land" attack, although doesn't prevent the multi-homed variation. Suspenders: send a RST in response to a SYN/ACK in SYN_RECEIVED state. The only packets we should get in SYN_RECEIVED are 1. A retransmitted SYN, or 2. An ack of our SYN/ACK. The "land" attack depends on us accepting our own SYN/ACK as an ACK; in SYN_RECEIVED state; this should prevent all "land" attacks. We also move up the sequence number check for the ACK in SYN_RECEIVED. This neither helps nor hurts with respect to the "land" attack, but puts more of the validation checking in one spot. PR: kern/5103	1998-01-21 02:05:59 +00:00
Bruce Evans	592071e854	Don't use ANSI string concatenation to misformat a string.	1997-12-19 23:46:21 +00:00
Garrett Wollman	76d3eadb53	Add Matt Dillon's quick fix hack for the self-connect DoS. PR: 5103	1997-11-20 20:04:49 +00:00
Poul-Henning Kamp	4a11ca4e29	Remove a bunch of variables which were unused both in GENERIC and LINT. Found by: -Wunused	1997-11-07 08:53:44 +00:00
Bruce Evans	55b211e3af	Removed unused #includes.	1997-10-28 15:59:26 +00:00
David Greenman	4281faf253	Killed the SYN_RECEIVED addition from rev 1.52. It results in legitimate RST's being ignored, keeping a connection around until it times out, and thus has the opposite effect of what was intended (which is to make the system more robust to DoS attacks).	1997-10-02 02:10:40 +00:00
Bill Fenner	026650e576	Don't consider a SYN/ACK with CC but no CCECHO a proper T/TCP handshake. Reviewed by: Rich Stevens <rstevens@kohala.com>	1997-09-30 16:38:09 +00:00
Joerg Wunsch	0cc12cc57e	Make TCPDEBUG a new-style option.	1997-09-16 18:36:06 +00:00
Garrett Wollman	57bf258e3d	Fix all areas of the system (or at least all those in LINT) to avoid storing socket addresses in mbufs. (Socket buffers are the one exception.) A number of kernel APIs needed to get fixed in order to make this happen. Also, fix three protocol families which kept PCBs in mbufs to not malloc them instead. Delete some old compatibility cruft while we're at it, and add some new routines in the in_cksum family.	1997-08-16 19:16:27 +00:00
John Polstra	66e39adc7c	Fix a bug (apparently very old) that can cause a TCP connection to be dropped when it has an unusual traffic pattern. For full details as well as a test case that demonstrates the failure, see the referenced PR. Under certain circumstances involving the persist state, it is possible for the receive side's tp->rcv_nxt to advance beyond its tp->rcv_adv. This causes (tp->rcv_adv - tp->rcv_nxt) to become negative. However, in the code affected by this fix, that difference was interpreted as an unsigned number by max(). Since it was negative, it was taken as a huge unsigned number. The effect was to cause the receiver to believe that its receive window had negative size, thereby rejecting all received segments including ACKs. As the test case shows, this led to fruitless retransmissions and eventually to a dropped connection. Even connections using the loopback interface could be dropped. The fix substitutes the signed imax() for the unsigned max() function. PR: closes kern/3998 Reviewed by: davidg, fenner, wollman	1997-07-01 05:42:16 +00:00
Garrett Wollman	a29f300e80	The long-awaited mega-massive-network-code- cleanup. Part I. This commit includes the following changes: 1) Old-style (pr_usrreq()) protocols are no longer supported, the compatibility glue for them is deleted, and the kernel will panic on boot if any are compiled in. 2) Certain protocol entry points are modified to take a process structure, so they they can easily tell whether or not it is possible to sleep, and also to access credentials. 3) SS_PRIV is no more, and with it goes the SO_PRIVSTATE setsockopt() call. Protocols should use the process pointer they are now passed. 4) The PF_LOCAL and PF_ROUTE families have been updated to use the new style, as has the `raw' skeleton family. 5) PF_LOCAL sockets now obey the process's umask when creating a socket in the filesystem. As a result, LINT is now broken. I'm hoping that some enterprising hacker with a bit more time will either make the broken bits work (should be easy for netipx) or dike them out.	1997-04-27 20:01:29 +00:00
Peter Wemm	6875d25465	Back out part 1 of the MCFH that changed $Id$ to $FreeBSD$. We are not ready for it yet.	1997-02-22 09:48:43 +00:00
Jordan K. Hubbard	1130b656e5	Make the long-awaited change from $Id$ to $FreeBSD$ This will make a number of things easier in the future, as well as (finally!) avoiding the Id-smashing problem which has plagued developers for so long. Boy, I'm glad we're not using sup anymore. This update would have been insane otherwise.	1997-01-14 07:20:47 +00:00
Bill Fenner	39172c9401	Re-enable the TCP SYN-attack protection code. I was the one who didn't understand the socket state flag. 2.2 candidate.	1996-11-10 07:37:24 +00:00

... 2 3 4 5 6 ...

404 Commits