Commit Graph

69 Commits

Author SHA1 Message Date
dg
7262ff6e58 Improved connection establishment performance by doing local port lookups via
a hashed port list. In the new scheme, in_pcblookup() goes away and is
replaced by a new routine, in_pcblookup_local() for doing the local port
check. Note that this implementation is space inefficient in that the PCB
struct is now too large to fit into 128 bytes. I might deal with this in the
future by using the new zone allocator, but I wanted these changes to be
extensively tested in their current form first.

Also:
1) Fixed off-by-one errors in the port lookup loops in in_pcbbind().
2) Got rid of some unneeded rehashing. Adding a new routine, in_pcbinshash()
   to do the initialial hash insertion.
3) Renamed in_pcblookuphash() to in_pcblookup_hash() for easier readability.
4) Added a new routine, in_pcbremlists() to remove the PCB from the various
   hash lists.
5) Added/deleted comments where appropriate.
6) Removed unnecessary splnet() locking. In general, the PCB functions should
   be called at splnet()...there are unfortunately a few exceptions, however.
7) Reorganized a few structs for better cache line behavior.
8) Killed my TCP_ACK_HACK kludge. It may come back in a different form in
   the future, however.

These changes have been tested on wcarchive for more than a month. In tests
done here, connection establishment overhead is reduced by more than 50
times, thus getting rid of one of the major networking scalability problems.

Still to do: make tcp_fastimo/tcp_slowtimo scale well for systems with a
large number of connections. tcp_fastimo is easy; tcp_slowtimo is difficult.

WARNING: Anything that knows about inpcb and tcpcb structs will have to be
         recompiled; at the very least, this includes netstat(1).
1998-01-27 09:15:13 +00:00
fenner
606a03ebe5 A more complete fix for the "land" attack, removing the "quick fix" from
rev 1.66.  This fix contains both belt and suspenders.

Belt: ignore packets where src == dst and srcport == dstport in TCPS_LISTEN.
 These packets can only legitimately occur when connecting a socket to itself,
 which doesn't go through TCPS_LISTEN (it goes CLOSED->SYN_SENT->SYN_RCVD->
 ESTABLISHED).  This prevents the "standard" "land" attack, although doesn't
 prevent the multi-homed variation.

Suspenders: send a RST in response to a SYN/ACK in SYN_RECEIVED state.
 The only packets we should get in SYN_RECEIVED are
 1. A retransmitted SYN, or
 2. An ack of our SYN/ACK.
 The "land" attack depends on us accepting our own SYN/ACK as an ACK;
 in SYN_RECEIVED state; this should prevent all "land" attacks.

We also move up the sequence number check for the ACK in SYN_RECEIVED.
 This neither helps nor hurts with respect to the "land" attack, but
 puts more of the validation checking in one spot.

PR:             kern/5103
1998-01-21 02:05:59 +00:00
bde
75c4ef96e7 Don't use ANSI string concatenation to misformat a string. 1997-12-19 23:46:21 +00:00
wollman
390341dca5 Add Matt Dillon's quick fix hack for the self-connect DoS.
PR:		5103
1997-11-20 20:04:49 +00:00
phk
4d26888936 Remove a bunch of variables which were unused both in GENERIC and LINT.
Found by:	-Wunused
1997-11-07 08:53:44 +00:00
bde
fb826377ff Removed unused #includes. 1997-10-28 15:59:26 +00:00
dg
295181cc83 Killed the SYN_RECEIVED addition from rev 1.52. It results in legitimate
RST's being ignored, keeping a connection around until it times out, and
thus has the opposite effect of what was intended (which is to make the
system more robust to DoS attacks).
1997-10-02 02:10:40 +00:00
fenner
e71cc90452 Don't consider a SYN/ACK with CC but no CCECHO a proper T/TCP
handshake.

Reviewed by:	Rich Stevens <rstevens@kohala.com>
1997-09-30 16:38:09 +00:00
joerg
c65e27777e Make TCPDEBUG a new-style option. 1997-09-16 18:36:06 +00:00
wollman
4542c1cf5d Fix all areas of the system (or at least all those in LINT) to avoid storing
socket addresses in mbufs.  (Socket buffers are the one exception.)  A number
of kernel APIs needed to get fixed in order to make this happen.  Also,
fix three protocol families which kept PCBs in mbufs to not malloc them
instead.  Delete some old compatibility cruft while we're at it, and add
some new routines in the in_cksum family.
1997-08-16 19:16:27 +00:00
jdp
3f044120cd Fix a bug (apparently very old) that can cause a TCP connection to
be dropped when it has an unusual traffic pattern.  For full details
as well as a test case that demonstrates the failure, see the
referenced PR.

Under certain circumstances involving the persist state, it is
possible for the receive side's tp->rcv_nxt to advance beyond its
tp->rcv_adv.  This causes (tp->rcv_adv - tp->rcv_nxt) to become
negative.  However, in the code affected by this fix, that difference
was interpreted as an unsigned number by max().  Since it was
negative, it was taken as a huge unsigned number.  The effect was
to cause the receiver to believe that its receive window had negative
size, thereby rejecting all received segments including ACKs.  As
the test case shows, this led to fruitless retransmissions and
eventually to a dropped connection.  Even connections using the
loopback interface could be dropped.  The fix substitutes the signed
imax() for the unsigned max() function.

PR:		closes kern/3998
Reviewed by:	davidg, fenner, wollman
1997-07-01 05:42:16 +00:00
wollman
6afbf203bd The long-awaited mega-massive-network-code- cleanup. Part I.
This commit includes the following changes:
1) Old-style (pr_usrreq()) protocols are no longer supported, the compatibility
glue for them is deleted, and the kernel will panic on boot if any are compiled
in.

2) Certain protocol entry points are modified to take a process structure,
so they they can easily tell whether or not it is possible to sleep, and
also to access credentials.

3) SS_PRIV is no more, and with it goes the SO_PRIVSTATE setsockopt()
call.  Protocols should use the process pointer they are now passed.

4) The PF_LOCAL and PF_ROUTE families have been updated to use the new
style, as has the `raw' skeleton family.

5) PF_LOCAL sockets now obey the process's umask when creating a socket
in the filesystem.

As a result, LINT is now broken.  I'm hoping that some enterprising hacker
with a bit more time will either make the broken bits work (should be
easy for netipx) or dike them out.
1997-04-27 20:01:29 +00:00
peter
94b6d72794 Back out part 1 of the MCFH that changed $Id$ to $FreeBSD$. We are not
ready for it yet.
1997-02-22 09:48:43 +00:00
jkh
808a36ef65 Make the long-awaited change from $Id$ to $FreeBSD$
This will make a number of things easier in the future, as well as (finally!)
avoiding the Id-smashing problem which has plagued developers for so long.

Boy, I'm glad we're not using sup anymore.  This update would have been
insane otherwise.
1997-01-14 07:20:47 +00:00
fenner
3227402145 Re-enable the TCP SYN-attack protection code. I was the one who didn't
understand the socket state flag.

2.2 candidate.
1996-11-10 07:37:24 +00:00
pst
430faa57f9 Fix two bugs I accidently put into the syn code at the last minute
(yes I had tested the hell out of this).

I've also temporarily disabled the code so that it behaves as it previously
did (tail drop's the syns) pending discussion with fenner about some socket
state flags that I don't fully understand.

Submitted by:	fenner
1996-10-11 19:26:42 +00:00
dg
00503a161c Improved in_pcblookuphash() to support wildcarding, and changed relavent
callers of it to take advantage of this. This reduces new connection
request overhead in the face of a large number of PCBs in the system.
Thanks to David Filo <filo@yahoo.com> for suggesting this and providing
a sample implementation (which wasn't used, but showed that it could be
done).

Reviewed by:	wollman
1996-10-07 19:06:12 +00:00
pst
b51353f335 Increase robustness of FreeBSD against high-rate connection attempt
denial of service attacks.

Reviewed by:	bde,wollman,olah
Inspired by:	vjs@sgi.com
1996-10-07 04:32:42 +00:00
pst
32aae00362 I don't understand, I committed this fix (move a counter and fixed a typo)
this evening.

I think I'm going insane.
1996-09-21 06:39:20 +00:00
ache
4b550cd4a8 Syntax error: so_incom -> so_incomp 1996-09-21 06:30:06 +00:00
pst
dd5375bf7c If the incomplete listen queue for a given socket is full,
drop the oldest entry in the queue.

There was a fair bit of discussion as to whether or not the
proper action is to drop a random entry in the queue.  It's
my conclusion that a random drop is better than a head drop,
however profiling this section of code (done by John Capo)
shows that a head-drop results in a significant performance
increase.

There are scenarios where a random drop is more appropriate.
If I find one in reality, I'll add the random drop code under
a conditional.

Obtained from: discussions and code done by Vernon Schryver (vjs@sgi.com).
1996-09-20 21:25:18 +00:00
pst
460ca264a3 Make the misnamed tcp initial keepalive timer value (which is really the
time, in seconds, that state for non-established TCP sessions stays about)
a sysctl modifyable variable.

[part 1 of two commits, I just realized I can't play with the indices as
 I was typing this commit message.]
1996-09-13 23:51:44 +00:00
pst
3499a65964 Receipt of two SYN's are sufficient to set the t_timer[TCPT_KEEP]
to "keepidle".  this should not occur unless the connection has
been established via the 3-way handshake which requires an ACK

Submitted by:	jmb
Obtained from:	problem discussed in Stevens vol. 3
1996-09-13 18:47:03 +00:00
fenner
1284248a56 Back out my stupid braino; I was thinking strlen and not sizeof. 1996-05-02 05:54:14 +00:00
fenner
2163f66f9a Size temp var correctly; buf[4*sizeof "123"] is not long enough
to store "192.252.119.189\0".
1996-05-02 05:31:13 +00:00
ache
8a5de28c05 inet_ntoa buffer was evaluated twice in log_in_vain, fix it.
Thanx to: jdp
1996-04-27 18:19:12 +00:00
wollman
c9ab94c878 Delete #ifdef notdef blocks containing old method of srtt calculation.
Requested by: davidg
1996-04-26 18:32:58 +00:00
pst
67931eee29 Logging UDP and TCP connection attempts should not be enabled by default.
It's trivial to create a denial of service attack on a box so enabled.

These messages, if enabled at all, must be rate-limited. (!)
1996-04-09 07:01:53 +00:00
phk
1eff72b85f Log TCP syn packets for ports we don't listen on.
Controlled by: sysctl net.inet.tcp.log_in_vain: 1

Log UDP syn packets for ports we don't listen on.
Controlled by: sysctl net.inet.udp.log_in_vain: 1

Suggested by:	Warren Toomey <wkt@cs.adfa.oz.au>
1996-04-04 10:46:44 +00:00
wollman
444648d459 Slight modification of RTO floor calculation. 1996-03-25 20:13:21 +00:00
wollman
acfe4c4467 A number of performance-reducing flaws fixed based on comments
from Larry Peterson &co. at Arizona:

- Header prediction for ACKs did not exclude Fast Retransmit/Recovery.
- srtt calculation tended to get ``stuck'' and could never decrease
  when below 8.  It still can't, but the scaling factors are adjusted
  so that this artifact does not cause as bad an effect on the RTO
  value as it used to.

The paper also points out the incr/8 error that has been long since fixed,
and the problems with ACKing frequency resulting from the use of options
which I suspect to be fixed already as well (as part of the T/TCP work).

Obtained from:	Brakmo & Peterson, ``Performance Problems in BSD4.4 TCP''
1996-03-22 18:09:21 +00:00
dg
3f0638f73b Move or add #include <queue.h> in preparation for upcoming struct socket
changes.
1996-03-11 15:13:58 +00:00
guido
89b4ca893f Add a counter for the number of times the listen queue was overflowed to
the tcpstat structure. (netstat -s)
Reviewed by:	wollman
Obtained from: Steves, TCP/IP Ill. vol.3, page 189
1996-02-26 21:47:13 +00:00
dg
41aff73dfb Fixed bug in Path MTU Discovery that caused the system to have to re-
discover the Path MTU for each connection if the connecting host didn't
offer an initial MSS.

Submitted by:	davidg & olah
1996-02-22 11:46:39 +00:00
olah
09077f7acf Fix a bug related to the interworking of T/TCP and window scaling:
when a connection enters the ESTBLS state using T/TCP, then window
scaling wasn't properly handled.  The fix is twofold.

1) When the 3WHS completes, make sure that we update our window
scaling state variables.

2) When setting the `virtual advertized window', then make sure
that we do not try to offer a window that is larger than the maximum
window without scaling (TCP_MAXWIN).

Reviewed by:	davidg
Reported by:	Jerry Chen <chen@Ipsilon.COM>
1996-01-31 08:22:24 +00:00
phk
9cb413a93c Another mega commit to staticize things. 1995-12-14 09:55:16 +00:00
phk
db2c71245d New style sysctl & staticize alot of stuff. 1995-11-14 20:34:56 +00:00
phk
c1dbd5b377 Start adding new style sysctl here too. 1995-11-09 20:23:09 +00:00
olah
d4e1ca409e Cosmetic changes to processing of segments in the SYN_SENT state:
- remove a redundant condition;
- complete all validity checks on segment before calling
  soisconnected(so).

Reviewed by:	Richard Stevens, davidg, wollman
1995-11-03 22:31:54 +00:00
wollman
7c65eebe94 Routes can be asymmetric. Always offer to /accept/ an MSS of up to the
capacity of the link, even if the route's MTU indicates that we cannot
send that much in their direction.  (This might actually make it possible
to test Path MTU discovery in a useful variety of cases.)
1995-10-13 16:00:25 +00:00
wollman
3fc43db861 Finish 4.4-Lite-2 merge: randomize TCP initial sequence numbers
to make ISS-guessing spoofing attacks harder.
1995-10-03 16:54:17 +00:00
olah
fd35d46e41 Remove a redundant `if' from tcp_reass().
Correct a typo in a comment (SEND_SYN -> NEEDSYN).

Reviewed by:	David Greenman
1995-07-31 10:24:22 +00:00
wollman
ae6523c0e5 tcp_input.c - keep track of how many times a route contained a cached rtt
or ssthresh that we were able to use

tcp_var.h - declare tcpstat entries for above; declare tcp_{send,recv}space

in_rmx.c - fill in the MTU and pipe sizes with the defaults TCP would have
	used anyway in the absence of values here
1995-07-10 15:39:16 +00:00
wollman
35b757bd67 Keep track of the number of samples through the srtt filter so that we
know better when to cache values in the route, rather than relying on a
heuristic involving sequence numbers that broke when tcp_sendspace
was increased to 16k.
1995-06-29 18:11:24 +00:00
rgrimes
c86f0c7a71 Remove trailing whitespace. 1995-05-30 08:16:23 +00:00
dg
fc19afab24 #ifdef'd my Nagel/ACK hack with "TCP_ACK_HACK", disabled by default. I'm
currently considering reducing the TCP fasttimo to 100ms to help improve
things, but this would be done as a seperate step at some point in the
future.
This was done because it was causing some sometimes serious performance
problems with T/TCP.
1995-05-11 01:41:06 +00:00
olah
e994f8f005 Fix a misspelled constant in tcp_input.c.
On Tue, 09 May 1995 04:35:27 PDT, Richard Stevens wrote:
> In tcp_dooptions() under the case TCPOPT_CC there is an assignment
>
>       to->to_flag |= TCPOPT_CC;
>
> that should be
>
>       to->to_flag |= TOF_CC;
>
> I haven't thought through the ramifications of what's been happening ...
>
>       Rich Stevens

Submitted by:	rstevens@noao.edu (Richard Stevens)
1995-05-09 12:32:06 +00:00
dg
b8a73effc2 Changed in_pcblookuphash() to not automatically call in_pcblookup() if
the lookup fails. Updated callers to deal with this. Call in_pcblookuphash
instead of in_pcblookup() in in_pcbconnect; this improves performance of
UDP output by about 17% in the standard case.
1995-05-03 07:16:53 +00:00
dg
30e9776583 Further satisfy my paranoia by making sure that the ACKNOW is only
set when ti_len is non-zero.
1995-04-10 17:37:46 +00:00
dg
95eb1b8365 Fixed bug I introduced with my Nagel hack which caused tcp_input and
tcp_output to loop endlessly. This was freefall's problem during the past
day.
1995-04-10 17:16:10 +00:00