170 Commits

Author SHA1 Message Date
charnier
7dd9d47059 Replace various spelling with FALLTHROUGH which is lint()able 2002-08-25 13:23:09 +00:00
jmallett
af73db292e Enclose IPv6 addresses in brackets when they are displayed printable with a
TCP/UDP port seperated by a colon.  This is for the log_in_vain facility.

Pointed out by:	Edward J. M. Brocklesby
Reviewed by:	ume
MFC after:	2 weeks
2002-08-19 19:47:13 +00:00
dillon
bb806c49bf Implement TCP bandwidth delay product window limiting, similar to (but
not meant to duplicate) TCP/Vegas.  Add four sysctls and default the
implementation to 'off'.

net.inet.tcp.inflight_enable	enable algorithm (defaults to 0=off)
net.inet.tcp.inflight_debug	debugging (defaults to 1=on)
net.inet.tcp.inflight_min	minimum window limit
net.inet.tcp.inflight_max	maximum window limit

MFC after:	1 week
2002-08-17 18:26:02 +00:00
hsu
2c79764ced Cosmetic-only changes for readability.
Reviewed by:	(early form passed by) bde
Approved by:	itojun (from core@kame.net)
2002-08-17 02:05:25 +00:00
rwatson
aa8060c29e Rename mac_check_socket_receive() to mac_check_socket_deliver() so that
we can use the names _receive() and _send() for the receive() and send()
checks.  Rename related constants, policy implementations, etc.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-08-15 18:51:27 +00:00
hsu
deb0142560 Reset dupack count in header prediction.
Follow-on to rev 1.39.

Reviewed by: jayanth, Thomas R Henderson <thomas.r.henderson@boeing.com>, silby, dillon
2002-08-15 17:13:18 +00:00
rwatson
a034d0cd3c Introduce support for Mandatory Access Control and extensible
kernel access control.

Instrument the TCP socket code for packet generation and delivery:
label outgoing mbufs with the label of the socket, and check socket and
mbuf labels before permitting delivery to a socket.  Assign labels
to newly accepted connections when the syncache/cookie code has done
its business.  Also set peer labels as convenient.  Currently,
MAC policies cannot influence the PCB matching algorithm, so cannot
implement polyinstantiation.  Note that there is at least one case
where a PCB is not available due to the TCP packet not being associated
with any socket, so we don't label in that case, but need to handle
it in a special manner.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-07-31 19:06:49 +00:00
ru
cb7f1cefec Don't shrink socket buffers in tcp_mss(), application might have already
configured them with setsockopt(SO_*BUF), for RFC1323's scaled windows.

PR:		kern/11966
MFC after:	1 week
2002-07-22 22:31:09 +00:00
dillon
62fef107e4 Add the tcps_sndrexmitbad statistic, keep track of late acks that caused
unnecessary retransmissions.
2002-07-19 18:29:38 +00:00
hsu
deb4c976c7 Avoid unlocking the inp twice if badport_bandlim() returns -1.
Reported by:	jlemon
2002-06-24 22:25:00 +00:00
hsu
f13c72f301 Style bug: fix 4 space indentations that should have been tabs.
Submitted by:	jlemon
2002-06-24 16:47:02 +00:00
luigi
ebcf841898 Move two global variables to automatic variables within the
only function where they are used (they are used with TCPDEBUG only).
2002-06-23 21:22:56 +00:00
luigi
5259888148 Remove (almost all) global variables that were used to hold
packet forwarding state ("annotations") during ip processing.
The code is considerably cleaner now.

The variables removed by this change are:

        ip_divert_cookie        used by divert sockets
        ip_fw_fwd_addr          used for transparent ip redirection
        last_pkt                used by dynamic pipes in dummynet

Removal of the first two has been done by carrying the annotations
into volatile structs prepended to the mbuf chains, and adding
appropriate code to add/remove annotations in the routines which
make use of them, i.e. ip_input(), ip_output(), tcp_input(),
bdg_forward(), ether_demux(), ether_output_frame(), div_output().

On passing, remove a bug in divert handling of fragmented packet.
Now it is the fragment at offset 0 which sets the divert status of
the whole packet, whereas formerly it was the last incoming fragment
to decide.

Removal of last_pkt required a change in the interface of ip_fw_chk()
and dummynet_io(). On passing, use the same mechanism for dummynet
annotations and for divert/forward annotations.

option IPFIREWALL_FORWARD is effectively useless, the code to
implement it is very small and is now in by default to avoid the
obfuscation of conditionally compiled code.

NOTES:
 * there is at least one global variable left, sro_fwd, in ip_output().
   I am not sure if/how this can be removed.

 * I have deliberately avoided gratuitous style changes in this commit
   to avoid cluttering the diffs. Minor stule cleanup will likely be
   necessary

 * this commit only focused on the IP layer. I am sure there is a
   number of global variables used in the TCP and maybe UDP stack.

 * despite the number of files touched, there are absolutely no API's
   or data structures changed by this commit (except the interfaces of
   ip_fw_chk() and dummynet_io(), which are internal anyways), so
   an MFC is quite safe and unintrusive (and desirable, given the
   improved readability of the code).

MFC after: 10 days
2002-06-22 11:51:02 +00:00
tanimura
cb3347e926 Remove so*_locked(), which were backed out by mistake. 2002-06-18 07:42:02 +00:00
hsu
cd25d4648f Lock up inpcb.
Submitted by:	Jennifer Yang <yangjihui@yahoo.com>
2002-06-10 20:05:46 +00:00
tanimura
e6fa9b9e92 Back out my lats commit of locking down a socket, it conflicts with hsu's work.
Requested by:	hsu
2002-05-31 11:52:35 +00:00
tanimura
92d8381dd5 Lock down a socket, milestone 1.
o Add a mutex (sb_mtx) to struct sockbuf. This protects the data in a
  socket buffer. The mutex in the receive buffer also protects the data
  in struct socket.

o Determine the lock strategy for each members in struct socket.

o Lock down the following members:

  - so_count
  - so_options
  - so_linger
  - so_state

o Remove *_locked() socket APIs.  Make the following socket APIs
  touching the members above now require a locked socket:

 - sodisconnect()
 - soisconnected()
 - soisconnecting()
 - soisdisconnected()
 - soisdisconnecting()
 - sofree()
 - soref()
 - sorele()
 - sorwakeup()
 - sotryfree()
 - sowakeup()
 - sowwakeup()

Reviewed by:	alfred
2002-05-20 05:41:09 +00:00
alfred
798c53d495 Redo the sigio locking.
Turn the sigio sx into a mutex.

Sigio lock is really only needed to protect interrupts from dereferencing
the sigio pointer in an object when the sigio itself is being destroyed.

In order to do this in the most unintrusive manner change pgsigio's
sigio * argument into a **, that way we can lock internally to the
function.
2002-05-01 20:44:46 +00:00
tanimura
89ec521d91 Revert the change of #includes in sys/filedesc.h and sys/socketvar.h.
Requested by:	bde

Since locking sigio_lock is usually followed by calling pgsigio(),
move the declaration of sigio_lock and the definitions of SIGIO_*() to
sys/signalvar.h.

While I am here, sort include files alphabetically, where possible.
2002-04-30 01:54:54 +00:00
tanimura
dbb4756491 Add a global sx sigio_lock to protect the pointer to the sigio object
of a socket.  This avoids lock order reversal caused by locking a
process in pgsigio().

sowakeup() and the callers of it (sowwakeup, soisconnected, etc.) now
require sigio_lock to be locked.  Provide sowwakeup_locked(),
soisconnected_locked(), and so on in case where we have to modify a
socket and wake up a process atomically.
2002-04-27 08:24:29 +00:00
suz
553226e8e1 just merged cosmetic changes from KAME to ease sync between KAME and FreeBSD.
(based on freebsd4-snap-20020128)

Reviewed by:	ume
MFC after:	1 week
2002-04-19 04:46:24 +00:00
silby
c7389be7ba Remove some ISN generation code which has been unused since the
syncache went in.

MFC after:	3 days
2002-04-10 22:12:01 +00:00
bde
867fc1ed1c Fixed some style bugs in the removal of __P(()). Continuation lines
were not outdented to preserve non-KNF lining up of code with parentheses.
Switch to KNF formatting.
2002-03-24 10:19:10 +00:00
alfred
357e37e023 Remove __P. 2002-03-19 21:25:46 +00:00
cjc
822f4e8381 Change the wording of the inline comments from the previous commit.
Objection from:	ru
2002-02-27 13:52:06 +00:00
cjc
8b28692f71 The TCP code did not do sufficient checks on whether incoming packets
were destined for a broadcast IP address. All TCP packets with a
broadcast destination must be ignored. The system only ignored packets
that were _link-layer_ broadcasts or multicast. We need to check the
IP address too since it is quite possible for a broadcast IP address
to come in with a unicast link-layer address.

Note that the check existed prior to CSRG revision 7.35, but was
removed. This commit effectively backs out that nine-year-old change.

PR:		misc/35022
2002-02-25 08:29:21 +00:00
mike
bcee06d42c o Move NTOHL() and associated macros into <sys/param.h>. These are
deprecated in favor of the POSIX-defined lowercase variants.
o Change all occurrences of NTOHL() and associated marcros in the
  source tree to use the lowercase function variants.
o Add missing license bits to sparc64's <machine/endian.h>.
  Approved by: jake
o Clean up <machine/endian.h> files.
o Remove unused __uint16_swap_uint32() from i386's <machine/endian.h>.
o Remove prototypes for non-existent bswapXX() functions.
o Include <machine/endian.h> in <arpa/inet.h> to define the
  POSIX-required ntohl() family of functions.
o Do similar things to expose the ntohl() family in libstand, <netinet/in.h>,
  and <sys/param.h>.
o Prepend underscores to the ntohl() family to help deal with
  complexities associated with having MD (asm and inline) versions, and
  having to prevent exposure of these functions in other headers that
  happen to make use of endian-specific defines.
o Create weak aliases to the canonical function name to help deal with
  third-party software forgetting to include an appropriate header.
o Remove some now unneeded pollution from <sys/types.h>.
o Add missing <arpa/inet.h> includes in userland.

Tested on:	alpha, i386
Reviewed by:	bde, jake, tmm
2002-02-18 20:35:27 +00:00
rwatson
46f317e07b o Spelling fix in comment: tcp_ouput -> tcp_output 2002-01-04 17:21:27 +00:00
jlemon
ec4b51f883 Fix up tabs in comments. 2001-12-13 04:02:09 +00:00
dillon
f97547e246 Fix a bug with transmitter restart after receiving a 0 window. The
receiver was not sending an immediate ack with delayed acks turned on
when the input buffer is drained, preventing the transmitter from
restarting immediately.

Propogate the TCP_NODELAY option to accept()ed sockets.  (Helps tbench and
is a good idea anyway).

Some cleanup.  Identify additonal issues in comments.

MFC after:	1 day
2001-12-02 08:49:29 +00:00
jlemon
a3c1c9fdb4 Introduce a syncache, which enables FreeBSD to withstand a SYN flood
DoS in an improved fashion over the existing code.

Reviewed by: silby  (in a previous iteration)
Sponsored by: DARPA, NAI Labs
2001-11-22 04:50:44 +00:00
jlemon
c41580e9ad Move initialization of snd_recover into tcp_sendseqinit(). 2001-11-21 18:45:51 +00:00
julian
5596676e6c KSE Milestone 2
Note ALL MODULES MUST BE RECOMPILED
make the kernel aware that there are smaller units of scheduling than the
process. (but only allow one thread per process at this time).
This is functionally equivalent to teh previousl -current except
that there is a thread associated with each process.

Sorry john! (your next MFC will be a doosie!)

Reviewed by: peter@freebsd.org, dillon@freebsd.org

X-MFC after:    ha ha ha ha
2001-09-12 08:38:13 +00:00
julian
071f86f9f1 Patches from Keiichi SHIMA <keiichi@iij.ad.jp>
to make ip use the standard protosw structure again.

Obtained from: Well, KAME I guess.
2001-09-03 20:03:55 +00:00
jayanth
77d67fb568 when newreno is turned on, if dupacks = 1 or dupacks = 2 and
new data is acknowledged, reset the dupacks to 0.
The problem was spotted when a connection had its send buffer full
because the congestion window was only 1 MSS and was not being incremented
because dupacks was not reset to 0.

Obtained from:		Yahoo!
2001-08-29 23:54:13 +00:00
dd
6ea3a08d37 Correct a typo in a comment: FIN_WAIT2 -> FIN_WAIT_2
PR:		29970
Submitted by:	Joseph Mallett <jmallett@xMach.org>
2001-08-23 22:34:29 +00:00
silby
58e247fcc4 Much delayed but now present: RFC 1948 style sequence numbers
In order to ensure security and functionality, RFC 1948 style
initial sequence number generation has been implemented.  Barring
any major crypographic breakthroughs, this algorithm should be
unbreakable.  In addition, the problems with TIME_WAIT recycling
which affect our currently used algorithm are not present.

Reviewed by: jesper
2001-08-22 00:58:16 +00:00
silby
2be73222cb Temporary feature: Runtime tuneable tcp initial sequence number
generation scheme.  Users may now select between the currently used
OpenBSD algorithm and the older random positive increment method.

While the OpenBSD algorithm is more secure, it also breaks TIME_WAIT
handling; this is causing trouble for an increasing number of folks.

To switch between generation schemes, one sets the sysctl
net.inet.tcp.tcp_seq_genscheme.  0 = random positive increments,
1 = the OpenBSD algorithm.  1 is still the default.

Once a secure _and_ compatible algorithm is implemented, this sysctl
will be removed.

Reviewed by: jlemon
Tested by: numerous subscribers of -net
2001-07-08 02:20:47 +00:00
ru
f8e11dde26 Add netstat(1) knob to reset net.inet.{ip|icmp|tcp|udp|igmp}.stats.
For example, ``netstat -s -p ip -z'' will show and reset IP stats.

PR:		bin/17338
2001-06-23 17:17:59 +00:00
silby
f41767543e Eliminate the allocation of a tcp template structure for each
connection.  The information contained in a tcptemp can be
reconstructed from a tcpcb when needed.

Previously, tcp templates required the allocation of one
mbuf per connection.  On large systems, this change should
free up a large number of mbufs.

Reviewed by:	bmilekic, jlemon, ru
MFC after: 2 weeks
2001-06-23 03:21:46 +00:00
ume
832f8d2249 Sync with recent KAME.
This work was based on kame-20010528-freebsd43-snap.tgz and some
critical problem after the snap was out were fixed.
There are many many changes since last KAME merge.

TODO:
  - The definitions of SADB_* in sys/net/pfkeyv2.h are still different
    from RFC2407/IANA assignment because of binary compatibility
    issue.  It should be fixed under 5-CURRENT.
  - ip6po_m member of struct ip6_pktopts is no longer used.  But, it
    is still there because of binary compatibility issue.  It should
    be removed under 5-CURRENT.

Reviewed by:	itojun
Obtained from:	KAME
MFC after:	3 weeks
2001-06-11 12:39:29 +00:00
jesper
9d59cfc3ee Silby's take one on increasing FreeBSD's resistance to SYN floods:
One way we can reduce the amount of traffic we send in response to a SYN
flood is to eliminate the RST we send when removing a connection from
the listen queue.  Since we are being flooded, we can assume that the
majority of connections in the queue are bogus.  Our RST is unwanted
by these hosts, just as our SYN-ACK was.  Genuine connection attempts
will result in hosts responding to our SYN-ACK with an ACK packet.  We
will automatically return a RST response to their ACK when it gets to us
if the connection has been dropped, so the early RST doesn't serve the
genuine class of connections much.  In summary, we can reduce the number
of packets we send by a factor of two without any loss in functionality
by ensuring that RST packets are not sent when dropping a connection
from the listen queue.

Submitted by:	Mike Silbersack <silby@silby.com>
Reviewed by:	jesper
MFC after:	2 weeks
2001-06-06 19:41:51 +00:00
jesper
aa7ec52010 Inline TCP_REASS() in the single location where it's used,
just as OpenBSD and NetBSD has done.

No functional difference.

MFC after:	2 weeks
2001-05-29 19:54:45 +00:00
jesper
02dca88184 properly delay acks in half-closed TCP connections
PR:	24962
Submitted by:	Tony Finch <dot@dotat.at>
MFC after:	2 weeks
2001-05-29 19:51:45 +00:00
jesper
a1fab55459 Say goodbye to TCP_COMPAT_42
Reviewed by:	wollman
Requested by:	wollman
2001-04-20 11:58:56 +00:00
kris
0c55f2e6da Randomize the TCP initial sequence numbers more thoroughly.
Obtained from:	OpenBSD
Reviewed by:	jesper, peter, -developers
2001-04-17 18:08:01 +00:00
des
9dc769bc1b Axe TCP_RESTRICT_RST. It was never a particularly good idea except for a few
very specific scenarios, and now that we have had net.inet.tcp.blackhole for
quite some time there is really no reason to use it any more.

(last of three commits)
2001-03-19 22:09:00 +00:00
jlemon
3b8f8e9938 Do not delay a new ack if there already is a delayed ack pending on the
connection, but send it immediately.  Prior to this change, it was possible
to delay a delayed-ack for multiple times, resulting in degraded TCP
behavior in certain corner cases.
2001-02-25 15:17:24 +00:00
bmilekic
0f9088da56 Clean up RST ratelimiting. Previously, ratelimiting occured before tests
were performed to determine if the received packet should be reset. This
created erroneous ratelimiting and false alarms in some cases. The code
has now been reorganized so that the checks for validity come before
the call to badport_bandlim. Additionally, a few changes in the symbolic
names of the bandlim types have been made, as well as a clarification of
exactly which type each RST case falls under.

Submitted by: Mike Silbersack <silby@silby.com>
2001-02-11 07:39:51 +00:00
wollman
08d0e8d96f Correct a comment. 2001-01-24 16:25:36 +00:00