Commit Graph

1816 Commits

Author SHA1 Message Date
Robert Watson
5e7ce4785f Modify the mac_init_ipq() MAC Framework entry point to accept an
additional flags argument to indicate blocking disposition, and
pass in M_NOWAIT from the IP reassembly code to indicate that
blocking is not OK when labeling a new IP fragment reassembly
queue.  This should eliminate some of the WITNESS warnings that
have started popping up since fine-grained IP stack locking
started going in; if memory allocation fails, the creation of
the fragment queue will be aborted.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories
2003-03-26 15:12:03 +00:00
Maxime Henrion
511e01e2d6 Try to make the MBUF_FRAG_TEST code work better.
- Don't try to fragment the packet if it's smaller than mbuf_frag_size.
- Preserve the size of the mbuf chain which is modified by m_split().
- Check that m_split() didn't return NULL.
- Make it so we don't end up with two M_PKTHDR mbuf in the chain.
- Use m->m_pkthdr.len instead of m->m_len so that we fragment the whole
  chain and not just the first mbuf.
- Fix a nearby style bug and rework the logic of the loops so that it's
  more clear.

This is still not quite right, because we're clearly abusing m_split() to
do something it was not designed for, but at least it works now.  We
should probably move this code into a m_fragment() function when it's
correct.
2003-03-25 23:49:14 +00:00
Mike Silbersack
9d9edc5693 Add the MBUF_FRAG_TEST option. When compiled in, this option
allows you to tell ip_output to fragment all outgoing packets
into mbuf fragments of size net.inet.ip.mbuf_frag_size bytes.
This is an excellent way to test if network drivers can properly
handle long mbuf chains being passed to them.

net.inet.ip.mbuf_frag_size defaults to 0 (no fragmentation)
so that you can at least boot before your network driver dies. :)
2003-03-25 05:45:05 +00:00
Maxime Henrion
aecfcdb824 Use __packed instead of __attribute__((__packed__)). 2003-03-22 00:25:14 +00:00
Matthew N. Dodd
57842a38fd Add a sysctl node allowing the specification of an address mask to use
when replying to ICMP Address Mask Request packets.
2003-03-21 15:43:06 +00:00
Matthew N. Dodd
21150298bb Add comments regarding the ICMP timestamp fields. 2003-03-21 15:28:10 +00:00
Crist J. Clark
010dabb047 Add a 'verrevpath' option that verifies the interface that a packet
comes in on is the same interface that we would route out of to get to
the packet's source address. Essentially automates an anti-spoofing
check using the information in the routing table.

Experimental. The usage and rule format for the feature may still be
subject to change.
2003-03-15 01:13:00 +00:00
Jeffrey Hsu
7792ea2700 Greatly simplify the unlocking logic by holding the TCP protocol lock until
after FIN_WAIT_2 processing.

Helped with debugging:	Doug Barton
2003-03-13 11:46:57 +00:00
Jeffrey Hsu
da3a8a1a4f Add support for RFC 3390, which allows for a variable-sized
initial congestion window.
2003-03-13 01:43:45 +00:00
Jeffrey Hsu
582a954b00 Implement the Limited Transmit algorithm (RFC 3042). 2003-03-12 20:27:28 +00:00
Sam Leffler
4a692a1fc2 correct two more flag misuses; m_tag* use malloc flags 2003-03-12 14:45:22 +00:00
Jonathan Lemon
a3b6edc353 Remove check for t_state == TCPS_TIME_WAIT and introduce the tw structure.
Sponsored by: DARPA, NAI Labs
2003-03-08 22:07:52 +00:00
Jonathan Lemon
607b0b0cc9 Remove a panic(); if the zone allocator can't provide more timewait
structures, reuse the oldest one.  Also move the expiry timer from
a per-structure callout to the tcp slow timer.

Sponsored by: DARPA, NAI Labs
2003-03-08 22:06:20 +00:00
Peter Wemm
3c6b084e96 Finish driving a stake through the heart of netns and the associated
ifdefs scattered around the place - its dead Jim!

The SMB stuff had stolen AF_NS, make it official.
2003-03-05 19:24:24 +00:00
Jonathan Lemon
1cafed3941 Update netisr handling; Each SWI now registers its queue, and all queue
drain routines are done by swi_net, which allows for better queue control
at some future point.  Packets may also be directly dispatched to a netisr
instead of queued, this may be of interest at some installations, but
currently defaults to off.

Reviewed by: hsu, silby, jayanth, sam
Sponsored by: DARPA, NAI Labs
2003-03-04 23:19:55 +00:00
Dag-Erling Smørgrav
521f364b80 More low-hanging fruit: kill caddr_t in calls to wakeup(9) / [mt]sleep(9). 2003-03-02 16:54:40 +00:00
Jonathan Lemon
272c5dfe93 In timewait state, if the incoming segment is a pure in-sequence ack
that matches snd_max, then do not respond with an ack, just drop the
segment.  This fixes a problem where a simultaneous close results in
an ack loop between two time-wait states.

Test case supplied by: Tim Robbins <tjr@FreeBSD.ORG>
Sponsored by: DARPA, NAI Labs
2003-02-26 18:20:41 +00:00
Jonathan Lemon
ef6b48deb9 The TCP protocol lock may still be held if the reassembly queue dropped FIN.
Detect this case and drop the lock accordingly.

Sponsored by: DARPA, NAI Labs
2003-02-26 13:55:13 +00:00
Mike Silbersack
a75a485d62 Fix a condition so that ip reassembly queues are emptied immediately
when maxfragpackets is dropped to 0.

Noticed by:	bmah
2003-02-26 07:28:35 +00:00
Robert Watson
9327ee33bf When generating a TCP response to a connection, not only test if the
tcpcb is NULL, but also its connected inpcb, since we now allow
elements of a TCP connection to hang around after other state, such
as the socket, has been recycled.

Tested by:	dcs
Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories
2003-02-25 14:08:41 +00:00
Maxim Konovalov
b36f5b3735 style(9): join lines. 2003-02-25 11:53:11 +00:00
Maxim Konovalov
99e8617d24 Ip reassembly queue structure has ipq_nfrags now. Count a number of
dropped ip fragments precisely.

Reviewed by:	silby
2003-02-25 11:49:01 +00:00
Jeffrey Hsu
edf02ff15d Hold the TCP protocol lock while modifying the connection hash table. 2003-02-25 01:32:03 +00:00
Mike Silbersack
af9c7d06d5 Fix a comment which didn't match the new cookie behavior.
Submitted by:	Scott Renfro <scott@renfro.org>
MFC after:	1 day
2003-02-24 03:15:48 +00:00
Jeffrey Hsu
11a20fb8b6 tcp_twstart() need to be called with the TCP protocol lock held to avoid
a race condition with the TCP timer routines.
2003-02-24 00:52:03 +00:00
Jeffrey Hsu
2fbef91887 Pass the right function to callout_reset() for a compressed
TIME-WAIT control block.
2003-02-24 00:48:12 +00:00
Mike Silbersack
a432399c56 Improve the security and performance of syncookies:
Security improvements:
- Increase the size of each syncookie secret from 32 to 128 bits
  in order to make brute force attacks on the secrets much more
  difficult.
- Always return the lowest order dword from the MD5 hash; this
  allows us to expose 2 more bits of the cookie and makes ACK
  floods which seek to guess the cookie value more difficult.

Performance improvements:
- Increase the lifetime of each syncookie from 4 seconds to 16
  seconds.  This increases the usefulness of syncookies during
  an attack.
- From Yahoo!: Reduce the number of calls to MD5Update; this
  results in a ~17% increase in cookie generation time here.

Reviewed by:	hsu, jayanth, jlemon, nectar
MFC After:	15 seconds
2003-02-23 19:04:23 +00:00
Jonathan Lemon
f243998be5 Yesterday just wasn't my day. Remove testing delta that crept into the diff.
Pointy hat provided by: sam
2003-02-23 15:40:36 +00:00
Sam Leffler
14dd6717f8 Add a new config option IPSEC_FILTERGIF to control whether or not
packets coming out of a GIF tunnel are re-processed by ipfw, et. al.
By default they are not reprocessed.  With the option they are.

This reverts 1.214.  Prior to that change packets were not re-processed.
After they were which caused problems because packets do not have
distinguishing characteristics (like a special network if) that allows
them to be filtered specially.

This is really a stopgap measure designed for immediate MFC so that
4.8 has consistent handling to what was in 4.7.

PR:		48159
Reviewed by:	Guido van Rooij <guido@gvr.org>
MFC after:	1 day
2003-02-23 00:47:06 +00:00
Jonathan Lemon
a14c749f04 Check to see if the TF_DELACK flag is set before returning from
tcp_input().  This unbreaks delack handling, while still preserving
correct T/TCP behavior

Tested by: maxim
Sponsored by: DARPA, NAI Labs
2003-02-22 21:54:57 +00:00
Mike Silbersack
375386e284 Add the ability to limit the number of IP fragments allowed per packet,
and enable it by default, with a limit of 16.

At the same time, tweak maxfragpackets downward so that in the worst
possible case, IP reassembly can use only 1/2 of all mbuf clusters.

MFC after: 	3 days
Reviewed by:	hsu
Liked by:	bmah
2003-02-22 06:41:47 +00:00
Poul-Henning Kamp
d25ecb917b - m = m_gethdr(M_NOWAIT, MT_HEADER);
+       m = m_gethdr(M_DONTWAIT, MT_HEADER);

'nuff said.
2003-02-21 23:17:12 +00:00
Crist J. Clark
b0d226932e The ancient and outdated concept of "privileged ports" in UNIX-type
OSes has probably caused more problems than it ever solved. Allow the
user to retire the old behavior by specifying their own privileged
range with,

  net.inet.ip.portrange.reservedhigh  default = IPPORT_RESERVED - 1
  net.inet.ip.portrange.reservedlo    default = 0

Now you can run that webserver without ever needing root at all. Or
just imagine, an ftpd that can really drop privileges, rather than
just set the euid, and still do PORT data transfers from 20/tcp.

Two edge cases to note,

  # sysctl net.inet.ip.portrange.reservedhigh=0

Opens all ports to everyone, and,

  # sysctl net.inet.ip.portrange.reservedhigh=65535

Locks all network activity to root only (which could actually have
been achieved before with ipfw(8), but is somewhat more
complicated).

For those who stick to the old religion that 0-1023 belong to root and
root alone, don't touch the knobs (or even lock them by raising
securelevel(8)), and nothing changes.
2003-02-21 05:28:27 +00:00
Jonathan Lemon
8608c4c1f9 Remove unused variables in the IPSEC case.
Submitted by:  Lars Eggert <larse@ISI.EDU>
2003-02-20 18:22:21 +00:00
Jonathan Lemon
ffae8c5a7e Unbreak non-IPV6 compilation.
Caught by: phk
Sponsored by: DARPA, NAI Labs
2003-02-19 23:43:04 +00:00
Jonathan Lemon
340c35de6a Add a TCP TIMEWAIT state which uses less space than a fullblown TCP
control block.  Allow the socket and tcpcb structures to be freed
earlier than inpcb.  Update code to understand an inp w/o a socket.

Reviewed by: hsu, silby, jayanth
Sponsored by: DARPA, NAI Labs
2003-02-19 22:32:43 +00:00
Jonathan Lemon
7990938421 Convert tcp_fillheaders(tp, ...) -> tcpip_fillheaders(inp, ...) so the
routine does not require a tcpcb to operate.  Since we no longer keep
template mbufs around, move pseudo checksum out of this routine, and
merge it with the length update.

Sponsored by: DARPA, NAI Labs
2003-02-19 22:18:06 +00:00
Jonathan Lemon
414462252a Correct comments. 2003-02-19 21:33:46 +00:00
Jonathan Lemon
3bfd6421c2 Clean up delayed acks and T/TCP interactions:
- delay acks for T/TCP regardless of delack setting
   - fix bug where a single pass through tcp_input might not delay acks
   - use callout_active() instead of callout_pending()

Sponsored by: DARPA, NAI Labs
2003-02-19 21:18:23 +00:00
Warner Losh
a163d034fa Back out M_* changes, per decision of the TRB.
Approved by: trb
2003-02-19 05:47:46 +00:00
Maxim Konovalov
b52d5ea3d2 o Fix ipfw uid rules: socheckuid() returns 0 when uid matches a socket
cr_uid.

Note: we do not have socheckuid() in RELENG_4, ip_fw2.c uses its
own macro for a similar purpose that is why ipfw2 in RELENG_4 processes
uid rules correctly. I will MFC the diff for code consistency.

Reported by:	Oleg Baranov <ol@csa.ru>
Reviewed by:	luigi
MFC after:	1 month
2003-02-17 13:39:57 +00:00
Jeffrey Hsu
4b40c56c28 Take advantage of pre-existing lock-free synchronization and type stable memory
to avoid acquiring SMP locks during expensive copyout process.
2003-02-15 02:37:57 +00:00
Jeffrey Hsu
85e8b24343 The protocol lock is always held in the dropafterack case, so we don't
need to check for it at runtime.
2003-02-13 22:14:22 +00:00
Jeffrey Hsu
3dc7ebf9ff in_pcbnotifyall() requires an exclusive protocol lock for notify functions
which modify the connection list, namely, tcp_notify().
2003-02-12 23:55:07 +00:00
Jeffrey Hsu
6d45d64a8f Properly document that syncache timer processing requires an
exclusive TCP protocol lock.
2003-02-12 00:42:12 +00:00
Seigo Tanimura
cd6c2a8874 s/IPSSEC/IPSEC/ 2003-02-11 10:51:56 +00:00
Jeffrey Hsu
24652ff6e1 Get cosmetic changes out of the way before I add routing table SMP locks. 2003-02-10 22:01:34 +00:00
Orion Hodson
022695f82a Avoid multiply for preemptive arp calculation since it hits every
ethernet packet sent.

Prompted by: Jeffrey Hsu <hsu@FreeBSD.org>
2003-02-08 15:05:15 +00:00
Orion Hodson
73224fb019 MFS 1.64.2.22: Re-enable non pre-emptive ARP requests.
Submitted by: "Diomidis Spinellis" <dds@aueb.gr>
PR:           kern/46116
2003-02-04 05:28:08 +00:00
Crist J. Clark
39eb27a4a9 Add the TCP flags to the log message whenever log_in_vain is 1, not
just when set to 2.

PR:		kern/43348
MFC after:	5 days
2003-02-02 22:06:56 +00:00
Mike Silbersack
ecf44c01f4 Move a comment and optimize the frag timeout code a slight bit.
Submitted by:	maxim
MFC with:	The previous two revisions
2003-02-01 05:59:51 +00:00
Sam Leffler
9359ad861e FAST_IPSEC bandaid: act like KAME and ignore ENOENT error codes from
ipsec4_process_packet; they happen when a packet is dropped because
an SA acquire is initiated

Submitted by:	Doug Ambrisko <ambrisko@verniernetworks.com>
2003-01-30 05:45:45 +00:00
Sam Leffler
28a34902c4 remove the restriction on build a kernel with FAST_IPSEC and INET6;
you still don't want to use the two together, but it's ok to have
them in the same kernel (the problem that initiated this bandaid
has long since been fixed)
2003-01-30 05:43:08 +00:00
Mike Silbersack
d4d5315c23 Fix a bug with syncookies; previously, the syncache's MSS size was not
initialized until after a syncookie was generated.  As a result,
all connections resulting from a returned cookie would end up using
a MSS of ~512 bytes.  Now larger packets will be used where possible.

MFC after:	5 days
2003-01-29 03:49:49 +00:00
Poul-Henning Kamp
4ee6e70ef3 Check bounds for index before dereferencing memory past end of array.
Found by:	FlexeLint
2003-01-28 22:44:12 +00:00
Jeffrey Hsu
93f798891a Avoid lock order reversal by expanding the scope of the
AF_INET radix tree lock to cover the ARP data structures.
2003-01-28 20:22:19 +00:00
Mike Silbersack
ac64c8668b A few fixes to rev 1.221
- Honor the previous behavior of maxfragpackets = 0 or -1
- Take a better stab at fragment statistics
- Move / correct a comment

Suggested by:	maxim@
MFC after:	7 days
2003-01-28 03:39:39 +00:00
Mike Silbersack
402062e80c Merge the best parts of maxfragpackets and maxnipq together. (Both
functions implemented approximately the same limits on fragment memory
usage, but in different fashions.)

End user visible changes:
- Fragment reassembly queues are freed in a FIFO manner when maxfragpackets
  has been reached, rather than all reassembly stopping.

MFC after: 	5 days
2003-01-26 01:44:05 +00:00
Alfred Perlstein
44956c9863 Remove M_TRYWAIT/M_WAITOK/M_WAIT. Callers should use 0.
Merge M_NOWAIT/M_DONTWAIT into a single flag M_NOWAIT.
2003-01-21 08:56:16 +00:00
Maxim Konovalov
2adf7582da De-anonymity a couple of messages I missed in a previous sweep.
Move one of them under DEB macro.

Noticed by:	Wiktor Niesiobedzki <w@evip.pl>
2003-01-20 13:03:34 +00:00
Maxim Konovalov
8ec22a9363 If the first action is O_LOG adjust a pointer to the real one, unbreaks
skipto + log rules.

Reported by:	Wiktor Niesiobedzki <w@evip.pl>
MFC after:	1 week
2003-01-20 11:58:34 +00:00
Jeffrey Hsu
314e5a3daf Optimize away call to bzero() in the common case by directly checking
if a connection has any cached TAO information.
2003-01-18 19:03:26 +00:00
Jeffrey Hsu
f5c5746047 Fix long-standing bug predating FreeBSD where calling connect() twice
on a raw ip socket will crash the system with a null-dereference.
2003-01-18 01:10:55 +00:00
Jeffrey Hsu
c996428c32 SMP locking for ARP. 2003-01-17 07:59:35 +00:00
Matthew Dillon
fe41ca530c Introduce the ability to flag a sysctl for operation at secure level 2 or 3
in addition to secure level 1.  The mask supports up to a secure level of 8
but only add defines through CTLFLAG_SECURE3 for now.

As per the missif in the log entry for 1.11 of ip_fw2.c which added the
secure flag to the IPFW sysctl's in the first place, change the secure
level requirement from 1 to 3 now that we have support for it.

Reviewed by:	imp
With Design Suggestions by:	imp
2003-01-14 19:35:33 +00:00
Jeffrey Hsu
cb942153c8 Fix NewReno.
Reviewed by: Tom Henderson <thomas.r.henderson@boeing.com>
2003-01-13 11:01:20 +00:00
Thomas Moestl
a9a7a91220 Clear the target hardware address field when generating an ARP request.
Reviewed by:	nectar
MFC after:	1 week
2003-01-10 00:04:53 +00:00
Jeffrey Hsu
b21bf9a59b Validate inp before de-referencing it.
Submitted by:	pb
2003-01-05 07:56:24 +00:00
Jens Schweikhardt
9d5abbddbf Correct typos, mostly s/ a / an / where appropriate. Some whitespace cleanup,
especially in troff files.
2003-01-01 18:49:04 +00:00
Sam Leffler
9967cafc49 Correct mbuf packet header propagation. Previously, packet headers
were sometimes propagated using M_COPY_PKTHDR which actually did
something between a "move" and a  "copy" operation.  This is replaced
by M_MOVE_PKTHDR (which copies the pkthdr contents and "removes" it
from the source mbuf) and m_dup_pkthdr which copies the packet
header contents including any m_tag chain.  This corrects numerous
problems whereby mbuf tags could be lost during packet manipulations.

These changes also introduce arguments to m_tag_copy and m_tag_copy_chain
to specify if the tag copy work should potentially block.  This
introduces an incompatibility with openbsd which we may want to revisit.

Note that move/dup of packet headers does not handle target mbufs
that have a cluster bound to them.  We may want to support this;
for now we watch for it with an assert.

Finally, M_COPYFLAGS was updated to include M_FIRSTFRAG|M_LASTFRAG.

Supported by:	Vernier Networks
Reviewed by:	Robert Watson <rwatson@FreeBSD.org>
2002-12-30 20:22:40 +00:00
Matthew Dillon
07fd333df3 Remove the PAWS ack-on-ack debugging printf().
Note that the original RFC 1323 (PAWS) says in 4.2.1 that the out of
order / reverse-time-indexed packet should be acknowledged as specified
in RFC-793 page 69 then dropped.  The original PAWS code in FreeBSD (1994)
simply acknowledged the segment unconditionally, which is incorrect, and
was fixed in 1.183 (2002).  At the moment we do not do checks for SYN or FIN
in addition to (tlen != 0), which may or may not be correct, but the
worst that ought to happen should be a retry by the sender.
2002-12-30 19:31:04 +00:00
Sam Leffler
069f35d328 correct style bogons 2002-12-30 18:45:31 +00:00
Ian Dowse
ed1a13b18f Bridged packets are supplied to the firewall with their IP header
in network byte order, but icmp_error() expects the IP header to
be in host order and the code here did not perform the necessary
swapping for the bridged case. This bug causes an "icmp_error: bad
length" panic when certain length IP packets (e.g. ip_len == 0x100)
are rejected by the firewall with an ICMP response.

MFC after:	3 days
2002-12-27 17:43:25 +00:00
Jeffrey Hsu
abe239cfe2 Validate inp to prevent an use after free. 2002-12-24 21:00:31 +00:00
Maxim Konovalov
f4ef616f98 o De-anonymity dummynet(4) and ipfw(4) messages, prepend them
by 'dummynet: ' and 'ipfw: ' prefixes.

PR:		kern/41609
2002-12-24 13:45:24 +00:00
Jeffrey Hsu
956b0b653c SMP locking for radix nodes. 2002-12-24 03:03:39 +00:00
Pierre Beyssac
1ba7727b9e Remove forgotten INP_UNLOCK(inp) in my previous commit.
Reported by: hsu
2002-12-22 13:04:08 +00:00
Pierre Beyssac
87cd4001b5 In syncache_timer(), don't attempt to lock the inpcb structure
associated with the syncache entry: in case tcp_close() has been
called on the corresponding listening socket, the lock has been
destroyed as a side effect of in_pcbdetach(), causing a panic when
we attempt to lock on it.

Reviewed by:	hsu
2002-12-21 19:59:47 +00:00
Sam Leffler
00f21882a0 replace the special-purpose rate-limiting code with the general facility
just added; this tries to maintain the same behaviour vis a vis printing
the rate-limiting messages but need tweaking
2002-12-21 00:08:20 +00:00
Jeffrey Hsu
9a39fc9d73 Eliminate a goto.
Fix some line breaks.
2002-12-20 11:24:02 +00:00
Jeffrey Hsu
540e8b7e31 Unravel a nested conditional.
Remove an unneeded local variable.
2002-12-20 11:16:52 +00:00
Jeffrey Hsu
f320a1bfd2 Expand scope of TCP protocol lock to cover syncache data structures. 2002-12-20 00:24:19 +00:00
Bosko Milekic
86fea6be59 o Untangle the confusion with the malloc flags {M_WAITOK, M_NOWAIT} and
the mbuf allocator flags {M_TRYWAIT, M_DONTWAIT}.
o Fix a bpf_compat issue where malloc() was defined to just call
  bpf_alloc() and pass the 'canwait' flag(s) along.  It's been changed
  to call bpf_alloc() but pass the corresponding M_TRYWAIT or M_DONTWAIT
  flag (and only one of those two).

Submitted by: Hiten Pandya <hiten@unixdaemons.com> (hiten->commit_count++)
2002-12-19 22:58:27 +00:00
Jeffrey Hsu
19fc74fb60 Lock up ifaddr reference counts. 2002-12-18 11:46:59 +00:00
Poul-Henning Kamp
11aee0b4b0 Remove unused and incorrectly maintained variable "in_interfaces" 2002-12-17 19:30:04 +00:00
Matthew Dillon
967adce8df Fix syntax in last commit. 2002-12-17 00:24:48 +00:00
Maxim Konovalov
616fa7460c o Trim EOL whitespaces.
MFC after:	1 week
2002-12-15 10:24:36 +00:00
Maxim Konovalov
21ef23ab3f o s/if_name[16]/if_name[IFNAMSIZ]/
Reviewed by:	luigi
MFC after:	1 week
2002-12-15 10:23:02 +00:00
Maxim Konovalov
2713a5bebb o M_DONTWAIT is mbuf(9) flag: malloc(M_DONTWAIT) -> malloc(M_NOWAIT).
The bug does not affect anything because M_NOWAIT == M_DONTWAIT.

Reviewed by:	luigi
MFC after:	1 week
2002-12-15 10:21:30 +00:00
Maxim Konovalov
83b75b7621 o Fix byte order logging issue: sa.sin_port is already in host byte order.
PR:		kern/45964
Submitted by:	Sascha Blank <sblank@tiscali.de>
Reviewed by:	luigi
MFC after:	1 week
2002-12-15 09:44:02 +00:00
Matthew Dillon
d7ff8ef62a Change tcp.inflight_min from 1024 to a production default of 6144. Create
a sysctl for the stabilization value for the bandwidth delay product (inflight)
algorithm and document it.

MFC after:	3 days
2002-12-14 21:00:17 +00:00
Matthew Dillon
1ab4789dc2 Bruce forwarded this tidbit from an analysis Van Jacobson did on an
apparent ack-on-ack problem with FreeBSD.  Prof. Jacobson noticed a
case in our TCP stack which would acknowledge a received ack-only packet,
which is not legal in TCP.

Submitted by:	 Van Jacobson <van@packetdesign.com>,
		bmah@packetdesign.com (Bruce A. Mah)
MFC after:	7 days
2002-12-14 07:31:51 +00:00
Maxim Sobolev
16199bf2d3 MFS: recognize gre packets used in the WCCP protocol.
Approved by:	re
2002-12-07 14:22:05 +00:00
Luigi Rizzo
97850a5dd9 Move fw_one_pass from ip_fw2.c to ip_input.c so that neither
bridge.c nor if_ethersubr.c depend on IPFIREWALL.
Restore the use of fw_one_pass in if_ethersubr.c

ipfw.8 will be updated with a separate commit.

Approved by: re
2002-11-20 19:07:27 +00:00
Luigi Rizzo
032dcc7680 Back out some style changes. They are not urgent,
I will put them back in after 5.0 is out.

Requested by: sam
Approved by: re
2002-11-20 19:00:54 +00:00
Luigi Rizzo
b375c9ec2c Back out the ip_fragment() code -- it is not urgent to have it in now,
I will put it back in in a better form after 5.0 is out.

Requested by: sam, rwatson, luigi (on second thought)
Approved by: re
2002-11-20 18:56:25 +00:00
Mike Silbersack
df285b3d1d Add a sysctl to control the generation of source quench packets,
and set it to 0 by default.

Partially obtained from:	NetBSD
Suggested by:	David Gilbert
MFC after:	5 days
2002-11-19 17:06:06 +00:00
Luigi Rizzo
9b77fbf0a2 Fix function headers and remove 'register' variable declarations. 2002-11-17 17:04:19 +00:00
Luigi Rizzo
3e372e140c Move the ip_fragment code from ip_output() to a separate function,
so that it can be reused elsewhere (there is a number of places
where it can be useful). This also trims some 200 lines from
the body of ip_output(), which helps readability a bit.

(This change was discussed a few weeks ago on the mailing lists,
Julian agreed, silence from others. It is not a functional change,
so i expect it to be ok to commit it now but i am happy to back it
out if there are objections).

While at it, fix some function headers and replace m_copy() with
m_copypacket() where applicable.

MFC after: 1 week
2002-11-17 16:30:44 +00:00
Luigi Rizzo
20fab86349 Minor documentation changes and indentation fix.
Replace m_copy() with m_copypacket() where applicable.

While at it, fix some function headers and remove 'register' from
variable declarations.
2002-11-17 16:13:08 +00:00
Luigi Rizzo
4e8fe3210d Cleanup some of the comments, and reformat long lines.
Replace m_copy() with m_copypacket() where applicable.

Replace "if (a.s_addr ...)" with "if (a.s_addr != INADDR_ANY ...)"
to make it clear what the code means.

While at it, fix some function headers and remove 'register' from
variable declarations.

MFC after: 3 days
2002-11-17 16:02:17 +00:00
Luigi Rizzo
bbb4330b61 Massive cleanup of the ip_mroute code.
No functional changes, but:

  + the mrouting module now should behave the same as the compiled-in
    version (it did not before, some of the rsvp code was not loaded
    properly);
  + netinet/ip_mroute.c is now truly optional;
  + removed some redundant/unused code;
  + changed many instances of '0' to NULL and INADDR_ANY as appropriate;
  + removed several static variables to make the code more SMP-friendly;
  + fixed some minor bugs in the mrouting code (mostly, incorrect return
    values from functions).

This commit is also a prerequisite to the addition of support for PIM,
which i would like to put in before DP2 (it does not change any of
the existing APIs, anyways).

Note, in the process we found out that some device drivers fail to
properly handle changes in IFF_ALLMULTI, leading to interesting
behaviour when a multicast router is started. This bug is not
corrected by this commit, and will be fixed with a separate commit.

Detailed changes:
--------------------
netinet/ip_mroute.c     all the above.
conf/files              make ip_mroute.c optional
net/route.c             fix mrt_ioctl hook
netinet/ip_input.c      fix ip_mforward hook, move rsvp_input() here
                        together with other rsvp code, and a couple
                        of indentation fixes.
netinet/ip_output.c     fix ip_mforward and ip_mcast_src hooks
netinet/ip_var.h        rsvp function hooks
netinet/raw_ip.c        hooks for mrouting and rsvp functions, plus
                        interface cleanup.
netinet/ip_mroute.h     remove an unused and optional field from a struct

Most of the code is from Pavlin Radoslavov and the XORP project

Reviewed by: sam
MFC after: 1 week
2002-11-15 22:53:53 +00:00
Sam Leffler
eec3a0b17f track changes to not strip the Ethernet header from input packets
Reviewed by:	many
Approved by:	re
2002-11-14 23:46:04 +00:00
Sam Leffler
ccb2acfe1b track bpf changes
Reviewed by:	many
Approved by:	re
2002-11-14 23:45:13 +00:00
Maxim Konovalov
8ef1565d2b Due to a memory alignment sizeof(struct ipfw_flow_id) is bigger than
ipfw_flow_id structure actual size and bcmp(3) may fail to compare
them properly. Compare members of these structures instead.

PR:		kern/44078
Submitted by:	Oleg Bulyzhin <oleg@rinet.ru>
Reviewed by:	luigi
MFC after:	2 weeks
2002-11-13 11:31:44 +00:00
Jeffrey Hsu
e1e1b6e892 Turn off duplicate lock checking for inp locks because udp_input()
intentionally locks two inp records simultaneously.
2002-11-12 20:44:38 +00:00
Sam Leffler
6f0d017cf4 a better solution to building FAST_IPSEC w/o INET6
Submitted by:	Jeffrey Hsu <hsu@FreeBSD.org>
2002-11-10 17:17:32 +00:00
Alfred Perlstein
29f194457c Fix instances of macros with improperly parenthasized arguments.
Verified by: md5
2002-11-09 12:55:07 +00:00
Sam Leffler
9c0a8ace11 temporarily disallow FAST_IPSEC and INET6 to avoid potential panics;
will correct this before 5.0 release
2002-11-08 23:50:32 +00:00
Sam Leffler
e8539d32f0 FAST_IPSEC fixups:
o fix #ifdef typo
o must use "bounce functions" when dispatched from the protosw table

don't know how this stuff was missed in my testing; must've committed
the wrong bits

Pointy hat:	sam
Submitted by:	"Doug Ambrisko" <ambrisko@verniernetworks.com>
2002-11-08 23:37:50 +00:00
Sam Leffler
58fcadfc0f fixup FAST_IPSEC build w/o INET6 2002-11-08 23:33:59 +00:00
Sam Leffler
ab94ca3cec correct fast ipsec logic: compare destination ip address against the
contents of the SA, not the SP

Submitted by:	"Doug Ambrisko" <ambrisko@verniernetworks.com>
2002-11-08 23:11:02 +00:00
John Baldwin
2d4e26522d Cast a ptrdiff_t to an int to printf. 2002-11-08 14:52:26 +00:00
Jeff Roberson
1645d0903e - Consistently update snd_wl1, snd_wl2, and rcv_up in the header
prediction code.  Previously, 2GB worth of header predicted data
   could leave these variables too far out of sequence which would cause
   problems after receiving a packet that did not match the header
   prediction.

Submitted by:	Bill Baumann <bbaumann@isilon.com>
Sponsored by:	Isilon Systems, Inc.
Reviewed by:	hsu, pete@isilon.com, neal@isilon.com, aaronp@isilon.com
2002-10-31 23:24:13 +00:00
Jeffrey Hsu
30613f5610 Don't need to check if SO_OOBINLINE is defined.
Don't need to protect isipv6 conditional with INET6.
Fix leading indentation in 2 lines.
2002-10-30 08:32:19 +00:00
Bill Fenner
4d3ffc9841 Renumber IPPROTO_DIVERT out of the range of valid IP protocol numbers.
This allows socket() to return an error when the kernel is not built
with IPDIVERT, and doesn't prevent future applications from using the
"borrowed" IP protocol number.  The sysctl net.inet.raw.olddiverterror
controls whether opening a socket with the "borrowed" IP protocol
fails with an accompanying kernel printf; this code should last only a
couple of releases.

Approved by:	re
2002-10-29 16:46:13 +00:00
Maxim Konovalov
a98d88ad3e Lower a priority of "session drop" messages.
Requested by:	Eugene Grosbein <eugen@kuzbass.ru>
MFC after:	3 days
2002-10-29 08:53:14 +00:00
Maxime Henrion
d28e8b3a0d Oops, forgot to commit this file. This is part of the fix
for ipfw2 panics on sparc64.
2002-10-24 22:32:13 +00:00
Maxime Henrion
7c697970f4 Fix ipfw2 panics on 64-bit platforms.
Quoting luigi:

In order to make the userland code fully 64-bit clean it may
be necessary to commit other changes that may or may not cause
a minor change in the ABI.

Reviewed by:	luigi
2002-10-24 18:04:44 +00:00
Luigi Rizzo
18f13da2be src and dst address were erroneously swapped in SRC_SET and DST_SET
commands.  Use the correct one. Also affects ipfw2 in -stable.
2002-10-24 18:01:53 +00:00
Maxime Henrion
56e77afa59 Fix kernel build on sparc64 in the IPDIVERT case. 2002-10-24 09:58:50 +00:00
Ian Dowse
efac726eeb Unbreak the automatic remapping of an INADDR_ANY destination address
to the primary local IP address when doing a TCP connect(). The
tcp_connect() code was relying on in_pcbconnect (actually in_pcbladdr)
modifying the passed-in sockaddr, and I failed to notice this in
the recent change that added in_pcbconnect_setup(). As a result,
tcp_connect() was ending up using the unmodified sockaddr address
instead of the munged version.

There are two cases to handle: if in_pcbconnect_setup() succeeds,
then the PCB has already been updated with the correct destination
address as we pass it pointers to inp_faddr and inp_fport directly.
If in_pcbconnect_setup() fails due to an existing but dead connection,
then copy the destination address from the old connection.
2002-10-24 02:02:34 +00:00
Maxim Konovalov
ba3a9d459c Kill EOL spaces.
Approved by:	luigi
MFC after:	1 week
2002-10-23 10:07:55 +00:00
Maxim Konovalov
6b6874b20c Use syslog for messages about dropped sessions, do not flood a console.
Suggested by:	Eugene Grosbein <eugen@kuzbass.ru>
Approved by:	luigi
MFC after:	1 week
2002-10-23 10:05:19 +00:00
SUZUKI Shinsuke
2754d95d85 fixed a kernel crash by "ifconfig stf0 inet 1.2.3.4"
MFC after:	1 week
2002-10-22 22:50:38 +00:00
Ian Dowse
c557ae16ce Implement a new IP_SENDSRCADDR ancillary message type that permits
a server process bound to a wildcard UDP socket to select the IP
address from which outgoing packets are sent on a per-datagram
basis. When combined with IP_RECVDSTADDR, such a server process can
guarantee to reply to an incoming request using the same source IP
address as the destination IP address of the request, without having
to open one socket per server IP address.

Discussed on:	-net
Approved by:	re
2002-10-21 20:40:02 +00:00
Ian Dowse
90162a4e87 Remove the "temporary connection" hack in udp_output(). In order
to send datagrams from an unconnected socket, we used to first block
input, then connect the socket to the sendmsg/sendto destination,
send the datagram, and finally disconnect the socket and unblock
input.

We now use in_pcbconnect_setup() to check if a connect() would have
succeeded, but we never record the connection in the PCB (local
anonymous port allocation is still recorded, though). The result
from in_pcbconnect_setup() authorises the sending of the datagram
and selects the local address and port to use, so we just construct
the header and call ip_output().

Discussed on:	-net
Approved by:	re
2002-10-21 20:10:05 +00:00
Ian Dowse
5200e00e72 Replace in_pcbladdr() with a more generic inner subroutine for
in_pcbconnect() called in_pcbconnect_setup(). This version performs
all of the functions of in_pcbconnect() except for the final
committing of changes to the PCB. In the case of an EADDRINUSE error
it can also provide to the caller the PCB of the duplicate connection,
avoiding an extra in_pcblookup_hash() lookup in tcp_connect().

This change will allow the "temporary connect" hack in udp_output()
to be removed and is part of the preparation for adding the
IP_SENDSRCADDR control message.

Discussed on:	-net
Approved by:	re
2002-10-21 13:55:50 +00:00
Poul-Henning Kamp
53be11f680 Fix two instances of variant struct definitions in sys/netinet:
Remove the never completed _IP_VHL version, it has not caught on
anywhere and it would make us incompatible with other BSD netstacks
to retain this version.

Add a CTASSERT protecting sizeof(struct ip) == 20.

Don't let the size of struct ipq depend on the IPDIVERT option.

This is a functional no-op commit.

Approved by:	re
2002-10-20 22:52:07 +00:00
Robert Watson
c740509854 When a packet is multicast encapsulated, give labeled policies the
opportunity to preserve the label.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories
2002-10-20 21:59:00 +00:00
Ian Dowse
4b932371f4 Split out most of the logic from in_pcbbind() into a new function
called in_pcbbind_setup() that does everything except commit the
changes to the PCB. There should be no functional change here, but
in_pcbbind_setup() will be used by the soon-to-appear IP_SENDSRCADDR
control message implementation to check or allocate the source
address and port.

Discussed on:	-net
Approved by:	re
2002-10-20 21:44:31 +00:00
Maxime Henrion
d7f4d27a7a Several malloc() calls were passing the M_DONTWAIT flag
which is an mbuf allocation flag.  Use the correct
M_NOWAIT malloc() flag.  Fortunately, both were defined
to 1, so this commit is a no-op.
2002-10-19 11:31:50 +00:00
Hajimu UMEMOTO
b6e2845324 last arg of in6?_gif_output() is not used any more.
Obtained from:	KAME
MFC after:	3 weeks
2002-10-17 17:47:55 +00:00
Alfred Perlstein
dde2897f82 de-__P(). 2002-10-16 22:27:27 +00:00
Hajimu UMEMOTO
ab94625826 use encapcheck.
Obtained from:	KAME
MFC after:	3 weeks
2002-10-16 20:16:49 +00:00
Hajimu UMEMOTO
9426aedf7f - after gif_set_tunnel(), psrc/pdst may be null. set IFF_RUNNING accordingly.
- set IFF_UP on SIOCSIFADDR.  be consistent with others.
- set if_addrlen explicitly (just in case)
- multi destination mode is long gone.
- missing break statement
- add gif_set_tunnel(), so that we can set tunnel address from within the
  kernel at ease.
- encap_attach/detach dynamically on ioctls
- move encap_attach() to dedicated function in in*_gif.c

Obtained from:	KAME
MFC after:	3 weeks
2002-10-16 19:49:37 +00:00
Matthew Dillon
abac41a659 Fix oops in my last commit, I was calculating a new length but then not
using it.  (The code is already correct in -stable).

Found by: silby
2002-10-16 19:16:33 +00:00
Guido van Rooij
2f591ab8fe Get rid of checking for ip sec history. It is true that packets are not
supposed to be checked by the firewall rules twice. However, because the
various ipsec handlers never call ip_input(), this never happens anyway.

This fixes the situation where a gif tunnel is encrypted with IPsec. In
such a case, after IPsec processing, the unencrypted contents from the
GIF tunnel are fed back to the ipintrq and subsequently handeld by
ip_input(). Yet, since there still is IPSec history attached, the
packets coming out from the gif device are never fed into the filtering
code.
This fix was sent to Itojun, and he pointed towartds
    http://www.netbsd.org/Documentation/network/ipsec/#ipf-interaction.
This patch actually implements what is stated there (specifically:
Packet came from tunnel devices (gif(4) and ipip(4)) will still
go through ipf(4). You may need to identify these packets by
using interface name directive in ipf.conf(5).

Reviewed by:	rwatson
MFC after:	3 weeks
2002-10-16 09:01:48 +00:00
Sam Leffler
9b65723081 correct PCB locking in broadcast/multicast case that was exposed by change
to use udp_append

Reviewed by:	hsu
2002-10-16 02:33:28 +00:00
Sam Leffler
b9234fafa0 Tie new "Fast IPsec" code into the build. This involves the usual
configuration stuff as well as conditional code in the IPv4 and IPv6
areas.  Everything is conditional on FAST_IPSEC which is mutually
exclusive with IPSEC (KAME IPsec implmentation).

As noted previously, don't use FAST_IPSEC with INET6 at the moment.

Reviewed by:	KAME, rwatson
Approved by:	silence
Supported by:	Vernier Networks
2002-10-16 02:25:05 +00:00
Sam Leffler
5d84645305 Replace aux mbufs with packet tags:
o instead of a list of mbufs use a list of m_tag structures a la openbsd
o for netgraph et. al. extend the stock openbsd m_tag to include a 32-bit
  ABI/module number cookie
o for openbsd compatibility define a well-known cookie MTAG_ABI_COMPAT and
  use this in defining openbsd-compatible m_tag_find and m_tag_get routines
o rewrite KAME use of aux mbufs in terms of packet tags
o eliminate the most heavily used aux mbufs by adding an additional struct
  inpcb parameter to ip_output and ip6_output to allow the IPsec code to
  locate the security policy to apply to outbound packets
o bump __FreeBSD_version so code can be conditionalized
o fixup ipfilter's call to ip_output based on __FreeBSD_version

Reviewed by:	julian, luigi (silent), -arch, -net, darren
Approved by:	julian, silence from everyone else
Obtained from:	openbsd (mostly)
MFC after:	1 month
2002-10-16 01:54:46 +00:00
Sean Chittenden
927a76bb5e Increase the max dummynet hash size from 1024 to 65536. Default is still
1024.

Silence on:	-net, -ipfw 4weeks+
Reviewed by:	dd
Approved by:	knu (mentor)
MFC after:	3 weeks
2002-10-12 07:45:23 +00:00
Matthew Dillon
c8d50f2414 turn off debugging by default if bandwidth delay product limiting is
turned on (it is already off in -stable).
2002-10-10 21:41:30 +00:00
Matthew Dillon
28257b5ccc Update various comments mainly related to retransmit/FIN that I
documented while working on a previous bug.

Fix a PERSIST bug.  Properly account for a FIN sent during a PERSIST.

MFC after:	7 days
2002-10-10 19:21:50 +00:00
Maxim Konovalov
a5428e3a9a Fix IPOPT_TS processing: do not overwrite IP address by timestamp.
PR:		misc/42121
Submitted by:	Praveen Khurjekar <praveen@codito.com>
Reviewed by:	silence on -net
MFC after:	1 month
2002-10-10 12:03:36 +00:00
Maxim Sobolev
748bb23dcc Since bpf is no longer an optional component, remove associated ifdef's.
Submitted by:	don't quite remember - the name of the sender disappeared
		with the rest of my inbox. :(
2002-10-02 09:38:17 +00:00
Mike Barcroft
c0ec31f93e Include <sys/cdefs.h> so the visibility conditionals are available.
(This should have been included with the previous revision.)
2002-10-02 04:22:34 +00:00
Mike Barcroft
0cd4a9031e Use visibility conditionals. Only TCP_NODELAY ends up being defined
in the standards case.
2002-10-02 04:19:47 +00:00
Matthew Dillon
a84db8f49e Guido found another bug. There is a situation with
timestamped TCP packets where FreeBSD will send DATA+FIN and
A W2K box will ack just the DATA portion.  If this occurs
after FreeBSD has done a (NewReno) fast-retransmit and is
recovering it (dupacks > threshold) it triggers a case in
tcp_newreno_partial_ack() (tcp_newreno() in stable) where
tcp_output() is called with the expectation that the retransmit
timer will be reloaded.  But tcp_output() falls through and
returns without doing anything, causing the persist timer to be
loaded instead.  This causes the connection to hang until W2K gives up.
This occurs because in the case where only the FIN must be acked, the
'len' calculation in tcp_output() will be 0, a lot of checks will be
skipped, and the FIN check will also be skipped because it is designed
to handle FIN retransmits, not forced transmits from tcp_newreno().

The solution is to simply set TF_ACKNOW before calling tcp_output()
to absolute guarentee that it will run the send code and reset the
retransmit timer.  TF_ACKNOW is already used for this purpose in other
cases.

For some unknown reason this patch also seems to greatly reduce
the number of duplicate acks received when Guido runs his tests over
a lossy network.  It is quite possible that there are other
tcp_newreno{_partial_ack()} cases which were not generating the expected
output which this patch also fixes.

X-MFC after:	Will be MFC'd after the freeze is over
2002-09-30 18:55:45 +00:00
Poul-Henning Kamp
37c841831f Be consistent about "static" functions: if the function is marked
static in its prototype, mark it static at the definition too.

Inspired by:    FlexeLint warning #512
2002-09-28 17:15:38 +00:00
Peter Wemm
224af215a6 Zap now-unused SHLIB_MINOR 2002-09-28 00:25:32 +00:00
Maxim Konovalov
cb7641e85b Slightly rearrange a code in rev. 1.164:
o Move len initialization closer to place of its first usage.
o Compare len with 0 to improve readability.
o Explicitly zero out phlen in ip_insertoptions() in failure case.

Suggested by:   jhb
Reviewed by:    jhb
MFC after:      2 weeks
2002-09-23 08:56:24 +00:00
Alfred Perlstein
ebc82cbbf0 s/__attribute__((__packed__))/__packed/g 2002-09-23 06:25:08 +00:00
Mike Silbersack
c1c36a2c68 Fix issue where shutdown(socket, SHUT_RD) was effectively
ignored for TCP sockets.

NetBSD PR:	18185
Submitted by:	Sean Boudreau <seanb@qnx.com>
MFC after:	3 days
2002-09-22 02:54:07 +00:00
Poul-Henning Kamp
a5554bf05b Use m_fixhdr() rather than roll our own. 2002-09-18 19:43:01 +00:00
Matthew Dillon
fa55172bc0 Guido reported an interesting bug where an FTP connection between a
Windows 2000 box and a FreeBSD box could stall.  The problem turned out
to be a timestamp reply bug in the W2K TCP stack.  FreeBSD sends a
timestamp with the SYN, W2K returns a timestamp of 0 in the SYN+ACK
causing FreeBSD to calculate an insane SRTT and RTT, resulting in
a maximal retransmit timeout (60 seconds).  If there is any packet
loss on the connection for the first six or so packets the retransmit
case may be hit (the window will still be too small for fast-retransmit),
causing a 60+ second pause.  The W2K box gives up and closes the
connection.

This commit works around the W2K bug.

15:04:59.374588 FREEBSD.20 > W2K.1036: S 1420807004:1420807004(0) win 65535 <mss 1460,nop,wscale 2,nop,nop,timestamp 188297344 0> (DF) [tos 0x8]
15:04:59.377558 W2K.1036 > FREEBSD.20: S 4134611565:4134611565(0) ack 1420807005 win 17520 <mss 1460,nop,wscale 0,nop,nop,timestamp 0 0> (DF)

Bug reported by: Guido van Rooij <guido@gvr.org>
2002-09-17 22:21:37 +00:00
Maxim Sobolev
563a9b6ecb Remove __RCSID().
Submitted by:	bde
2002-09-17 11:31:41 +00:00
Maxim Konovalov
1cf4349926 Explicitly clear M_FRAG flag on a mbuf with the last fragment to unbreak
ip fragments reassembling for loopback interface.

Discussed with:	bde, jlemon
Reviewed by:	silence on -net
MFC after:	2 weeks
2002-09-17 11:20:02 +00:00
Maxim Konovalov
e079ba8d93 In rare cases when there is no room for ip options ip_insertoptions()
can fail and corrupt a header length. Initialize len and check what
ip_insertoptions() returns.

Reviewed by:	archie, silence on -net
MFC after:	5 days
2002-09-17 11:13:04 +00:00
Jennifer Yang
4a03a8a8c7 Tempary fix for inet6. The final fix is to change in6_pcbnotify to take pcbinfo instead
of pcbhead. It is on the way.
2002-09-17 03:19:43 +00:00
Maxim Sobolev
2b82e3b367 Remove superfluous break. 2002-09-10 09:18:33 +00:00
Maxim Sobolev
565bb857d0 Since from now on encap_input() also catches IPPROTO_MOBILE and IPPROTO_GRE
packets in addition to IPPROTO_IPV4 and IPPROTO_IPV6, explicitly specify
IPPROTO_IPV4 or IPPROTO_IPV6 instead of -1 when calling encap_attach().

MFC after:	28 days
		(along with other if_gre changes)
2002-09-09 09:36:47 +00:00
Maxim Sobolev
c23d234cce Reduce namespace pollution by staticizing everything, which doesn't need to
be visible from outside of the module.
2002-09-06 18:16:03 +00:00
Maxim Sobolev
8e96e13e6a Add a new gre(4) driver, which could be used to create GRE (RFC1701)
and MOBILE (RFC2004) IP tunnels.

Obrained from:  NetBSD
2002-09-06 17:12:50 +00:00
Bruce Evans
40545cf5fc Fixed namespace pollution in uma changes:
- use `struct uma_zone *' instead of uma_zone_t, so that <sys/uma.h> isn't
  a prerequisite.
- don't include <sys/uma.h>.
Namespace pollution makes "opaque" types like uma_zone_t perfectly
non-opaque.  Such types should never be used (see style(9)).

Fixed subsequently grwon dependencies of this header on its own pollution:
- include <sys/_mutex.h> and its prerequisite <sys/_lock.h> instead of
  depending on namespace pollution 2 layers deep in <sys/uma.h>.
2002-09-05 19:48:52 +00:00
Bruce Evans
c74af4fac1 Include <sys/mutex.h> and its prerequisite <sys/lock.h> instead of depending
on namespace pollution 4 layers deep in <netinet/in_pcb.h>.

Removed unused includes.  Sorted includes.
2002-09-05 15:33:30 +00:00
Maxim Sobolev
386fefa3a0 Add in_hosteq() and in_nullhost() macros to make life of developers
porting NetBSD code a little bit easier.

Obtained from:	NetBSD
2002-09-04 09:55:50 +00:00
Darren Reed
1851791868 some ipfilter files that accidently got imported here 2002-08-29 13:27:26 +00:00
Darren Reed
070700595d This commit was generated by cvs2svn to compensate for changes in r102514,
which included commits to RCS files with non-trunk default branches.
2002-08-28 13:26:01 +00:00
Philippe Charnier
93b0017f88 Replace various spelling with FALLTHROUGH which is lint()able 2002-08-25 13:23:09 +00:00
Crist J. Clark
784d7650f7 Lock the sysctl(8) knobs that turn ip{,6}fw(8) firewalling and
firewall logging on and off when at elevated securelevel(8). It would
be nice to be able to only lock these at securelevel >= 3, like rules
are, but there is no such functionality at present. I don't see reason
to be adding features to securelevel(8) with MAC being merged into 5.0.

PR:		kern/39396
Reviewed by:	luigi
MFC after:	1 week
2002-08-25 03:50:29 +00:00
Matthew Dillon
4f1e1f32b6 Correct bug in t_bw_rtttime rollover, #undef USERTT 2002-08-24 17:22:44 +00:00
Archie Cobbs
4a6a94d8d8 Replace (ab)uses of "NULL" where "0" is really meant. 2002-08-22 21:24:01 +00:00
Mike Barcroft
abbd890233 o Merge <machine/ansi.h> and <machine/types.h> into a new header
called <machine/_types.h>.
o <machine/ansi.h> will continue to live so it can define MD clock
  macros, which are only MD because of gratuitous differences between
  architectures.
o Change all headers to make use of this.  This mainly involves
  changing:
    #ifdef _BSD_FOO_T_
    typedef	_BSD_FOO_T_	foo_t;
    #undef _BSD_FOO_T_
    #endif
  to:
    #ifndef _FOO_T_DECLARED
    typedef	__foo_t	foo_t;
    #define	_FOO_T_DECLARED
    #endif

Concept by:	bde
Reviewed by:	jake, obrien
2002-08-21 16:20:02 +00:00
Don Lewis
26ef6ac4df Create new functions in_sockaddr(), in6_sockaddr(), and
in6_v4mapsin6_sockaddr() which allocate the appropriate sockaddr_in*
structure and initialize it with the address and port information passed
as arguments.  Use calls to these new functions to replace code that is
replicated multiple times in in_setsockaddr(), in_setpeeraddr(),
in6_setsockaddr(), in6_setpeeraddr(), in6_mapped_sockaddr(), and
in6_mapped_peeraddr().  Inline COMMON_END in tcp_usr_accept() so that
we can call in_sockaddr() with temporary copies of the address and port
after the PCB is unlocked.

Fix the lock violation in tcp6_usr_accept() (caused by calling MALLOC()
inside in6_mapped_peeraddr() while the PCB is locked) by changing
the implementation of tcp6_usr_accept() to match tcp_usr_accept().

Reviewed by:	suz
2002-08-21 11:57:12 +00:00
Juli Mallett
ded7008a07 Enclose IPv6 addresses in brackets when they are displayed printable with a
TCP/UDP port seperated by a colon.  This is for the log_in_vain facility.

Pointed out by:	Edward J. M. Brocklesby
Reviewed by:	ume
MFC after:	2 weeks
2002-08-19 19:47:13 +00:00
Luigi Rizzo
306fe283a1 Raise limit for port lists to 30 entries/ranges.
Remove a duplicate "logging" message, and identify the firewall
as ipfw2 in the boot message.
2002-08-19 04:45:01 +00:00
Matthew Dillon
1fcc99b5de Implement TCP bandwidth delay product window limiting, similar to (but
not meant to duplicate) TCP/Vegas.  Add four sysctls and default the
implementation to 'off'.

net.inet.tcp.inflight_enable	enable algorithm (defaults to 0=off)
net.inet.tcp.inflight_debug	debugging (defaults to 1=on)
net.inet.tcp.inflight_min	minimum window limit
net.inet.tcp.inflight_max	maximum window limit

MFC after:	1 week
2002-08-17 18:26:02 +00:00
Jeffrey Hsu
c068736a61 Cosmetic-only changes for readability.
Reviewed by:	(early form passed by) bde
Approved by:	itojun (from core@kame.net)
2002-08-17 02:05:25 +00:00
Luigi Rizzo
99e5e64504 sys/netinet/ip_fw2.c:
Implement the M_SKIP_FIREWALL bit in m_flags to avoid loops
    for firewall-generated packets (the constant has to go in sys/mbuf.h).

    Better comments on keepalive generation, and enforce dyn_rst_lifetime
    and dyn_fin_lifetime to be less than dyn_keepalive_period.

    Enforce limits (up to 64k) on the number of dynamic buckets, and
    retry allocation with smaller sizes.

    Raise default number of dynamic rules to 4096.

    Improved handling of set of rules -- now you can atomically
    enable/disable multiple sets, move rules from one set to another,
    and swap sets.

sbin/ipfw/ipfw2.c:

    userland support for "noerror" pipe attribute.

    userland support for sets of rules.

    minor improvements on rule parsing and printing.

sbin/ipfw/ipfw.8:

    more documentation on ipfw2 extensions, differences from ipfw1
    (so we can use the same manpage for both), stateful rules,
    and some additional examples.
    Feedback and more examples needed here.
2002-08-16 10:31:47 +00:00
Alfred Perlstein
e88894d39a make the strings for tcptimers, tanames and prurequests const to silence
warnings.
2002-08-16 09:07:59 +00:00
Robert Watson
365433d9b8 Code formatting sync to trustedbsd_mac: don't perform an assignment
in an if clause.

PR:
Submitted by:
Reviewed by:
Approved by:
Obtained from:
MFC after:
2002-08-15 22:04:31 +00:00
Robert Watson
fb95b5d3c3 Rename mac_check_socket_receive() to mac_check_socket_deliver() so that
we can use the names _receive() and _send() for the receive() and send()
checks.  Rename related constants, policy implementations, etc.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-08-15 18:51:27 +00:00
Jeffrey Hsu
b5addd8564 Reset dupack count in header prediction.
Follow-on to rev 1.39.

Reviewed by: jayanth, Thomas R Henderson <thomas.r.henderson@boeing.com>, silby, dillon
2002-08-15 17:13:18 +00:00
Luigi Rizzo
4bbf3b8b3a Kernel support for a dummynet option:
When a pipe or queue has the "noerror" attribute, do not report
drops to the caller (ip_output() and friends).
(2 lines to implement it, 2 lines to document it.)

This will let you simulate losses on the sender side as if they
happened in the middle of the network, i.e. with no explicit feedback
to the sender.

manpage and ipfw2.c changes to follow shortly, together with other
ipfw2 changes.

Requested by: silby
MFC after: 3 days
2002-08-15 16:53:43 +00:00
Robert Watson
ecd3e8ff5a It's now sufficient to rely on a nested include of _label.h to make sure
all structures in ip_var.h are defined, so remove include of mac.h.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-08-15 14:34:45 +00:00
Robert Watson
9daf40feaa Perform a nested include of _label.h if #ifdef _KERNEL. This will
satisfy consumers of ip_var.h that need a complete definition of
struct ipq and don't include mac.h.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-08-15 14:34:02 +00:00
Robert Watson
3b6aad64bf Add mac.h -- raw_ip.c was depending on nested inclusion of mac.h which
is no longer present.

Pointed out by:	bmilekic
Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-08-15 14:27:46 +00:00
Poul-Henning Kamp
ae89fdaba7 remove spurious printf 2002-08-13 19:13:23 +00:00
Jennifer Yang
3d6ade3a03 Assert that the inpcb lock is held when calling tcp_output().
Approved by:	hsu
2002-08-12 03:22:46 +00:00
Luigi Rizzo
43405724ec One bugfix and one new feature.
The bugfix (ipfw2.c) makes the handling of port numbers with
a dash in the name, e.g. ftp-data, consistent with old ipfw:
use \\ before the - to consider it as part of the name and not
a range separator.

The new feature (all this description will go in the manpage):

each rule now belongs to one of 32 different sets, which can
be optionally specified in the following form:

	ipfw add 100 set 23 allow ip from any to any

If "set N" is not specified, the rule belongs to set 0.

Individual sets can be disabled, enabled, and deleted with the commands:

	ipfw disable set N
	ipfw enable set N
	ipfw delete set N

Enabling/disabling of a set is atomic. Rules belonging to a disabled
set are skipped during packet matching, and they are not listed
unless you use the '-S' flag in the show/list commands.
Note that dynamic rules, once created, are always active until
they expire or their parent rule is deleted.
Set 31 is reserved for the default rule and cannot be disabled.

All sets are enabled by default. The enable/disable status of the sets
can be shown with the command

	ipfw show sets

Hopefully, this feature will make life easier to those who want to
have atomic ruleset addition/deletion/tests. Examples:

To add a set of rules atomically:

	ipfw disable set 18
	ipfw add ... set 18 ...		# repeat as needed
	ipfw enable set 18

To delete a set of rules atomically

	ipfw disable set 18
	ipfw delete set 18
	ipfw enable set 18

To test a ruleset and disable it and regain control if something
goes wrong:

	ipfw disable set 18
	ipfw add ... set 18 ...         # repeat as needed
	ipfw enable set 18 ; echo "done "; sleep 30 && ipfw disable set 18

    here if everything goes well, you press control-C before
    the "sleep" terminates, and your ruleset will be left
    active. Otherwise, e.g. if you cannot access your box,
    the ruleset will be disabled after the sleep terminates.

I think there is only one more thing that one might want, namely
a command to assign all rules in set X to set Y, so one can
test a ruleset using the above mechanisms, and once it is
considered acceptable, make it part of an existing ruleset.
2002-08-10 04:37:32 +00:00
Mike Silbersack
a9ce5e05b5 Handle PMTU discovery in syn-ack packets slightly differently;
rely on syncache flags instead of directly accessing the route
entry.

MFC after:	3 days
2002-08-05 22:34:15 +00:00
Luigi Rizzo
1cbd978e96 bugfix: move check for udp_blackhole before the one for icmp_bandlim.
MFC after: 3 days
2002-08-04 20:50:13 +00:00
Luigi Rizzo
ea779ff36c Fix handling of packets which matched an "ipfw fwd" rule on the input side. 2002-08-03 14:59:45 +00:00
Robert Watson
e316463a86 When preserving the IP header in extra mbuf in the IP forwarding
case, also preserve the MAC label.  Note that this mbuf allocation
is fairly non-optimal, but not my fault.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-08-02 20:45:27 +00:00
Robert Watson
09a555cbf9 Work to fix LINT build.
Reported by:	phk
2002-08-02 18:08:14 +00:00
Robert Watson
bdb3fa1832 Introduce support for Mandatory Access Control and extensible
kernel access control.

Add MAC support for the UDP protocol.  Invoke appropriate MAC entry
points to label packets that are generated by local UDP sockets,
and to authorize delivery of mbufs to local sockets both in the
multicast/broadcast case and the unicast case.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-08-01 21:37:34 +00:00
Robert Watson
d00e44fb4a Document the undocumented assumption that at least one of the PCB
pointer and incoming mbuf pointer will be non-NULL in tcp_respond().
This is relied on by the MAC code for correctness, as well as
existing code.

Obtained from:	TrustedBSD PRoject
Sponsored by:	DARPA, NAI Labs
2002-08-01 03:54:43 +00:00
Robert Watson
0070e096d7 Introduce support for Mandatory Access Control and extensible
kernel access control.

Add support for labeling most out-going ICMP messages using an
appropriate MAC entry point.  Currently, we do not explicitly
label packet reflect (timestamp, echo request) ICMP events,
implicitly using the originating packet label since the mbuf is
reused.  This will be made explicit at some point.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-08-01 03:53:04 +00:00
Robert Watson
c488362e1a Introduce support for Mandatory Access Control and extensible
kernel access control.

Instrument the TCP socket code for packet generation and delivery:
label outgoing mbufs with the label of the socket, and check socket and
mbuf labels before permitting delivery to a socket.  Assign labels
to newly accepted connections when the syncache/cookie code has done
its business.  Also set peer labels as convenient.  Currently,
MAC policies cannot influence the PCB matching algorithm, so cannot
implement polyinstantiation.  Note that there is at least one case
where a PCB is not available due to the TCP packet not being associated
with any socket, so we don't label in that case, but need to handle
it in a special manner.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-07-31 19:06:49 +00:00
Robert Watson
4ea889c666 Introduce support for Mandatory Access Control and extensible
kernel access control.

Instrument the raw IP socket code for packet generation and delivery:
label outgoing mbufs with the label of the socket, and check the
socket and mbuf labels before permitting delivery to a socket,
permitting MAC policies to selectively allow delivery of raw IP mbufs
to various raw IP sockets that may be open.  Restructure the policy
checking code to compose IPsec and MAC results in a more readable
manner.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-07-31 18:30:34 +00:00
Robert Watson
4ed84624a2 Introduce support for Mandatory Access Control and extensible
kernel access control.

When fragmenting an IP datagram, invoke an appropriate MAC entry
point so that MAC labels may be copied (...) to the individual
IP fragment mbufs by MAC policies.

When IP options are inserted into an IP datagram when leaving a
host, preserve the label if we need to reallocate the mbuf for
alignment or size reasons.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-07-31 17:21:01 +00:00
Robert Watson
36b0360b37 Introduce support for Mandatory Access Control and extensible
kernel access control.

Instrument the code managing IP fragment reassembly queues (struct ipq)
to invoke appropriate MAC entry points to maintain a MAC label on
each queue.  Permit MAC policies to associate information with a queue
based on the mbuf that caused it to be created, update that information
based on further mbufs accepted by the queue, influence the decision
making process by which mbufs are accepted to the queue, and set the
label of the mbuf holding the reassembled datagram following reassembly
completetion.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-07-31 17:17:51 +00:00
Robert Watson
0ec4b12334 Introduce support for Mandatory Access Control and extensible
kernel access control.

When generating an IGMP message, invoke a MAC entry point to permit
the MAC framework to label its mbuf appropriately for the target
interface.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-07-31 16:46:56 +00:00
Robert Watson
19527d3e22 Introduce support for Mandatory Access Control and extensible
kernel access control.

When generating an ARP query, invoke a MAC entry point to permit the
MAC framework to label its mbuf appropriately for the interface.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-07-31 16:45:16 +00:00
Robert Watson
d3990b06e1 Introduce support for Mandatory Access Control and extensible
kernel access control.

Invoke the MAC framework to label mbuf created using divert sockets.
These labels may later be used for access control on delivery to
another socket, or to an interface.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI LAbs
2002-07-31 16:42:47 +00:00
Robert Watson
549e4c9e4e Introduce support for Mandatory Access Control and extensible
kernel access control.

Label IP fragment reassembly queues, permitting security features to
be maintained on those objects.  ipq_label will be used to manage
the reassembly of fragments into IP datagrams using security
properties.  This permits policies to deny the reassembly of fragments,
as well as influence the resulting label of a datagram following
reassembly.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-07-30 23:09:20 +00:00
Maxim Konovalov
d46a53126c Use a common way to release locks before exit.
Reviewed by:	hsu
2002-07-29 09:01:39 +00:00
Don Lewis
5c38b6dbce Wire the sysctl output buffer before grabbing any locks to prevent
SYSCTL_OUT() from blocking while locks are held.  This should
only be done when it would be inconvenient to make a temporary copy of
the data and defer calling SYSCTL_OUT() until after the locks are
released.
2002-07-28 19:59:31 +00:00
Hajimu UMEMOTO
66ef17c4b6 make setsockopt(IPV6_V6ONLY, 0) actuall work for tcp6.
MFC after:	1 week
2002-07-25 18:10:04 +00:00
Hajimu UMEMOTO
eccb7001ee cleanup usage of ip6_mapped_addr_on and ip6_v6only. now,
ip6_mapped_addr_on is unified into ip6_v6only.

MFC after:	1 week
2002-07-25 17:40:45 +00:00
Luigi Rizzo
be1826c354 Only log things net.inet.ip.fw.verbose is set 2002-07-24 02:41:19 +00:00
Ruslan Ermilov
61a875d706 Don't forget to recalculate the IP checksum of the original
IP datagram embedded into ICMP error message.

Spotted by:	tcpdump 3.7.1 (-vvv)
MFC after:	3 days
2002-07-23 00:16:19 +00:00
Ruslan Ermilov
88c39af35f Don't shrink socket buffers in tcp_mss(), application might have already
configured them with setsockopt(SO_*BUF), for RFC1323's scaled windows.

PR:		kern/11966
MFC after:	1 week
2002-07-22 22:31:09 +00:00
Hajimu UMEMOTO
854d3b19a2 do not refer to IN6P_BINDV6ONLY anymore.
Obtained from:	KAME
MFC after:	1 week
2002-07-22 15:51:02 +00:00
John Polstra
8ea8a6804b Fix overflows in intermediate calculations in sysctl_msec_to_ticks().
At hz values of 1000 and above the overflows caused net.inet.tcp.keepidle
to be reported as negative.

MFC after:	3 days
2002-07-20 23:48:59 +00:00
Robert Watson
69dac2ea47 Don't export 'struct ipq' from kernel, instead #ifdef _KERNEL. As kernel
data structures pick up security and synchronization primitives, it
becomes increasingly desirable not to arbitrarily export them via
include files to userland, as the userland applications pick up new
#include dependencies.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-07-20 22:46:20 +00:00
Matthew Dillon
d65bf08af3 Add the tcps_sndrexmitbad statistic, keep track of late acks that caused
unnecessary retransmissions.
2002-07-19 18:29:38 +00:00
Matthew Dillon
701bec5a38 Introduce two new sysctl's:
net.inet.tcp.rexmit_min (default 3 ticks equiv)

    This sysctl is the retransmit timer RTO minimum,
    specified in milliseconds.  This value is
    designed for algorithmic stability only.

net.inet.tcp.rexmit_slop (default 200ms)

    This sysctl is the retransmit timer RTO slop
    which is added to every retransmit timeout and
    is designed to handle protocol stack overheads
    and delayed ack issues.

Note that the *original* code applied a 1-second
RTO minimum but never applied real slop to the RTO
calculation, so any RTO calculation over one second
would have no slop and thus not account for
protocol stack overheads (TCP timestamps are not
a measure of protocol turnaround!).  Essentially,
the original code made the RTO calculation almost
completely irrelevant.

Please note that the 200ms slop is debateable.
This commit is not meant to be a line in the sand,
and if the community winds up deciding that increasing
it is the correct solution then it's easy to do.
Note that larger values will destroy performance
on lossy networks while smaller values may result in
a greater number of unnecessary retransmits.
2002-07-18 19:06:12 +00:00
Luigi Rizzo
90780c4b05 Move IPFW2 definition before including ip_fw.h
Make indentation of new parts consistent with the style used for this file.
2002-07-18 05:18:41 +00:00
Matthew Dillon
22fd54d461 I don't know how the minimum retransmit timeout managed to get set to
one second but it badly breaks throughput on networks with minor packet
loss.

Complaints by: at least two people tracked down to this.
MFC after:	3 days
2002-07-17 23:32:03 +00:00
Luigi Rizzo
318aa87b59 Fix a panic when doing "ipfw add pipe 1 log ..."
Also synchronize ip_dummynet.c with the version in RELENG_4 to
ease MFC's.
2002-07-17 07:21:42 +00:00
Luigi Rizzo
a8c102a2ec Implement keepalives for dynamic rules, so they will not expire
just because you leave your session idle.

Also, put in a fix for 64-bit architectures (to be revised).

In detail:

ip_fw.h

  * Reorder fields in struct ip_fw to avoid alignment problems on
    64-bit machines. This only masks the problem, I am still not
    sure whether I am doing something wrong in the code or there
    is a problem elsewhere (e.g. different aligmnent of structures
    between userland and kernel because of pragmas etc.)

  * added fields in dyn_rule to store ack numbers, so we can
    generate keepalives when the dynamic rule is about to expire

ip_fw2.c

  * use a local function, send_pkt(), to generate TCP RST for Reset rules;

  * save about 250 bytes by cleaning up the various snprintf()
    in ipfw_log() ...

  * ... and use twice as many bytes to implement keepalives
    (this seems to be working, but i have not tested it extensively).

Keepalives are generated once every 5 seconds for the last 20 seconds
of the lifetime of a dynamic rule for an established TCP flow.  The
packets are sent to both sides, so if at least one of the endpoints
is responding, the timeout is refreshed and the rule will not expire.

You can disable this feature with

        sysctl net.inet.ip.fw.dyn_keepalive=0

(the default is 1, to have them enabled).

MFC after: 1 day

(just kidding... I will supply an updated version of ipfw2 for
RELENG_4 tomorrow).
2002-07-14 23:47:18 +00:00
Luigi Rizzo
3956b02345 Avoid dereferencing a null pointer in ro_rt.
This was always broken in HEAD (the offending statement was introduced
in rev. 1.123 for HEAD, while RELENG_4 included this fix (in rev.
1.99.2.12 for RELENG_4) and I inadvertently deleted it in 1.99.2.30.

So I am also restoring these two lines in RELENG_4 now.
We might need another few things from 1.99.2.30.
2002-07-12 22:08:47 +00:00
Don Lewis
2d20c83f93 Back out the previous change, since it looks like locking udbinfo provides
sufficient protection.
2002-07-12 09:55:48 +00:00
Don Lewis
bb1dd7a45a Lock inp while we're accessing it. 2002-07-12 08:05:22 +00:00
Don Lewis
0e1eebb846 Defer calling SYSCTL_OUT() until after the locks have been released. 2002-07-11 23:18:43 +00:00
Don Lewis
142b2bd644 Reduce the nesting level of a code block that doesn't need to be in
an else clause.
2002-07-11 23:13:31 +00:00
Luigi Rizzo
c7ea683135 Change one variable to make it easier to switch between ipfw and ipfw2 2002-07-09 06:53:38 +00:00
Luigi Rizzo
b3063f064c Fix a bug caused by dereferencing an invalid pointer when
no punch_fw was used.
Fix another couple of bugs which prevented rules from being
installed properly.

On passing, use IPFW2 instead of NEW_IPFW to compile the new code,
and slightly simplify the instruction generation code.
2002-07-08 22:57:35 +00:00
Luigi Rizzo
d63b346ab1 No functional changes, but:
Following Darren's suggestion, make Dijkstra happy and rewrite the
ipfw_chk() main loop removing a lot of goto's and using instead a
variable to store match status.

Add a lot of comments to explain what instructions are supposed to
do and how -- this should ease auditing of the code and make people
more confident with it.

In terms of code size: the entire file takes about 12700 bytes of text,
about 3K of which are for the main function, ipfw_chk(), and 2K (ouch!)
for ipfw_log().
2002-07-08 22:46:01 +00:00
Luigi Rizzo
7d4d3e9051 Remove one unused command name. 2002-07-08 22:39:19 +00:00
Luigi Rizzo
5185195169 Forgot to update one field name in one of the latest commits. 2002-07-08 22:37:55 +00:00
Luigi Rizzo
5e43aef891 Implement the last 2-3 missing instructions for ipfw,
now it should support all the instructions of the old ipfw.

Fix some bugs in the user interface, /sbin/ipfw.

Please check this code against your rulesets, so i can fix the
remaining bugs (if any, i think they will be mostly in /sbin/ipfw).

Once we have done a bit of testing, this code is ready to be MFC'ed,
together with a bunch of other changes (glue to ipfw, and also the
removal of some global variables) which have been in -current for
a couple of weeks now.

MFC after: 7 days
2002-07-05 22:43:06 +00:00
Brian Somers
27cc91fbf8 Remove trailing whitespace 2002-07-01 11:19:40 +00:00
Jesper Skriver
eb538bfd64 Extend the effect of the sysctl net.inet.tcp.icmp_may_rst
so that, if we recieve a ICMP "time to live exceeded in transit",
(type 11, code 0) for a TCP connection on SYN-SENT state, close
the connection.

MFC after:	2 weeks
2002-06-30 20:07:21 +00:00
Jonathan Lemon
0080a004d7 One possible code path for syncache_respond() is:
syncache_respond(A), ip_output(), ip_input(), tcp_input(), syncache_badack(B)

Which winds up deleting a different entry from the syncache.  Handle
this by not utilizing the next entry in the timer chain until after
syncache_respond() completes.  The case of A == B should not be possible.

Problem found by: Don Bowman <don@sandvine.com>
2002-06-28 19:12:38 +00:00
Doug Rabson
24f8fd9fd1 Fix warning.
Reviewed by: luigi
2002-06-28 08:36:26 +00:00
Luigi Rizzo
9758b77ff1 The new ipfw code.
This code makes use of variable-size kernel representation of rules
(exactly the same concept of BPF instructions, as used in the BSDI's
firewall), which makes firewall operation a lot faster, and the
code more readable and easier to extend and debug.

The interface with the rest of the system is unchanged, as witnessed
by this commit. The only extra kernel files that I am touching
are if_fw.h and ip_dummynet.c, which is quite tied to ipfw. In
userland I only had to touch those programs which manipulate the
internal representation of firewall rules).

The code is almost entirely new (and I believe I have written the
vast majority of those sections which were taken from the former
ip_fw.c), so rather than modifying the old ip_fw.c I decided to
create a new file, sys/netinet/ip_fw2.c .  Same for the user
interface, which is in sbin/ipfw/ipfw2.c (it still compiles to
/sbin/ipfw).  The old files are still there, and will be removed
in due time.

I have not renamed the header file because it would have required
touching a one-line change to a number of kernel files.

In terms of user interface, the new "ipfw" is supposed to accepts
the old syntax for ipfw rules (and produce the same output with
"ipfw show". Only a couple of the old options (out of some 30 of
them) has not been implemented, but they will be soon.

On the other hand, the new code has some very powerful extensions.
First, you can put "or" connectives between match fields (and soon
also between options), and write things like

ipfw add allow ip from { 1.2.3.4/27 or 5.6.7.8/30 } 10-23,25,1024-3000 to any

This should make rulesets slightly more compact (and lines longer!),
by condensing 2 or more of the old rules into single ones.

Also, as an example of how easy the rules can be extended, I have
implemented an 'address set' match pattern, where you can specify
an IP address in a format like this:

        10.20.30.0/26{18,44,33,22,9}

which will match the set of hosts listed in braces belonging to the
subnet 10.20.30.0/26 . The match is done using a bitmap, so it is
essentially a constant time operation requiring a handful of CPU
instructions (and a very small amount of memmory -- for a full /24
subnet, the instruction only consumes 40 bytes).

Again, in this commit I have focused on functionality and tried
to minimize changes to the other parts of the system. Some performance
improvement can be achieved with minor changes to the interface of
ip_fw_chk_t. This will be done later when this code is settled.

The code is meant to compile unmodified on RELENG_4 (once the
PACKET_TAG_* changes have been merged), for this reason
you will see #ifdef __FreeBSD_version in a couple of places.
This should minimize errors when (hopefully soon) it will be time
to do the MFC.
2002-06-27 23:02:18 +00:00
Maxime Henrion
7627c6cbcc Warning fixes for 64 bits platforms. With this last fix,
I can build a GENERIC sparc64 kernel with -Werror.

Reviewed by:	luigi
2002-06-27 11:02:06 +00:00
Luigi Rizzo
713a6ea063 Just a comment on some additional consistency checks that could
be added here.
2002-06-26 21:00:53 +00:00
Kenneth D. Merry
98cb733c67 At long last, commit the zero copy sockets code.
MAKEDEV:	Add MAKEDEV glue for the ti(4) device nodes.

ti.4:		Update the ti(4) man page to include information on the
		TI_JUMBO_HDRSPLIT and TI_PRIVATE_JUMBOS kernel options,
		and also include information about the new character
		device interface and the associated ioctls.

man9/Makefile:	Add jumbo.9 and zero_copy.9 man pages and associated
		links.

jumbo.9:	New man page describing the jumbo buffer allocator
		interface and operation.

zero_copy.9:	New man page describing the general characteristics of
		the zero copy send and receive code, and what an
		application author should do to take advantage of the
		zero copy functionality.

NOTES:		Add entries for ZERO_COPY_SOCKETS, TI_PRIVATE_JUMBOS,
		TI_JUMBO_HDRSPLIT, MSIZE, and MCLSHIFT.

conf/files:	Add uipc_jumbo.c and uipc_cow.c.

conf/options:	Add the 5 options mentioned above.

kern_subr.c:	Receive side zero copy implementation.  This takes
		"disposable" pages attached to an mbuf, gives them to
		a user process, and then recycles the user's page.
		This is only active when ZERO_COPY_SOCKETS is turned on
		and the kern.ipc.zero_copy.receive sysctl variable is
		set to 1.

uipc_cow.c:	Send side zero copy functions.  Takes a page written
		by the user and maps it copy on write and assigns it
		kernel virtual address space.  Removes copy on write
		mapping once the buffer has been freed by the network
		stack.

uipc_jumbo.c:	Jumbo disposable page allocator code.  This allocates
		(optionally) disposable pages for network drivers that
		want to give the user the option of doing zero copy
		receive.

uipc_socket.c:	Add kern.ipc.zero_copy.{send,receive} sysctls that are
		enabled if ZERO_COPY_SOCKETS is turned on.

		Add zero copy send support to sosend() -- pages get
		mapped into the kernel instead of getting copied if
		they meet size and alignment restrictions.

uipc_syscalls.c:Un-staticize some of the sf* functions so that they
		can be used elsewhere.  (uipc_cow.c)

if_media.c:	In the SIOCGIFMEDIA ioctl in ifmedia_ioctl(), avoid
		calling malloc() with M_WAITOK.  Return an error if
		the M_NOWAIT malloc fails.

		The ti(4) driver and the wi(4) driver, at least, call
		this with a mutex held.  This causes witness warnings
		for 'ifconfig -a' with a wi(4) or ti(4) board in the
		system.  (I've only verified for ti(4)).

ip_output.c:	Fragment large datagrams so that each segment contains
		a multiple of PAGE_SIZE amount of data plus headers.
		This allows the receiver to potentially do page
		flipping on receives.

if_ti.c:	Add zero copy receive support to the ti(4) driver.  If
		TI_PRIVATE_JUMBOS is not defined, it now uses the
		jumbo(9) buffer allocator for jumbo receive buffers.

		Add a new character device interface for the ti(4)
		driver for the new debugging interface.  This allows
		(a patched version of) gdb to talk to the Tigon board
		and debug the firmware.  There are also a few additional
		debugging ioctls available through this interface.

		Add header splitting support to the ti(4) driver.

		Tweak some of the default interrupt coalescing
		parameters to more useful defaults.

		Add hooks for supporting transmit flow control, but
		leave it turned off with a comment describing why it
		is turned off.

if_tireg.h:	Change the firmware rev to 12.4.11, since we're really
		at 12.4.11 plus fixes from 12.4.13.

		Add defines needed for debugging.

		Remove the ti_stats structure, it is now defined in
		sys/tiio.h.

ti_fw.h:	12.4.11 firmware.

ti_fw2.h:	12.4.11 firmware, plus selected fixes from 12.4.13,
		and my header splitting patches.  Revision 12.4.13
		doesn't handle 10/100 negotiation properly.  (This
		firmware is the same as what was in the tree previously,
		with the addition of header splitting support.)

sys/jumbo.h:	Jumbo buffer allocator interface.

sys/mbuf.h:	Add a new external mbuf type, EXT_DISPOSABLE, to
		indicate that the payload buffer can be thrown away /
		flipped to a userland process.

socketvar.h:	Add prototype for socow_setup.

tiio.h:		ioctl interface to the character portion of the ti(4)
		driver, plus associated structure/type definitions.

uio.h:		Change prototype for uiomoveco() so that we'll know
		whether the source page is disposable.

ufs_readwrite.c:Update for new prototype of uiomoveco().

vm_fault.c:	In vm_fault(), check to see whether we need to do a page
		based copy on write fault.

vm_object.c:	Add a new function, vm_object_allocate_wait().  This
		does the same thing that vm_object allocate does, except
		that it gives the caller the opportunity to specify whether
		it should wait on the uma_zalloc() of the object structre.

		This allows vm objects to be allocated while holding a
		mutex.  (Without generating WITNESS warnings.)

		vm_object_allocate() is implemented as a call to
		vm_object_allocate_wait() with the malloc flag set to
		M_WAITOK.

vm_object.h:	Add prototype for vm_object_allocate_wait().

vm_page.c:	Add page-based copy on write setup, clear and fault
		routines.

vm_page.h:	Add page based COW function prototypes and variable in
		the vm_page structure.

Many thanks to Drew Gallatin, who wrote the zero copy send and receive
code, and to all the other folks who have tested and reviewed this code
over the years.
2002-06-26 03:37:47 +00:00
Jeffrey Hsu
6fd22caf91 Avoid unlocking the inp twice if badport_bandlim() returns -1.
Reported by:	jlemon
2002-06-24 22:25:00 +00:00
Jeffrey Hsu
f14e4cfe33 Style bug: fix 4 space indentations that should have been tabs.
Submitted by:	jlemon
2002-06-24 16:47:02 +00:00
Luigi Rizzo
f10e85d797 Slightly restructure the #ifdef INET6 sections to make the code
more readable.

Remove the six "register" attributes from variables tcp_output(), the
compiler surely knows well how to allocate them.
2002-06-23 21:25:36 +00:00
Luigi Rizzo
410bb1bfe2 Move two global variables to automatic variables within the
only function where they are used (they are used with TCPDEBUG only).
2002-06-23 21:22:56 +00:00
Luigi Rizzo
4d2e36928d Move some global variables in more appropriate places.
Add XXX comments to mark places which need to be taken care of
if we want to remove this part of the kernel from Giant.

Add a comment on a potential performance problem with ip_forward()
2002-06-23 20:48:26 +00:00
Luigi Rizzo
51aed12e52 fix bad indentation and whitespace resulting from cut&paste 2002-06-23 09:15:43 +00:00
Luigi Rizzo
dfd1ae2f86 fix indentation of a comment 2002-06-23 09:14:24 +00:00
Luigi Rizzo
a5924d6100 fix a typo in a comment 2002-06-23 09:13:46 +00:00