Commit Graph

1291 Commits

Author SHA1 Message Date
Ruslan Ermilov
3277d1c498 Fixed the brain-o in rev. 1.10: the logic check was reversed.
Reported by:	Bernd Fuerwitt <bf@fuerwitt.de>
2001-06-27 14:11:25 +00:00
Ruslan Ermilov
a447a5ae06 Bring in fix from NetBSD's revision 1.16:
Pass the correct destination address for the route-to-gateway case.

PR:		kern/10607
MFC after:	2 weeks
2001-06-26 09:00:50 +00:00
David Malone
7ce87f1205 Allow getcred sysctl to work in jailed root processes. Processes can
only do getcred calls for sockets which were created in the same jail.
This should allow the ident to work in a reasonable way within jails.

PR:		28107
Approved by:	des, rwatson
2001-06-24 12:18:27 +00:00
Jonathan Lemon
f962cba5c3 Replace bzero() of struct ip with explicit zeroing of structure members,
which is faster.
2001-06-23 17:44:27 +00:00
Ruslan Ermilov
c73d99b567 Add netstat(1) knob to reset net.inet.{ip|icmp|tcp|udp|igmp}.stats.
For example, ``netstat -s -p ip -z'' will show and reset IP stats.

PR:		bin/17338
2001-06-23 17:17:59 +00:00
Mike Silbersack
08517d530e Eliminate the allocation of a tcp template structure for each
connection.  The information contained in a tcptemp can be
reconstructed from a tcpcb when needed.

Previously, tcp templates required the allocation of one
mbuf per connection.  On large systems, this change should
free up a large number of mbufs.

Reviewed by:	bmilekic, jlemon, ru
MFC after: 2 weeks
2001-06-23 03:21:46 +00:00
Munechika SUMIKAWA
a96c00661a - Renumber KAME local ICMP types and NDP options numberes beacaues they
are duplicated by newly defined types/options in RFC3121
- We have no backward compatibility issue. There is no apps in our
  distribution which use the above types/options.

Obtained from:	KAME
MFC after:	2 weeks
2001-06-21 07:08:43 +00:00
Hajimu UMEMOTO
ff2428299f made sure to use the correct sa_len for rtalloc().
sizeof(ro_dst) is not necessarily the correct one.
this change would also fix the recent path MTU discovery problem for the
destination of an incoming TCP connection.

Submitted by:	JINMEI Tatuya <jinmei@kame.net>
Obtained from:	KAME
MFC after:	2 weeks
2001-06-20 12:32:48 +00:00
Jonathan Lemon
08aadfbb98 Do not perform arp send/resolve on an interface marked NOARP.
PR: 25006
MFC after: 2 weeks
2001-06-15 21:00:32 +00:00
Peter Wemm
215db1379e Fix a stack of KAME netinet6/in6.h warnings:
592: warning: `struct mbuf' declared inside parameter list
595: warning: `struct ifnet' declared inside parameter list
2001-06-15 00:37:27 +00:00
Hajimu UMEMOTO
3384154590 Sync with recent KAME.
This work was based on kame-20010528-freebsd43-snap.tgz and some
critical problem after the snap was out were fixed.
There are many many changes since last KAME merge.

TODO:
  - The definitions of SADB_* in sys/net/pfkeyv2.h are still different
    from RFC2407/IANA assignment because of binary compatibility
    issue.  It should be fixed under 5-CURRENT.
  - ip6po_m member of struct ip6_pktopts is no longer used.  But, it
    is still there because of binary compatibility issue.  It should
    be removed under 5-CURRENT.

Reviewed by:	itojun
Obtained from:	KAME
MFC after:	3 weeks
2001-06-11 12:39:29 +00:00
Jesper Skriver
96c2b04290 Make the default value of net.inet.ip.maxfragpackets and
net.inet6.ip6.maxfragpackets dependent on nmbclusters,
defaulting to nmbclusters / 4

Reviewed by:	bde
MFC after:	1 week
2001-06-10 11:04:10 +00:00
Peter Wemm
0978669829 "Fix" the previous initial attempt at fixing TUNABLE_INT(). This time
around, use a common function for looking up and extracting the tunables
from the kernel environment.  This saves duplicating the same function
over and over again.  This way typically has an overhead of 8 bytes + the
path string, versus about 26 bytes + the path string.
2001-06-08 05:24:21 +00:00
Jonathan Lemon
0a52f59c36 Move IPFilter into contrib. 2001-06-07 05:13:35 +00:00
Peter Wemm
4422746fdf Back out part of my previous commit. This was a last minute change
and I botched testing.  This is a perfect example of how NOT to do
this sort of thing. :-(
2001-06-07 03:17:26 +00:00
Peter Wemm
81930014ef Make the TUNABLE_*() macros look and behave more consistantly like the
SYSCTL_*() macros.  TUNABLE_INT_DECL() was an odd name because it didn't
actually declare the int, which is what the name suggests it would do.
2001-06-06 22:17:08 +00:00
Jesper Skriver
65f28919b3 Silby's take one on increasing FreeBSD's resistance to SYN floods:
One way we can reduce the amount of traffic we send in response to a SYN
flood is to eliminate the RST we send when removing a connection from
the listen queue.  Since we are being flooded, we can assume that the
majority of connections in the queue are bogus.  Our RST is unwanted
by these hosts, just as our SYN-ACK was.  Genuine connection attempts
will result in hosts responding to our SYN-ACK with an ACK packet.  We
will automatically return a RST response to their ACK when it gets to us
if the connection has been dropped, so the early RST doesn't serve the
genuine class of connections much.  In summary, we can reduce the number
of packets we send by a factor of two without any loss in functionality
by ensuring that RST packets are not sent when dropping a connection
from the listen queue.

Submitted by:	Mike Silbersack <silby@silby.com>
Reviewed by:	jesper
MFC after:	2 weeks
2001-06-06 19:41:51 +00:00
Brian Somers
f987e1bd0f Add BSD-style copyright headers
Approved by: Charles Mott <cmott@scientech.com>
2001-06-04 15:09:51 +00:00
Brian Somers
888b1a7aa5 Change to a standard BSD-style copyright
Approved by:	Atsushi Murai <amurai@spec.co.jp>
2001-06-04 14:52:17 +00:00
Jesper Skriver
690a6055ff Prevent denial of service using bogus fragmented IPv4 packets.
A attacker sending a lot of bogus fragmented packets to the target
(with different IPv4 identification field - ip_id), may be able
to put the target machine into mbuf starvation state.

By setting a upper limit on the number of reassembly queues we
prevent this situation.

This upper limit is controlled by the new sysctl
net.inet.ip.maxfragpackets which defaults to 200,
as the IPv6 case, this should be sufficient for most
systmes, but you might want to increase it if you have
lots of TCP sessions.
I'm working on making the default value dependent on
nmbclusters.

If you want old behaviour (no upper limit) set this sysctl
to a negative value.

If you don't want to accept any fragments (not recommended)
set the sysctl to 0 (zero).

Obtained from:	NetBSD
MFC after:	1 week
2001-06-03 23:33:23 +00:00
Kris Kennaway
64dddc1872 Add ``options RANDOM_IP_ID'' which randomizes the ID field of IP packets.
This closes a minor information leak which allows a remote observer to
determine the rate at which the machine is generating packets, since the
default behaviour is to increment a counter for each packet sent.

Reviewed by:    -net
Obtained from:  OpenBSD
2001-06-01 10:02:28 +00:00
David E. O'Brien
240ef84277 Back out jesper's 2001/05/31 14:58:11 PDT commit. It does not compile. 2001-06-01 09:51:14 +00:00
Jesper Skriver
2b1a209a17 Prevent denial of service using bogus fragmented IPv4 packets.
A attacker sending a lot of bogus fragmented packets to the target
(with different IPv4 identification field - ip_id), may be able
to put the target machine into mbuf starvation state.

By setting a upper limit on the number of reassembly queues we
prevent this situation.

This upper limit is controlled by the new sysctl
net.inet.ip.maxfragpackets which defaults to NMBCLUSTERS/4

If you want old behaviour (no upper limit) set this sysctl
to a negative value.

If you don't want to accept any fragments (not recommended)
set the sysctl to 0 (zero)

Obtained from:	NetBSD (partially)
MFC after:	1 week
2001-05-31 21:57:29 +00:00
Jesper Skriver
7ceb778366 Disable rfc1323 and rfc1644 TCP extensions if we havn't got
any response to our third SYN to work-around some broken
terminal servers (most of which have hopefully been retired)
that have bad VJ header compression code which trashes TCP
segments containing unknown-to-them TCP options.

PR:		kern/1689
Submitted by:	jesper
Reviewed by:	wollman
MFC after:	2 weeks
2001-05-31 19:24:49 +00:00
Ruslan Ermilov
79ec1c507a Add an integer field to keep protocol-specific flags with links.
For FTP control connection, keep the CRLF end-of-line termination
status in there.

Fixed the bug when the first FTP command in a session was ignored.

PR:		24048
MFC after:	1 week
2001-05-30 14:24:35 +00:00
Jesper Skriver
e4b6428171 Inline TCP_REASS() in the single location where it's used,
just as OpenBSD and NetBSD has done.

No functional difference.

MFC after:	2 weeks
2001-05-29 19:54:45 +00:00
Jesper Skriver
853be1226e properly delay acks in half-closed TCP connections
PR:	24962
Submitted by:	Tony Finch <dot@dotat.at>
MFC after:	2 weeks
2001-05-29 19:51:45 +00:00
Ruslan Ermilov
9185426827 In in_ifadown(), differentiate between whether the interface goes
down or interface address is deleted.  Only delete static routes
in the latter case.

Reported by:	Alexander Leidinger <Alexander@leidinger.net>
2001-05-11 14:37:34 +00:00
Mark Murray
fb919e4d5a Undo part of the tangle of having sys/lock.h and sys/mutex.h included in
other "system" header files.

Also help the deprecation of lockmgr.h by making it a sub-include of
sys/lock.h and removing sys/lockmgr.h form kernel .c files.

Sort sys/*.h includes where possible in affected files.

OK'ed by:	bde (with reservations)
2001-05-01 08:13:21 +00:00
Jesper Skriver
d1745f454d Say goodbye to TCP_COMPAT_42
Reviewed by:	wollman
Requested by:	wollman
2001-04-20 11:58:56 +00:00
Kris Kennaway
f0a04f3f51 Randomize the TCP initial sequence numbers more thoroughly.
Obtained from:	OpenBSD
Reviewed by:	jesper, peter, -developers
2001-04-17 18:08:01 +00:00
Darren Reed
454a43c1f1 fix security hole created by fragment cache 2001-04-06 15:52:28 +00:00
Bill Fumerola
0901f62e11 pipe/queue are the only consumers of flow_id, so only set it in those cases 2001-04-06 06:52:25 +00:00
Jesper Skriver
b77d155dd3 MFC candidate.
Change code from PRC_UNREACH_ADMIN_PROHIB to PRC_UNREACH_PORT for
ICMP_UNREACH_PROTOCOL and ICMP_UNREACH_PORT

And let TCP treat PRC_UNREACH_PORT like PRC_UNREACH_ADMIN_PROHIB

This should fix the case where port unreachables for udp returned
ENETRESET instead of ECONNREFUSED

Problem found by:	Bill Fenner <fenner@research.att.com>
Reviewed by:		jlemon
2001-03-28 14:13:19 +00:00
Ruslan Ermilov
4a558355e5 MAN[1-9] -> MAN. 2001-03-27 17:27:19 +00:00
Yaroslav Tykhiy
4cbc8ad1bb Add a missing m_pullup() before a mtod() in in_arpinput().
PR: kern/22177
Reviewed by: wollman
2001-03-27 12:34:58 +00:00
Hidetoshi Shimokawa
110a013333 Replace dyn_fin_lifetime with dyn_ack_lifetime for half-closed state.
Half-closed state could last long for some connections and fin_lifetime
(default 20sec) is too short for that.

OK'ed by: luigi
2001-03-27 05:28:30 +00:00
Poul-Henning Kamp
f83880518b Send the remains (such as I have located) of "block major numbers" to
the bit-bucket.
2001-03-26 12:41:29 +00:00
Brian Somers
71593f95e0 Make header files conform to style(9).
Reviewed by (*): bde

(*) alias_local.h only got a cursory glance.
2001-03-25 12:05:10 +00:00
Brian Somers
adad9908fa Remove an extraneous declaration. 2001-03-25 03:34:29 +00:00
Hajimu UMEMOTO
2da24fa6e9 IPv4 address is not unsigned int. This change introduces in_addr_t.
PR:		9982
Adviced by:	des
Reviewed by:	-alpha and -net (no objection)
Obtained from:	OpenBSD
2001-03-23 18:59:31 +00:00
Brian Somers
30fcf11451 Remove (non-protected) variable names from function prototypes. 2001-03-22 11:55:26 +00:00
Paul Richards
1789d85615 Only flush rules that have a rule number above that set by a new
sysctl, net.inet.ip.fw.permanent_rules.

This allows you to install rules that are persistent across flushes,
which is very useful if you want a default set of rules that
maintains your access to remote machines while you're reconfiguring
the other rules.

Reviewed by:	Mark Murray <markm@FreeBSD.org>
2001-03-21 08:19:31 +00:00
Dag-Erling Smørgrav
c59319bf1a Axe TCP_RESTRICT_RST. It was never a particularly good idea except for a few
very specific scenarios, and now that we have had net.inet.tcp.blackhole for
quite some time there is really no reason to use it any more.

(last of three commits)
2001-03-19 22:09:00 +00:00
Ruslan Ermilov
1e3d5af041 Invalidate cached forwarding route (ipforward_rt) whenever a new route
is added to the routing table, otherwise we may end up using the wrong
route when forwarding.

PR:		kern/10778
Reviewed by:	silence on -net
2001-03-19 09:16:16 +00:00
Ruslan Ermilov
4078ffb154 Make sure the cached forwarding route (ipforward_rt) is still up before
using it.  Not checking this may have caused the wrong IP address to be
used when processing certain IP options (see example below).  This also
caused the wrong route to be passed to ip_output() when forwarding, but
fortunately ip_output() is smart enough to detect this.

This example demonstrates the wrong behavior of the Record Route option
observed with this bug.  Host ``freebsd'' is acting as the gateway for
the ``sysv''.

1. On the gateway, we add the route to the destination.  The new route
   will use the primary address of the loopback interface, 127.0.0.1:

:  freebsd# route add 10.0.0.66 -iface lo0 -reject
:  add host 10.0.0.66: gateway lo0

2. From the client, we ping the destination.  We see the correct replies.
   Please note that this also causes the relevant route on the ``freebsd''
   gateway to be cached in ipforward_rt variable:

:  sysv# ping -snv 10.0.0.66
:  PING 10.0.0.66: 56 data bytes
:  ICMP Host Unreachable from gateway 192.168.0.115
:  ICMP Host Unreachable from gateway 192.168.0.115
:  ICMP Host Unreachable from gateway 192.168.0.115
:
:  ----10.0.0.66 PING Statistics----
:  3 packets transmitted, 0 packets received, 100% packet loss

3. On the gateway, we delete the route to the destination, thus making
   the destination reachable through the `default' route:

:  freebsd# route delete 10.0.0.66
:  delete host 10.0.0.66

4. From the client, we ping destination again, now with the RR option
   turned on.  The surprise here is the 127.0.0.1 in the first reply.
   This is caused by the bug in ip_rtaddr() not checking the cached
   route is still up befor use.  The debug code also shows that the
   wrong (down) route is further passed to ip_output().  The latter
   detects that the route is down, and replaces the bogus route with
   the valid one, so we see the correct replies (192.168.0.115) on
   further probes:

:  sysv# ping -snRv 10.0.0.66
:  PING 10.0.0.66: 56 data bytes
:  64 bytes from 10.0.0.66: icmp_seq=0. time=10. ms
:    IP options:  <record route> 127.0.0.1, 10.0.0.65, 10.0.0.66,
:                                192.168.0.65, 192.168.0.115, 192.168.0.120,
:                                0.0.0.0(Current), 0.0.0.0, 0.0.0.0
:  64 bytes from 10.0.0.66: icmp_seq=1. time=0. ms
:    IP options:  <record route> 192.168.0.115, 10.0.0.65, 10.0.0.66,
:                                192.168.0.65, 192.168.0.115, 192.168.0.120,
:                                0.0.0.0(Current), 0.0.0.0, 0.0.0.0
:  64 bytes from 10.0.0.66: icmp_seq=2. time=0. ms
:    IP options:  <record route> 192.168.0.115, 10.0.0.65, 10.0.0.66,
:                                192.168.0.65, 192.168.0.115, 192.168.0.120,
:                                0.0.0.0(Current), 0.0.0.0, 0.0.0.0
:
:  ----10.0.0.66 PING Statistics----
:  3 packets transmitted, 3 packets received, 0% packet loss
:  round-trip (ms)  min/avg/max = 0/3/10
2001-03-18 13:04:07 +00:00
Poul-Henning Kamp
462b86fe91 <sys/queue.h> makeover. 2001-03-16 20:00:53 +00:00
Poul-Henning Kamp
ccd6f42dc9 Fix a style(9) nit. 2001-03-16 19:36:23 +00:00
Ruslan Ermilov
089cdfad78 net/route.c:
A route generated from an RTF_CLONING route had the RTF_WASCLONED flag
  set but did not have a reference to the parent route, as documented in
  the rtentry(9) manpage.  This prevented such routes from being deleted
  when their parent route is deleted.

  Now, for example, if you delete an IP address from a network interface,
  all ARP entries that were cloned from this interface route are flushed.

  This also has an impact on netstat(1) output.  Previously, dynamically
  created ARP cache entries (RTF_STATIC flag is unset) were displayed as
  part of the routing table display (-r).  Now, they are only printed if
  the -a option is given.

netinet/in.c, netinet/in_rmx.c:

  When address is removed from an interface, also delete all routes that
  point to this interface and address.  Previously, for example, if you
  changed the address on an interface, outgoing IP datagrams might still
  use the old address.  The only solution was to delete and re-add some
  routes.  (The problem is easily observed with the route(8) command.)

  Note, that if the socket was already bound to the local address before
  this address is removed, new datagrams generated from this socket will
  still be sent from the old address.

PR:		kern/20785, kern/21914
Reviewed by:	wollman (the idea)
2001-03-15 14:52:12 +00:00
Ruslan Ermilov
206a3274ef RFC768 (UDP) requires that "if the computed checksum is zero, it
is transmitted as all ones".  This got broken after introduction
of delayed checksums as follows.  Some guys (including Jonathan)
think that it is allowed to transmit all ones in place of a zero
checksum for TCP the same way as for UDP.  (The discussion still
takes place on -net.)  Thus, the 0 -> 0xffff checksum fixup was
first moved from udp_output() (see udp_usrreq.c, 1.64 -> 1.65)
to in_cksum_skip() (see sys/i386/i386/in_cksum.c, 1.17 -> 1.18,
INVERT expression).  Besides that I disagree that it is valid for
TCP, there was no real problem until in_cksum.c,v 1.20, where the
in_cksum() was made just a special version of in_cksum_skip().
The side effect was that now every incoming IP datagram failed to
pass the checksum test (in_cksum() returned 0xffff when it should
actually return zero).  It was fixed next day in revision 1.21,
by removing the INVERT expression.  The latter also broke the
0 -> 0xffff fixup for UDP checksums.

Before this change:
: tcpdump: listening on lo0
: 127.0.0.1.33005 > 127.0.0.1.33006:  udp 0 (ttl 64, id 1)
:                          4500 001c 0001 0000 4011 7cce 7f00 0001
:                          7f00 0001 80ed 80ee 0008 0000

After this change:
: tcpdump: listening on lo0
: 127.0.0.1.33005 > 127.0.0.1.33006:  udp 0 (ttl 64, id 1)
:                          4500 001c 0001 0000 4011 7cce 7f00 0001
:                          7f00 0001 80ed 80ee 0008 ffff
2001-03-13 17:07:06 +00:00
Ruslan Ermilov
fb9aaba000 Count and show incoming UDP datagrams with no checksum. 2001-03-13 13:26:06 +00:00
Poul-Henning Kamp
503d3c0277 Correctly cleanup in case of failure to bind a pcb.
PR:		25751
Submitted by:	<unicorn@Forest.Od.UA>
2001-03-12 21:53:23 +00:00
Jonathan Lemon
1db24ffb98 Unbreak LINT.
Pointed out by: phk
2001-03-12 02:57:42 +00:00
Ian Dowse
5d936aa181 In ip_output(), initialise `ia' in the case where the packet has
come from a dummynet pipe. Without this, the code which increments
the per-ifaddr stats can dereference an uninitialised pointer. This
should make dummynet usable again.

Reported by:	"Dmitry A. Yanko" <fm@astral.ntu-kpi.kiev.ua>
Reviewed by:	luigi, joe
2001-03-11 17:50:19 +00:00
Ruslan Ermilov
8ce3f3dd28 Make it possible to use IP_TTL and IP_TOS setsockopt(2) options
on certain types of SOCK_RAW sockets.  Also, use the ip.ttl MIB
variable instead of MAXTTL constant as the default time-to-live
value for outgoing IP packets all over the place, as we already
do this for TCP and UDP.

Reviewed by:	wollman
2001-03-09 12:22:51 +00:00
Jonathan Lemon
c0647e0d07 Push the test for a disconnected socket when accept()ing down to the
protocol layer.  Not all protocols behave identically.  This fixes the
brokenness observed with unix-domain sockets (and postfix)
2001-03-09 08:16:40 +00:00
Jonathan Lemon
32676c2d1f The TCP sequence number used for sending a RST with the ipfw reset rule
is already in host byte order, so do not swap it again.

Reviewed by:	bfumerola
2001-03-09 08:13:08 +00:00
Ian Dowse
bfef7ed45c It was possible for ip_forward() to supply to icmp_error()
an IP header with ip_len in network byte order. For certain
values of ip_len, this could cause icmp_error() to write
beyond the end of an mbuf, causing mbuf free-list corruption.
This problem was observed during generation of ICMP redirects.

We now make quite sure that the copy of the IP header kept
for icmp_error() is stored in a non-shared mbuf header so
that it will not be modified by ip_output().

Also:
- Calculate the correct number of bytes that need to be
  retained for icmp_error(), instead of assuming that 64
  is enough (it's not).
- In icmp_error(), use m_copydata instead of bcopy() to
  copy from the supplied mbuf chain, in case the first 8
  bytes of IP payload are not stored directly after the IP
  header.
- Sanity-check ip_len in icmp_error(), and panic if it is
  less than sizeof(struct ip). Incoming packets with bad
  ip_len values are discarded in ip_input(), so this should
  only be triggered by bugs in the code, not by bad packets.

This patch results from code and suggestions from Ruslan, Bosko,
Jonathan Lemon and Matt Dillon, with important testing by Mike
Tancsa, who could reproduce this problem at will.

Reported by:	Mike Tancsa <mike@sentex.net>
Reviewed by:	ru, bmilekic, jlemon, dillon
2001-03-08 19:03:26 +00:00
Don Lewis
a8f1210095 Modify the comments to more closely resemble the English language. 2001-03-05 22:40:27 +00:00
Don Lewis
3f67c83439 Move the loopback net check closer to the beginning of ip_input() so that
it doesn't block packets whose destination address has been translated to
the loopback net by ipnat.

Add warning comments about the ip_checkinterface feature.
2001-03-05 08:45:05 +00:00
Bosko Milekic
234ff7c46f During a flood, we don't call rtfree(), but we remove the entry ourselves.
However, if the RTF_DELCLONE and RTF_WASCLONED condition passes, but the ref
count is > 1, we won't decrement the count at all. This could lead to
route entries never being deleted.

Here, we call rtfree() not only if the initial two conditions fail, but
also if the ref count is > 1 (and we therefore don't immediately delete
the route, but let rtfree() handle it).

This is an urgent MFC candidate. Thanks go to Mike Silbersack for the
fix, once again. :-)

Submitted by: Mike Silbersack <silby@silby.com>
2001-03-04 21:28:40 +00:00
Don Lewis
e15ae1b226 Disable interface checking for packets subject to "ipfw fwd".
Chris Johnson <cjohnson@palomine.net> tested this fix in -stable.
2001-03-04 03:22:36 +00:00
Don Lewis
823db0e9dd Disable interface checking when IP forwarding is engaged so that packets
addressed to the interface on the other side of the box follow their
historical path.

Explicitly block packets sent to the loopback network sent from the outside,
which is consistent with the behavior of the forwarding path between
interfaces as implemented in in_canforward().

Always check the arrival interface when matching the packet destination
against the interface broadcast addresses.  This bug allowed TCP
connections to be made to the broadcast address of an interface on the
far side of the system because the M_BCAST flag was not set because the
packet was unicast to the interface on the near side.  This was broken
when the directed broadcast code was removed from revision 1.32.  If
the directed broadcast code was stil present, the destination would not
have been recognized as local until the packet was forwarded to the output
interface and ether_output() looped a copy back to ip_input() with
M_BCAST set and the receive interface set to the output interface.

Optimize the order of the tests.

Reviewed by:	jlemon
2001-03-04 01:39:19 +00:00
Jonathan Lemon
b3e95d4ed0 Add a new sysctl net.inet.ip.check_interface, which will verify that
an incoming packet arrivees on an interface that has an address matching
the packet's address.  This is turned on by default.
2001-03-02 20:54:03 +00:00
Poul-Henning Kamp
970680fad8 Fix jails. 2001-02-28 09:38:48 +00:00
Jonathan Lemon
7538a9a0f8 When iterating over our list of interface addresses in order to determine
if an arriving packet belongs to us, also check that the packet arrived
through the correct interface.  Skip this check if the packet was locally
generated.
2001-02-27 19:43:14 +00:00
Bill Fumerola
2a6cb8804e The TCP header-specific section suffered a little bit of bitrot recently:
When we recieve a fragmented TCP packet (other than the first) we can't
extract header information (we don't have state to reference). In a rather
unelegant fashion we just move on and assume a non-match.

Recent additions to the TCP header-specific section of the code neglected
to add the logic to the fragment code so in those cases the match was
assumed to be positive and those parts of the rule (which should have
resulted in a non-match/continue) were instead skipped (which means
the processing of the rule continued even though it had already not
matched).

Fault can be spread out over Rich Steenbergen (tcpoptions) and myself
(tcp{seq,ack,win}).

rwatson sent me a patch that got me thinking about this whole situation
(but what I'm committing / this description is mine so don't blame him).
2001-02-27 10:20:44 +00:00
Jonathan Lemon
7d42e30c2e Use more aggressive retransmit timeouts for the initial SYN packet.
As we currently drop the connection after 4 retransmits + 2 ICMP errors,
this allows initial connection attempts to be dropped much faster.
2001-02-26 21:33:55 +00:00
Jonathan Lemon
c693a045de Remove in_pcbnotify and use in_pcblookup_hash to find the cb directly.
For TCP, verify that the sequence number in the ICMP packet falls within
the tcp receive window before performing any actions indicated by the
icmp packet.

Clean up some layering violations (access to tcp internals from in_pcb)
2001-02-26 21:19:47 +00:00
Jeroen Ruigrok van der Werven
b9af273fe3 Remove struct full_tcpiphdr{}.
This piece of code has not been referenced since it was put there
in 1995.  Also done a codebased search on popular networking libraries
and third-party applications.  This is an orphan.

Reviewed by:	jesper
2001-02-26 20:10:16 +00:00
Jeroen Ruigrok van der Werven
05f15c3dc3 Remove conditionals for vax support.
People who care much about this are welcomed to try 2.11BSD. :)

Noticed by:	luigi
Reviewed by:	jesper
2001-02-26 20:05:32 +00:00
Jesper Skriver
694a9ff95b Remove tcp_drop_all_states, which is unneeded after jlemon removed it
from tcp_subr.c in rev 1.92
2001-02-25 17:20:19 +00:00
Jonathan Lemon
d8c85a260f Do not delay a new ack if there already is a delayed ack pending on the
connection, but send it immediately.  Prior to this change, it was possible
to delay a delayed-ack for multiple times, resulting in degraded TCP
behavior in certain corner cases.
2001-02-25 15:17:24 +00:00
Jonathan Lemon
c484d1a38c When converting soft error into a hard error, drop the connection. The
error will be passed up to the user, who will close the connection, so
it does not appear to make a sense to leave the connection open.

This also fixes a bug with kqueue, where the filter does not set EOF
on the connection, because the connection is still open.

Also remove calls to so{rw}wakeup, as we aren't doing anything with
them at the moment anyway.

Reviewed by: alfred, jesper
2001-02-23 21:07:06 +00:00
Jonathan Lemon
e4bb5b0572 Allow ICMP unreachables which map into PRC_UNREACH_ADMIN_PROHIB to
reset TCP connections which are in the SYN_SENT state, if the sequence
number in the echoed ICMP reply is correct.  This behavior can be
controlled by the sysctl net.inet.tcp.icmp_may_rst.

Currently, only subtypes 2,3,10,11,12 are treated as such
(port, protocol and administrative unreachables).

Assocaiate an error code with these resets which is reported to the
user application: ENETRESET.

Disallow resetting TCP sessions which are not in a SYN_SENT state.

Reviewed by: jesper, -net
2001-02-23 20:51:46 +00:00
Jesper Skriver
d1c54148b7 Redo the security update done in rev 1.54 of src/sys/netinet/tcp_subr.c
and 1.84 of src/sys/netinet/udp_usrreq.c

The changes broken down:

- remove 0 as a wildcard for addresses and port numbers in
  src/sys/netinet/in_pcb.c:in_pcbnotify()
- add src/sys/netinet/in_pcb.c:in_pcbnotifyall() used to notify
  all sessions with the specific remote address.
- change
  - src/sys/netinet/udp_usrreq.c:udp_ctlinput()
  - src/sys/netinet/tcp_subr.c:tcp_ctlinput()
  to use in_pcbnotifyall() to notify multiple sessions, instead of
  using in_pcbnotify() with 0 as src address and as port numbers.
- remove check for src port == 0 in
  - src/sys/netinet/tcp_subr.c:tcp_ctlinput()
  - src/sys/netinet/udp_usrreq.c:udp_ctlinput()
  as they are no longer needed.
- move handling of redirects and host dead from in_pcbnotify() to
  udp_ctlinput() and tcp_ctlinput(), so they will call
  in_pcbnotifyall() to notify all sessions with the specific
  remote address.

Approved by:	jlemon
Inspired by:    NetBSD
2001-02-22 21:23:45 +00:00
Jesper Skriver
43c77c8f5f Backout change in 1.153, as it violate rfc1122 section 3.2.1.3.
Requested by:	jlemon,ru
2001-02-21 16:59:47 +00:00
Robert Watson
91421ba234 o Move per-process jail pointer (p->pr_prison) to inside of the subject
credential structure, ucred (cr->cr_prison).
o Allow jail inheritence to be a function of credential inheritence.
o Abstract prison structure reference counting behind pr_hold() and
  pr_free(), invoked by the similarly named credential reference
  management functions, removing this code from per-ABI fork/exit code.
o Modify various jail() functions to use struct ucred arguments instead
  of struct proc arguments.
o Introduce jailed() function to determine if a credential is jailed,
  rather than directly checking pointers all over the place.
o Convert PRISON_CHECK() macro to prison_check() function.
o Move jail() function prototypes to jail.h.
o Emulate the P_JAILED flag in fill_kinfo_proc() and no longer set the
  flag in the process flags field itself.
o Eliminate that "const" qualifier from suser/p_can/etc to reflect
  mutex use.

Notes:

o Some further cleanup of the linux/jail code is still required.
o It's now possible to consider resolving some of the process vs
  credential based permission checking confusion in the socket code.
o Mutex protection of struct prison is still not present, and is
  required to protect the reference count plus some fields in the
  structure.

Reviewed by:	freebsd-arch
Obtained from:	TrustedBSD Project
2001-02-21 06:39:57 +00:00
Jesper Skriver
58e9b41722 Only call in_pcbnotify if the src port number != 0, as we
treat 0 as a wildcard in src/sys/in_pbc.c:in_pcbnotify()

It's sufficient to check for src|local port, as we'll have no
sessions with src|local port == 0

Without this a attacker sending ICMP messages, where the attached
IP header (+ 8 bytes) has the address and port numbers == 0, would
have the ICMP message applied to all sessions.

PR:		kern/25195
Submitted by:	originally by jesper, reimplimented by jlemon's advice
Reviewed by:	jlemon
Approved by:	jlemon
2001-02-20 23:25:04 +00:00
Jesper Skriver
2b18d82220 Send a ICMP unreachable instead of dropping the packet silent, if we
receive a packet not for us, and forwarding disabled.

PR:		kern/24512
Reviewed by:	jlemon
Approved by:	jlemon
2001-02-20 21:31:47 +00:00
Jesper Skriver
c2221099a9 Remove unneeded loop increment in src/sys/netinet/in_pcb.c:in_pcbnotify
Forgotten by phk, when committing fix in kern/23986

PR:		kern/23986
Reviewed by:	phk
Approved by:	phk
2001-02-20 21:11:29 +00:00
Brian Feldman
c0511d3b58 Switch to using a struct xucred instead of a struct xucred when not
actually in the kernel.  This structure is a different size than
what is currently in -CURRENT, but should hopefully be the last time
any application breakage is caused there.  As soon as any major
inconveniences are removed, the definition of the in-kernel struct
ucred should be conditionalized upon defined(_KERNEL).

This also changes struct export_args to remove dependency on the
constantly-changing struct ucred, as well as limiting the bounds
of the size fields to the correct size.  This means: a) mountd and
friends won't break all the time, b) mountd and friends won't crash
the kernel all the time if they don't know what they're doing wrt
actual struct export_args layout.

Reviewed by:	bde
2001-02-18 13:30:20 +00:00
Poul-Henning Kamp
90fcbbd635 Remove unneeded loop increment in src/sys/netinet/in_pcb.c:in_pcbnotify
Add new PRC_UNREACH_ADMIN_PROHIB in sys/sys/protosw.h

Remove condition on TCP in src/sys/netinet/ip_icmp.c:icmp_input

In src/sys/netinet/ip_icmp.c:icmp_input set code = PRC_UNREACH_ADMIN_PROHIB
or PRC_UNREACH_HOST for all unreachables except ICMP_UNREACH_NEEDFRAG

Rename sysctl icmp_admin_prohib_like_rst to icmp_unreach_like_rst
to reflect the fact that we also react on ICMP unreachables that
are not administrative prohibited.  Also update the comments to
reflect this.

In sys/netinet/tcp_subr.c:tcp_ctlinput add code to treat
PRC_UNREACH_ADMIN_PROHIB and PRC_UNREACH_HOST different.

PR:		23986
Submitted by:	Jesper Skriver <jesper@skriver.dk>
2001-02-18 09:34:55 +00:00
Luigi Rizzo
c1b843c774 remove unused data structure definition, and corresponding macro into*() 2001-02-18 07:10:03 +00:00
Jonathan Lemon
7c45cb9bca Clean up warning. 2001-02-15 22:32:06 +00:00
Jeroen Ruigrok van der Werven
e61c4bedda Add definitions for IPPROTO numbers 55-57. 2001-02-14 13:51:20 +00:00
Poul-Henning Kamp
bb07ec8c84 Introduce a new feature in IPFW: Check of the source or destination
address is configured on a interface.  This is useful for routers with
dynamic interfaces.  It is now possible to say:

        0100 allow       tcp from any to any established
        0200 skipto 1000 tcp from any to any
        0300 allow       ip from any to any
        1000 allow       tcp from 1.2.3.4 to me 22
        1010 deny        tcp from any to me 22
        1020 allow       tcp from any to any

and not have to worry about the behaviour if dynamic interfaces configure
new IP numbers later on.

The check is semi expensive (traverses the interface address list)
so it should be protected as in the above example if high performance
is a requirement.
2001-02-13 14:12:37 +00:00
Bosko Milekic
a57815efd2 Clean up RST ratelimiting. Previously, ratelimiting occured before tests
were performed to determine if the received packet should be reset. This
created erroneous ratelimiting and false alarms in some cases. The code
has now been reorganized so that the checks for validity come before
the call to badport_bandlim. Additionally, a few changes in the symbolic
names of the bandlim types have been made, as well as a clarification of
exactly which type each RST case falls under.

Submitted by: Mike Silbersack <silby@silby.com>
2001-02-11 07:39:51 +00:00
Luigi Rizzo
7e1cd0d23d Sync with the bridge/dummynet/ipfw code already tested in stable.
In ip_fw.[ch] change a couple of variable and field names to
avoid having types, variables and fields with the same name.
2001-02-10 00:10:18 +00:00
Jeroen Ruigrok van der Werven
1a6e52d0e9 Fix typo: seperate -> separate.
Seperate does not exist in the english language.
2001-02-06 11:21:58 +00:00
Poul-Henning Kamp
6817526d14 Convert if_multiaddrs from LIST to TAILQ so that it can be traversed
backwards in the three drivers which want to do that.

Reviewed by:    mikeh
2001-02-06 10:12:15 +00:00
Julian Elischer
41d2ba5e27 Fix bad patch from a few days ago. It broke some bridging. 2001-02-05 21:25:27 +00:00
Poul-Henning Kamp
37d4006626 Another round of the <sys/queue.h> FOREACH transmogriffer.
Created with:   sed(1)
Reviewed by:    md5(1)
2001-02-04 16:08:18 +00:00
Darren Reed
185b71c73e fix duplicate rcsid 2001-02-04 15:25:15 +00:00
Darren Reed
f590526d0a fix conflicts 2001-02-04 14:26:56 +00:00
Poul-Henning Kamp
fc2ffbe604 Mechanical change to use <sys/queue.h> macro API instead of
fondling implementation details.

Created with: sed(1)
Reviewed by: md5(1)
2001-02-04 13:13:25 +00:00
Poul-Henning Kamp
ef9e85abba Use <sys/queue.h> macro API. 2001-02-04 12:37:48 +00:00
Julian Elischer
c8f8e9c110 Make the code act the same in the case of BRIDGE being defined, but not
turned on, and the case of it not being defined at all.
i.e. Disabling bridging re-enables some of the checks it disables.

Submitted by: "Rogier R. Mulhuijzen" <drwilco@drwilco.net>
2001-02-03 17:25:21 +00:00
Jonathan Lemon
007581c0d8 When turning off TCP_NOPUSH, call tcp_output to immediately flush
out any data pending in the buffer.

Submitted by: Tony Finch <dot@dotat.at>
2001-02-02 18:48:25 +00:00
Luigi Rizzo
507b4b5432 MFS: bridge/ipfw/dummynet fixes (bridge.c will be committed separately) 2001-02-02 00:18:00 +00:00
Brian Somers
435ff15c3b Add a few ``const''s to silence some -Wwrite-strings warnings 2001-01-29 11:44:13 +00:00
Brian Somers
4834b77d04 Ignore leading witespace in the string given to PacketAliasProxyRule(). 2001-01-29 00:30:01 +00:00
Luigi Rizzo
f8acf87bb5 Make sure we do not follow an invalid pointer in ipfw_report
when we get an incomplete packet or m_pullup fails.
2001-01-27 02:31:08 +00:00
Luigi Rizzo
26fb17bdd0 Minor cleanups after yesterday's patch.
The code (bridging and dummynet) actually worked fine!
2001-01-26 19:43:54 +00:00
Luigi Rizzo
6258acf88f Bring dummynet in line with the code that now works in -STABLE.
It compiles, but I cannot test functionality yet.
2001-01-26 06:49:34 +00:00
Luigi Rizzo
7a726a2dd1 Pass up errors returned by dummynet. The same should be done with
divert.
2001-01-25 02:06:38 +00:00
Garrett Wollman
a589a70ee1 Correct a comment. 2001-01-24 16:25:36 +00:00
Wes Peters
550b151850 When attempting to bind to an ephemeral port, if no such port is
available, the error return should be EADDRNOTAVAIL rather than
EAGAIN.

PR:		14181
Submitted by:	Dima Dorfman <dima@unixfreak.org>
Reviewed by:	Garrett Wollman <wollman@khavrinen.lcs.mit.edu>
2001-01-23 07:27:56 +00:00
Luigi Rizzo
8b2cd62d7d Change critical section protection for dummynet from splnet() to
splimp() -- we need it because dummynet can be invoked by the
bridging code at splimp().

This should cure the pipe "stalls" that several people have been
reporting on -stable while using bridging+dummynet (the problem
would not affect routers using dummynet).
2001-01-22 23:04:13 +00:00
Dag-Erling Smørgrav
a3ea6d41b9 First step towards an MP-safe zone allocator:
- have zalloc() and zfree() always lock the vm_zone.
 - remove zalloci() and zfreei(), which are now redundant.

Reviewed by:	bmilekic, jasone
2001-01-21 22:23:11 +00:00
Luigi Rizzo
ec97c79e30 Document data structures and operation on dummynet so next time
I or someone else browse through this code I do not have a hard
time understanding what is going on.
2001-01-17 01:09:40 +00:00
Luigi Rizzo
5da48f88bd Some dummynet patches that I forgot to commit last summer.
One of them fixes a potential panic when bridging is used and
you run out of mbufs (though i have no idea if the bug has
ever hit anyone).
2001-01-16 23:49:49 +00:00
Bosko Milekic
987efc765e Prototype inet_ntoa_r and thereby silence a warning from GCC. The function
is prototyped immediately under inet_ntoa, which is also from libkern.
2001-01-12 07:47:53 +00:00
Robert Watson
46a27060af o Minor style(9)ism to make consistent with -STABLE 2001-01-09 18:26:17 +00:00
Robert Watson
65450f2f77 o IPFW incorrectly handled filtering in the presence of previously
reserved and now allocated TCP flags in incoming packets.  This patch
  stops overloading those bits in the IP firewall rules, and moves
  colliding flags to a seperate field, ipflg.  The IPFW userland
  management tool, ipfw(8), is updated to reflect this change.  New TCP
  flags related to ECN are now included in tcp.h for reference, although
  we don't currently implement TCP+ECN.

o To use this fix without completely rebuilding, it is sufficient to copy
  ip_fw.h and tcp.h into your appropriate include directory, then rebuild
  the ipfw kernel module, and ipfw tool, and install both.  Note that a
  mismatch between module and userland tool will result in incorrect
  installation of firewall rules that may have unexpected effects.  This
  is an MFC candidate, following shakedown.  This bug does not appear
  to affect ipfilter.

Reviewed by:	security-officer, billf
Reported by:	Aragon Gouveia <aragon@phat.za.net>
2001-01-09 03:10:30 +00:00
Alfred Perlstein
3269187d41 provide a sysctl 'net.link.ether.inet.log_arp_wrong_iface' to allow one
to supress logging when ARP replies arrive on the wrong interface:
 "/kernel: arp: 1.2.3.4 is on dc0 but got reply from 00:00:c5:79:d0:0c on dc1"

the default is to log just to give notice about possibly incorrectly
configured networks.
2001-01-06 00:45:08 +00:00
Alfred Perlstein
da289f07ee Fix incorrect logic wouldn't disconnect incomming connections that had been
disconnected because they were not full.

Submitted by: David Filo
2001-01-03 19:50:23 +00:00
Assar Westerlund
598ce68dbd include tcp header files to get the prototype for tcp_seq_vs_sess 2000-12-27 03:02:29 +00:00
Poul-Henning Kamp
442fad6798 Update the "icmp_admin_prohib_like_rst" code to check the tcp-window and
to be configurable with respect to acting only in SYN or in all TCP states.

PR:		23665
Submitted by:	Jesper Skriver <jesper@skriver.dk>
2000-12-24 10:57:21 +00:00
Bosko Milekic
2a0c503e7a * Rename M_WAIT mbuf subsystem flag to M_TRYWAIT.
This is because calls with M_WAIT (now M_TRYWAIT) may not wait
  forever when nothing is available for allocation, and may end up
  returning NULL. Hopefully we now communicate more of the right thing
  to developers and make it very clear that it's necessary to check whether
  calls with M_(TRY)WAIT also resulted in a failed allocation.
  M_TRYWAIT basically means "try harder, block if necessary, but don't
  necessarily wait forever." The time spent blocking is tunable with
  the kern.ipc.mbuf_wait sysctl.
  M_WAIT is now deprecated but still defined for the next little while.

* Fix a typo in a comment in mbuf.h

* Fix some code that was actually passing the mbuf subsystem's M_WAIT to
  malloc(). Made it pass M_WAITOK instead. If we were ever to redefine the
  value of the M_WAIT flag, this could have became a big problem.
2000-12-21 21:44:31 +00:00
Bill Fumerola
16cd6db04f Use getmicrotime() instead of microtime() when timestamping ICMP packets,
the former is quicker and accurate enough for use here.

Submitted by:	Jason Slagle <raistlin@toledolink.com> (on IRC)
Reviewed by:	phk
2000-12-16 21:39:48 +00:00
Poul-Henning Kamp
b11d7a4a2f We currently does not react to ICMP administratively prohibited
messages send by routers when they deny our traffic, this causes
a timeout when trying to connect to TCP ports/services on a remote
host, which is blocked by routers or firewalls.

rfc1122 (Requirements for Internet Hosts) section 3.2.2.1 actually
requi re that we treat such a message for a TCP session, that we
treat it like if we had recieved a RST.

quote begin.

            A Destination Unreachable message that is received MUST be
            reported to the transport layer.  The transport layer SHOULD
            use the information appropriately; for example, see Sections
            4.1.3.3, 4.2.3.9, and 4.2.4 below.  A transport protocol
            that has its own mechanism for notifying the sender that a
            port is unreachable (e.g., TCP, which sends RST segments)
            MUST nevertheless accept an ICMP Port Unreachable for the
            same purpose.

quote end.

I've written a small extension that implement this, it also create
a sysctl "net.inet.tcp.icmp_admin_prohib_like_rst" to control if
this new behaviour is activated.

When it's activated (set to 1) we'll treat a ICMP administratively
prohibited message (icmp type 3 code 9, 10 and 13) for a TCP
sessions, as if we recived a TCP RST, but only if the TCP session
is in SYN_SENT state.

The reason for only reacting when in SYN_SENT state, is that this
will solve the problem, and at the same time minimize the risk of
this being abused.

I suggest that we enable this new behaviour by default, but it
would be a change of current behaviour, so if people prefer to
leave it disabled by default, at least for now, this would be ok
for me, the attached diff actually have the sysctl set to 0 by
default.

PR:		23086
Submitted by:	Jesper Skriver <jesper@skriver.dk>
2000-12-16 19:42:06 +00:00
Bosko Milekic
09f81a46a5 Change the following:
1.  ICMP ECHO and TSTAMP replies are now rate limited.
  2.  RSTs generated due to packets sent to open and unopen ports
      are now limited by seperate counters.
  3.  Each rate limiting queue now has its own description, as
      follows:

      Limiting icmp unreach response from 439 to 200 packets per second
      Limiting closed port RST response from 283 to 200 packets per second
      Limiting open port RST response from 18724 to 200 packets per second
      Limiting icmp ping response from 211 to 200 packets per second
      Limiting icmp tstamp response from 394 to 200 packets per second

Submitted by: Mike Silbersack <silby@silby.com>
2000-12-15 21:45:49 +00:00
David Malone
7cc0979fd6 Convert more malloc+bzero to malloc+M_ZERO.
Submitted by:	josh@zipperup.org
Submitted by:	Robert Drehmel <robd@gmx.net>
2000-12-08 21:51:06 +00:00
Poul-Henning Kamp
959b7375ed Staticize some malloc M_ instances. 2000-12-08 20:09:00 +00:00
Jonathan Lemon
df5e198723 Lock down the network interface queues. The queue mutex must be obtained
before adding/removing packets from the queue.  Also, the if_obytes and
if_omcasts fields should only be manipulated under protection of the mutex.

IF_ENQUEUE, IF_PREPEND, and IF_DEQUEUE perform all necessary locking on
the queue.  An IF_LOCK macro is provided, as well as the old (mutex-less)
versions of the macros in the form _IF_ENQUEUE, _IF_QFULL, for code which
needs them, but their use is discouraged.

Two new macros are introduced: IF_DRAIN() to drain a queue, and IF_HANDOFF,
which takes care of locking/enqueue, and also statistics updating/start
if necessary.
2000-11-25 07:35:38 +00:00
Jonathan Lemon
e82ac18e52 Revert the last commit to the callout interface, and add a flag to
callout_init() indicating whether the callout is safe or not.  Update
the callers of callout_init() to reflect the new interface.

Okayed by: Jake
2000-11-25 06:22:16 +00:00
Bosko Milekic
a352dd9a71 Fixup (hopefully) bridging + ipfw + dummynet together...
* Some dummynet code incorrectly handled a malloc()-allocated pseudo-mbuf
  header structure, called "pkt," and could consequently pollute the mbuf
  free list if it was ever passed to m_freem(). The fix involved passing not
  pkt, but essentially pkt->m_next (which is a real mbuf) to the mbuf
  utility routines.

* Also, for dummynet, in bdg_forward(), made the code copy the ethernet header
  back into the mbuf (prepended) because the dummynet code that follows expects
  it to be there but it is, unfortunately for dummynet, passed to bdg_forward
  as a seperate argument.

PRs: kern/19551 ; misc/21534 ; kern/23010
Submitted by: Thomas Moestl <tmoestl@gmx.net>
Reviewed by: bmilekic
Approved by: luigi
2000-11-23 22:25:03 +00:00
Ruslan Ermilov
1b7b85c4d6 mdoc(7) police: use the new feature of the An macro. 2000-11-22 08:47:35 +00:00
Bosko Milekic
0a1df235ba While I'm here, get rid of (now useless) MCLISREFERENCED and use MEXT_IS_REF
instead.
Also, fix a small set of "avail." If we're setting `avail,' we shouldn't
be re-checking whether m_flags is M_EXT, because we know that it is, as if
it wasn't, we would have already returned several lines above.

Reviewed by: jlemon
2000-11-11 23:05:59 +00:00
Ruslan Ermilov
203de3b494 Fixed the security breach I introduced in rev 1.145.
Disallow getsockopt(IP_FW_ADD) if securelevel >= 3.

PR:		22600
2000-11-07 09:20:32 +00:00
Jonathan Lemon
8735719e43 tp->snd_recover is part of the New Reno recovery algorithm, and should
only be checked if the system is currently performing New Reno style
fast recovery.  However, this value was being checked regardless of the
NR state, with the end result being that the congestion window was never
opened.

Change the logic to check t_dupack instead; the only code path that
allows it to be nonzero at this point is NewReno, so if it is nonzero,
we are in fast recovery mode and should not touch the congestion window.

Tested by:	phk
2000-11-04 15:59:39 +00:00
Ruslan Ermilov
1d02752206 Fixed the bug I have introduced in icmp_error() in revision 1.44.
The amount of data we copy from the original IP datagram into the
ICMP message was computed incorrectly for IP packets with payload
less than 8 bytes.
2000-11-02 09:46:23 +00:00
Ruslan Ermilov
506f494939 Wrong checksum may have been computed for certain UDP packets.
Reviewed by:	jlemon
2000-11-01 16:56:33 +00:00
Ruslan Ermilov
60123168be Wrong checksum used for certain reassembled IP packets before diverting. 2000-11-01 11:21:45 +00:00
Josef Karthauser
ffa37b3f9b It's no longer true that "nobody uses ia beyond here"; it's now
used to keep address based if_data statistics in.

Submitted by:	ru
2000-11-01 01:59:28 +00:00
Ruslan Ermilov
48cb400fb1 Do not waste a time saving a copy of IP header if we are certainly
not going to send an ICMP error message (net.inet.udp.blackhole=1).
2000-10-31 09:13:02 +00:00
Ruslan Ermilov
642cd09fb3 Added boolean argument to link searching functions, indicating
whether they should create a link if lookup has failed or not.
2000-10-30 17:24:12 +00:00
Ruslan Ermilov
03453c5e87 A significant rewrite of PPTP aliasing code.
PPTP links are no longer dropped by simple (and inappropriate in this
case) "inactivity timeout" procedure, only when requested through the
control connection.

It is now possible to have multiple PPTP servers running behind NAT.
Just redirect the incoming TCP traffic to port 1723, everything else
is done transparently.

Problems were reported and the fix was tested by:
		Michael Adler <Michael.Adler@compaq.com>,
		David Andersen <dga@lcs.mit.edu>
2000-10-30 12:39:41 +00:00
Poul-Henning Kamp
cf9fa8e725 Move suser() and suser_xxx() prototypes and a related #define from
<sys/proc.h> to <sys/systm.h>.

Correctly document the #includes needed in the manpage.

Add one now needed #include of <sys/systm.h>.
Remove the consequent 48 unused #includes of <sys/proc.h>.
2000-10-29 16:06:56 +00:00
Poul-Henning Kamp
53ce36d17a Remove unneeded #include <sys/proc.h> lines. 2000-10-29 13:57:19 +00:00
Darren Reed
0c72e2855d Fix conflicts creted by import. 2000-10-29 07:53:05 +00:00
Josef Karthauser
fe93767490 Count per-address statistics for IP fragments.
Requested by:	ru
Obtained from:	BSD/OS
2000-10-29 01:05:09 +00:00
David E. O'Brien
d0eaa94443 Include sys/param.h for `__FreeBSD_version' rather than the non-existent
osreldate.h.

Submitted by:	dougb
2000-10-27 12:53:31 +00:00
Poul-Henning Kamp
46aa3347cb Convert all users of fldoff() to offsetof(). fldoff() is bad
because it only takes a struct tag which makes it impossible to
use unions, typedefs etc.

Define __offsetof() in <machine/ansi.h>

Define offsetof() in terms of __offsetof() in <stddef.h> and <sys/types.h>

Remove myriad of local offsetof() definitions.

Remove includes of <stddef.h> in kernel code.

NB: Kernelcode should *never* include from /usr/include !

Make <sys/queue.h> include <machine/ansi.h> to avoid polluting the API.

Deprecate <struct.h> with a warning.  The warning turns into an error on
01-12-2000 and the file gets removed entirely on 01-01-2001.

Paritials reviews by:   various.
Significant brucifications by:  bde
2000-10-27 11:45:49 +00:00
Ruslan Ermilov
3cebc3e4de Fetch the protocol header (TCP, UDP, ICMP) only from the first fragment
of IP datagram.  This fixes the problem when firewall denied fragmented
packets whose last fragment was less than minimum protocol header size.

Found by:	Harti Brandt <brandt@fokus.gmd.de>
PR:		kern/22309
2000-10-27 07:19:17 +00:00
Ruslan Ermilov
b6ea1aa58d RFC 791 says that IP_RF bit should always be zero, but nothing
in the code enforces this.  So, do not check for and attempt a
false reassembly if only IP_RF is set.

Also, removed the dead code, since we no longer use dtom() on
return from ip_reass().
2000-10-26 13:14:48 +00:00
Darren Reed
60b88d9681 fix conflicts from rcsids 2000-10-26 12:33:42 +00:00
Ruslan Ermilov
7e2df4520d Wrong header length used for certain reassembled IP packets.
This was first fixed in rev 1.82 but then broken in rev 1.125.

PR:		6177
2000-10-26 12:18:13 +00:00
Luigi Rizzo
1f8ed85239 Close PR22152 and PR19511 -- correct the naming of a variable 2000-10-26 00:16:12 +00:00
Ruslan Ermilov
8829f4ee0b We now keep the ip_id field in network byte order all the
time, so there is no need to make the distinction between
ip_output() and ip_input() cases.

Reviewed by:	silence on freebsd-net
2000-10-25 10:56:41 +00:00
Jun-ichiro itojun Hagino
d31944e6ec be careful on mbuf overrun on ctlinput.
short icmp6 packet may be able to panic the kernel.
sync with kame.
2000-10-23 07:11:01 +00:00
Ruslan Ermilov
cc22c7a746 Save a few CPU cycles in IP fragmentation code. 2000-10-20 14:10:37 +00:00
Josef Karthauser
5da9f8fa97 Augment the 'ifaddr' structure with a 'struct if_data' to keep
statistics on a per network address basis.

Teach the IPv4 and IPv6 input/output routines to log packets/bytes
against the network address connected to the flow.

Teach netstat to display the per-address stats for IP protocols
when 'netstat -i' is evoked, instead of displaying the per-interface
stats.
2000-10-19 23:15:54 +00:00
Ruslan Ermilov
f136389613 A failure to allocate memory for auxiliary TCP data is now fatal.
This fixes a null pointer dereference problem that is unlikely to
happen in normal circumstances.
2000-10-19 10:44:44 +00:00
Ruslan Ermilov
0531ca1fd8 If we do not byte-swap the ip_id in the first place, don't do it in
the second.  NetBSD (from where I've taken this originally) needs
to fix this too.
2000-10-18 11:36:09 +00:00
Ruslan Ermilov
487bdb3855 Backout my wrong attempt to fix the compilation warning in ip_input.c
and instead reapply the revision 1.49 of mbuf.h, i.e.

Fixed regression of the type of the `header' member of struct pkthdr from
`void *' to caddr_t in rev.1.51.  This mainly caused an annoying warning
for compiling ip_input.c.

Requested by:	bde
2000-10-12 16:33:41 +00:00
Ruslan Ermilov
e6c89c1bd2 Fix the compilation warning. 2000-10-12 10:42:32 +00:00
Ruslan Ermilov
bc95ac80b2 Allow for IP_FW_ADD to be used in getsockopt(2) incarnation as
well, in which case return the rule number back into userland.

PR:		bin/18351
Reviewed by:	archie, luigi
2000-10-12 07:59:14 +00:00
Alfred Perlstein
abbfaeb87b Remove headers not needed.
Pointed out by: phk
2000-10-07 23:15:17 +00:00
Ruslan Ermilov
c0752e1657 As we now may check the TCP header window field, make sure we pullup
enough into the mbuf data area.  Solve this problem once and for all
by pulling up the entire (standard) header for TCP and UDP, and four
bytes of header for ICMP (enough for type, code and cksum fields).
2000-10-06 12:12:09 +00:00
Ruslan Ermilov
60f9125458 Added the missing ntohs() conversion when matching IP packet with
the IP_FW_IF_IPID rule.  (We have recently decided to keep the
ip_id field in network byte order inside the kernel, see revision
1.140 of src/sys/netinet/ip_input.c).

I did not like to have the conversion happen in userland, and I
think that the similar conversions for fw_tcp(seq|ack|win) should
be moved out of userland (src/sbin/ipfw/ipfw.c) into the kernel.
2000-10-03 12:18:11 +00:00
Jonathan Lemon
d17e895b5f If TCPDEBUG is defined, we could dereference a tp which was freed. 2000-10-02 15:00:13 +00:00
Ruslan Ermilov
c7f95f5372 A bit of indentation reformatting. 2000-10-02 13:13:24 +00:00
Bill Fumerola
9ad30943aa Add new fields for more granularity:
IP: version, tos, ttl, len, id
	TCP: seq#, ack#, window size

Reviewed by:  silence on freebsd-{net,ipfw}
2000-10-02 03:33:31 +00:00
Bill Fumerola
98b829924f Add new fields for more granularity:
IP: version, tos, ttl, len, id
	TCP: seq#, ack#, window size

Reviewed by:	silence on freebsd-{net,ipfw}
2000-10-02 03:03:31 +00:00
Ruslan Ermilov
3ea420e391 Document that net.inet.ip.fw.one_pass only affects dummynet(4).
Noticed by:	Peter Jeremy<peter.jeremy@alcatel.com.au>
2000-09-29 08:39:06 +00:00
Kris Kennaway
be515d91ad Use stronger random number generation for TCP_ISSINCR and tcp_iss.
Reviewed by:	peter, jlemon
2000-09-29 01:37:19 +00:00
Bosko Milekic
9d8c8a672c Finally make do_tcpdrain sysctl live under correct parent, _net_inet_tcp,
as opposed to _debug. Like before, default value remains 1.
2000-09-25 23:40:22 +00:00
Ruslan Ermilov
0122d6f195 Fixed the calculations with UDP header length field.
The field is in network byte order and contains the
size of the header.

Reviewed by:	brian
2000-09-21 06:52:59 +00:00
Kenjiro Cho
e645a1ca27 change the evaluation order of the rsvp socket in rsvp_input()
in favor of the new-style per-vif socket.

this does not affect the behavior of the ISI rsvpd but allows
another rsvp implementation (e.g., KOM rsvp) to take advantage
of the new style for particular sockets while using the old style
for others.

in the future, rsvp supporn should be replaced by more generic
router-alert support.

PR:		kern/20984
Submitted by:	Martin Karsten <Martin.Karsten@KOM.tu-darmstadt.de>
Reviewed by:	kjc
2000-09-17 13:50:12 +00:00
Poul-Henning Kamp
e4bdf25dc8 Properly jail UDP sockets. This is quite a bit more tricky than TCP.
This fixes a !root userland panic, and some cases where the wrong
interface was chosen for a jailed UDP socket.

PR:		20167, 19839, 20946
2000-09-17 13:35:42 +00:00
Poul-Henning Kamp
24b261c720 Reverse last commit, a better fix has been found. 2000-09-17 13:34:18 +00:00
Poul-Henning Kamp
e2cabba9d7 Make sure UDP sockets are explicitly bind(2)'ed [sic] before we connect(2)
them.

PR:     20946
Isolated by:    Aaron Gifford <agifford@infowest.com>
2000-09-17 11:34:33 +00:00
Jonathan Lemon
af1270f87f It is possible for a TCP callout to be removed from the timing wheel,
but have a network interrupt arrive and deactivate the timeout before
the callout routine runs.  Check for this case in the callout routine;
it should only run if the callout is active and not on the wheel.
2000-09-16 00:53:53 +00:00
Ruslan Ermilov
4996f02545 Add -Wmissing-prototypes. 2000-09-15 15:37:16 +00:00
Jonathan Lemon
a8db1d93f1 m_cat() can free its second argument, so collect the checksum information
from the fragment before calling m_cat().
2000-09-14 21:06:48 +00:00
Ruslan Ermilov
e30177e024 Follow BSD/OS and NetBSD, keep the ip_id field in network order all the time.
Requested by:	wollman
2000-09-14 14:42:04 +00:00
Bill Fumerola
95d0db2b40 Fix screwup in previous commit. 2000-09-12 02:38:05 +00:00
Archie Cobbs
6612c70eb1 Don't do snd_nxt rollback optimization (rev. 1.46) for SYN packets.
It causes a panic when/if snd_una is incremented elsewhere (this
is a conservative change, because originally no rollback occurred
for any packets at all).

Submitted by:	Vivek Sadananda Pai <vivek@imimic.com>
2000-09-11 19:11:33 +00:00
Alfred Perlstein
b47ce7f5cb Forget to include sysctl.h
Submitted by: des
2000-09-09 18:47:46 +00:00
Alfred Perlstein
34b94e8b82 Accept filter maintainance
Update copyrights.

Introduce a new sysctl node:
  net.inet.accf

Although acceptfilters need refcounting to be properly (safely) unloaded
as a temporary hack allow them to be unloaded if the sysctl
net.inet.accf.unloadable is set, this is really for developers who want
to work on thier own filters.

A near complete re-write of the accf_http filter:
  1) Parse check if the request is HTTP/1.0 or HTTP/1.1 if not dump
     to the application.
     Because of the performance implications of this there is a sysctl
     'net.inet.accf.http.parsehttpversion' that when set to non-zero
     parses the HTTP version.
     The default is to parse the version.
  2) Check if a socket has filled and dump to the listener
  3) optimize the way that mbuf boundries are handled using some voodoo
  4) even though you'd expect accept filters to only be used on TCP
     connections that don't use m_nextpkt I've fixed the accept filter
     for socket connections that use this.

This rewrite of accf_http should allow someone to use them and maintain
full HTTP compliance as long as net.inet.accf.http.parsehttpversion is
set.
2000-09-06 18:49:13 +00:00
Bill Fumerola
4897e8320e 1. IP_FW_F_{UID,GID} are _not_ commands, they are extras. The sanity checking
for them does not belong in the IP_FW_F_COMMAND switch, that mask doesn't even
apply to them(!).

2. You cannot add a uid/gid rule to something that isn't TCP, UDP, or IP.

XXX - this should be handled in ipfw(8) as well (for more diagnostic output),
but this at least protects bogus rules from being added.

Pointy hat:	green
2000-09-06 03:10:42 +00:00
Ruslan Ermilov
76e6ebd64e Match IPPROTO_ICMP with IP protocol field of the original IP
datagram embedded into ICMP error message, not with protocol
field of ICMP message itself (which is always IPPROTO_ICMP).

Pointed by:	Erik Salander <erik@whistle.com>
2000-09-01 16:38:53 +00:00
Ruslan Ermilov
04287599db Fixed broken ICMP error generation, unified conversion of IP header
fields between host and network byte order.  The details:

o icmp_error() now does not add IP header length.  This fixes the problem
  when icmp_error() is called from ip_forward().  In this case the ip_len
  of the original IP datagram returned with ICMP error was wrong.

o icmp_error() expects all three fields, ip_len, ip_id and ip_off in host
  byte order, so DTRT and convert these fields back to network byte order
  before sending a message.  This fixes the problem described in PR 16240
  and PR 20877 (ip_id field was returned in host byte order).

o ip_ttl decrement operation in ip_forward() was moved down to make sure
  that it does not corrupt the copy of original IP datagram passed later
  to icmp_error().

o A copy of original IP datagram in ip_forward() was made a read-write,
  independent copy.  This fixes the problem I first reported to Garrett
  Wollman and Bill Fenner and later put in audit trail of PR 16240:
  ip_output() (not always) converts fields of original datagram to network
  byte order, but because copy (mcopy) and its original (m) most likely
  share the same mbuf cluster, ip_output()'s manipulations on original
  also corrupted the copy.

o ip_output() now expects all three fields, ip_len, ip_off and (what is
  significant) ip_id in host byte order.  It was a headache for years that
  ip_id was handled differently.  The only compatibility issue here is the
  raw IP socket interface with IP_HDRINCL socket option set and a non-zero
  ip_id field, but ip.4 manual page was unclear on whether in this case
  ip_id field should be in host or network byte order.
2000-09-01 12:33:03 +00:00
Ruslan Ermilov
816fa7febc Changed the way we handle outgoing ICMP error messages -- do
not alias `ip_src' unless it comes from the host an original
datagram that triggered this error message was destined for.

PR:		20712
Reviewed by:	brian, Charles Mott <cmott@scientech.com>
2000-09-01 09:32:44 +00:00
Ruslan Ermilov
0ac308534e Grab ADJUST_CHECKSUM() macro from alias_local.h. 2000-08-31 12:54:55 +00:00
Ruslan Ermilov
305d10699e Create aliasing links for incoming ICMP echo/timestamp requests.
This makes outgoing ICMP echo/timestamp replies to be de-aliased
with the right source IP, not exactly the primary aliasing IP.
2000-08-31 12:47:57 +00:00
Ruslan Ermilov
3e065e76ac Fixed the bug that div_bind() always returned zero
even if there was an error (broken in rev 1.9).
2000-08-30 14:43:02 +00:00
Ruslan Ermilov
2160daba07 Backout the hack in rev 1.71, I am working on a better patch
that should cover almost all inconsistencies in ICMP error
generation.
2000-08-30 08:28:06 +00:00
Andrey A. Chernov
d9e630b592 strtok -> strsep (no strtok allowed in libraries)
add unsigned char cast to ctype macro
2000-08-29 21:34:55 +00:00
Darren Reed
473998719e Apply appropriate patch.
PR:		20877
Submitted by:	Frank Volf (volf@oasis.IAEhv.nl)
2000-08-29 10:41:55 +00:00
Archie Cobbs
11840b0692 Remove obsolete comment. 2000-08-22 00:32:52 +00:00
Bruce Evans
4153a3a323 Fixed a missing splx() in if_addmulti(). Was broken in rev.1.28. 2000-08-19 22:10:10 +00:00
Jun-ichiro itojun Hagino
d1d1144bd7 repair endianness issue in IN_MULTICAST().
again, *BSD difference...

From: Nick Sayer <nsayer@quack.kfu.com>
2000-08-15 07:34:08 +00:00
Ruslan Ermilov
a05f06d79f Fixed PunchFW code segmentation violation bug.
Reported by:	Christian Schade <chris@cube.sax.de>
2000-08-14 15:24:47 +00:00
Ruslan Ermilov
b834663f47 Use queue(3) LIST_* macros for doubly-linked lists. 2000-08-14 14:18:16 +00:00
Darren Reed
5e90b39cba resolve conflicts 2000-08-13 04:31:06 +00:00
Ruslan Ermilov
0eb10a0963 - Do not modify Peer's Call ID in outgoing Incoming-Call-Connected
PPTP control messages.

- Cosmetics: replace `GRE link' with `PPTP link'.

Reviewed by:	Erik Salander <erik@whistle.com>
2000-08-09 11:25:44 +00:00
Ruslan Ermilov
934a4fb381 Adjust TCP checksum rather than compute it afresh.
Submitted by:	Erik Salander <erik@whistle.com>
2000-08-07 09:51:04 +00:00