Commit Graph

1868 Commits

Author SHA1 Message Date
Robert Watson
6200a93f82 Rename NET_PICKUP_GIANT() to NET_LOCK_GIANT(), and NET_DROP_GIANT()
to NET_UNLOCK_GIANT().  While they are used in similar ways, the
semantics are quite different -- NET_LOCK_GIANT() and NET_UNLOCK_GIANT()
directly wrap mutex lock and unlock operations, whereas drop/pickup
special case the handling of Giant recursion.  Add a comment saying
as much.

Add NET_ASSERT_GIANT(), which conditionally asserts Giant based
on the value of debug_mpsafenet.
2004-03-01 22:37:01 +00:00
Hajimu UMEMOTO
04d3a45241 fix -O0 compilation without INET6.
Pointed out by:	ru
2004-03-01 19:10:31 +00:00
Robert Watson
768bbd68cc Remove unneeded {} originally used to hold local variables for dummynet
in a code block, as the variable is now gone.

Submitted by:	sam
2004-02-28 19:50:43 +00:00
Robert Watson
a7b6a14aee Remove now unneeded arguments to tcp_twrespond() -- so and msrc. These
were needed by the MAC Framework until inpcbs gained labels.

Submitted by:	sam
2004-02-28 15:12:20 +00:00
Max Laier
25a4adcec4 Bring eventhandler callbacks for pf.
This enables pf to track dynamic address changes on interfaces (dailup) with
the "on (<ifname>)"-syntax. This also brings hooks in anticipation of
tracking cloned interfaces, which will be in future versions of pf.

Approved by: bms(mentor)
2004-02-26 04:27:55 +00:00
Max Laier
cc5934f5af Tweak existing header and other build infrastructure to be able to build
pf/pflog/pfsync as modules. Do not list them in NOTES or modules/Makefile
(i.e. do not connect it to any (automatic) builds - yet).

Approved by: bms(mentor)
2004-02-26 03:53:54 +00:00
Don Lewis
47934cef8f Split the mlock() kernel code into two parts, mlock(), which unpacks
the syscall arguments and does the suser() permission check, and
kern_mlock(), which does the resource limit checking and calls
vm_map_wire().  Split munlock() in a similar way.

Enable the RLIMIT_MEMLOCK checking code in kern_mlock().

Replace calls to vslock() and vsunlock() in the sysctl code with
calls to kern_mlock() and kern_munlock() so that the sysctl code
will obey the wired memory limits.

Nuke the vslock() and vsunlock() implementations, which are no
longer used.

Add a member to struct sysctl_req to track the amount of memory
that is wired to handle the request.

Modify sysctl_wire_old_buffer() to return an error if its call to
kern_mlock() fails.  Only wire the minimum of the length specified
in the sysctl request and the length specified in its argument list.
It is recommended that sysctl handlers that use sysctl_wire_old_buffer()
should specify reasonable estimates for the amount of data they
want to return so that only the minimum amount of memory is wired
no matter what length has been specified by the request.

Modify the callers of sysctl_wire_old_buffer() to look for the
error return.

Modify sysctl_old_user to obey the wired buffer length and clean up
its implementation.

Reviewed by:	bms
2004-02-26 00:27:04 +00:00
Max Laier
ac9d7e2618 Re-remove MT_TAGs. The problems with dummynet have been fixed now.
Tested by: -current, bms(mentor), me
Approved by: bms(mentor), sam
2004-02-25 19:55:29 +00:00
Bruce Evans
0613995bd0 Fixed namespace pollution in rev.1.74. Implementation of the syncache
increased <netinet/tcp_var>'s already large set of prerequisites, and
this was handled badly.  Just don't declare the complete syncache struct
unless <netinet/pcb.h> is included before <netinet/tcp_var.h>.

Approved by:	jlemon (years ago, for a more invasive fix)
2004-02-25 13:03:01 +00:00
Bruce Evans
a545b1dc4d Don't use the negatively-opaque type uma_zone_t or be chummy with
<vm/uma.h>'s idempotency indentifier or its misspelling.
2004-02-25 11:53:19 +00:00
Jeffrey Hsu
89c02376fc Relax a KASSERT condition to allow for a valid corner case where
the FIN on the last segment consumes an extra sequence number.

Spurious panic reported by Mike Silbersack <silby@silby.com>.
2004-02-25 08:53:17 +00:00
Andre Oppermann
12e2e97051 Convert the tcp segment reassembly queue to UMA and limit the maximum
amount of segments it will hold.

The following tuneables and sysctls control the behaviour of the tcp
segment reassembly queue:

 net.inet.tcp.reass.maxsegments (loader tuneable)
  specifies the maximum number of segments all tcp reassemly queues can
  hold (defaults to 1/16 of nmbclusters).

 net.inet.tcp.reass.maxqlen
  specifies the maximum number of segments any individual tcp session queue
  can hold (defaults to 48).

 net.inet.tcp.reass.cursegments (readonly)
  counts the number of segments currently in all reassembly queues.

 net.inet.tcp.reass.overflows (readonly)
  counts how often either the global or local queue limit has been reached.

Tested by:	bms, silby
Reviewed by:	bms, silby
2004-02-24 15:27:41 +00:00
Pawel Jakub Dawidek
41fe0c8ad5 Fixed ucred structure leak.
Approved by:	scottl (mentor)
PR:		54163
MFC after:	3 days
2004-02-19 14:13:21 +00:00
Max Laier
36e8826ffb Backout MT_TAG removal (i.e. bring back MT_TAGs) for now, as dummynet is
not working properly with the patch in place.

Approved by: bms(mentor)
2004-02-18 00:04:52 +00:00
Hajimu UMEMOTO
da0f40995d IPSEC and FAST_IPSEC have the same internal API now;
so merge these (IPSEC has an extra ipsecstat)

Submitted by:	"Bjoern A. Zeeb" <bzeeb+freebsd@zabbadoz.net>
2004-02-17 14:02:37 +00:00
Bruce M Simpson
88f6b0435e Shorten the name of the socket option used to enable TCP-MD5 packet
treatment.

Submitted by:	Vincent Jardin
2004-02-16 22:21:16 +00:00
Hajimu UMEMOTO
70dbc6cbfc don't update outgoing ifp, if ipsec tunnel mode encapsulation
was not made.

Obtained from:	KAME
2004-02-16 17:05:06 +00:00
Bruce M Simpson
91179f796d Spell types consistently throughout this file. Do not use the __packed attribute, as we are often #include'd from userland without <sys/cdefs.h> in front of us, and it is not strictly necessary.
Noticed by:	Sascha Blank
2004-02-16 14:40:56 +00:00
Bruce M Simpson
32ff046639 Final brucification pass. Spell types consistently (u_int). Remove bogus
casts. Remove unnecessary parenthesis.

Submitted by:	bde
2004-02-14 21:49:48 +00:00
Max Laier
97075d0c0a Do not expose ip_dn_find_rule inline function to userland and unbreak world.
----------------------------------------------------------------------
2004-02-13 22:26:36 +00:00
Max Laier
189a0ba4e7 Do not check receive interface when pfil(9) hook changed address.
Approved by: bms(mentor)
2004-02-13 19:20:43 +00:00
Max Laier
1094bdca51 This set of changes eliminates the use of MT_TAG "pseudo mbufs", replacing
them mostly with packet tags (one case is handled by using an mbuf flag
since the linkage between "caller" and "callee" is direct and there's no
need to incur the overhead of a packet tag).

This is (mostly) work from: sam

Silence from: -arch
Approved by: bms(mentor), sam, rwatson
2004-02-13 19:14:16 +00:00
Bruce M Simpson
265ed01285 Brucification.
Submitted by:	bde
2004-02-13 18:21:45 +00:00
Hajimu UMEMOTO
efddf5c64d supported IPV6_RECVPATHMTU socket option.
Obtained from:	KAME
2004-02-13 14:50:01 +00:00
Bruce M Simpson
b30190b542 Update the prototype for tcpsignature_apply() to reflect the spelling of
the types used by m_apply()'s callback function, f, as documented in mbuf(9).

Noticed by:	njl
2004-02-12 20:16:09 +00:00
Bruce M Simpson
bca0e5bfc3 style(9) pass; whitespace and comments.
Submitted by:	njl
2004-02-12 20:12:48 +00:00
Bruce M Simpson
a0194ef1ea Remove an unnecessary initialization that crept in from the code which
verifies TCP-MD5 digests.

Noticed by:	njl
2004-02-12 20:08:28 +00:00
Bruce M Simpson
45d370ee8b Fix a typo; left out preprocessor conditional for sigoff variable, which
is only used by TCP_SIGNATURE code.

Noticed by:	Roop Nanuwa
2004-02-11 09:46:54 +00:00
Bruce M Simpson
1cfd4b5326 Initial import of RFC 2385 (TCP-MD5) digest support.
This is the first of two commits; bringing in the kernel support first.
This can be enabled by compiling a kernel with options TCP_SIGNATURE
and FAST_IPSEC.

For the uninitiated, this is a TCP option which provides for a means of
authenticating TCP sessions which came into being before IPSEC. It is
still relevant today, however, as it is used by many commercial router
vendors, particularly with BGP, and as such has become a requirement for
interconnect at many major Internet points of presence.

Several parts of the TCP and IP headers, including the segment payload,
are digested with MD5, including a shared secret. The PF_KEY interface
is used to manage the secrets using security associations in the SADB.

There is a limitation here in that as there is no way to map a TCP flow
per-port back to an SPI without polluting tcpcb or using the SPD; the
code to do the latter is unstable at this time. Therefore this code only
supports per-host keying granularity.

Whilst FAST_IPSEC is mutually exclusive with KAME IPSEC (and thus IPv6),
TCP_SIGNATURE applies only to IPv4. For the vast majority of prospective
users of this feature, this will not pose any problem.

This implementation is output-only; that is, the option is honoured when
responding to a host initiating a TCP session, but no effort is made
[yet] to authenticate inbound traffic. This is, however, sufficient to
interwork with Cisco equipment.

Tested with a Cisco 2501 running IOS 12.0(27), and Quagga 0.96.4 with
local patches. Patches for tcpdump to validate TCP-MD5 sessions are also
available from me upon request.

Sponsored by:	sentex.net
2004-02-11 04:26:04 +00:00
Hajimu UMEMOTO
f073c60f73 pass pcb rather than so. it is expected that per socket policy
works again.
2004-02-03 18:20:55 +00:00
Andre Oppermann
b74d89bbbb Add sysctl net.inet.icmp.reply_src to specify the interface name
used for the ICMP reply source in reponse to packets which are not
directly addressed to us.  By default continue with with normal
source selection.

Reviewed by:	bms
2004-02-02 22:53:16 +00:00
Andre Oppermann
1488eac8ec More verbose description of the source ip address selection for ICMP replies.
Reviewed by:	bms
2004-02-02 22:17:09 +00:00
Poul-Henning Kamp
be8a62e821 Introduce the SO_BINTIME option which takes a high-resolution timestamp
at packet arrival.

For benchmarking purposes SO_BINTIME is preferable to SO_TIMEVAL
since it has higher resolution and lower overhead.  Simultaneous
use of the two options is possible and they will return consistent
timestamps.

This introduces an extra test and a function call for SO_TIMEVAL, but I have
not been able to measure that.
2004-01-31 10:40:25 +00:00
Maxim Sobolev
4c83789253 Remove NetBSD'isms (add FreeBSD'isms?), which makes gre(4) working again. 2004-01-30 09:03:01 +00:00
Ruslan Ermilov
0ca2861fc9 Correct the descriptions of the net.inet.{udp,raw}.recvspace sysctls. 2004-01-27 22:17:39 +00:00
Maxim Sobolev
7735aeb9bb Add support for WCCPv2. It should be enablem manually using link2
ifconfig(8) flag since header for version 2 is the same but IP payload
is prepended with additional 4-bytes field.

Inspired by:	Roman Synyuk <roman@univ.kiev.ua>
MFC after:	2 weeks
2004-01-26 12:33:56 +00:00
Maxim Sobolev
6e628b8187 (whilespace-only)
Kill trailing spaces.
2004-01-26 12:21:59 +00:00
Andre Oppermann
241f1e33b1 Remove leftover FREE() from changes in rev 1.50.
Noticed by:	Jun Kuriyama <kuriyama@imgsrc.co.jp>
2004-01-23 01:39:12 +00:00
Andre Oppermann
201d185b69 Split the overloaded variable 'win' into two for their specific purposes:
recwin and sendwin.  This removes a big source of confusion and makes
following the code much easier.

Reviewed by:	sam (mentor)
Obtained from:	DragonFlyBSD rev 1.6 (hsu)
2004-01-22 23:22:14 +00:00
Andre Oppermann
1ddba8d63e Move the reduction by one of the syncache limit after the zone has been
allocated.

Reviewed by:    sam (mentor)
Obtained from:  DragonFlyBSD rev 1.6 (hsu)
2004-01-22 23:14:48 +00:00
Andre Oppermann
73080de2be Remove an unused variable and put the sockaddr_in6 onto the stack instead
of malloc'ing it.

Reviewed by:	sam (mentor)
Obtained from:	DragonFlyBSD rev 1.6 (hsu)
2004-01-22 23:10:11 +00:00
Jeffrey Hsu
61a36e3dfc Merge from DragonFlyBSD rev 1.10:
date: 2003/09/02 10:04:47;  author: hsu;  state: Exp;  lines: +5 -6
Account for when Limited Transmit is not congestion window limited.

Obtained from:	DragonFlyBSD
2004-01-20 21:40:25 +00:00
Poul-Henning Kamp
5e289f9eb6 Mostly mechanical rework of libalias:
Makes it possible to have multiple packet aliasing instances in a
single process by moving all static and global variables into an
instance structure called "struct libalias".

Redefine a new API based on s/PacketAlias/LibAlias/g

Add new "instance" argument to all functions in the new API.

Implement old API in terms of the new API.
2004-01-17 10:52:21 +00:00
Hajimu UMEMOTO
548c676b32 do not deref freed pointer
Submitted by:	"Bjoern A. Zeeb" <bzeeb+freebsd@zabbadoz.net>
Reviewed by:	itojun
2004-01-13 09:51:47 +00:00
Andre Oppermann
bed824fa90 Disable the minmssoverload connection drop by default until the detection
logic is refined.
2004-01-12 15:46:04 +00:00
Don Lewis
e29ef13f6c Check that sa_len is the appropriate value in tcp_usr_bind(),
tcp6_usr_bind(), tcp_usr_connect(), and tcp6_usr_connect() before checking
to see whether the address is multicast so that the proper errno value
will be returned if sa_len is incorrect.  The checks are identical to the
ones in in_pcbbind_setup(), in6_pcbbind(), and in6_pcbladdr(), which are
called after the multicast address check passes.

MFC after:	30 days
2004-01-10 08:53:00 +00:00
Andre Oppermann
1ddc17c1d5 Reduce TCP_MINMSS default to 216. The AX.25 protocol (packet radio)
is frequently used with an MTU of 256 because of slow speeds and a
high packet loss rate.
2004-01-09 14:14:10 +00:00
Andre Oppermann
53369ac9bb Limiters and sanity checks for TCP MSS (maximum segement size)
resource exhaustion attacks.

For network link optimization TCP can adjust its MSS and thus
packet size according to the observed path MTU.  This is done
dynamically based on feedback from the remote host and network
components along the packet path.  This information can be
abused to pretend an extremely low path MTU.

The resource exhaustion works in two ways:

 o during tcp connection setup the advertized local MSS is
   exchanged between the endpoints.  The remote endpoint can
   set this arbitrarily low (except for a minimum MTU of 64
   octets enforced in the BSD code).  When the local host is
   sending data it is forced to send many small IP packets
   instead of a large one.

   For example instead of the normal TCP payload size of 1448
   it forces TCP payload size of 12 (MTU 64) and thus we have
   a 120 times increase in workload and packets. On fast links
   this quickly saturates the local CPU and may also hit pps
   processing limites of network components along the path.

   This type of attack is particularly effective for servers
   where the attacker can download large files (WWW and FTP).

   We mitigate it by enforcing a minimum MTU settable by sysctl
   net.inet.tcp.minmss defaulting to 256 octets.

 o the local host is reveiving data on a TCP connection from
   the remote host.  The local host has no control over the
   packet size the remote host is sending.  The remote host
   may chose to do what is described in the first attack and
   send the data in packets with an TCP payload of at least
   one byte.  For each packet the tcp_input() function will
   be entered, the packet is processed and a sowakeup() is
   signalled to the connected process.

   For example an attack with 2 Mbit/s gives 4716 packets per
   second and the same amount of sowakeup()s to the process
   (and context switches).

   This type of attack is particularly effective for servers
   where the attacker can upload large amounts of data.
   Normally this is the case with WWW server where large POSTs
   can be made.

   We mitigate this by calculating the average MSS payload per
   second.  If it goes below 'net.inet.tcp.minmss' and the pps
   rate is above 'net.inet.tcp.minmssoverload' defaulting to
   1000 this particular TCP connection is resetted and dropped.

MITRE CVE:	CAN-2004-0002
Reviewed by:	sam (mentor)
MFC after:	1 day
2004-01-08 17:40:07 +00:00
Andre Oppermann
bf87c82ebb If path mtu discovery is enabled set the DF bit in all cases we
send packets on a tcp connection.

PR:		kern/60889
Tested by:	Richard Wendland <richard@wendland.org.uk>
Approved by:	re (scottl)
2004-01-08 11:17:11 +00:00
Andre Oppermann
e0f630ea7a Do not set the ip_id to zero when DF is set on packet and
restore the general pre-randomid behaviour.

Setting the ip_id to zero causes several problems with
packet reassembly when a device along the path removes
the DF bit for some reason.

Other BSD and Linux have found and fixed the same issues.

PR:		kern/60889
Tested by:	Richard Wendland <richard@wendland.org.uk>
Approved by:	re (scottl)
2004-01-08 11:13:40 +00:00