Commit Graph

4155 Commits

Author SHA1 Message Date
Qing Li
1184509858 When an interface address route is removed from the system, another
route with the same prefix is searched for as a replacement. The
current code did not bypass routes that have non-operational
interfaces. This patch fixes that bug and will find a replacement
route with an active interface.

PR:		kern/159603
Submitted by:	pluknet, ambrisko at ambrisko dot com
Reviewed by:	discussed on net@
Approved by:	re (bz)
MFC after:	3 days
2011-08-28 00:14:40 +00:00
Bjoern A. Zeeb
b233773bb9 Increase the defaults for the maximum socket buffer limit,
and the maximum TCP send and receive buffer limits from 256kB
to 2MB.

For sb_max_adj we need to add the cast as already used in the sysctl
handler to not overflow the type doing the maths.

Note that this is just the defaults.  They will allow more memory
to be consumed per socket/connection if needed but not change the
default "idle" memory consumption.   All values are still tunable
by sysctls.

Suggested by:	gnn
Discussed on:	arch (Mar and Aug 2011)
MFC after:	3 weeks
Approved by:	re (kib)
2011-08-25 09:20:13 +00:00
Bjoern A. Zeeb
6f69742441 Fix compilation in case of defined(INET) && defined(IPFIREWALL_FORWARD)
but no INET6.

Reported by:	avg
Tested by:	avg
MFC after:	4 weeks
X-MFC with:	r225044
Approved by:	re (kib)
2011-08-20 18:45:38 +00:00
Bjoern A. Zeeb
8a006adb24 Add support for IPv6 to ipfw fwd:
Distinguish IPv4 and IPv6 addresses and optional port numbers in
user space to set the option for the correct protocol family.
Add support in the kernel for carrying the new IPv6 destination
address and port.
Add support to TCP and UDP for IPv6 and fix UDP IPv4 to not change
the address in the IP header.
Add support for IPv6 forwarding to a non-local destination.
Add a regession test uitilizing VIMAGE to check all 20 possible
combinations I could think of.

Obtained from:	David Dolson at Sandvine Incorporated
		(original version for ipfw fwd IPv6 support)
Sponsored by:	Sandvine Incorporated
PR:		bin/117214
MFC after:	4 weeks
Approved by:	re (kib)
2011-08-20 17:05:11 +00:00
Bjoern A. Zeeb
f76fdd221b Hide IPv6 next header parsing warnings under the verbose sysctl
so people can possibly disable it when their consoles are flooded,
or enabled it for debugging.

MFC after:	2 weeks
Approved by:	re (kib)
2011-08-20 14:20:36 +00:00
Bjoern A. Zeeb
0c4dbd5af7 After r225032 fix logging in a similar way masking the the IPv6
more fragments flag off so that offset == 0 checks work properly.

PR:		kern/145733
Submitted by:	Matthew Luckie (mjl luckie.org.nz)
MFC after:	2 weeks
X-MFC with:	r225032
Approved by:	re (kib)
2011-08-20 13:47:08 +00:00
Bjoern A. Zeeb
49239b28da If we detect an IPv6 fragment header and it is not the first fragment,
then terminate the loop as we will not find any further headers and
for short fragments this could otherwise lead to a pullup error
discarding the fragment.

PR:		kern/145733
Submitted by:	Matthew Luckie (mjl luckie.org.nz)
MFC after:	2 weeks
Approved by:	re (kib)
2011-08-20 13:46:19 +00:00
Bjoern A. Zeeb
720fee0674 ipfw internally checks for offset == 0 to determine whether the
packet is a/the first fragment or not.  For IPv6 we have added the
"more fragments" flag as well to be able to determine on whether
there will be more as we do not have the fragment header avaialble
for logging, while for IPv4 this information can be derived directly
from the IPv4 header.  This allowed fragmented packets to bypass
normal rules as proper masking was not done when checking offset.
Split variables to not need masking for IPv6 to avoid further errors.

PR:		kern/145733
Submitted by:	Matthew Luckie (mjl luckie.org.nz)
MFC after:	2 weeks
Approved by:	re (kib)
2011-08-20 13:17:47 +00:00
Bjoern A. Zeeb
391255b8a4 While not explicitly allowed by RFC 2460, in case there is no
translation technology involved (and that section is suggested to
be removed by Errata 2843), single packet fragments do not harm.

There is another errata under discussion to clarify and allow this.
Meanwhile add a sysctl to allow disabling this behaviour again.
We will treat single packet fragment (a fragment header added
when not needed) as if there was no fragment header.

PR:		kern/145733
Submitted by:	Matthew Luckie (mjl luckie.org.nz) (original version)
Tested by:	Matthew Luckie (mjl luckie.org.nz)
MFC after:	2 weeks
Approved by:	re (kib)
2011-08-20 12:40:17 +00:00
Michael Tuexen
3900c0936f Fix the handling of [gs]etsockopt() unconnected 1-to-1 style sockets.
While there:
* Fix a locking issue in setsockopt() of SCTP_CMT_ON_OFF.
* Fix a bug in setsockopt() of SCTP_DEFAULT_PRINFO, where the pr_value
  was ignored.

Approved by: re@
MFC after: 2 months.
2011-08-16 21:04:18 +00:00
Michael Tuexen
b10f2dc889 Add support for the spp_dscp field in the SCTP_PEER_ADDR_PARAMS
socket option. Backwards compatibility is provided by still
supporting the spp_ipv4_tos field.

Approved by: re@
MFC after: 2 months.
2011-08-14 20:55:32 +00:00
Kevin Lo
7236660627 If RTF_HOST flag is specified, then we are interested in destination
address.

PR:		kern/159600
Submitted by:	Svatopluk Kraus <onwahe at gmail dot com>
Approved by:	re (hrs)
2011-08-10 06:17:06 +00:00
Michael Tuexen
ca85e9482a The result of a joint work between rrs@ and myself at the IETF:
* Decouple the path supervision using a separate HB timer per path.
* Add support for potentially failed state.
* Bring back RTO.min to 1 second.
* Accept packets on IP-addresses already announced via an ASCONF
* While there: do some cleanups.

Approved by: re@
MFC after: 2 months.
2011-08-03 20:21:00 +00:00
Gleb Smirnoff
217e3abc03 Add missing break; in r223593.
Submitted by:	sem
Pointy hat to:	glebius
Approved by:	re (kib)
2011-08-01 13:41:38 +00:00
Bjoern A. Zeeb
d9a362862c Add spares to the network stack for FreeBSD-9:
- TCP keep* timers
- TCP UTO (adjust from what was there already)
- netmap
- route caching
- user cookie (temporary to allow for the real fix)

Slightly re-shuffle struct ifnet moving fields out of the middle
of spares and to better align.

Discussed with:	rwatson (slightly earlier version)
2011-07-17 21:15:20 +00:00
Bjoern A. Zeeb
dceced71fb Unbreak no-INET kernels after r223839 adding the needed #ifdef INET.
MFC after:	4 weeks
2011-07-14 13:44:48 +00:00
Michael Tuexen
1a3b5ce2b9 Don't check for SOCK_DGRAM anymore. Also remove multicast
related code which is not necessary anymore.
2011-07-12 20:14:03 +00:00
Michael Tuexen
78d9a31d3a The socket API only specifies SCTP for SOCK_SEQPACKET and
SOCK_STREAM, but not SOCK_DGRAM. So don't register it for
SOCK_DGRAM.
While there, fix some indentation.
2011-07-12 19:29:29 +00:00
Marko Zec
13e255fab7 Permit ARP to proceed for IPv4 host routes for which the gateway is the
same as the host address.  This already works fine for INET6 and ND6.

While here, remove two function pointers from struct lltable which are
only initialized but never used.

MFC after:	3 days
2011-07-08 09:38:33 +00:00
Andrey V. Elsukov
4659e09dcb Add again the checking for log_arp_permanent_modify that was by accident
removed in the r186119.

PR:		kern/154831
MFC after:	1 week
2011-07-07 11:59:51 +00:00
Andre Oppermann
1c6e7fa7f1 Remove the TCP_SORECEIVE_STREAM compile time option. The use of
soreceive_stream() for TCP still has to be enabled with the loader
tuneable net.inet.tcp.soreceive_stream.

Suggested by:	trociny and others
2011-07-07 10:37:14 +00:00
Colin Percival
472ea5befb Remove #ifdef notyet code dating back to 4.3BSD Net/2 (and possibly earlier).
I think the benefit of making the code cleaner and easier to understand
outweighs the humour of leaving this intact (or possibly changing it to
#ifdef not_yet_and_probably_never).

MFC after:	2 weeks
2011-07-05 18:49:55 +00:00
Colin Percival
ca7122622b Don't allow lro->len to exceed 65535, as this will result in overflow
when len is inserted back into the synthetic IP packet and cause a
multiple of 2^16 bytes of TCP "packet loss".

This improves Linux->FreeBSD netperf bandwidth by a factor of 300 in
testing on Amazon EC2.

Reviewed by:	jfv
MFC after:	2 weeks
2011-07-05 18:43:54 +00:00
Glen Barber
ff19f85d50 - General grammar and mdoc(7) fixes. [1] [2]
- While here, remove a paragraph about userspace operation that
  has been outdated for some time. [2]

PR:		158623
Submitted by:	Ben Kudak (kaduk % mit!edu) [1]
Reviewed by:	glebius [2]
MFC after:	1 week
2011-07-04 23:00:26 +00:00
Ermal Luçi
e6c90582c7 pf(4) tags now store the state key but tcp_respond tries to reuse a mbuf as an optimization.
This makes pf find the wrong state and cause errors reported with state mismatches.
Clear the cached state link on the pf(4) tag to avoid the state mismatches.

Approved by:	bz
2011-07-04 17:43:04 +00:00
Andrey V. Elsukov
2303570fe8 ARP code reuses mbuf from ARP request to make a reply, but it does not
reset rcvif to NULL. Since rcvif is not NULL, ipfw(4) supposes that ARP
replies were received on specified interface.
Reset rcvif to NULL for ARP replies to fix this issue.

PR:		kern/131817
Reviewed by:	glebius
MFC after:	1 month
2011-07-04 05:47:48 +00:00
Michael Tuexen
b845acda75 Add the missing sca_keylength field to the sctp_authkey structure,
which is used the the SCTP_AUTH_KEY socket option.

MFC after: 1 month.
2011-06-30 16:56:55 +00:00
Andrey V. Elsukov
9527ec6e52 Add new rule actions "call" and "return" to ipfw. They make
possible to organize subroutines with rules.

The "call" action saves the current rule number in the internal
stack and rules processing continues from the first rule with
specified number (similar to skipto action). If later a rule with
"return" action is encountered, the processing returns to the first
rule with number of "call" rule saved in the stack plus one or higher.

Submitted by:	Vadim Goncharov
Discussed by:	ipfw@, luigi@
2011-06-29 10:06:58 +00:00
Bjoern A. Zeeb
e0bfbfce79 Update packet filter (pf) code to OpenBSD 4.5.
You need to update userland (world and ports) tools
to be in sync with the kernel.

Submitted by:	mlaier
Submitted by:	eri
2011-06-28 11:57:25 +00:00
Michael Tuexen
3c4401ecab Add support for SCTP_PR_SCTP_NONE which I misded to add.
This constant is defined in the socket API ID.

MFC after: 2 months.
2011-06-27 22:03:33 +00:00
Gleb Smirnoff
812f1d32e7 Add possibility to pass IPv6 packets to a divert(4) socket.
Submitted by:	sem
2011-06-27 12:21:11 +00:00
Andrey V. Elsukov
0511675327 Export AddLink() function from libalias. It can be used when custom
alias address needs to be specified.
Add inbound handler to the alias_ftp module. It helps handle active
FTP transfer mode for the case with external clients and FTP server behind
NAT. Fix passive FTP transfer case for server behind NAT using redirect with
external IP address different from NAT ip address.

PR:		kern/157957
Submitted by:	Alexander V. Chernikov
2011-06-22 20:00:27 +00:00
Andrey V. Elsukov
62b6e03adf Document PKT_ALIAS_SKIP_GLOBAL option.
Submitted by:	Alexander V. Chernikov
2011-06-22 09:55:28 +00:00
Andrey V. Elsukov
bb3dd40974 Do not use SET_HOST_IPLEN() macro for IPv6 packets.
PR:		kern/157239
MFC after:	2 weeks
2011-06-21 06:06:47 +00:00
Bjoern A. Zeeb
75497cc5eb Fix a KASSERT from r212803 to check the correct length also in case of
IPsec being compiled in and used.  Improve reporting by adding the length
fields to the panic message, so that we would have some immediate debugging
hints.

Discussed with:	jhb
2011-06-20 07:07:18 +00:00
Bjoern A. Zeeb
f404863979 Remove a these days incorrect comment left from before new-arp.
MFC after:	1 week
2011-06-18 13:54:36 +00:00
Michael Tuexen
6037f89c81 Add SCTP_DEFAULT_PRINFO socket option.
Fix the SCTP_DEFAULT_SNDINFO socket option: Don't clear the
PR SCTP policy when setting sinfo_flags.

MFC after: 1 month.
2011-06-16 21:12:36 +00:00
Michael Tuexen
0b064106dd * Fix the handling of addresses in sctp_sendv().
* Add support for SCTP_SENDV_NOINFO.
* Improve the error handling of sctp_sendv() and sctp_recv().

MFC after: 1 month
2011-06-16 15:36:09 +00:00
Michael Tuexen
e2e7c62edc Add support for the newly added SCTP API.
In particular add support for:
* SCTP_SNDINFO, SCTP_PRINFO, SCTP_AUTHINFO, SCTP_DSTADDRV4, and
  SCTP_DSTADDRV6 cmsgs.
* SCTP_NXTINFO and SCTP_RCVINFO cmgs.
* SCTP_EVENT, SCTP_RECVRCVINFO, SCTP_RECVNXTINFO and SCTP_DEFAULT_SNDINFO
  socket option.
* Special association ids (SCTP_FUTURE_ASSOC, ...)
* sctp_recvv() and sctp_sendv() functions.

MFC after: 1 month.
2011-06-15 23:50:27 +00:00
Andrey V. Elsukov
1875bbfe54 Implement "global" mode for ipfw nat. It is similar to natd(8)
"globalport" option for multiple NAT instances.

If ipfw rule contains "global" keyword instead of nat_number, then
for each outgoing packet ipfw_nat looks up translation state in all
configured nat instances. If an entry is found, packet aliased
according to that entry, otherwise packet is passed unchanged.

User can specify "skip_global" option in NAT configuration to exclude
an instance from the lookup in global mode.

PR:		kern/157867
Submitted by:	Alexander V. Chernikov (previous version)
Tested by:	Eugene Grosbein
2011-06-14 13:35:24 +00:00
Andrey V. Elsukov
81a654646e Sort alias mode flags in the increasing order. 2011-06-14 12:06:38 +00:00
Andrey V. Elsukov
3265f69ce6 Add IPv6 support to the ipfw uid/gid check. Pass an ip_fw_args structure
to the check_uidgid() function, since it contains all needed arguments
and also pointer to mbuf and now it is possible use in_pcblookup_mbuf()
function.

Since i can not test it for the non-FreeBSD case, i keep this ifdef
unchanged.

Tested by:	Alexander V. Chernikov
MFC after:	3 weeks
2011-06-14 07:20:16 +00:00
John Baldwin
6b7c15e580 Advance the advertised window (rcv_adv) to the currently received data
(rcv_nxt) if we advertising a zero window.  This can be true when ACK'ing
a window probe whose one byte payload was accepted rather than dropped
because the socket's receive buffer was not completely full, but the
remaining space was smaller than the window scale.

This ensures that window probe ACKs satisfy the assumption made in r221346
and closes a window where rcv_nxt could be greater than rcv_adv.

Tested by:	trasz, pho, trociny
Reviewed by:	silby
MFC after:	1 week
2011-06-13 15:38:31 +00:00
Bjoern A. Zeeb
ffe8cd7b10 Correct comments and debug logging in ipsec to better match reality.
MFC after:	3 days
2011-06-08 03:02:11 +00:00
Andrey V. Elsukov
56e38090a4 Fix indentation. 2011-06-07 06:57:22 +00:00
Andrey V. Elsukov
bd853db48c Make a behaviour of the libalias based in-kernel NAT a bit closer to
how natd(8) does work. natd(8) drops packets only when libalias returns
PKT_ALIAS_IGNORED and "deny_incoming" option is set, but ipfw_nat
always did drop packets that were not aliased, even if they should
not be aliased and just are going through.

PR:		kern/122109, kern/129093, kern/157379
Submitted by:	Alexander V. Chernikov (previous version)
MFC after:	1 month
2011-06-07 06:42:29 +00:00
Bjoern A. Zeeb
1417604e70 Unbreak kernels with non-default PCBGROUP included but no WITNESS.
Rather than including lock.h in in_pcbgroup.c in right order, fix it
for all consumers of in_pcb.h by further header file pollution under
#ifdef KERNEL.

Reported by:	Pan Tsu (inyaoo gmail.com)
2011-06-06 21:45:32 +00:00
Robert Watson
52cd27cb58 Implement a CPU-affine TCP and UDP connection lookup data structure,
struct inpcbgroup.  pcbgroups, or "connection groups", supplement the
existing inpcbinfo connection hash table, which when pcbgroups are
enabled, might now be thought of more usefully as a per-protocol
4-tuple reservation table.

Connections are assigned to connection groups base on a hash of their
4-tuple; wildcard sockets require special handling, and are members
of all connection groups.  During a connection lookup, a
per-connection group lock is employed rather than the global pcbinfo
lock.  By aligning connection groups with input path processing,
connection groups take on an effective CPU affinity, especially when
aligned with RSS work placement (see a forthcoming commit for
details).  This eliminates cache line migration associated with
global, protocol-layer data structures in steady state TCP and UDP
processing (with the exception of protocol-layer statistics; further
commit to follow).

Elements of this approach were inspired by Willman, Rixner, and Cox's
2006 USENIX paper, "An Evaluation of Network Stack Parallelization
Strategies in Modern Operating Systems".  However, there are also
significant differences: we maintain the inpcb lock, rather than using
the connection group lock for per-connection state.

Likewise, the focus of this implementation is alignment with NIC
packet distribution strategies such as RSS, rather than pure software
strategies.  Despite that focus, software distribution is supported
through the parallel netisr implementation, and works well in
configurations where the number of hardware threads is greater than
the number of NIC input queues, such as in the RMI XLR threaded MIPS
architecture.

Another important difference is the continued maintenance of existing
hash tables as "reservation tables" -- these are useful both to
distinguish the resource allocation aspect of protocol name management
and the more common-case lookup aspect.  In configurations where
connection tables are aligned with hardware hashes, it is desirable to
use the traditional lookup tables for loopback or encapsulated traffic
rather than take the expense of hardware hashes that are hard to
implement efficiently in software (such as RSS Toeplitz).

Connection group support is enabled by compiling "options PCBGROUP"
into your kernel configuration; for the time being, this is an
experimental feature, and hence is not enabled by default.

Subject to the limited MFCability of change dependencies in inpcb,
and its change to the inpcbinfo init function signature, this change
in principle could be merged to FreeBSD 8.x.

Reviewed by:    bz
Sponsored by:   Juniper Networks, Inc.
2011-06-06 12:55:02 +00:00
Andrey V. Elsukov
1e587bfa32 Do not return EINVAL when user does ipfw set N flush on an empty set.
MFC after:	2 weeks
2011-06-06 10:39:38 +00:00
Hiroki Sato
db82af41db - Implement RDNSS and DNSSL options (RFC 6106, IPv6 Router Advertisement
Options for DNS Configuration) into rtadvd(8) and rtsold(8).  DNS
  information received by rtsold(8) will go to resolv.conf(5) by
  resolvconf(8) script.  This is based on work by J.R. Oldroyd (kern/156259)
  but revised extensively[1].

- rtadvd(8) now supports "noifprefix" to disable gathering on-link prefixes
  from interfaces when no "addr" is specified[2].  An entry in rtadvd.conf
  with "noifprefix" + no "addr" generates an RA message with no prefix
  information option.

- rtadvd(8) now supports RTM_IFANNOUNCE message to fix crashes when an
  interface is added or removed.

- Correct bogus ND_OPT_ROUTE_INFO value to one in RFC 4191.

Reviewed by:	bz[1]
PR:		kern/156259 [1]
PR:		bin/152458 [2]
2011-06-06 03:06:43 +00:00