prediction code. Previously, 2GB worth of header predicted data
could leave these variables too far out of sequence which would cause
problems after receiving a packet that did not match the header
prediction.
Submitted by: Bill Baumann <bbaumann@isilon.com>
Sponsored by: Isilon Systems, Inc.
Reviewed by: hsu, pete@isilon.com, neal@isilon.com, aaronp@isilon.com
This allows socket() to return an error when the kernel is not built
with IPDIVERT, and doesn't prevent future applications from using the
"borrowed" IP protocol number. The sysctl net.inet.raw.olddiverterror
controls whether opening a socket with the "borrowed" IP protocol
fails with an accompanying kernel printf; this code should last only a
couple of releases.
Approved by: re
Quoting luigi:
In order to make the userland code fully 64-bit clean it may
be necessary to commit other changes that may or may not cause
a minor change in the ABI.
Reviewed by: luigi
to the primary local IP address when doing a TCP connect(). The
tcp_connect() code was relying on in_pcbconnect (actually in_pcbladdr)
modifying the passed-in sockaddr, and I failed to notice this in
the recent change that added in_pcbconnect_setup(). As a result,
tcp_connect() was ending up using the unmodified sockaddr address
instead of the munged version.
There are two cases to handle: if in_pcbconnect_setup() succeeds,
then the PCB has already been updated with the correct destination
address as we pass it pointers to inp_faddr and inp_fport directly.
If in_pcbconnect_setup() fails due to an existing but dead connection,
then copy the destination address from the old connection.
a server process bound to a wildcard UDP socket to select the IP
address from which outgoing packets are sent on a per-datagram
basis. When combined with IP_RECVDSTADDR, such a server process can
guarantee to reply to an incoming request using the same source IP
address as the destination IP address of the request, without having
to open one socket per server IP address.
Discussed on: -net
Approved by: re
to send datagrams from an unconnected socket, we used to first block
input, then connect the socket to the sendmsg/sendto destination,
send the datagram, and finally disconnect the socket and unblock
input.
We now use in_pcbconnect_setup() to check if a connect() would have
succeeded, but we never record the connection in the PCB (local
anonymous port allocation is still recorded, though). The result
from in_pcbconnect_setup() authorises the sending of the datagram
and selects the local address and port to use, so we just construct
the header and call ip_output().
Discussed on: -net
Approved by: re
in_pcbconnect() called in_pcbconnect_setup(). This version performs
all of the functions of in_pcbconnect() except for the final
committing of changes to the PCB. In the case of an EADDRINUSE error
it can also provide to the caller the PCB of the duplicate connection,
avoiding an extra in_pcblookup_hash() lookup in tcp_connect().
This change will allow the "temporary connect" hack in udp_output()
to be removed and is part of the preparation for adding the
IP_SENDSRCADDR control message.
Discussed on: -net
Approved by: re
Remove the never completed _IP_VHL version, it has not caught on
anywhere and it would make us incompatible with other BSD netstacks
to retain this version.
Add a CTASSERT protecting sizeof(struct ip) == 20.
Don't let the size of struct ipq depend on the IPDIVERT option.
This is a functional no-op commit.
Approved by: re
called in_pcbbind_setup() that does everything except commit the
changes to the PCB. There should be no functional change here, but
in_pcbbind_setup() will be used by the soon-to-appear IP_SENDSRCADDR
control message implementation to check or allocate the source
address and port.
Discussed on: -net
Approved by: re
- set IFF_UP on SIOCSIFADDR. be consistent with others.
- set if_addrlen explicitly (just in case)
- multi destination mode is long gone.
- missing break statement
- add gif_set_tunnel(), so that we can set tunnel address from within the
kernel at ease.
- encap_attach/detach dynamically on ioctls
- move encap_attach() to dedicated function in in*_gif.c
Obtained from: KAME
MFC after: 3 weeks
supposed to be checked by the firewall rules twice. However, because the
various ipsec handlers never call ip_input(), this never happens anyway.
This fixes the situation where a gif tunnel is encrypted with IPsec. In
such a case, after IPsec processing, the unencrypted contents from the
GIF tunnel are fed back to the ipintrq and subsequently handeld by
ip_input(). Yet, since there still is IPSec history attached, the
packets coming out from the gif device are never fed into the filtering
code.
This fix was sent to Itojun, and he pointed towartds
http://www.netbsd.org/Documentation/network/ipsec/#ipf-interaction.
This patch actually implements what is stated there (specifically:
Packet came from tunnel devices (gif(4) and ipip(4)) will still
go through ipf(4). You may need to identify these packets by
using interface name directive in ipf.conf(5).
Reviewed by: rwatson
MFC after: 3 weeks
configuration stuff as well as conditional code in the IPv4 and IPv6
areas. Everything is conditional on FAST_IPSEC which is mutually
exclusive with IPSEC (KAME IPsec implmentation).
As noted previously, don't use FAST_IPSEC with INET6 at the moment.
Reviewed by: KAME, rwatson
Approved by: silence
Supported by: Vernier Networks
o instead of a list of mbufs use a list of m_tag structures a la openbsd
o for netgraph et. al. extend the stock openbsd m_tag to include a 32-bit
ABI/module number cookie
o for openbsd compatibility define a well-known cookie MTAG_ABI_COMPAT and
use this in defining openbsd-compatible m_tag_find and m_tag_get routines
o rewrite KAME use of aux mbufs in terms of packet tags
o eliminate the most heavily used aux mbufs by adding an additional struct
inpcb parameter to ip_output and ip6_output to allow the IPsec code to
locate the security policy to apply to outbound packets
o bump __FreeBSD_version so code can be conditionalized
o fixup ipfilter's call to ip_output based on __FreeBSD_version
Reviewed by: julian, luigi (silent), -arch, -net, darren
Approved by: julian, silence from everyone else
Obtained from: openbsd (mostly)
MFC after: 1 month
timestamped TCP packets where FreeBSD will send DATA+FIN and
A W2K box will ack just the DATA portion. If this occurs
after FreeBSD has done a (NewReno) fast-retransmit and is
recovering it (dupacks > threshold) it triggers a case in
tcp_newreno_partial_ack() (tcp_newreno() in stable) where
tcp_output() is called with the expectation that the retransmit
timer will be reloaded. But tcp_output() falls through and
returns without doing anything, causing the persist timer to be
loaded instead. This causes the connection to hang until W2K gives up.
This occurs because in the case where only the FIN must be acked, the
'len' calculation in tcp_output() will be 0, a lot of checks will be
skipped, and the FIN check will also be skipped because it is designed
to handle FIN retransmits, not forced transmits from tcp_newreno().
The solution is to simply set TF_ACKNOW before calling tcp_output()
to absolute guarentee that it will run the send code and reset the
retransmit timer. TF_ACKNOW is already used for this purpose in other
cases.
For some unknown reason this patch also seems to greatly reduce
the number of duplicate acks received when Guido runs his tests over
a lossy network. It is quite possible that there are other
tcp_newreno{_partial_ack()} cases which were not generating the expected
output which this patch also fixes.
X-MFC after: Will be MFC'd after the freeze is over
o Move len initialization closer to place of its first usage.
o Compare len with 0 to improve readability.
o Explicitly zero out phlen in ip_insertoptions() in failure case.
Suggested by: jhb
Reviewed by: jhb
MFC after: 2 weeks
Windows 2000 box and a FreeBSD box could stall. The problem turned out
to be a timestamp reply bug in the W2K TCP stack. FreeBSD sends a
timestamp with the SYN, W2K returns a timestamp of 0 in the SYN+ACK
causing FreeBSD to calculate an insane SRTT and RTT, resulting in
a maximal retransmit timeout (60 seconds). If there is any packet
loss on the connection for the first six or so packets the retransmit
case may be hit (the window will still be too small for fast-retransmit),
causing a 60+ second pause. The W2K box gives up and closes the
connection.
This commit works around the W2K bug.
15:04:59.374588 FREEBSD.20 > W2K.1036: S 1420807004:1420807004(0) win 65535 <mss 1460,nop,wscale 2,nop,nop,timestamp 188297344 0> (DF) [tos 0x8]
15:04:59.377558 W2K.1036 > FREEBSD.20: S 4134611565:4134611565(0) ack 1420807005 win 17520 <mss 1460,nop,wscale 0,nop,nop,timestamp 0 0> (DF)
Bug reported by: Guido van Rooij <guido@gvr.org>
packets in addition to IPPROTO_IPV4 and IPPROTO_IPV6, explicitly specify
IPPROTO_IPV4 or IPPROTO_IPV6 instead of -1 when calling encap_attach().
MFC after: 28 days
(along with other if_gre changes)