Commit Graph

4634 Commits

Author SHA1 Message Date
tuexen
cfbde72679 Micro-optimization suggested in
https://bugzilla.mozilla.org/show_bug.cgi?id=898234
by pchang9. While there simplify the code.

MFC after: 1 week
2013-08-01 12:05:23 +00:00
ae
afd48faca0 Remove the large part of struct ipsecstat. Only few fields of this
structure is used, but they already have equal fields in the struct
newipsecstat, that was introduced with FAST_IPSEC and then was merged
together with old ipsecstat structure.

This fixes kernel stack overflow on some architectures after migration
ipsecstat to PCPU counters.

Reported by:	Taku YAMAMOTO, Maciej Milewski
2013-07-23 14:14:24 +00:00
tuexen
e36ca7ff41 Allow the code to be compiled without warnings for any combination
of INET, INET6 and SCTP_DEBUG defines.
The issue was reported by Lally Singh.

MFC after: 2 weeks
2013-07-20 13:14:59 +00:00
tuexen
9f5ff4a4a0 Get the code compiling without INET and INET6 being defined.
This is not possible in FreeBSD, but in the upstream code.

MFC after: 2 weeks
2013-07-19 21:16:59 +00:00
andre
67c418ca84 Free the non-fatal "timestamp missing" debug string manually as it is
not covered by the catch-all free for the error cases.

Found by:	Coverity
2013-07-16 16:37:08 +00:00
trociny
69ab640b6b A complete duplication of binding should be allowed if on both new and
duplicated sockets a multicast address is bound and either
SO_REUSEPORT or SO_REUSEADDR is set.

But actually it works for the following combinations:

  * SO_REUSEPORT is set for the fist socket and SO_REUSEPORT for the new;
  * SO_REUSEADDR is set for the fist socket and SO_REUSEADDR for the new;
  * SO_REUSEPORT is set for the fist socket and SO_REUSEADDR for the new;

and fails for this:

  * SO_REUSEADDR is set for the fist socket and SO_REUSEPORT for the new.

Fix the last case.

PR:		179901
MFC after:	1 month
2013-07-12 19:08:33 +00:00
andre
72f878ef47 Unbreak VIMAGE by correctly naming the vnet pointer in struct tcp_syncache.
Reported by:	trociny, rodrigc
2013-07-12 07:43:56 +00:00
andre
fc2be30b21 Improve SYN cookies by encoding the MSS, WSCALE (window scaling) and SACK
information into the ISN (initial sequence number) without the additional
use of timestamp bits and switching to the very fast and cryptographically
strong SipHash-2-4 MAC hash algorithm to protect the SYN cookie against
forgeries.

The purpose of SYN cookies is to encode all necessary session state in
the 32 bits of our initial sequence number to avoid storing any information
locally in memory.  This is especially important when under heavy spoofed
SYN attacks where we would either run out of memory or the syncache would
fill with bogus connection attempts swamping out legitimate connections.

The original SYN cookies method only stored an indexed MSS values in the
cookie.  This isn't sufficient anymore and breaks down in the presence of
WSCALE information which is only exchanged during SYN and SYN-ACK.  If we
can't keep track of it then we may severely underestimate the available
send or receive window. This is compounded with large windows whose size
information on the TCP segment header is even lower numerically.  A number
of years back SYN cookies were extended to store the additional state in
the TCP timestamp fields, if available on a connection.  While timestamps
are common among the BSD, Linux and other *nix systems Windows never enabled
them by default and thus are not present for the vast majority of clients
seen on the Internet.

The common parameters used on TCP sessions have changed quite a bit since
SYN cookies very invented some 17 years ago.  Today we have a lot more
bandwidth available making the use window scaling almost mandatory.  Also
SACK has become standard making recovering from packet loss much more
efficient.

This change moves all necessary information into the ISS removing the need
for timestamps.  Both the MSS (16 bits) and send WSCALE (4 bits) are stored
in 3 bit indexed form together with a single bit for SACK.  While this is
significantly less than the original range, it is sufficient to encode all
common values with minimal rounding.

The MSS depends on the MTU of the path and with the dominance of ethernet
the main value seen is around 1460 bytes.  Encapsulations for DSL lines
and some other overheads reduce it by a few more bytes for many connections
seen.  Rounding down to the next lower value in some cases isn't a problem
as we send only slightly more packets for the same amount of data.

The send WSCALE index is bit more tricky as rounding down under-estimates
the available send space available towards the remote host, however a small
number values dominate and are carefully selected again.

The receive WSCALE isn't encoded at all but recalculated based on the local
receive socket buffer size when a valid SYN cookie returns.  A listen socket
buffer size is unlikely to change while active.

The index values for MSS and WSCALE are selected for minimal rounding errors
based on large traffic surveys.  These values have to be periodically
validated against newer traffic surveys adjusting the arrays tcp_sc_msstab[]
and tcp_sc_wstab[] if necessary.

In addition the hash MAC to protect the SYN cookies is changed from MD5
to SipHash-2-4, a much faster and cryptographically secure algorithm.

Reviewed by:	dwmalone
Tested by:	Fabian Keil <fk@fabiankeil.de>
2013-07-11 15:29:25 +00:00
andre
86d89cd580 Extend debug logging of TCP timestamp related specification
violations.

Update related comments and style.
2013-07-10 12:06:01 +00:00
tuexen
5719f9fad9 Use IPSECSTAT_INC() and IPSEC6STAT_INC() macros for ipsec statistics
accounting.

X-MFC with: r252026
2013-07-09 14:38:26 +00:00
ae
430162610d Migrate struct carpstats to PCPU counters. 2013-07-09 10:02:51 +00:00
ae
08c6719ac4 Migrate structs in6_ifstat and icmp6_ifstat to PCPU counters. 2013-07-09 09:59:46 +00:00
ae
e5b002a3b8 Migrate structs ip6stat, icmp6stat and rip6stat to PCPU counters. 2013-07-09 09:54:54 +00:00
ae
705a50a053 Migrate structs arpstat, icmpstat, mrtstat, pimstat and udpstat to PCPU
counters.
2013-07-09 09:50:15 +00:00
ae
1b1e1de2b2 Use new macros to implement ipstat and tcpstat using PCPU counters.
Change interface of kread_counters() similar ot kread() in the netstat(1).
2013-07-09 09:43:03 +00:00
ae
1a36dfcc87 Prepare network statistics structures for migration to PCPU counters.
Use uint64_t as type for all fields of structures.

Changed structures: ahstat, arpstat, espstat, icmp6_ifstat, icmp6stat,
in6_ifstat, ip6stat, ipcompstat, ipipstat, ipsecstat, mrt6stat, mrtstat,
pfkeystat, pim6stat, pimstat, rip6stat, udpstat.

Discussed with:	arch@
2013-07-09 09:32:06 +00:00
tuexen
9e48e2f73a Fix a bug were only 2048 streams where usable even though more than
2048 streams were negotiated on the wire. While there, remove the
hard coded limit of 2048 streams.

MFC after: 3 days
2013-07-05 10:08:49 +00:00
tuexen
6a8f6a36f6 When processing an incoming ABORT, SHUTDOWN_COMPLETE or ERROR (NAT related)
chunk, take always the T-bit into account, when checking the verification
tag.

MFC after: 3 days
2013-07-04 19:47:46 +00:00
trociny
9b554dcd02 In r227207, to fix the issue with possible NULL inp_socket pointer
dereferencing, when checking for SO_REUSEPORT option (and SO_REUSEADDR
for multicast), INP_REUSEPORT flag was introduced to cache the socket
option.  It was decided then that one flag would be enough to cache
both SO_REUSEPORT and SO_REUSEADDR: when processing SO_REUSEADDR
setsockopt(2), it was checked if it was called for a multicast address
and INP_REUSEPORT was set accordingly.

Unfortunately that approach does not work when setsockopt(2) is called
before binding to a multicast address: the multicast check fails and
INP_REUSEPORT is not set.

Fix this by adding INP_REUSEADDR flag to unconditionally cache
SO_REUSEADDR.

PR:		179901
Submitted by:	Michael Gmelin freebsd grem.de (initial version)
Reviewed by:	rwatson
MFC after:	1 week
2013-07-04 18:38:00 +00:00
tuexen
baf8a88746 Code cleanups.
MFC after: 3 days
2013-07-03 18:48:43 +00:00
np
dc9cd49613 Catch up with r238990. LLE_DELETED does not clobber everything else in
la_flags since said revision.
2013-07-03 17:27:32 +00:00
hrs
b3bce89084 Fix a panic when leaving MC group in a kernel with VIMAGE enabled.
in_leavegroup() is called from an asynchronous task, and
igmp_change_state() requires that curvnet is set by the caller.
2013-07-02 16:39:12 +00:00
lstewart
1920a78bd9 Import an implementation of the CAIA Delay-Gradient (CDG) congestion control
algorithm, which is based on the 2011 v0.1 patch release and described in the
paper "Revisiting TCP Congestion Control using Delay Gradients" by David Hayes
and Grenville Armitage. It is implemented as a kernel module compatible with the
modular congestion control framework.

CDG is a hybrid congestion control algorithm which reacts to both packet loss
and inferred queuing delay. It attempts to operate as a delay-based algorithm
where possible, but utilises heuristics to detect loss-based TCP cross traffic
and will compete effectively as required. CDG is therefore incrementally
deployable and suitable for use on shared networks.

In collaboration with:	David Hayes <david.hayes at ieee.org> and
		Grenville Armitage <garmitage at swin edu au>
MFC after:	4 days
Sponsored by:	Cisco University Research Program and FreeBSD Foundation
2013-07-02 08:44:56 +00:00
glebius
68c4131bea Fix kmod_*stat_inc() after r249276. The incorrect code actually
increased the pointer, not the memory it points to.

In collaboration with:	kib
Reported & tested by:	Ian FREISLICH <ianf clue.co.za>
Sponsored by:		Nginx, Inc.
2013-06-21 06:36:26 +00:00
ae
1e4c88cc8b Use IPSECSTAT_INC() and IPSEC6STAT_INC() macros for ipsec statistics
accounting.

MFC after:	2 weeks
2013-06-20 09:55:53 +00:00
bms
be617f91a5 Disable IGMPv3 link timers on a transition to IGMPv2.
Submitted by:	Alan Smithee
2013-06-07 17:12:08 +00:00
andre
b706ceb4ab Allow drivers to specify a maximum TSO length in bytes if they are
limited in the amount of data they can handle at once.

Drivers can set ifp->if_hw_tsomax before calling ether_ifattach() to
change the limit.

The lowest allowable size is IP_MAXPACKET / 8 (8192 bytes) as anything
less wouldn't be very useful anymore.  The upper limit is still at
IP_MAXPACKET (65536 bytes).  Raising it requires further auditing of
the IPv4/v6 code path's as the length field in the IP header would
overflow leading to confusion in firewalls and others packet handler on
the real size of the packet.

The placement into "struct ifnet" is a bit hackish but the best place
that was found.  When the stack/driver boundary is updated it should
be handled in a better way.

Submitted by:	cperciva (earlier version)
Reviewed by:	cperciva
Tested by:	cperciva
MFC after:	1 week (using spare struct members to preserve ABI)
2013-06-03 12:55:13 +00:00
tuexen
c8786efb41 Use LIST_EMPTY when appropriate.
MFC after: 1 week
2013-06-02 10:35:08 +00:00
tuexen
991d3fc803 Remove redundant checks.
MFC after: 2 weeks
2013-05-28 09:25:58 +00:00
tuexen
00b2cf3075 Withdraw http://svnweb.freebsd.org/changeset/base/250809
since the real fix is in http://svnweb.freebsd.org/changeset/base/250952.
2013-05-24 09:21:18 +00:00
tuexen
cd30dfafe0 Initialize the fibnum for outgoing packets to 0. This avoids
crashing due to the usage of uninitialized fibnum.
This bugs became visiable after
http://svnweb.freebsd.org/changeset/base/250700

MFC after: 2 weeks
2013-05-19 16:06:43 +00:00
tuexen
7d22df2a4c Set errno to ETIMEDOUT if an SCTP association times out during
setup.

MFC after: 1 week
2013-05-17 22:26:05 +00:00
tuexen
d482c3688d Don't send an ABORT chunk with verification 0.
MFC after: 1 week
2013-05-17 21:45:52 +00:00
jimharris
7c58f64221 Fix typo in net.inet.tcp.minmss sysctl description.
MFC after:	3 days
2013-05-13 19:55:27 +00:00
hrs
960e7e9fd4 Add IFF_MONITOR support to gre(4).
Tested by:	Chip Marshall
MFC after:	1 week
2013-05-11 19:05:38 +00:00
glebius
f43ee707dd Rate limit the number of remotely triggered ARP log messages
to 1 log message per second.
2013-05-11 10:51:32 +00:00
tuexen
6ea39edf93 Honor the net.inet6.ip6.v6only sysctl variable and the IPV6_V6ONLY
socket option for SCTP sockets in the same way as for UDP or TCP
sockets.

MFC after: 2 weeks
2013-05-10 18:09:38 +00:00
andre
cc8c6e4d01 Back out r249318, r249320 and r249327 due to a heisenbug most
likely related to a race condition in the ipi_hash_lock with
the exact cause currently unknown but under investigation.
2013-05-06 16:42:18 +00:00
hrs
d9d71436d9 Use FF02:0:0:0:0:2:FF00::/104 prefix for IPv6 Node Information Group
Address.  Although KAME implementation used FF02:0:0:0:0:2::/96 based on
older versions of draft-ietf-ipngwg-icmp-name-lookup, it has been changed
in RFC 4620.

The kernel always joins the /104-prefixed address, and additionally does
/96-prefixed one only when net.inet6.icmp6.nodeinfo_oldmcprefix=1.
The default value of the sysctl is 1.

ping6(8) -N flag now uses /104-prefixed one.  When this flag is specified
twice, it uses /96-prefixed one instead.

Reviewed by:		ume
Based on work by:	Thomas Scheffler
PR:			conf/174957
MFC after:		2 weeks
2013-05-04 19:16:26 +00:00
cperciva
fcb6743484 Move IPPROTO_IPV6 from #ifdef __BSD_VISIBLE to #if __POSIX_VISIBLE >= 201112
since POSIX 2001 states that it shall be defined.

Reported by:	sbruno
Reviewed by:	jilles
MFC after:	1 week
2013-04-27 23:36:01 +00:00
glebius
b4bc270e8f Add const qualifier to the dst parameter of the ifnet if_output method. 2013-04-26 12:50:32 +00:00
glebius
8b3c8983bc Fix couple of mbuf leaks in incoming ARP processing. 2013-04-25 17:38:04 +00:00
glebius
dd30d87e8d Introduce a pointer to const variable gw, which points either at the
same place as dst, or to the sockaddr in the routing table.

The const constraint of gw makes us safe from modifing routing table
accidentially. And "onstantness" of dst allows us to remove several
bandaids, when we switched it back at &ro->ro_dst, now it always
points there.

Reviewed by:	rrs
2013-04-25 12:42:09 +00:00
rrs
d90354b019 This fixes the issue with the "randomly changing" default
route. What it was is there are two places in ip_output.c
where we do a goto again. One place was fine, it
copies out the new address and then resets dst = ro->rt_dst;
But the other place does *not* do that, which means earlier
when we found the gateway, we have dst pointing there
aka dst = ro->rt_gateway is done.. then we do a
goto again.. bam now we clobber the default route.

The fix is just to move the again so we are always
doing dst = &ro->rt_dst; in the again loop.

PR:	 174749,157796
MFC after:	1 week
2013-04-24 18:30:32 +00:00
andre
1f1bd98319 When doing RFC3042 limited transmit on the first on second
duplicate ACK make sure we actually have new data to send.
This prevents us from sending unneccessary pure ACKs.

Reported by:	Matt Miller <matt@matthewjmiller.net>
Tested by:	Matt Miller <matt@matthewjmiller.net>
MFC after:	2 weeks
2013-04-23 14:06:32 +00:00
oleg
9917da6df0 Plug static llentry leak (ipv4 & ipv6 were affected).
PR:		kern/172985
MFC after:	1 month
2013-04-21 21:28:38 +00:00
gabor
188c638b60 - Corrrect mispellings of word useful
Submitted by:	Christoph Mallon <christoph.mallon@gmx.de> (via private mail)
2013-04-17 11:45:15 +00:00
delphij
8809aa4451 Fix incomplete printf.
PR:		kern/177889
Submitted by:	Sven-Thorsten Dietrich <sven vyatta com>
MFC after:	1 week
2013-04-16 19:32:12 +00:00
delphij
54d1fdc9a1 Don't leak lock when returning.
PR:		kern/177888
Submitted by:	Sven-Thorsten Dietrich <sven vyatta com>
MFC after:	1 week
2013-04-16 19:25:41 +00:00
ae
d657004b7f Reflect removing of the counter_u64_subtract() function in the macro. 2013-04-12 16:29:15 +00:00