Commit Graph

4645 Commits

Author SHA1 Message Date
Michael Tuexen
6be15a24c4 Export the inpcb features as a 64-bit entity.
Bump __FreeBSD_version to 1000048 since the
modified structure is user visible and used
by netstat, for example.
2013-08-22 20:29:57 +00:00
Michael Tuexen
06c9f9bddf Make also the features of the association 64-bit.
When exporting to xinpcb, just export the lower
32-bit. Using there also 64-bits will break the
ABI and will be committed separetly.

MFC after: 2 weeks
X-MFC with: 254248
2013-08-22 19:28:13 +00:00
Xin LI
acde2476c4 Fix an integer overflow in computing the size of a temporary buffer
can result in a buffer which is too small for the requested
operation.

Security:	CVE-2013-3077
Security:	FreeBSD-SA-13:09.ip_multicast
2013-08-22 00:51:37 +00:00
Andre Oppermann
5fc98a7895 Reorder the mbuf defines to make more sense and group related flags
together.

Add M_FLAG_PRINTF for use with printf(9) %b indentifier.

Use the generic mbuf flags print names in the net80211 code and adjust
the protocol specific bits for their new positions.

Change SCTP M_PROTO mapping from 5 to 1 to fit within the 16bit field
they use internally to store some additional information.

Discussed with:	trociny, glebius
2013-08-19 14:25:11 +00:00
Andre Oppermann
86bd049144 Add m_clrprotoflags() to clear protocol specific mbuf flags at up and
downwards layer crossings.

Consistently use it within IP, IPv6 and ethernet protocols.

Discussed with:	trociny, glebius
2013-08-19 13:27:32 +00:00
Andre Oppermann
678d7b9461 Move the SCTP specific definition of M_NOTIFICATION onto a protocol
specific mbuf flag from sys/mbuf.h to netinet/sctp_os_bsd.h.  It is
only relevant within SCTP.

Discussed with:	tuexen
2013-08-19 12:30:18 +00:00
Andre Oppermann
88388bdcbe Move the global M_SKIP_FIREWALL mbuf flags to a protocol layer specific
flag instead.  The flag is only used within the IP and IPv6 layer 3
protocols.

Because some firewall packages treat IPv4 and IPv6 packets the same the
flag should have the same value for both.

Discussed with:	trociny, glebius
2013-08-19 11:08:36 +00:00
Andre Oppermann
b09dc7e328 Move ip_reassemble()'s use of the global M_FRAG mbuf flag to a protocol layer
specific flag instead.  The flag is only relevant while the packet stays in
the IP reassembly queue.

Discussed with:	trociny, glebius
2013-08-19 10:34:10 +00:00
Andre Oppermann
fb86dfcd2f Remove unused M_FRAG, M_FIRSTFRAG and M_LASTFRAG tagging from ip_fragment().
There wasn't any real driver (and hardware) support for it.  Modern hardware
does full fragmentation/segmentation offload instead.
2013-08-19 10:30:15 +00:00
Mark Johnston
7b77e1fe0f Specify SDT probe argument types in the probe definition itself rather than
using SDT_PROBE_ARGTYPE(). This will make it easy to extend the SDT(9) API
to allow probes with dynamically-translated types.

There is no functional change.

MFC after:	2 weeks
2013-08-15 04:08:55 +00:00
Michael Tuexen
0e05fbded9 Don't send uninitialized memory (two instances of 4 bytes) in
every cookie on the wire. This bug was reported in
https://bugzilla.mozilla.org/show_bug.cgi?id=905080

MFC after: 3 days
2013-08-14 21:51:32 +00:00
Mikolaj Golub
c5c392e7ed Virtualize carp(4) variables to have per vnet control.
Reviewed by:	ae, glebius
2013-08-13 19:59:49 +00:00
Michael Tuexen
2c9c61defa Make the features a 64-bit value instead of 32-bit.
This will allow an easier integration of the support
for NDATA.
While there, do also some minor cleanups.
Obtained from:	rrs@
MFC after: 2 weeks
2013-08-12 13:52:15 +00:00
Michael Tuexen
bfd1666aad Micro-optimization suggested in
https://bugzilla.mozilla.org/show_bug.cgi?id=898234
by pchang9. While there simplify the code.

MFC after: 1 week
2013-08-01 12:05:23 +00:00
Andrey V. Elsukov
6794f46021 Remove the large part of struct ipsecstat. Only few fields of this
structure is used, but they already have equal fields in the struct
newipsecstat, that was introduced with FAST_IPSEC and then was merged
together with old ipsecstat structure.

This fixes kernel stack overflow on some architectures after migration
ipsecstat to PCPU counters.

Reported by:	Taku YAMAMOTO, Maciej Milewski
2013-07-23 14:14:24 +00:00
Michael Tuexen
88a95b1f25 Allow the code to be compiled without warnings for any combination
of INET, INET6 and SCTP_DEBUG defines.
The issue was reported by Lally Singh.

MFC after: 2 weeks
2013-07-20 13:14:59 +00:00
Michael Tuexen
da24cfcb35 Get the code compiling without INET and INET6 being defined.
This is not possible in FreeBSD, but in the upstream code.

MFC after: 2 weeks
2013-07-19 21:16:59 +00:00
Andre Oppermann
ccd040ab18 Free the non-fatal "timestamp missing" debug string manually as it is
not covered by the catch-all free for the error cases.

Found by:	Coverity
2013-07-16 16:37:08 +00:00
Mikolaj Golub
f122b319eb A complete duplication of binding should be allowed if on both new and
duplicated sockets a multicast address is bound and either
SO_REUSEPORT or SO_REUSEADDR is set.

But actually it works for the following combinations:

  * SO_REUSEPORT is set for the fist socket and SO_REUSEPORT for the new;
  * SO_REUSEADDR is set for the fist socket and SO_REUSEADDR for the new;
  * SO_REUSEPORT is set for the fist socket and SO_REUSEADDR for the new;

and fails for this:

  * SO_REUSEADDR is set for the fist socket and SO_REUSEPORT for the new.

Fix the last case.

PR:		179901
MFC after:	1 month
2013-07-12 19:08:33 +00:00
Andre Oppermann
10c982958c Unbreak VIMAGE by correctly naming the vnet pointer in struct tcp_syncache.
Reported by:	trociny, rodrigc
2013-07-12 07:43:56 +00:00
Andre Oppermann
81d392a09d Improve SYN cookies by encoding the MSS, WSCALE (window scaling) and SACK
information into the ISN (initial sequence number) without the additional
use of timestamp bits and switching to the very fast and cryptographically
strong SipHash-2-4 MAC hash algorithm to protect the SYN cookie against
forgeries.

The purpose of SYN cookies is to encode all necessary session state in
the 32 bits of our initial sequence number to avoid storing any information
locally in memory.  This is especially important when under heavy spoofed
SYN attacks where we would either run out of memory or the syncache would
fill with bogus connection attempts swamping out legitimate connections.

The original SYN cookies method only stored an indexed MSS values in the
cookie.  This isn't sufficient anymore and breaks down in the presence of
WSCALE information which is only exchanged during SYN and SYN-ACK.  If we
can't keep track of it then we may severely underestimate the available
send or receive window. This is compounded with large windows whose size
information on the TCP segment header is even lower numerically.  A number
of years back SYN cookies were extended to store the additional state in
the TCP timestamp fields, if available on a connection.  While timestamps
are common among the BSD, Linux and other *nix systems Windows never enabled
them by default and thus are not present for the vast majority of clients
seen on the Internet.

The common parameters used on TCP sessions have changed quite a bit since
SYN cookies very invented some 17 years ago.  Today we have a lot more
bandwidth available making the use window scaling almost mandatory.  Also
SACK has become standard making recovering from packet loss much more
efficient.

This change moves all necessary information into the ISS removing the need
for timestamps.  Both the MSS (16 bits) and send WSCALE (4 bits) are stored
in 3 bit indexed form together with a single bit for SACK.  While this is
significantly less than the original range, it is sufficient to encode all
common values with minimal rounding.

The MSS depends on the MTU of the path and with the dominance of ethernet
the main value seen is around 1460 bytes.  Encapsulations for DSL lines
and some other overheads reduce it by a few more bytes for many connections
seen.  Rounding down to the next lower value in some cases isn't a problem
as we send only slightly more packets for the same amount of data.

The send WSCALE index is bit more tricky as rounding down under-estimates
the available send space available towards the remote host, however a small
number values dominate and are carefully selected again.

The receive WSCALE isn't encoded at all but recalculated based on the local
receive socket buffer size when a valid SYN cookie returns.  A listen socket
buffer size is unlikely to change while active.

The index values for MSS and WSCALE are selected for minimal rounding errors
based on large traffic surveys.  These values have to be periodically
validated against newer traffic surveys adjusting the arrays tcp_sc_msstab[]
and tcp_sc_wstab[] if necessary.

In addition the hash MAC to protect the SYN cookies is changed from MD5
to SipHash-2-4, a much faster and cryptographically secure algorithm.

Reviewed by:	dwmalone
Tested by:	Fabian Keil <fk@fabiankeil.de>
2013-07-11 15:29:25 +00:00
Andre Oppermann
07dacf031e Extend debug logging of TCP timestamp related specification
violations.

Update related comments and style.
2013-07-10 12:06:01 +00:00
Michael Tuexen
e5aeb83c42 Use IPSECSTAT_INC() and IPSEC6STAT_INC() macros for ipsec statistics
accounting.

X-MFC with: r252026
2013-07-09 14:38:26 +00:00
Andrey V. Elsukov
69edf037d7 Migrate struct carpstats to PCPU counters. 2013-07-09 10:02:51 +00:00
Andrey V. Elsukov
2841260cd6 Migrate structs in6_ifstat and icmp6_ifstat to PCPU counters. 2013-07-09 09:59:46 +00:00
Andrey V. Elsukov
a786f67981 Migrate structs ip6stat, icmp6stat and rip6stat to PCPU counters. 2013-07-09 09:54:54 +00:00
Andrey V. Elsukov
5b7cb97c2b Migrate structs arpstat, icmpstat, mrtstat, pimstat and udpstat to PCPU
counters.
2013-07-09 09:50:15 +00:00
Andrey V. Elsukov
5da0521fce Use new macros to implement ipstat and tcpstat using PCPU counters.
Change interface of kread_counters() similar ot kread() in the netstat(1).
2013-07-09 09:43:03 +00:00
Andrey V. Elsukov
c80211e3cf Prepare network statistics structures for migration to PCPU counters.
Use uint64_t as type for all fields of structures.

Changed structures: ahstat, arpstat, espstat, icmp6_ifstat, icmp6stat,
in6_ifstat, ip6stat, ipcompstat, ipipstat, ipsecstat, mrt6stat, mrtstat,
pfkeystat, pim6stat, pimstat, rip6stat, udpstat.

Discussed with:	arch@
2013-07-09 09:32:06 +00:00
Michael Tuexen
ee1ccd9258 Fix a bug were only 2048 streams where usable even though more than
2048 streams were negotiated on the wire. While there, remove the
hard coded limit of 2048 streams.

MFC after: 3 days
2013-07-05 10:08:49 +00:00
Michael Tuexen
5db47b3def When processing an incoming ABORT, SHUTDOWN_COMPLETE or ERROR (NAT related)
chunk, take always the T-bit into account, when checking the verification
tag.

MFC after: 3 days
2013-07-04 19:47:46 +00:00
Mikolaj Golub
efdf104bca In r227207, to fix the issue with possible NULL inp_socket pointer
dereferencing, when checking for SO_REUSEPORT option (and SO_REUSEADDR
for multicast), INP_REUSEPORT flag was introduced to cache the socket
option.  It was decided then that one flag would be enough to cache
both SO_REUSEPORT and SO_REUSEADDR: when processing SO_REUSEADDR
setsockopt(2), it was checked if it was called for a multicast address
and INP_REUSEPORT was set accordingly.

Unfortunately that approach does not work when setsockopt(2) is called
before binding to a multicast address: the multicast check fails and
INP_REUSEPORT is not set.

Fix this by adding INP_REUSEADDR flag to unconditionally cache
SO_REUSEADDR.

PR:		179901
Submitted by:	Michael Gmelin freebsd grem.de (initial version)
Reviewed by:	rwatson
MFC after:	1 week
2013-07-04 18:38:00 +00:00
Michael Tuexen
56f778aadf Code cleanups.
MFC after: 3 days
2013-07-03 18:48:43 +00:00
Navdeep Parhar
e364d8c44a Catch up with r238990. LLE_DELETED does not clobber everything else in
la_flags since said revision.
2013-07-03 17:27:32 +00:00
Hiroki Sato
e32d93954d Fix a panic when leaving MC group in a kernel with VIMAGE enabled.
in_leavegroup() is called from an asynchronous task, and
igmp_change_state() requires that curvnet is set by the caller.
2013-07-02 16:39:12 +00:00
Lawrence Stewart
92a0637f73 Import an implementation of the CAIA Delay-Gradient (CDG) congestion control
algorithm, which is based on the 2011 v0.1 patch release and described in the
paper "Revisiting TCP Congestion Control using Delay Gradients" by David Hayes
and Grenville Armitage. It is implemented as a kernel module compatible with the
modular congestion control framework.

CDG is a hybrid congestion control algorithm which reacts to both packet loss
and inferred queuing delay. It attempts to operate as a delay-based algorithm
where possible, but utilises heuristics to detect loss-based TCP cross traffic
and will compete effectively as required. CDG is therefore incrementally
deployable and suitable for use on shared networks.

In collaboration with:	David Hayes <david.hayes at ieee.org> and
		Grenville Armitage <garmitage at swin edu au>
MFC after:	4 days
Sponsored by:	Cisco University Research Program and FreeBSD Foundation
2013-07-02 08:44:56 +00:00
Gleb Smirnoff
42a253e6a1 Fix kmod_*stat_inc() after r249276. The incorrect code actually
increased the pointer, not the memory it points to.

In collaboration with:	kib
Reported & tested by:	Ian FREISLICH <ianf clue.co.za>
Sponsored by:		Nginx, Inc.
2013-06-21 06:36:26 +00:00
Andrey V. Elsukov
6659296cb0 Use IPSECSTAT_INC() and IPSEC6STAT_INC() macros for ipsec statistics
accounting.

MFC after:	2 weeks
2013-06-20 09:55:53 +00:00
Bruce M Simpson
c91950082d Disable IGMPv3 link timers on a transition to IGMPv2.
Submitted by:	Alan Smithee
2013-06-07 17:12:08 +00:00
Andre Oppermann
3c914c547e Allow drivers to specify a maximum TSO length in bytes if they are
limited in the amount of data they can handle at once.

Drivers can set ifp->if_hw_tsomax before calling ether_ifattach() to
change the limit.

The lowest allowable size is IP_MAXPACKET / 8 (8192 bytes) as anything
less wouldn't be very useful anymore.  The upper limit is still at
IP_MAXPACKET (65536 bytes).  Raising it requires further auditing of
the IPv4/v6 code path's as the length field in the IP header would
overflow leading to confusion in firewalls and others packet handler on
the real size of the packet.

The placement into "struct ifnet" is a bit hackish but the best place
that was found.  When the stack/driver boundary is updated it should
be handled in a better way.

Submitted by:	cperciva (earlier version)
Reviewed by:	cperciva
Tested by:	cperciva
MFC after:	1 week (using spare struct members to preserve ABI)
2013-06-03 12:55:13 +00:00
Michael Tuexen
fe1831e06f Use LIST_EMPTY when appropriate.
MFC after: 1 week
2013-06-02 10:35:08 +00:00
Michael Tuexen
fb4a67d207 Remove redundant checks.
MFC after: 2 weeks
2013-05-28 09:25:58 +00:00
Michael Tuexen
3f61f926ea Withdraw http://svnweb.freebsd.org/changeset/base/250809
since the real fix is in http://svnweb.freebsd.org/changeset/base/250952.
2013-05-24 09:21:18 +00:00
Michael Tuexen
e3581df21e Initialize the fibnum for outgoing packets to 0. This avoids
crashing due to the usage of uninitialized fibnum.
This bugs became visiable after
http://svnweb.freebsd.org/changeset/base/250700

MFC after: 2 weeks
2013-05-19 16:06:43 +00:00
Michael Tuexen
553bb0688c Set errno to ETIMEDOUT if an SCTP association times out during
setup.

MFC after: 1 week
2013-05-17 22:26:05 +00:00
Michael Tuexen
b05fbf171e Don't send an ABORT chunk with verification 0.
MFC after: 1 week
2013-05-17 21:45:52 +00:00
Jim Harris
d13fc9954b Fix typo in net.inet.tcp.minmss sysctl description.
MFC after:	3 days
2013-05-13 19:55:27 +00:00
Hiroki Sato
b8992a6792 Add IFF_MONITOR support to gre(4).
Tested by:	Chip Marshall
MFC after:	1 week
2013-05-11 19:05:38 +00:00
Gleb Smirnoff
5d81d09598 Rate limit the number of remotely triggered ARP log messages
to 1 log message per second.
2013-05-11 10:51:32 +00:00
Michael Tuexen
3457ccdaea Honor the net.inet6.ip6.v6only sysctl variable and the IPV6_V6ONLY
socket option for SCTP sockets in the same way as for UDP or TCP
sockets.

MFC after: 2 weeks
2013-05-10 18:09:38 +00:00