Commit Graph

4757 Commits

Author SHA1 Message Date
Eitan Adler
5f30ec9b63 In a situation where:
- The remote host sends a FIN
	- in an ACK for a sequence number for which an ACK has already
	  been received
	- There is still unacked data on route to the remote host
	- The packet does not contain a window update

The packet may be dropped without processing the FIN flag.

PR:		kern/99188
Submitted by:	Staffan Ulfberg <staffan@ulfberg.se>
Discussed with:	andre
MFC after:	never
2013-12-02 03:11:25 +00:00
Michael Tuexen
c302aeb123 In
http://svnweb.freebsd.org/changeset/base/258221
I introduced a bug which initialized global locks
whenever the SCTP stack initialized. This was fixed in
http://svnweb.freebsd.org/changeset/base/258574
by rodrigc@. He just initialized the locks for
the default vnet. This fix reverts to the old
behaviour before r258221, which explicitly makes
sure it is only called once, because this works also on
other platforms.
MFC after: 3 days
X-MFC with: r258574.
2013-11-30 12:51:19 +00:00
Andriy Gapon
d9fae5ab88 dtrace sdt: remove the ugly sname parameter of SDT_PROBE_DEFINE
In its stead use the Solaris / illumos approach of emulating '-' (dash)
in probe names with '__' (two consecutive underscores).

Reviewed by:	markj
MFC after:	3 weeks
2013-11-26 08:46:27 +00:00
Adrian Chadd
fa22ce1570 Convert over the TCP probes to use mtod() rather than directly
dereferencing m->m_data.

Sponsored by:	Netflix, Inc.
2013-11-25 22:55:06 +00:00
Craig Rodrigues
c0c61281b4 Only initialize some mutexes for the default VNET.
In r208160, sctp_it_ctl was made a global variable, across all VNETs.
However, sctp_init() is called for every VNET that is created.  This results
in the same global mutexes which are part of sctp_it_ctl being initialized.  This can result
in crashes if many jails are created.

To reproduce the problem:
  (1)  Take a GENERIC kernel config, and add options for: VIMAGE, WITNESS,
       INVARIANTS.
  (2)  Run this command in a loop:
       jail -l -u root -c path=/ name=foo persist vnet && jexec foo ifconfig lo0 127.0.0.1/8 && jail -r foo

       (see http://lists.freebsd.org/pipermail/freebsd-current/2010-November/021280.html )

Witness will warn about the same mutex being initialized.

Fix the problem by only initializing these mutexes in the default VNET.
2013-11-25 18:49:37 +00:00
Attilio Rao
54366c0bd7 - For kernel compiled only with KDTRACE_HOOKS and not any lock debugging
option, unbreak the lock tracing release semantic by embedding
  calls to LOCKSTAT_PROFILE_RELEASE_LOCK() direclty in the inlined
  version of the releasing functions for mutex, rwlock and sxlock.
  Failing to do so skips the lockstat_probe_func invokation for
  unlocking.
- As part of the LOCKSTAT support is inlined in mutex operation, for
  kernel compiled without lock debugging options, potentially every
  consumer must be compiled including opt_kdtrace.h.
  Fix this by moving KDTRACE_HOOKS into opt_global.h and remove the
  dependency by opt_kdtrace.h for all files, as now only KDTRACE_FRAMES
  is linked there and it is only used as a compile-time stub [0].

[0] immediately shows some new bug as DTRACE-derived support for debug
in sfxge is broken and it was never really tested.  As it was not
including correctly opt_kdtrace.h before it was never enabled so it
was kept broken for a while.  Fix this by using a protection stub,
leaving sfxge driver authors the responsibility for fixing it
appropriately [1].

Sponsored by:	EMC / Isilon storage division
Discussed with:	rstone
[0] Reported by:	rstone
[1] Discussed with:	philip
2013-11-25 07:38:45 +00:00
Gleb Smirnoff
c1f7c3f500 In r257692 I intentionally deleted code that handled P2P interfaces
with equal addresses on both sides. It appeared that OpenVPN uses
such configutations.

Submitted by:	trociny
2013-11-17 15:14:07 +00:00
Mikolaj Golub
a3985bdd12 Deregister helper hooks on vnet destroy. 2013-11-17 15:09:39 +00:00
Michael Tuexen
2a44dbf682 Use SCTP_PR_SCTP_TTL when the user provides a positive
timetolive in sctp_sendmsg().

MFC after: 3 days
2013-11-16 19:57:56 +00:00
Michael Tuexen
04194e4f7d Remove a stray write operation.
MFC after: 3 days
2013-11-16 16:09:09 +00:00
Michael Tuexen
dcb3fc4cd6 When determining if an address belongs to an stcb, take the address family
into account for wildcard bound endpoints.

MFC after: 3 days
2013-11-16 15:34:14 +00:00
Michael Tuexen
f4f34bde23 Cleanups which result in fixes which have been made upstream
and where partially suggested by Andrew Galante.
There is no functional change in FreeBSD.

MFC after: 3 days
2013-11-16 15:04:49 +00:00
Gleb Smirnoff
555036b5f6 Remove never used ioctls that originate from KAME. The proof
of their zero usage was exp-run from misc/183538.
2013-11-11 05:39:42 +00:00
Gleb Smirnoff
2f3eb7f4d8 Make TCP_KEEP* socket options readable. At least PostgreSQL wants
to read the values.

Reported by:	sobomax
2013-11-08 13:04:14 +00:00
Michael Tuexen
de72f4e54b Get rid of the artification limitation enforced by
SCTP_AUTH_RANDOM_SIZE_MAX.
This was suggested by Andrew Galante.

MFC after: 3 days
2013-11-07 18:50:11 +00:00
Michael Tuexen
a9d94d290b Make sure that we don't try to build an ASCONF-ACK chunk
larger than what fits in the the mbuf cluster.
This issue was reported by Andrew Galante.

MFC after: 3 days
2013-11-07 17:08:09 +00:00
Michael Tuexen
c9eb4473b4 Use htons()/ntohs() appropriately.
These issues were reported by Andrew Galante.

MFC after: 3 days
2013-11-07 16:37:12 +00:00
Gleb Smirnoff
77b89ad837 Provide compat layer for OSIOCAIFADDR. 2013-11-06 19:46:20 +00:00
Gleb Smirnoff
821b5caf7a Fix my braino in r257692. For SIOCG*ADDR we don't need exact match on
specified address, actually in most cases the address isn't specified.

Reported by:	peter
2013-11-06 08:36:08 +00:00
Nathan Whitehorn
6224cd89c0 Fix build on GCC. 2013-11-06 01:14:00 +00:00
Gleb Smirnoff
fe9bfbcf5a netinet code no longer uses IFA_RTSELF. 2013-11-05 07:45:20 +00:00
Gleb Smirnoff
f7a39160c2 Rewrite in_control(), so that it is comprehendable without getting mad.
o Provide separate functions for SIOCAIFADDR and for SIOCDIFADDR, with
  clear code flow from beginning to the end. After that the rest of
  in_control() gets very small and clear.
o Provide sx(9) lock to protect against parallel ioctl() invocations.
o Reimplement logic from r201282, that tried to keep localhost route in
  table when multiple P2P interfaces with same local address are created
  and deleted.

Discussed with:		pluknet, melifaro
Sponsored by:		Netflix
Sponsored by:		Nginx, Inc.
2013-11-05 07:44:15 +00:00
Gleb Smirnoff
b1b9dcae46 Remove net.link.ether.inet.useloopback sysctl tunable. It was always on by
default from the very beginning. It was placed in wrong namespace
net.link.ether, originally it had been at another wrong namespace. It was
incorrectly documented at incorrect manual page arp(8). Since new-ARP commit,
the tunable have been consulted only on route addition, and ignored on route
deletion. Behaviour of a system with tunable turned off is not fully correct,
and has no advantages comparing to normal behavior.
2013-11-05 07:32:09 +00:00
Michael Tuexen
3b3d05d769 Unlock the lock before destroying it.
This issue was reported by Andrew Galante.

MFC after: 3 days
2013-11-03 14:00:17 +00:00
Michael Tuexen
b54ddf225f Changes from upstream to improve compilation when INET or INET6
or none of them is defined.

MFC after: 3 days
2013-11-02 20:12:19 +00:00
Gleb Smirnoff
586904c22e in_ifadown() can be void. 2013-11-01 10:29:10 +00:00
Gleb Smirnoff
237bf7f773 Cleanup in_ifscrub(), which is just an entry to in_scrubprefix(). 2013-11-01 10:18:41 +00:00
Michael Tuexen
6ed728108a Terminate a debug output with a \n. 2013-10-29 20:04:50 +00:00
Gleb Smirnoff
8d7cf9b5d4 Uninline inm_lookup_locked(). Now in_var.h doesn't dereference
fields of struct ifnet.

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2013-10-29 11:21:31 +00:00
Michael Tuexen
92dfa76cbc Fis the value of *optlen when calling getsockopt() for
SCTP_REMOTE_UDP_ENCAPS_PORT.
This issue was reported by Andrew Galante.
MFC after: 3 days
2013-10-28 20:45:19 +00:00
Michael Tuexen
daac3e7db6 Fix compilation if SCTP_DONT_DO_PRIVADDR_SCOPE is defined.
The issue was reported by Andrew Galante.

MFC after: 3 days
2013-10-28 20:32:37 +00:00
Gleb Smirnoff
c3322cb91c Include necessary headers that now are available due to pollution
via if_var.h.

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2013-10-28 07:29:16 +00:00
Gleb Smirnoff
eedc7fd9e8 Provide includes that are needed in these files, and before were read
in implicitly via if.h -> if_var.h pollution.

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2013-10-26 18:18:50 +00:00
Gleb Smirnoff
76039bc84f The r48589 promised to remove implicit inclusion of if_var.h soon. Prepare
to this event, adding if_var.h to files that do need it. Also, include
all includes that now are included due to implicit pollution via if_var.h

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2013-10-26 17:58:36 +00:00
John Baldwin
3380883230 Finish r254925 and remove the last remaining sysctl name list macro. The
one port that used it has been fixed to use the more portable
getprotoent(3) instead.
2013-10-23 13:22:50 +00:00
Andre Oppermann
c1e5a6e5e8 The TCP delayed ACK logic isn't aware of LRO passing up large aggregated
segments thinking it received only one segment. This causes it to enable
the delay the ACK for 100ms to wait for another segment which may never
come because all the data was received already.

Doing delayed ACK for LRO segments is bogus for two reasons: a) it pushes
us further away from acking every other packet; b) it introduces additional
delay in responding to the sender.  The latter is especially bad because it
is in the nature of LRO to aggregated all segments of a burst with no more
coming until an ACK is sent back.

Change the delayed ACK logic to detect LRO segments by being larger than
the MSS for this connection and issuing an immediate ACK for them to keep
the ACK clock ticking without interruption.

Reported by:	julian, cperciva
Tested by:	cperciva
Reviewed by:	lstewart
MFC after:	3 days
2013-10-22 18:24:34 +00:00
Kevin Lo
9768475eb0 - Add parentheses to all internet addresses
- All the casts to uint32_t should be to in_addr_t

Suggested by:	bde
Reviewed by:	bde
2013-10-19 18:13:32 +00:00
Michael Tuexen
77dabf96d9 Remove a buggy comparision when setting manually the path MTU.
After fixing, the comparision would have become redundant.
Thanks to Andrew Galante for reporting the issue.

MFC after:	3 days
2013-10-15 20:21:27 +00:00
Gleb Smirnoff
7caf4ab7ac - Utilize counter(9) to accumulate statistics on interface addresses. Add
four counters to struct ifaddr. This kills '+=' on a variables shared
  between processors for every packet.
- Nuke struct if_data from struct ifaddr.
- In ip_input() do not put a reference on ifaddr, instead update statistics
  right now in place and do IN_IFADDR_RUNLOCK(). These removes atomic(9)
  for every packet. [1]
- To properly support NET_RT_IFLISTL sysctl used by getifaddrs(3), in
  rtsock.c fill if_data fields using counter_u64_fetch().
- Accidentially fix bug in COMPAT_32 version of NET_RT_IFLISTL, which
  took if_data not from the ifaddr, but from ifaddr's ifnet. [2]

Submitted by:	melifaro [1], pluknet[2]
Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2013-10-15 11:37:57 +00:00
Gleb Smirnoff
4675896098 Remove ifa_init() and provide ifa_alloc() that will allocate and setup
struct ifaddr internally.

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2013-10-15 10:31:42 +00:00
Gleb Smirnoff
6ed910fabe Hide 'struct ifaddr' definition from userland. Two tools left that use it,
namely ipftest(1) and ifmcstat(1). These sniff structure definition using
_WANT_IFADDR define.

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2013-10-15 10:19:24 +00:00
Kevin Lo
b298a86678 Treat INADDR_NONE as uint32_t.
Reviewed by:	glebius
2013-10-15 07:35:39 +00:00
Gleb Smirnoff
c11a15bf8d When processing ACK in tcp_do_segment, use sbcut_locked() instead of
sbdrop_locked() to cut acked mbufs from the socket buffer. Free this
chain a batch manner after the socket buffer lock is dropped.

This measurably reduces contention on socket buffer.

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
Approved by:	re (marius)
2013-10-09 12:00:38 +00:00
Mark Johnston
8298c17c6c Add a separate translator for headers passed to the TCP probes in the
input path. These probes get some of the fields in host order, whereas the
output probes get them in network order, so a single translator isn't
enough. This workaround ensures that the problem is essentially invisble
to users: none of the probe arguments or their fields have changed.

Approved by:	re (hrs)
2013-10-02 17:14:12 +00:00
Bjoern A. Zeeb
a5f44cd7a1 Introduce spares in the TCP syncache and timewait structures
so that fixed TCP_SIGNATURE handling can later be merged.

This is derived from follow-up work to SVN r183001 posted to
net@ on Sep 13 2008.

Approved by:	re (gjb)
2013-09-21 10:01:51 +00:00
Mikolaj Golub
4d3dfd450a Unregister inet/inet6 pfil hooks on vnet destroy.
Discussed with:	andre
Approved by:	re (rodrigc)
2013-09-13 18:45:10 +00:00
Michael Tuexen
5dc80df9c5 Fix the aborting of association with the iterator using an empty
user initiated error cause (using SCTP_ABORT|SCTP_SENDALL).

Approved by: re (delphij)
MFC after: 1 week
2013-09-09 21:40:07 +00:00
Mikolaj Golub
1f6addd92c Relese the interface in the last.
Reviewed by:	glebius
Approved by:	re (kib)
2013-09-08 18:19:40 +00:00
Michael Tuexen
d4d23375d3 When computing the partial delivery point, take the
receiver socket buffer size correctly into account.

MFC after: 1 week
2013-09-07 00:45:24 +00:00
John Baldwin
86d93a15ff Use LIST_FOREACH_SAFE() instead of doing it by hand. 2013-09-05 14:26:37 +00:00
John Baldwin
fa302f207f Use an unsigned long when indexing into mfchashtbl[] and mf6ctable[]. This
matches the types used when computing hash indices and the type of the
maximum size of mfchashtbl[].

PR:		kern/181821
Submitted by:	Sven-Thorsten Dietrich <sven@vyatta.com> (IPv4)
MFC after:	1 week
2013-09-05 14:16:37 +00:00
Andrey V. Elsukov
d983befd2f Remove unused code and sort variables declarations.
PR:		kern/181822
MFC after:	1 week
2013-09-05 08:12:36 +00:00
Michael Tuexen
0ddb429900 Remove redundant field pr_sctp_on.
MFC after: 1 week
2013-09-03 19:31:59 +00:00
Michael Tuexen
a28c9ff0b7 Use uint16_t instead of in_port_t for consistency with the SCTP code.
MFC after: 1 week
2013-09-02 23:27:53 +00:00
Michael Tuexen
e6b2b4b65b All changes affect only SCTP-AUTH:
* Remove non working code related to SHA224.
* Remove support for non-standardised HMAC-IDs using SHA384 and SHA512.
* Prefer SHA256 over SHA1.
* Minor cleanup.

MFC after: 2 weeks
2013-09-02 22:48:41 +00:00
Navdeep Parhar
7127e6acf0 Merge r254336 from user/np/cxl_tuning.
Add a last-modified timestamp to each LRO entry and provide an interface
to flush all inactive entries.  Drivers decide when to flush and what
the inactivity threshold should be.

Network drivers that process an rx queue to completion can enter a
livelock type situation when the rate at which packets are received
reaches equilibrium with the rate at which the rx thread is processing
them.  When this happens the final LRO flush (normally when the rx
routine is done) does not occur.  Pure ACKs and segments with total
payload < 64K can get stuck in an LRO entry.  Symptoms are that TCP
tx-mostly connections' performance falls off a cliff during heavy,
unrelated rx on the interface.

Flushing only inactive LRO entries works better than any of these
alternates that I tried:
- don't LRO pure ACKs
- flush _all_ LRO entries periodically (every 'x' microseconds or every
  'y' descriptors)
- stop rx processing in the driver periodically and schedule remaining
  work for later.

Reviewed by:	andre
2013-08-28 23:00:34 +00:00
John Baldwin
fd77bbb967 Remove most of the remaining sysctl name list macros. They were only
ever intended for use in sysctl(8) and it has not used them for many
years.

Reviewed by:	bde
Tested by:	exp-run by bdrewery
2013-08-26 18:16:05 +00:00
Mark Johnston
1ad19fb657 The second last argument of udp:::receive is supposed to contain the
connection state, not the IP header.

X-MFC with:	r254889
2013-08-26 00:28:57 +00:00
Mark Johnston
57f6086735 Implement the ip, tcp, and udp DTrace providers. The probe definitions use
dynamic translation so that their arguments match the definitions for
these providers in Solaris and illumos. Thus, existing scripts for these
providers should work unmodified on FreeBSD.

Tested by:	gnn, hiren
MFC after:	1 month
2013-08-25 21:54:41 +00:00
Michael Tuexen
1a94cdbea7 Provide human readable debug output. 2013-08-25 12:44:03 +00:00
Andre Oppermann
9850f95989 For now limit printf(9) %x of the 64bit pkthdr.csum_flags field to 32bits.
The upper 32bits are not occupied for now.

Sponsored by:	The FreeBSD Foundation
2013-08-25 09:49:00 +00:00
Andre Oppermann
1b4381afbb Restructure the mbuf pkthdr to make it fit for upcoming capabilities and
features.  The changes in particular are:

o Remove rarely used "header" pointer and replace it with a 64bit protocol/
  layer specific union PH_loc for local use.  Protocols can flexibly overlay
  their own 8 to 64 bit fields to store information while the packet is
  worked on.

o Mechanically convert IP reassembly, IGMP/MLD and ATM to use pkthdr.PH_loc
  instead of pkthdr.header.

o Extend csum_flags to 64bits to allow for additional future offload
  information to be carried (e.g. iSCSI, IPsec offload, and others).

o Move the RSS hash type enumerator from abusing m_flags to its own 8bit
  rsstype field.  Adjust accessor macros.

o Add cosqos field to store Class of Service / Quality of Service information
  with the packet.  It is not yet supported in any drivers but allows us to
  get on par with Cisco/Juniper in routing applications (plus MPLS QoS) with
  a modernized ALTQ.

o Add four 8 bit fields l[2-5]hlen to store the relative header offsets
  from the start of the packet.  This is important for various offload
  capabilities and to relieve the drivers from having to parse the packet
  and protocol headers to find out location of checksums and other
  information.  Header parsing in drivers is a lot of copy-paste and
  unhandled corner cases which we want to avoid.

o Add another flexible 64bit union to map various additional persistent
  packet information, like ether_vtag, tso_segsz and csum fields.
  Depending on the csum_flags settings some fields may have different usage
  making it very flexible and adaptable to future capabilities.

o Restructure the CSUM flags to better signify their outbound (down the
  stack) and inbound (up the stack) use.  The CSUM flags used to be a bit
  chaotic and rather poorly documented leading to incorrect use in many
  places.  Bring clarity into their use through better naming.
  Compatibility mappings are provided to preserve the API.  The drivers
  can be corrected one by one and MFC'd without issue.

o The size of pkthdr stays the same at 48/56bytes (32/64bit architectures).

Sponsored by:	The FreeBSD Foundation
2013-08-24 19:51:18 +00:00
Michael Tuexen
6be15a24c4 Export the inpcb features as a 64-bit entity.
Bump __FreeBSD_version to 1000048 since the
modified structure is user visible and used
by netstat, for example.
2013-08-22 20:29:57 +00:00
Michael Tuexen
06c9f9bddf Make also the features of the association 64-bit.
When exporting to xinpcb, just export the lower
32-bit. Using there also 64-bits will break the
ABI and will be committed separetly.

MFC after: 2 weeks
X-MFC with: 254248
2013-08-22 19:28:13 +00:00
Xin LI
acde2476c4 Fix an integer overflow in computing the size of a temporary buffer
can result in a buffer which is too small for the requested
operation.

Security:	CVE-2013-3077
Security:	FreeBSD-SA-13:09.ip_multicast
2013-08-22 00:51:37 +00:00
Andre Oppermann
5fc98a7895 Reorder the mbuf defines to make more sense and group related flags
together.

Add M_FLAG_PRINTF for use with printf(9) %b indentifier.

Use the generic mbuf flags print names in the net80211 code and adjust
the protocol specific bits for their new positions.

Change SCTP M_PROTO mapping from 5 to 1 to fit within the 16bit field
they use internally to store some additional information.

Discussed with:	trociny, glebius
2013-08-19 14:25:11 +00:00
Andre Oppermann
86bd049144 Add m_clrprotoflags() to clear protocol specific mbuf flags at up and
downwards layer crossings.

Consistently use it within IP, IPv6 and ethernet protocols.

Discussed with:	trociny, glebius
2013-08-19 13:27:32 +00:00
Andre Oppermann
678d7b9461 Move the SCTP specific definition of M_NOTIFICATION onto a protocol
specific mbuf flag from sys/mbuf.h to netinet/sctp_os_bsd.h.  It is
only relevant within SCTP.

Discussed with:	tuexen
2013-08-19 12:30:18 +00:00
Andre Oppermann
88388bdcbe Move the global M_SKIP_FIREWALL mbuf flags to a protocol layer specific
flag instead.  The flag is only used within the IP and IPv6 layer 3
protocols.

Because some firewall packages treat IPv4 and IPv6 packets the same the
flag should have the same value for both.

Discussed with:	trociny, glebius
2013-08-19 11:08:36 +00:00
Andre Oppermann
b09dc7e328 Move ip_reassemble()'s use of the global M_FRAG mbuf flag to a protocol layer
specific flag instead.  The flag is only relevant while the packet stays in
the IP reassembly queue.

Discussed with:	trociny, glebius
2013-08-19 10:34:10 +00:00
Andre Oppermann
fb86dfcd2f Remove unused M_FRAG, M_FIRSTFRAG and M_LASTFRAG tagging from ip_fragment().
There wasn't any real driver (and hardware) support for it.  Modern hardware
does full fragmentation/segmentation offload instead.
2013-08-19 10:30:15 +00:00
Mark Johnston
7b77e1fe0f Specify SDT probe argument types in the probe definition itself rather than
using SDT_PROBE_ARGTYPE(). This will make it easy to extend the SDT(9) API
to allow probes with dynamically-translated types.

There is no functional change.

MFC after:	2 weeks
2013-08-15 04:08:55 +00:00
Michael Tuexen
0e05fbded9 Don't send uninitialized memory (two instances of 4 bytes) in
every cookie on the wire. This bug was reported in
https://bugzilla.mozilla.org/show_bug.cgi?id=905080

MFC after: 3 days
2013-08-14 21:51:32 +00:00
Mikolaj Golub
c5c392e7ed Virtualize carp(4) variables to have per vnet control.
Reviewed by:	ae, glebius
2013-08-13 19:59:49 +00:00
Michael Tuexen
2c9c61defa Make the features a 64-bit value instead of 32-bit.
This will allow an easier integration of the support
for NDATA.
While there, do also some minor cleanups.
Obtained from:	rrs@
MFC after: 2 weeks
2013-08-12 13:52:15 +00:00
Michael Tuexen
bfd1666aad Micro-optimization suggested in
https://bugzilla.mozilla.org/show_bug.cgi?id=898234
by pchang9. While there simplify the code.

MFC after: 1 week
2013-08-01 12:05:23 +00:00
Andrey V. Elsukov
6794f46021 Remove the large part of struct ipsecstat. Only few fields of this
structure is used, but they already have equal fields in the struct
newipsecstat, that was introduced with FAST_IPSEC and then was merged
together with old ipsecstat structure.

This fixes kernel stack overflow on some architectures after migration
ipsecstat to PCPU counters.

Reported by:	Taku YAMAMOTO, Maciej Milewski
2013-07-23 14:14:24 +00:00
Michael Tuexen
88a95b1f25 Allow the code to be compiled without warnings for any combination
of INET, INET6 and SCTP_DEBUG defines.
The issue was reported by Lally Singh.

MFC after: 2 weeks
2013-07-20 13:14:59 +00:00
Michael Tuexen
da24cfcb35 Get the code compiling without INET and INET6 being defined.
This is not possible in FreeBSD, but in the upstream code.

MFC after: 2 weeks
2013-07-19 21:16:59 +00:00
Andre Oppermann
ccd040ab18 Free the non-fatal "timestamp missing" debug string manually as it is
not covered by the catch-all free for the error cases.

Found by:	Coverity
2013-07-16 16:37:08 +00:00
Mikolaj Golub
f122b319eb A complete duplication of binding should be allowed if on both new and
duplicated sockets a multicast address is bound and either
SO_REUSEPORT or SO_REUSEADDR is set.

But actually it works for the following combinations:

  * SO_REUSEPORT is set for the fist socket and SO_REUSEPORT for the new;
  * SO_REUSEADDR is set for the fist socket and SO_REUSEADDR for the new;
  * SO_REUSEPORT is set for the fist socket and SO_REUSEADDR for the new;

and fails for this:

  * SO_REUSEADDR is set for the fist socket and SO_REUSEPORT for the new.

Fix the last case.

PR:		179901
MFC after:	1 month
2013-07-12 19:08:33 +00:00
Andre Oppermann
10c982958c Unbreak VIMAGE by correctly naming the vnet pointer in struct tcp_syncache.
Reported by:	trociny, rodrigc
2013-07-12 07:43:56 +00:00
Andre Oppermann
81d392a09d Improve SYN cookies by encoding the MSS, WSCALE (window scaling) and SACK
information into the ISN (initial sequence number) without the additional
use of timestamp bits and switching to the very fast and cryptographically
strong SipHash-2-4 MAC hash algorithm to protect the SYN cookie against
forgeries.

The purpose of SYN cookies is to encode all necessary session state in
the 32 bits of our initial sequence number to avoid storing any information
locally in memory.  This is especially important when under heavy spoofed
SYN attacks where we would either run out of memory or the syncache would
fill with bogus connection attempts swamping out legitimate connections.

The original SYN cookies method only stored an indexed MSS values in the
cookie.  This isn't sufficient anymore and breaks down in the presence of
WSCALE information which is only exchanged during SYN and SYN-ACK.  If we
can't keep track of it then we may severely underestimate the available
send or receive window. This is compounded with large windows whose size
information on the TCP segment header is even lower numerically.  A number
of years back SYN cookies were extended to store the additional state in
the TCP timestamp fields, if available on a connection.  While timestamps
are common among the BSD, Linux and other *nix systems Windows never enabled
them by default and thus are not present for the vast majority of clients
seen on the Internet.

The common parameters used on TCP sessions have changed quite a bit since
SYN cookies very invented some 17 years ago.  Today we have a lot more
bandwidth available making the use window scaling almost mandatory.  Also
SACK has become standard making recovering from packet loss much more
efficient.

This change moves all necessary information into the ISS removing the need
for timestamps.  Both the MSS (16 bits) and send WSCALE (4 bits) are stored
in 3 bit indexed form together with a single bit for SACK.  While this is
significantly less than the original range, it is sufficient to encode all
common values with minimal rounding.

The MSS depends on the MTU of the path and with the dominance of ethernet
the main value seen is around 1460 bytes.  Encapsulations for DSL lines
and some other overheads reduce it by a few more bytes for many connections
seen.  Rounding down to the next lower value in some cases isn't a problem
as we send only slightly more packets for the same amount of data.

The send WSCALE index is bit more tricky as rounding down under-estimates
the available send space available towards the remote host, however a small
number values dominate and are carefully selected again.

The receive WSCALE isn't encoded at all but recalculated based on the local
receive socket buffer size when a valid SYN cookie returns.  A listen socket
buffer size is unlikely to change while active.

The index values for MSS and WSCALE are selected for minimal rounding errors
based on large traffic surveys.  These values have to be periodically
validated against newer traffic surveys adjusting the arrays tcp_sc_msstab[]
and tcp_sc_wstab[] if necessary.

In addition the hash MAC to protect the SYN cookies is changed from MD5
to SipHash-2-4, a much faster and cryptographically secure algorithm.

Reviewed by:	dwmalone
Tested by:	Fabian Keil <fk@fabiankeil.de>
2013-07-11 15:29:25 +00:00
Andre Oppermann
07dacf031e Extend debug logging of TCP timestamp related specification
violations.

Update related comments and style.
2013-07-10 12:06:01 +00:00
Michael Tuexen
e5aeb83c42 Use IPSECSTAT_INC() and IPSEC6STAT_INC() macros for ipsec statistics
accounting.

X-MFC with: r252026
2013-07-09 14:38:26 +00:00
Andrey V. Elsukov
69edf037d7 Migrate struct carpstats to PCPU counters. 2013-07-09 10:02:51 +00:00
Andrey V. Elsukov
2841260cd6 Migrate structs in6_ifstat and icmp6_ifstat to PCPU counters. 2013-07-09 09:59:46 +00:00
Andrey V. Elsukov
a786f67981 Migrate structs ip6stat, icmp6stat and rip6stat to PCPU counters. 2013-07-09 09:54:54 +00:00
Andrey V. Elsukov
5b7cb97c2b Migrate structs arpstat, icmpstat, mrtstat, pimstat and udpstat to PCPU
counters.
2013-07-09 09:50:15 +00:00
Andrey V. Elsukov
5da0521fce Use new macros to implement ipstat and tcpstat using PCPU counters.
Change interface of kread_counters() similar ot kread() in the netstat(1).
2013-07-09 09:43:03 +00:00
Andrey V. Elsukov
c80211e3cf Prepare network statistics structures for migration to PCPU counters.
Use uint64_t as type for all fields of structures.

Changed structures: ahstat, arpstat, espstat, icmp6_ifstat, icmp6stat,
in6_ifstat, ip6stat, ipcompstat, ipipstat, ipsecstat, mrt6stat, mrtstat,
pfkeystat, pim6stat, pimstat, rip6stat, udpstat.

Discussed with:	arch@
2013-07-09 09:32:06 +00:00
Michael Tuexen
ee1ccd9258 Fix a bug were only 2048 streams where usable even though more than
2048 streams were negotiated on the wire. While there, remove the
hard coded limit of 2048 streams.

MFC after: 3 days
2013-07-05 10:08:49 +00:00
Michael Tuexen
5db47b3def When processing an incoming ABORT, SHUTDOWN_COMPLETE or ERROR (NAT related)
chunk, take always the T-bit into account, when checking the verification
tag.

MFC after: 3 days
2013-07-04 19:47:46 +00:00
Mikolaj Golub
efdf104bca In r227207, to fix the issue with possible NULL inp_socket pointer
dereferencing, when checking for SO_REUSEPORT option (and SO_REUSEADDR
for multicast), INP_REUSEPORT flag was introduced to cache the socket
option.  It was decided then that one flag would be enough to cache
both SO_REUSEPORT and SO_REUSEADDR: when processing SO_REUSEADDR
setsockopt(2), it was checked if it was called for a multicast address
and INP_REUSEPORT was set accordingly.

Unfortunately that approach does not work when setsockopt(2) is called
before binding to a multicast address: the multicast check fails and
INP_REUSEPORT is not set.

Fix this by adding INP_REUSEADDR flag to unconditionally cache
SO_REUSEADDR.

PR:		179901
Submitted by:	Michael Gmelin freebsd grem.de (initial version)
Reviewed by:	rwatson
MFC after:	1 week
2013-07-04 18:38:00 +00:00
Michael Tuexen
56f778aadf Code cleanups.
MFC after: 3 days
2013-07-03 18:48:43 +00:00
Navdeep Parhar
e364d8c44a Catch up with r238990. LLE_DELETED does not clobber everything else in
la_flags since said revision.
2013-07-03 17:27:32 +00:00
Hiroki Sato
e32d93954d Fix a panic when leaving MC group in a kernel with VIMAGE enabled.
in_leavegroup() is called from an asynchronous task, and
igmp_change_state() requires that curvnet is set by the caller.
2013-07-02 16:39:12 +00:00
Lawrence Stewart
92a0637f73 Import an implementation of the CAIA Delay-Gradient (CDG) congestion control
algorithm, which is based on the 2011 v0.1 patch release and described in the
paper "Revisiting TCP Congestion Control using Delay Gradients" by David Hayes
and Grenville Armitage. It is implemented as a kernel module compatible with the
modular congestion control framework.

CDG is a hybrid congestion control algorithm which reacts to both packet loss
and inferred queuing delay. It attempts to operate as a delay-based algorithm
where possible, but utilises heuristics to detect loss-based TCP cross traffic
and will compete effectively as required. CDG is therefore incrementally
deployable and suitable for use on shared networks.

In collaboration with:	David Hayes <david.hayes at ieee.org> and
		Grenville Armitage <garmitage at swin edu au>
MFC after:	4 days
Sponsored by:	Cisco University Research Program and FreeBSD Foundation
2013-07-02 08:44:56 +00:00
Gleb Smirnoff
42a253e6a1 Fix kmod_*stat_inc() after r249276. The incorrect code actually
increased the pointer, not the memory it points to.

In collaboration with:	kib
Reported & tested by:	Ian FREISLICH <ianf clue.co.za>
Sponsored by:		Nginx, Inc.
2013-06-21 06:36:26 +00:00
Andrey V. Elsukov
6659296cb0 Use IPSECSTAT_INC() and IPSEC6STAT_INC() macros for ipsec statistics
accounting.

MFC after:	2 weeks
2013-06-20 09:55:53 +00:00