5263 Commits

Author SHA1 Message Date
sbruno
79bb3d3455 Implement PLPMTUD blackhole detection (RFC 4821), inspired by code
from xnu sources.  If we encounter a network where ICMP is blocked
the Needs Frag indicator may not propagate back to us.  Attempt to
downshift the mss once to a preconfigured value.

Default this feature to off for now while we do not have a full PLPMTUD
implementation in our stack.

Adds the following new sysctl's for control:
net.inet.tcp.pmtud_blackhole_detection -- turns on/off this feature
net.inet.tcp.pmtud_blackhole_mss       -- mss to try for ipv4
net.inet.tcp.v6pmtud_blackhole_mss     -- mss to try for ipv6

Adds the following new sysctl's for monitoring:
-- Number of times the code was activated to attempt a mss downshift
net.inet.tcp.pmtud_blackhole_activated
-- Number of times the blackhole mss was used in an attempt to downshift
net.inet.tcp.pmtud_blackhole_min_activated
-- Number of times that we failed to connect after we downshifted the mss
net.inet.tcp.pmtud_blackhole_failed

Phabricator:	https://reviews.freebsd.org/D506
Reviewed by:	rpaulo bz
MFC after:	2 weeks
Relnotes:	yes
Sponsored by:	Limelight Networks
2014-10-07 21:50:28 +00:00
melifaro
bbf0fe2f55 Sync to HEAD@r272609. 2014-10-06 11:29:50 +00:00
hselasky
f1f39047b6 Minor code styling.
Suggested by:	glebius @
2014-10-06 06:19:54 +00:00
tuexen
7384b54f43 Remove unused MC_ALIGN macro as suggested by Robert.
MFC after: 1 week
2014-10-05 20:30:49 +00:00
rwatson
b363d98dc7 Eliminate use of M_EXT in IP6_EXTHDR_CHECK() by trimming a redundant
'if'/'else' case: it matches the simple 'else' case that follows.

This reduces awareness of external-storage mechanics outside of the
mbuf allocator.

Reviewed by:	bz
MFC after:	3 days
Sponsored by:	EMC / Isilon Storage Division
Differential Revision:	https://reviews.freebsd.org/D900
2014-10-05 06:28:53 +00:00
melifaro
e8d559896c Sync to HEAD@r272516. 2014-10-04 12:42:37 +00:00
hrs
43ddffd06b Add an additional routing table lookup when m->m_pkthdr.fibnum is changed
at a PFIL hook in ip{,6}_output().  IPFW setfib rule did not perform
a routing table lookup when the destination address was not changed.

CR:	D805
2014-10-02 00:25:57 +00:00
markj
77fb3d6bbe Add a sysctl, net.inet.icmp.tstamprepl, which can be used to disable replies
to ICMP Timestamp packets.

PR:		193689
Submitted by:	Anthony Cornehl <accornehl@gmail.com>
MFC after:	3 weeks
Sponsored by:	EMC / Isilon Storage Division
2014-10-01 18:07:34 +00:00
melifaro
d8b683d70f Remove lock init from radix.c.
Radix has never managed its locking itself.
The only consumer using radix with embeded rwlock
is system routing table. Move per-AF lock inits there.
2014-10-01 14:39:06 +00:00
tuexen
911b31bff0 The default for UDPLITE_RECV_CSCOV is zero. RFC 3828 recommend
that this means full checksum coverage for received packets.
If an application is willing to accept packets with partial
coverage, it is expected to use the socekt option and provice
the minimum coverage it accepts.

Reviewed by: kevlo
MFC after: 3 days
2014-10-01 05:43:29 +00:00
tuexen
e642e7ce0e UDPLite requires a checksum. Therefore, discard a received packet if
the checksum is 0.

MFC after: 3 days
2014-09-30 20:29:58 +00:00
tuexen
ed4e0bcac5 If the checksum coverage field in the UDPLITE header is the length
of the complete UDPLITE packet, the packet has full checksum coverage.
SO fix the condition.

Reviewed by: kevlo
MFC after: 3 days
2014-09-30 18:17:28 +00:00
jhb
9d6dc35e9d Only define the full inm_print() if KTR_IGMPV3 is enabled at compile time. 2014-09-30 17:26:34 +00:00
tuexen
186c8f6391 Checksum coverage values larger than 65535 for UDPLite are invalid.
Check for this when the user calls setsockopt using UDPLITE_{SEND,RECV}CSCOV.

Reviewed by: kevlo
MFC after: 3 days
2014-09-28 17:22:45 +00:00
melifaro
e58ee21a5e * Split tcp_signature_compute() into 2 pieces:
- tcp_get_sav() - SADB key lookup
 - tcp_signature_do_compute() - actual computation
* Fix TCP signature case for listening socket:
  do not assume EVERY connection coming to socket
  with TCP_SIGNATURE set to be md5 signed regardless
  of SADB key existance for particular address. This
  fixes the case for routing software having _some_
  BGP sessions secured by md5.
* Simplify TCP_SIGNATURE handling in tcp_input()

MFC after:	2 weeks
2014-09-27 07:04:12 +00:00
adrian
34f5e94b25 Remove an un-needed bit of pre-processor work - it all lives inside
#ifdef RSS.
2014-09-27 05:14:02 +00:00
jmg
bb3d6ada62 drop unnecessary ifdef IPSEC's. This file is only compiled when IPSEC
is defined...

Differential Revision:	D839
Reviewed by:	bz, glebius, gnn
Sponsered by:	EuroBSDCon DevSummit
2014-09-26 12:48:54 +00:00
np
ae32468228 Catch up with r271119. 2014-09-24 20:12:40 +00:00
hselasky
bdacf9ba4d Improve transmit sending offload, TSO, algorithm in general.
The current TSO limitation feature only takes the total number of
bytes in an mbuf chain into account and does not limit by the number
of mbufs in a chain. Some kinds of hardware is limited by two
factors. One is the fragment length and the second is the fragment
count. Both of these limits need to be taken into account when doing
TSO. Else some kinds of hardware might have to drop completely valid
mbuf chains because they cannot loaded into the given hardware's DMA
engine. The new way of doing TSO limitation has been made backwards
compatible as input from other FreeBSD developers and will use
defaults for values not set.

Reviewed by:	adrian, rmacklem
Sponsored by:	Mellanox Technologies
MFC after:	1 week
2014-09-22 08:27:27 +00:00
hrs
a313921d44 Add a change missing in r271916. 2014-09-21 04:38:50 +00:00
hrs
ffad09823e - Virtualize interface cloner for gre(4). This fixes a panic when destroying
a vnet jail which has a gre(4) interface.

- Make net.link.gre.max_nesting vnet-local.
2014-09-21 03:56:06 +00:00
glebius
fec1da5cbc Mechanically convert to if_inc_counter(). 2014-09-19 10:19:51 +00:00
glebius
2bb30b3418 Remove disabled code, that is very unlikely to be ever enabled again,
as well as the comment that explains why is it disabled.
2014-09-19 05:23:47 +00:00
asomers
c1b1b15f41 Fix source address selection on unbound sockets in the presence of multiple
fibs. Use the mbuf's or the socket's fib instead of RT_ALL_FIBS. Fixes PR
187553. Also fixes netperf's UDP_STREAM test on a nondefault fib.

sys/netinet/ip_output.c
	In ip_output, lookup the source address using the mbuf's fib instead
	of RT_ALL_FIBS.

sys/netinet/in_pcb.c
	in in_pcbladdr, lookup the source address using the socket's fib,
	because we don't seem to have the mbuf fib. They should be the same,
	though.

tests/sys/net/fibs_test.sh
	Clear the expected failure on udp_dontroute.

PR:		187553
CR:		https://reviews.freebsd.org/D772
MFC after:	3 weeks
Sponsored by:	Spectra Logic
2014-09-16 15:28:19 +00:00
tuexen
996f057a5f Add a explict cast to silence a warning when building
the userland stack on Windows.
This issue was reported by Peter Kasting from Google.

MFC after: 3 days
2014-09-16 14:39:24 +00:00
tuexen
47bba753f6 Use a consistent type for the number of HMAC algorithms.
This fixes a bug which resulted in a warning on the userland
stack, when compiled on Windows.
Thanks to Peter Kasting from Google for reporting the issue and
provinding a potential fix.

MFC after: 3 days
2014-09-16 14:20:33 +00:00
tuexen
d7760b7171 Small cleanup which addresses a warning regaring the truncation
of a 64-bit entity to a 32-bit entity. This issue was reported by
Peter Kasting from Google.

MFC after: 3 days
2014-09-16 13:48:46 +00:00
glebius
f716889a49 FreeBSD-SA-14:19.tcp raised attention to the state of our stack
towards blind SYN/RST spoofed attack.

Originally our stack used in-window checks for incoming SYN/RST
as proposed by RFC793. Later, circa 2003 the RST attack was
mitigated using the technique described in P. Watson
"Slipping in the window" paper [1].

After that, the checks were only relaxed for the sake of
compatibility with some buggy TCP stacks. First, r192912
introduced the vulnerability, just fixed by aforementioned SA.
Second, r167310 had slightly relaxed the default RST checks,
instead of utilizing net.inet.tcp.insecure_rst sysctl.

In 2010 a new technique for mitigation of these attacks was
proposed in RFC5961 [2]. The idea is to send a "challenge ACK"
packet to the peer, to verify that packet arrived isn't spoofed.
If peer receives challenge ACK it should regenerate its RST or
SYN with correct sequence number. This should not only protect
against attacks, but also improve communication with broken
stacks, so authors of reverted r167310 and r192912 won't be
disappointed.

[1] http://bandwidthco.com/whitepapers/netforensics/tcpip/TCP Reset Attacks.pdf
[2] http://www.rfc-editor.org/rfc/rfc5961.txt

Changes made:

o Revert r167310.
o Implement "challenge ACK" protection as specificed in RFC5961
  against RST attack. On by default.
  - Carefully preserve r138098, which handles empty window edge
    case, not described by the RFC.
  - Update net.inet.tcp.insecure_rst description.
o Implement "challenge ACK" protection as specificed in RFC5961
  against SYN attack. On by default.
  - Provide net.inet.tcp.insecure_syn sysctl, to turn off
    RFC5961 protection.

The changes were tested at Netflix. The tested box didn't show
any anomalies compared to control box, except slightly increased
number of TCP connection in LAST_ACK state.

Reviewed by:	rrs
Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2014-09-16 11:07:25 +00:00
tuexen
fe03a49a25 Make a type conversion explicit. When compiling this code on
Windows as part of the SCTP userland stack, this fixes a
warning reported by Peter Kasting from Google.

MFC after: 3 days
2014-09-16 10:57:55 +00:00
delphij
a90097904b Fix Denial of Service in TCP packet processing.
Submitted by:	glebius
Security:	FreeBSD-SA-14:19.tcp
2014-09-16 09:48:24 +00:00
tuexen
cce669e356 The MTU is handled as a 32-bit entity within the SCTP stack.
This was reported by Peter Kasting from Google.

MFC after: 3 days
2014-09-16 09:22:43 +00:00
adrian
3bc90623ca Ensure the correct software IPv4 hash is done based on the configured
RSS parameters, rather than assuming we're hashing IPv4+UDP and IPv4+TCP.
2014-09-16 03:26:42 +00:00
tuexen
da9214b7fd Chunk IDs are 8 bit entities, not 16 bit.
Thanks to Peter Kasting from Google for drawing
my attention to it.

MFC after: 3 days
2014-09-15 19:38:34 +00:00
hrs
0d175c82cb Use generic SYSCTL_* macro instead of deprecated SYSCTL_VNET_*.
Suggested by:	glebius
2014-09-15 14:43:58 +00:00
hrs
0d57c69d7b Make net.inet.ip.sourceroute, net.inet.ip.accept_sourceroute, and
net.inet.ip.process_options vnet-aware.  Revert changes in r271545.

Suggested by:	bz
2014-09-15 07:20:40 +00:00
hselasky
727760a4e4 Revert r271504. A new patch to solve this issue will be made.
Suggested by:	adrian @
2014-09-13 20:52:01 +00:00
hselasky
3d04a989df Improve transmit sending offload, TSO, algorithm in general.
The current TSO limitation feature only takes the total number of
bytes in an mbuf chain into account and does not limit by the number
of mbufs in a chain. Some kinds of hardware is limited by two
factors. One is the fragment length and the second is the fragment
count. Both of these limits need to be taken into account when doing
TSO. Else some kinds of hardware might have to drop completely valid
mbuf chains because they cannot loaded into the given hardware's DMA
engine. The new way of doing TSO limitation has been made backwards
compatible as input from other FreeBSD developers and will use
defaults for values not set.

MFC after:	1 week
Sponsored by:	Mellanox Technologies
2014-09-13 08:26:09 +00:00
asomers
081aa8a15c Revisions 264905 and 266860 added a "int fib" argument to ifa_ifwithnet and
ifa_ifwithdstaddr. For the sake of backwards compatibility, the new
arguments were added to new functions named ifa_ifwithnet_fib and
ifa_ifwithdstaddr_fib, while the old functions became wrappers around the
new ones that passed RT_ALL_FIBS for the fib argument. However, the
backwards compatibility is not desired for FreeBSD 11, because there are
numerous other incompatible changes to the ifnet(9) API. We therefore
decided to remove it from head but leave it in place for stable/9 and
stable/10. In addition, this commit adds the fib argument to
ifa_ifwithbroadaddr for consistency's sake.

sys/sys/param.h
	Increment __FreeBSD_version

sys/net/if.c
sys/net/if_var.h
sys/net/route.c
	Add fibnum argument to ifa_ifwithbroadaddr, and remove the _fib
	versions of ifa_ifwithdstaddr, ifa_ifwithnet, and ifa_ifwithroute.

sys/net/route.c
sys/net/rtsock.c
sys/netinet/in_pcb.c
sys/netinet/ip_options.c
sys/netinet/ip_output.c
sys/netinet6/nd6.c
	Fixup calls of modified functions.

share/man/man9/ifnet.9
	Document changed API.

CR:		https://reviews.freebsd.org/D458
MFC after:	Never
Sponsored by:	Spectra Logic
2014-09-11 20:21:03 +00:00
ae
1576b695b6 Add scope zone id to the in_endpoints and hc_metrics structures.
A non-global IPv6 address can be used in more than one zone of the same
scope. This zone index is used to identify to which zone a non-global
address belongs.

Also we can have many foreign hosts with equal non-global addresses,
but from different zones. So, they can have different metrics in the
host cache.

Obtained from:	Yandex LLC
Sponsored by:	Yandex LLC
2014-09-10 16:26:18 +00:00
ae
9186ea6a59 Make in6_pcblookup_hash_locked and in6_pcbladdr static.
Obtained from:	Yandex LLC
Sponsored by:	Yandex LLC
2014-09-10 13:17:35 +00:00
ae
82d0b71937 Introduce INP6_PCBHASHKEY macro. Replace usage of hardcoded part of
IPv6 address as hash key in all places.

Obtained from:	Yandex LLC
2014-09-10 12:35:42 +00:00
adrian
63bc177a08 Calculate the RSS hash for outbound UDPv4 frames.
Differential Revision:	https://reviews.freebsd.org/D527
Reviewed by:	grehan
2014-09-09 04:19:36 +00:00
adrian
b73995e058 Update the IPv4 input path to handle reassembled frames and incoming frames
with no RSS hash.

When doing RSS:

* Create a new IPv4 netisr which expects the frames to have been verified;
  it just directly dispatches to the IPv4 input path.
* Once IPv4 reassembly is done, re-calculate the RSS hash with the new
  IP and L3 header; then reinject it as appropriate.
* Update the IPv4 netisr to be a CPU affinity netisr with the RSS hash
  function (rss_soft_m2cpuid) - this will do a software hash if the
  hardware doesn't provide one.

NICs that don't implement hardware RSS hashing will now benefit from RSS
distribution - it'll inject into the correct destination netisr.

Note: the netisr distribution doesn't work out of the box - netisr doesn't
query RSS for how many CPUs and the affinity setup.  Yes, netisr likely
shouldn't really be doing CPU stuff anymore and should be "some kind of
'thing' that is a workqueue that may or may not have any CPU affinity";
that's for a later commit.

Differential Revision:	https://reviews.freebsd.org/D527
Reviewed by:	grehan
2014-09-09 04:18:20 +00:00
adrian
e5ddfb30ab Implement IPv4 RSS software hash functions to use during packet ingress
and egress.

* rss_mbuf_software_hash_v4 - look at the IPv4 mbuf to fetch the IPv4 details
  + direction to calculate a hash.
* rss_proto_software_hash_v4 - hash the given source/destination IPv4 address,
  port and direction.
* rss_soft_m2cpuid - map the given mbuf to an RSS CPU ("bucket" for now)

These functions are intended to be used by the stack to support
the following:

* Not all NICs do RSS hashing, so we should support some way of doing
  a hash in software;
* The NIC / driver may not hash frames the way we want (eg UDP 4-tuple
  hashing when the stack is only doing 2-tuple hashing for UDP); so we
  may need to re-hash frames;
* .. same with IPv4 fragments - they will need to be re-hashed after
  reassembly;
* .. and same with things like IP tunneling and such;
* The transmit path for things like UDP, RAW and ICMP don't currently
  have any RSS information attached to them - so they'll need an
  RSS calculation performed before transmit.

TODO:

* Counters! Everywhere!
* Add a debug mode that software hashes received frames and compares them
  to the hardware hash provided by the hardware to ensure they match.

The IPv6 part of this is missing - I'm going to do some re-juggling of
where various parts of the RSS framework live before I add the IPv6
code (read: the IPv6 code is going to go into netinet6/in6_rss.[ch],
rather than living here.)

Note: This API is still fluid.  Please keep that in mind.

Differential Revision:	https://reviews.freebsd.org/D527
Reviewed by:	grehan
2014-09-09 03:10:21 +00:00
adrian
e623d51cd5 Add support for receiving and setting flowtype, flowid and RSS bucket
information as part of recvmsg().

This is primarily used for debugging/verification of the various
processing paths in the IP, PCB and driver layers.

Unfortunately the current implementation of the control message path
results in a ~10% or so drop in UDP frame throughput when it's used.

Differential Revision:	https://reviews.freebsd.org/D527
Reviewed by:	grehan
2014-09-09 01:45:39 +00:00
adrian
aab402c146 Add a flag to ip_output() - IP_NODEFAULTFLOWID - which prevents it from
overriding an existing flowid/flowtype field in the outbound mbuf with
the inp_flowid/inp_flowtype details.

The upcoming RSS UDP support calculates a valid RSS value for outbound
mbufs and since it may change per send, it doesn't cache it in the inpcb.
So overriding it here would be wrong.

Differential Revision:	https://reviews.freebsd.org/D527
Reviewed by:	grehan
2014-09-09 00:19:02 +00:00
melifaro
f7e6823045 Make ipfw_nat module use IP_FW3 codes.
Kernel changes:
* Split kernel/userland nat structures eliminating IPFW_INTERNAL hack.
* Add IP_FW_NAT44_* codes resemblin old ones.
* Assume that instances can be named (no kernel support currently).
* Use both UH+WLOCK locks for all configuration changes.
* Provide full ABI support for old sockopts.

Userland changes:
* Use IP_FW_NAT44_* codes for nat operations.
* Remove undocumented ability to show ranges of nat "log" entries.
2014-09-07 18:30:29 +00:00
tuexen
53f889f7e3 Address warnings generated by the clang analyzer.
MFC after: 1 week
2014-09-07 18:05:37 +00:00
tuexen
c7b009940d Address another warnings reported by Patrick Laimbock when compiling
in userspace. While there, improve consistency.

MFC after: 1 week
2014-09-07 17:07:19 +00:00
tuexen
a20e3eb506 Use union sctp_sockstore instead of struct sockaddr_storage. This
eliminiates some warnings when building in userland.
Thanks to Patrick Laimbock for reporting this issue.
Remove also some unnecessary casts.
There should be no functional change.

MFC after: 1 week
2014-09-07 09:06:26 +00:00