4888 Commits

Author SHA1 Message Date
adrian
63bc177a08 Calculate the RSS hash for outbound UDPv4 frames.
Differential Revision:	https://reviews.freebsd.org/D527
Reviewed by:	grehan
2014-09-09 04:19:36 +00:00
adrian
b73995e058 Update the IPv4 input path to handle reassembled frames and incoming frames
with no RSS hash.

When doing RSS:

* Create a new IPv4 netisr which expects the frames to have been verified;
  it just directly dispatches to the IPv4 input path.
* Once IPv4 reassembly is done, re-calculate the RSS hash with the new
  IP and L3 header; then reinject it as appropriate.
* Update the IPv4 netisr to be a CPU affinity netisr with the RSS hash
  function (rss_soft_m2cpuid) - this will do a software hash if the
  hardware doesn't provide one.

NICs that don't implement hardware RSS hashing will now benefit from RSS
distribution - it'll inject into the correct destination netisr.

Note: the netisr distribution doesn't work out of the box - netisr doesn't
query RSS for how many CPUs and the affinity setup.  Yes, netisr likely
shouldn't really be doing CPU stuff anymore and should be "some kind of
'thing' that is a workqueue that may or may not have any CPU affinity";
that's for a later commit.

Differential Revision:	https://reviews.freebsd.org/D527
Reviewed by:	grehan
2014-09-09 04:18:20 +00:00
adrian
e5ddfb30ab Implement IPv4 RSS software hash functions to use during packet ingress
and egress.

* rss_mbuf_software_hash_v4 - look at the IPv4 mbuf to fetch the IPv4 details
  + direction to calculate a hash.
* rss_proto_software_hash_v4 - hash the given source/destination IPv4 address,
  port and direction.
* rss_soft_m2cpuid - map the given mbuf to an RSS CPU ("bucket" for now)

These functions are intended to be used by the stack to support
the following:

* Not all NICs do RSS hashing, so we should support some way of doing
  a hash in software;
* The NIC / driver may not hash frames the way we want (eg UDP 4-tuple
  hashing when the stack is only doing 2-tuple hashing for UDP); so we
  may need to re-hash frames;
* .. same with IPv4 fragments - they will need to be re-hashed after
  reassembly;
* .. and same with things like IP tunneling and such;
* The transmit path for things like UDP, RAW and ICMP don't currently
  have any RSS information attached to them - so they'll need an
  RSS calculation performed before transmit.

TODO:

* Counters! Everywhere!
* Add a debug mode that software hashes received frames and compares them
  to the hardware hash provided by the hardware to ensure they match.

The IPv6 part of this is missing - I'm going to do some re-juggling of
where various parts of the RSS framework live before I add the IPv6
code (read: the IPv6 code is going to go into netinet6/in6_rss.[ch],
rather than living here.)

Note: This API is still fluid.  Please keep that in mind.

Differential Revision:	https://reviews.freebsd.org/D527
Reviewed by:	grehan
2014-09-09 03:10:21 +00:00
adrian
e623d51cd5 Add support for receiving and setting flowtype, flowid and RSS bucket
information as part of recvmsg().

This is primarily used for debugging/verification of the various
processing paths in the IP, PCB and driver layers.

Unfortunately the current implementation of the control message path
results in a ~10% or so drop in UDP frame throughput when it's used.

Differential Revision:	https://reviews.freebsd.org/D527
Reviewed by:	grehan
2014-09-09 01:45:39 +00:00
adrian
aab402c146 Add a flag to ip_output() - IP_NODEFAULTFLOWID - which prevents it from
overriding an existing flowid/flowtype field in the outbound mbuf with
the inp_flowid/inp_flowtype details.

The upcoming RSS UDP support calculates a valid RSS value for outbound
mbufs and since it may change per send, it doesn't cache it in the inpcb.
So overriding it here would be wrong.

Differential Revision:	https://reviews.freebsd.org/D527
Reviewed by:	grehan
2014-09-09 00:19:02 +00:00
tuexen
53f889f7e3 Address warnings generated by the clang analyzer.
MFC after: 1 week
2014-09-07 18:05:37 +00:00
tuexen
c7b009940d Address another warnings reported by Patrick Laimbock when compiling
in userspace. While there, improve consistency.

MFC after: 1 week
2014-09-07 17:07:19 +00:00
tuexen
a20e3eb506 Use union sctp_sockstore instead of struct sockaddr_storage. This
eliminiates some warnings when building in userland.
Thanks to Patrick Laimbock for reporting this issue.
Remove also some unnecessary casts.
There should be no functional change.

MFC after: 1 week
2014-09-07 09:06:26 +00:00
tuexen
863eb35e7d Use SYSCTL_PROC instead of SYSCTL_VNET_PROC.
Suggested by: glebius@
MFC after: 1 week
2014-09-07 07:49:49 +00:00
tuexen
65ac57de62 Fix a leak of an address, if the address is scheduled for removal
and the stack is torn down.
Thanks to Peter Bostroem and Jiayang Liu from Google for reporting the
issue.

MFC after: 1 week
2014-09-06 20:03:24 +00:00
tuexen
b9adc6630b Fix the handling of sysctl variables when used with VIMAGE.
While there do some cleanup of the code.

MFC after: 1 week
2014-09-06 19:12:14 +00:00
glebius
ca003e260e Satisfy assertion in m_demote().
Sponsored by:	Nginx, Inc.
2014-09-04 19:28:02 +00:00
jhb
01ecfa68a7 In tcp_input(), don't acquire the pcbinfo global write lock for SYN
packets targeting a listening socket.  Permit to reduce TCP input
processing starvation in context of high SYN load (e.g. short-lived TCP
connections or SYN flood).

Submitted by:	Julien Charbon <jcharbon@verisign.com>
Reviewed by:	adrian, hiren, jhb, Mike Bentkofsky
2014-09-04 19:09:08 +00:00
glebius
9e0133cbe0 Fixes for tcp_respond() comment. 2014-09-04 17:05:57 +00:00
glebius
ae21082d79 Improve r265338. When inserting mbufs into TCP reassembly queue,
try to collapse adjacent pieces using m_catpkt(). In best case
scenario it copies data and frees mbufs, making mbuf exhaustion
attack harder.

Suggested by:		Jonathan Looney <jonlooney gmail.com>
Security:		Hardens against remote mbuf exhaustion attack.
Sponsored by:		Netflix
Sponsored by:		Nginx, Inc.
2014-09-04 09:15:44 +00:00
glebius
2e01608625 Clean up unused CSUM_FRAGMENT.
Sponsored by:	Nginx, Inc.
2014-09-03 08:30:18 +00:00
glebius
2dd39cca12 Make SOCK_RAW sockets to be truly raw, not modifying received and sent
packets at all. Swapping byte order on SOCK_RAW was actually a bug, an
artifact from the BSD network stack, that used to convert a packet to
native byte order once it is received by kernel.

Other operating systems didn't follow this, and later other BSD
descendants fixed this, leaving us alone with the bug. Now it is
clear that we should fix the bug.

In collaboration with:	Olivier Cochard-Labbé <olivier cochard.me>
See also:		https://wiki.freebsd.org/SOCK_RAW
Sponsored by:		Nginx, Inc.
2014-09-01 14:04:51 +00:00
glebius
6047680797 Use macros instead of referencing struct if_data that resides in ifnet.
Sponsored by:	Nginx, Inc.
2014-08-31 06:30:50 +00:00
tuexen
368ac164f9 Announce SCTP support in the kern.features sysctl variables.
MFC after: 3 days
2014-08-26 21:15:34 +00:00
delphij
3fc68a2c42 Restore historical behavior of in_control, which, when no matching address
is found, the first usable address is returned for legacy ioctls like
SIOCGIFBRDADDR, SIOCGIFDSTADDR, SIOCGIFNETMASK and SIOCGIFADDR.

While there also fix a subtle issue that a caller from a jail asking for
INADDR_ANY may get the first IP of the host that do not belong to the jail.

Submitted by:	glebius
Differential Revision: https://reviews.freebsd.org/D667
2014-08-22 19:08:12 +00:00
lstewart
b3bf7524ad Destroy the "qdiffsample_zone" UMA zone on unload to avoid a use-after-unload
panic easily triggered by running "sysctl -a" after unload.

Reported and tested by:	Grenville Armitage <garmitage@swin.edu.au>
MFC after:	1 week
2014-08-19 02:19:53 +00:00
kevlo
dd40fa7e62 Change pr_output's prototype to avoid the need for explicit casts.
This is a follow up to r269699.

Phabric:	D564
Reviewed by:	jhb
2014-08-15 02:43:02 +00:00
tuexen
4feb6f37e3 Add support for the SCTP_PR_STREAM_STATUS and SCTP_PR_ASSOC_STATUS
socket options. This includes managing the correspoing stat counters.
Add the SCTP_DETAILED_STR_STATS kernel option to control per policy
counters on every stream. The default is off and only an aggregated
counter is available. This is sufficient for the RTCWeb usecase.

MFC after: 1 week
2014-08-13 15:50:16 +00:00
tuexen
e167d96026 Change SCTP sysctl from auth_disable to auth_enable. This is
consistent with other similar sysctl variable used in SCTP.
2014-08-12 13:13:11 +00:00
tuexen
b57b7cb252 Add support for the SCTP_AUTH_SUPPORTED and SCTP_ASCONF_SUPPORTED
socket options. Add also a sysctl to control the support of ASCONF.

MFC after: 1 week
2014-08-12 11:30:16 +00:00
hselasky
e253e43882 Fix string length argument passed to "sysctl_handle_string()" so that
the complete string is returned by the function and not just only one
byte.

PR:	192544
MFC after:	2 weeks
2014-08-10 07:51:55 +00:00
hiren
a1e149db26 Improve comments by listing a criteria for automatic increment of receive socket
buffer.

Reviewed by:	jmg
2014-08-09 21:01:24 +00:00
tuexen
9f6eff7a40 Small modification of the sctp_input() cleanup to avoid having
code between declariations.
2014-08-09 14:33:44 +00:00
kib
aebb08ee0c Fix one more compiler warning, m is not initialized. 2014-08-08 15:50:02 +00:00
bz
7704d53e2e Fix argument to KTR after r269699 to unbreak LINT builds. 2014-08-08 09:17:02 +00:00
kevlo
7727a3c215 Merge 'struct ip6protosw' and 'struct protosw' into one. Now we have
only one protocol switch structure that is shared between ipv4 and ipv6.

Phabric:	D476
Reviewed by:	jhb
2014-08-08 01:57:15 +00:00
tuexen
e7d5338a8e Add support for the SCTP_RECONFIG_SUPPORTED and the corresponding
sysctl controlling the negotiation of the RE-CONFIG extension.

MFC after: 3 days
2014-08-04 20:07:35 +00:00
hiren
203dc5e795 Add a comment for easier code understanding. 2014-08-04 19:42:48 +00:00
tuexen
ff18393ff0 Add support for the SCTP_PKTDROP_SUPPORTED socket option and
the corresponding sysctl variable.
The default is off, since the specification is not an RFC yet.

MFC after: 1 week
2014-08-03 18:12:55 +00:00
tuexen
62b88726fd Use consistent names for SCTP sysctls. Rename
nr_sack_on_off to nrsack_enable.
Please note that this extension is off by default
since it is not specified in an RFC (yet).
2014-08-03 15:09:13 +00:00
tuexen
fb7bbef5e1 Add SCTP socket option SCTP_NRSACK_SUPPORTED to control the
NRSACK extension. The default will still be off, since it
it not an RFC (yet).
Changing the sysctl name will be in a separate commit.

MFC after: 1 week
2014-08-03 14:10:10 +00:00
tuexen
31e0173d95 Add support for the SCTP_PR_SUPPORTED socket option as specified in
http://tools.ietf.org/html/draft-ietf-tsvwg-sctp-prpolicies
Add also a sysctl controlling the default of the end-points.

MFC after: 1 week
2014-08-02 21:36:40 +00:00
tuexen
6dd06cb981 Fix a copy and paste error.
X-MFC with: 269436
2014-08-02 20:37:02 +00:00
tuexen
9ad96316d8 Cleanup the ECN configuration handling and provide an SCTP socket
option for controlling ECN on future associations and get the
status on current associations.
A simialar pattern will be used for controlling SCTP extensions in
upcoming commits.
2014-08-02 17:35:13 +00:00
tuexen
4a0d502636 Remove the asconf_auth_nochk sysctl. This was off by default and only
existed to be able to test with non-compliant peers a long time ago.
2014-08-01 20:49:27 +00:00
grehan
8cf28f6b1b Fix byte ordering in default RSS key.
The rss_key[] array in netinet/in_rss.c has the bytes in incorrect
order. This results in the RSS test vectors in the Microsft RSS spec
and Intel NIC specs giving incorrect results, and making it difficult
to verify correct hash operation when RSS functionality is added to
new NICs.

CR:		https://phabric.freebsd.org/D516
Reviewed by:	adrian
2014-08-01 18:36:40 +00:00
tuexen
418772ad46 Cleanup sctp_send_initiate() and sctp_send_initiate_ack() to be
in sync as much as possible. This simplifies upcoming changes.
2014-08-01 12:42:37 +00:00
smh
3ae4520699 Ensure that IP's added to CARP always use the CARP MAC
Previously there was a race condition between the address addition
and associating it with the CARP which resulted in the interface
MAC, instead of the CARP MAC, being used for a brief amount of time.

This caused "is using my IP address" warnings as well as data being
sent to the wrong machine due to incorrect ARP entries being recorded
by other devices on the network.
2014-07-31 16:43:56 +00:00
smh
8d283ef60a Only check error if one could have been generated 2014-07-31 09:18:29 +00:00
hiren
f19c3089af Add a comment and while there, fix trailing whitespace. 2014-07-29 23:42:51 +00:00
glebius
d32e428cc3 Garbage collect couple of unused fields from struct ifaddr:
- ifa_claim_addr() unused since removal of NetAtalk
- ifa_metric seems to be never utilized, always a copy of if_metric
2014-07-29 15:01:29 +00:00
marcel
c6a7dd8e59 The accept filter code is not specific to the FreeBSD IPv4 network stack,
so it really should not be under "optional inet". The fact that uipc_accf.c
lives under kern/ lends some weight to making it a "standard" file.

Moving kern/uipc_accf.c from "optional inet" to "standard" eliminates the
need for #ifdef INET in kern/uipc_socket.c.

Also, this meant the net.inet.accf.unloadable sysctl needed to move, as
net.inet does not exist without networking compiled in (as it lives in
netinet/in_proto.c.) The new sysctl has been named net.accf.unloadable.

In order to support existing accept filter sysctls, the net.inet.accf node
has been added netinet/in_proto.c.

Submitted by:	Steve Kiernan <stevek@juniper.net>
Obtained from:	Juniper Networks, Inc.
2014-07-26 19:27:34 +00:00
tuexen
da26b011a7 Initialize notification strucuture. This was missed in an earlier commit
MFC after: 3 days
2014-07-24 18:06:18 +00:00
hrs
45044bb8e3 Fix EtherIP. TOS field must be initialized when the inner protocol is
PF_LINK, and multicast/broadcast flag should always be dropped because
the outer protocol uses unicast even when the inner address is not for
unicast.  It had been broken since r236951 when gif_output() started to
use IFQ_HANDOFF().
2014-07-24 10:42:47 +00:00
tuexen
8a530385ea Cleanup the definition of two structures which are
exposed to userland. Therefore no MFC.
2014-07-22 19:54:22 +00:00