Currently, the TCP slow timer can starve TCP input processing while it
walks the list of connections in TIME_WAIT closing expired connections
due to contention on the global TCP pcbinfo lock.
To remediate, introduce a new global lock to protect the list of
connections in TIME_WAIT. Only acquire the TCP pcbinfo lock when
closing an expired connection. This limits the window of time when
TCP input processing is stopped to the amount of time needed to close
a single connection.
Approved by: jhb (mentor)
Fix some minor TSO issues:
- Improve description of TSO limits.
- Remove a not needed KASSERT()
- Remove some not needed variable casts.
Sponsored by: Mellanox Technologies
Check for mbuf copy failure when there are multiple multicast sockets
This partitular case is the only path where the mbuf could be NULL.
udp_append() checked for a NULL mbuf only after invoking the tunneling
callback. Our only in tree tunneling callback - SCTP - assumed a non
NULL mbuf, and it is a bit odd to make the callbacks responsible for
checking this condition.
This also reduces the differences between the IPv4 and IPv6 code.
Improve transmit sending offload, TSO, algorithm in general. This
change allows all HCAs from Mellanox Technologies to function properly
when TSO is enabled. See r271946 and r272595 for more details about
this commit.
Sponsored by: Mellanox Technologies
When tunneling interface is going to insert mbuf into netisr queue after stripping
outer header, consider it as new packet and clear the protocols flags.
This fixes problems when IPSEC traffic goes through various tunnels and router
doesn't send ICMP/ICMPv6 errors.
PR: 174602
Sponsored by: Yandex LLC
The SYSCTL data pointers can come from userspace and must not be
directly accessed. Although this will work on some platforms, it can
throw an exception if the pointer is invalid and then panic the kernel.
Add a missing SYSCTL_IN() of "SCTP_BASE_STATS" structure.
Sponsored by: Mellanox Technologies
strict POSIX mode.
Put the htonl(), htons(), ntohl() and ntohs() declarations under
__POSIX_VISIBLE >= 200112. POSIX.1-2001 and newer require these to be
exposed from <netinet/in.h> (as well as <arpa/inet.h>).
Note that it may be unnecessary to check __POSIX_VISIBLE >= 200112 because
older versions of POSIX and the C standard do not define this header.
However, other places in the same file already perform the check.
PR: 188316
Submitted by: Christian Neukirchen
Implement PLPMTUD blackhole detection (RFC 4821), inspired by code
from xnu sources. If we encounter a network where ICMP is blocked
the Needs Frag indicator may not propagate back to us. Attempt to
downshift the mss once to a preconfigured value.
Note, this is turned off by default.
Fix the reported streams in a SCTP_STREAM_RESET_EVENT, if a
sent incoming stream reset request was responded with failed
or denied.
Thanks to Peter Bostroem from Google for reporting the issue.
Ensure that the list of streams sent in a stream reset parameter fits
in an mbuf-cluster.
Thanks to Peter Bostroem for drawing my attention to this part of the code.
Ensure that the number of stream reported in srs_number_streams is
consistent with the amount of data provided in the SCTP_RESET_STREAMS
socket option.
Thanks to Peter Bostroem from Google for drawing my attention to
this part of the code.
Fix EtherIP. TOS field must be initialized when the inner protocol is
PF_LINK, and multicast/broadcast flag should always be dropped because
the outer protocol uses unicast even when the inner address is not for
unicast. It had been broken since r236951 when gif_output() started to
use IFQ_HANDOFF().
The default for UDPLITE_RECV_CSCOV is zero. RFC 3828 recommend
that this means full checksum coverage for received packets.
If an application is willing to accept packets with partial
coverage, it is expected to use the socket option and provide
the minimum coverage it accepts.
If the checksum coverage field in the UDPLITE header is the length
of the complete UDPLITE packet, the packet has full checksum coverage.
So fix the condition.
Chunk IDs are 8 bit entities, not 16 bit.
Thanks to Peter Kasting from Google for drawing
my attention to it.
MFC r271665:
The MTU is handled as a 32-bit entity within the SCTP stack.
This was reported by Peter Kasting from Google.
MFC r271670:
Make a type conversion explicit. When compiling this code on
Windows as part of the SCTP userland stack, this fixes a
warning reported by Peter Kasting from Google.
MFC r271672:
Small cleanup which addresses a warning regaring the truncation
of a 64-bit entity to a 32-bit entity. This issue was reported by
Peter Kasting from Google.
MFC r271673:
Use a consistent type for the number of HMAC algorithms.
This fixes a bug which resulted in a warning on the userland
stack, when compiled on Windows.
Thanks to Peter Kasting from Google for reporting the issue and
provinding a potential fix.
MFC r271674:
Add a explict cast to silence a warning when building
the userland stack on Windows.
This issue was reported by Peter Kasting from Google.
Approved by: re (kib)
Announce SCTP support in the kern.features sysctl variables.
MFC r270859:
Enable SCTP support. It runs perfectly fine on a Wandboard quad.
MFC r271204 with manual intervention:
Fix the handling of sysctl variables when used with VIMAGE.
While there do some cleanup of the code.
MFC r271209:
Fix a leak of an address, if the address is scheduled for removal
and the stack is torn down.
Thanks to Peter Bostroem and Jiayang Liu from Google for reporting the
issue.
MFC r271219:
Use SYSCTL_PROC instead of SYSCTL_VNET_PROC.
Suggested by: glebius@
MFC r271221:
Use union sctp_sockstore instead of struct sockaddr_storage. This
eliminates some warnings when building in userland.
Thanks to Patrick Laimbock for reporting this issue.
Remove also some unnecessary casts.
There should be no functional change.
MFC r271228:
Address another warnings reported by Patrick Laimbock when compiling
in userspace. While there, improve consistency.
MFC r271230:
Address warnings generated by the clang analyzer.
Approved by: re (kib)
Destroy the "qdiffsample_zone" UMA zone on unload to avoid a use-after-unload
panic easily triggered by running "sysctl -a" after unload.
Reported and tested by: Grenville Armitage <garmitage@swin.edu.au>
Approved by: re(gjb)
Fix string length argument passed to "sysctl_handle_string()" so that
the complete string is returned by the function and not just only one
byte.
PR: 192544
Add support for the SCTP_PR_STREAM_STATUS and SCTP_PR_ASSOC_STATUS
socket options. This includes managing the correspoing stat counters.
Add the SCTP_DETAILED_STR_STATS kernel option to control per policy
counters on every stream. The default is off and only an aggregated
counter is available. This is sufficient for the RTCWeb usecase.
Add support for the SCTP_PKTDROP_SUPPORTED socket option and
the corresponding sysctl variable.
The default is off, since the specification is not an RFC yet.
Add SCTP socket option SCTP_NRSACK_SUPPORTED to control the
NRSACK extension. The default will still be off, since it
it not an RFC (yet).
Changing the sysctl name will be in a separate commit.
Cleanup the ECN configuration handling and provide an SCTP socket
option for controlling ECN on future associations and get the
status on current associations.
A simialar pattern will be used for controlling SCTP extensions in
upcoming commits.
Bugfix: When a remote address was added to an endpoint,
a source address was selected and cached, but it was not
stored that is was cached. This resulted in selecting
different source addresses for the INIT-ACK and COOKIE-ACK
when possible.
Thanks to Niu Zhixiong for reporting the issue.
Remove the prototypes for things that are no longer file local but were
moved to the header file.
Was suppoed to be MFCed with: r266596
Pointy hat to: bz
Move the tcp_fields_to_host() and tcp_fields_to_net() (inline)
functions to the tcp_var.h header file in order to avoid further
duplication with upcoming commits.
Reviewed by: np