We are inspecting PCBs of divert sockets under NET_EPOCH section,
but PCB could be already detached and we should check INP_FREED flag
when we took INP_RLOCK.
PR: 254478
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D29420
In tcp_hostcache_list, the sbuf used would need a large (~2MB)
blocking allocation of memory (M_WAITOK), when listing a
full hostcache. This may stall the requestor for an indeterminate
time.
A further optimization is to return the expected userspace
buffersize right away, rather than preparing the output of
each current entry of the hostcase, provided by: @tuexen.
This makes use of the ready-made functions of sbuf to work
with sysctl, and repeatedly drain the much smaller buffer.
PR: 254333
MFC after: 2 weeks
Reviewed By: #transport, tuexen
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D29471
Ensure that the stack does not generate a DSACK block for user
data received on a SYN segment in SYN-SENT state.
Reviewed by: rscheff
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D29376
Sponsored by: Netflix, Inc.
Summary:
This fixes rtentry leak for the cloned interfaces created inside the
VNET.
PR: 253998
Reported by: rashey at superbox.pl
MFC after: 3 days
Loopback teardown order is `SI_SUB_INIT_IF`, which happens after `SI_SUB_PROTO_DOMAIN` (route table teardown).
Thus, any route table operations are too late to schedule.
As the intent of the vnet teardown procedures to minimise the amount of effort by doing global cleanups instead of per-interface ones, address this by adding a relatively light-weight routing table cleanup function, `rib_flush_routes()`.
It removes all remaining routes from the routing table and schedules the deletion, which will happen later, when `rtables_destroy()` waits for the current epoch to finish.
Test Plan:
```
set_skip:set_skip_group_lo -> passed [0.053s]
tail -n 200 /var/log/messages | grep rtentry
```
Reviewers: #network, kp, bz
Reviewed By: kp
Subscribers: imp, ae
Differential Revision: https://reviews.freebsd.org/D29116
Allow sending user data on the SYN segment.
Reviewed by: rrs
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D29082
Sponsored by: Netflix, Inc.
Introduce convenience macros to retrieve the DSCP, ECN or traffic class
bits from an IPv6 header.
Use them where appropriate.
Reviewed by: ae (previous version), rscheff, tuexen, rgrimes
MFC after: 2 weeks
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D29056
the negotiation of TCP features. This affects most TCP options but
adherance to RFC7323 with the timestamp option will prevent a session
from getting established.
PR: 253576
Reviewed By: tuexen, #transport
MFC after: 3 days
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D28652
- address second instance of cwnd potentially becoming zero
- fix sublte bug due to implicit int to uint typecase in max()
- fix bug due to typo in hand-coded CEILING() function by using howmany() macro
- use int instead of long, and add a missing long typecast
- replace if conditionals with easier to read imax/imin (as in pseudocode)
Reviewed By: #transport, kbowling
MFC after: 3 days
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D28813
Despite the comment to the contrary neither pf nor carp use
in_addmulti(). Nothing does, so get rid of it.
Carp stopped using it in 08b68b0e4c
(2011). It's unclear when pf stopped using it, but before
d6d3f01e0a (2012).
Reviewed by: bz@, melifaro@
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D28918
When tearing down vnet jails we can move an if_bridge out (as
part of the normal vnet_if_return()). This can, when it's clearing out
its list of member interfaces, change its link layer address.
That sends an iflladdr_event, but at that point we've already freed the
AF_INET/AF_INET6 if_afdata pointers.
In other words: when the iflladdr_event callbacks fire we can't assume
that ifp->if_afdata[AF_INET] will be set.
Reviewed by: donner@, melifaro@
MFC after: 1 week
Sponsored by: Orange Business Services
Differential Revision: https://reviews.freebsd.org/D28860
terminating a TCP connection.
If a TCP packet must be retransmitted and the data length has changed in the
retransmitted packet, due to the internal workings of TCP, typically when ACK
packets are lost, then there is a 30% chance that the logic in GetDeltaSeqOut()
will find the correct length, which is the last length received.
This can be explained as follows:
If a "227 Entering Passive Mode" packet must be retransmittet and the length
changes from 51 to 50 bytes, for example, then we have three cases for the
list scan in GetDeltaSeqOut(), depending on how many prior packets were
received modulus N_LINK_TCP_DATA=3:
case 1: index 0: original packet 51
index 1: retransmitted packet 50
index 2: not relevant
case 2: index 0: not relevant
index 1: original packet 51
index 2: retransmitted packet 50
case 3: index 0: retransmitted packet 50
index 1: not relevant
index 2: original packet 51
This patch simply changes the searching order for TCP packets, always starting
at the last received packet instead of any received packet, in
GetDeltaAckIn() and GetDeltaSeqOut().
Else no functional changes.
Discussed with: rscheff@
Submitted by: Andreas Longwitz <longwitz@incore.de>
PR: 230755
MFC after: 1 week
Sponsored by: Mellanox Technologies // NVIDIA Networking
Under some circumstances, PRR may end up with a fully
collapsed cwnd when finalizing the loss recovery.
Reviewed By: #transport, kbowling
Reported by: Liang Tian
MFC after: 1 week
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D28780
TCP_LINGERTIME can be traced back to BSD 4.4 Lite and perhaps beyond, in
exactly the same form that it appears here modulo slightly different
context. It used to be the case that there was a single pr_usrreq
method with requests dispatched to it; these exact two lines appeared in
tcp_usrreq's PRU_ATTACH handling.
The only purpose of this that I can find is to cause surprising behavior
on accepted connections. Newly-created sockets will never hit these
paths as one cannot set SO_LINGER prior to socket(2). If SO_LINGER is
set on a listening socket and inherited, one would expect the timeout to
be inherited rather than changed arbitrarily like this -- noting that
SO_LINGER is nonsense on a listening socket beyond inheritance, since
they cannot be 'connected' by definition.
Neither Illumos nor Linux reset the timer like this based on testing and
inspection of Illumos, and testing of Linux.
Reviewed by: rscheff, tuexen
Differential Revision: https://reviews.freebsd.org/D28265
a further CPU enhancements for compressed acks. These
are acks that are compressed into an mbuf. The transport
has to be aware of how to process these, and an upcoming
update to rack will do so. You need the rack changes
to actually test and validate these since if the transport
does not support mbuf compression, then the old code paths
stay in place. We do in this commit take out the concept
of logging if you don't have a lock (which was quite
dangerous and was only for some early debugging but has
been left in the code).
Sponsored by: Netflix Inc.
Differential Revision: https://reviews.freebsd.org/D28374
It fixes loopback route installation for the interfaces
in the different fibs using the same prefix.
Reviewed By: donner
PR: 189088
Differential Revision: https://reviews.freebsd.org/D28673
MFC after: 1 week
- improved pipe calculation which does not degrade under heavy loss
- engaging in Loss Recovery earlier under adverse conditions
- Rescue Retransmission in case some of the trailing packets of a request got lost
All above changes are toggled with the sysctl "rfc6675_pipe" (disabled by default).
Reviewers: #transport, tuexen, lstewart, slavash, jtl, hselasky, kib, rgrimes, chengc_netapp.com, thj, #manpages, kbowling, #netapp, rscheff
Reviewed By: #transport
Subscribers: imp, melifaro
MFC after: 2 weeks
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D18985
Currently ip6_input() calls in6ifa_ifwithaddr() for
every local packet, in order to check if the target ip
belongs to the local ifa in proper state and increase
its counters.
in6ifa_ifwithaddr() references found ifa.
With epoch changes, both `ip6_input()` and all other current callers
of `in6ifa_ifwithaddr()` do not need this reference
anymore, as epoch provides stability guarantee.
Given that, update `in6ifa_ifwithaddr()` to allow
it to return ifa without referencing it, while preserving
option for getting referenced ifa if so desired.
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D28648
Use ISS for SEG.SEQ when sending a SYN-ACK segment in response to
an SYN segment received in the SYN-SENT state on a socket having
the IPPROTO_TCP level socket option TCP_NOOPT enabled.
Reviewed by: rscheff
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D28656
In error case we can leave `inp' locked, also we need to free
mbuf chain `m' in the same case. Release the lock and use `badunlocked'
label to exit with freed mbuf. Also modify UDP error statistic to
match the IPv6 code.
Remove redundant INP_RUNLOCK() from the `if (last == NULL)' block,
there are no ways to reach this point with locked `inp'.
Obtained from: Yandex LLC
MFC after: 3 days
Sponsored by: Yandex LLC
Historically receive buffer overflows have been ignored and programs
could not tell if they missed messages or messages had been truncated
because of overflows. Since programs historically do not expect to get
receive overflow errors, this behavior is not the default.
This is really really important for programs that use route(4) to keep in sync
with the system. If we loose a message then we need to reload the full system
state, otherwise the behaviour from that point is undefined and can lead
to chasing bogus bug reports.
to be a true RFC 6598 NAT444 setup, where each network segment (e.g. user,
subnet) can have their own dedicated port aliasing ranges.
Reviewed by: donner, kp
Approved by: 0mp (mentor), donner, kp
Differential Revision: https://reviews.freebsd.org/D23450
Improve the handling of INIT chunks in specific szenarios and
report and appropriate error cause.
Thanks to Anatoly Korniltsev for reporting the issue for the
userland stack.
MFC after: 3 days
Originally IFCAP_NOMAP meant that the mbuf has external storage pointer
that points to unmapped address. Then, this was extended to array of
such pointers. Then, such mbufs were augmented with header/trailer.
Basically, extended mbufs are extended, and set of features is subject
to change. The new name should be generic enough to avoid further
renaming.