In case of the empty queue tp->snd_holes and tcp_sackhole_insert()
failing due to memory shortage, tp->snd_holes will be empty.
This problem was hit when stress tests where performed by pho.
PR: 215513
Reported by: pho
Tested by: pho
Sponsored by: Netflix, Inc.
This can lead to change of mbuf pointer (packet filter could do m_pullup(),
NAT, etc). Also in case of change of destination address, tryforward can
decide that packet should be handled by local system. In this case modified
mbuf can be returned to the ip[6]_input(). To handle this correctly, check
M_FASTFWD_OURS flag after return from ip[6]_tryforward. And if it is present,
update variables that depend from mbuf pointer and skip another inbound
firewall processing.
No objection from: #network
MFC after: 3 weeks
Sponsored by: Yandex LLC
Differential Revision: https://reviews.freebsd.org/D8764
in6p_options to check that. That is incorrect as we carry ip options in
in6p_outputopts. Also, just checking for in6p_outputopts being NULL won't
suffice as we combine ip options and ip header fields both in that one field.
The commit fixes this by using ip6_optlen() which correctly calculates length
of only ip options for IPv6.
Reviewed by: ae, bz
MFC after: 3 weeks
Sponsored by: Limelight Networks
This fixes a bug where the wrong ppid was reported, if
* I-DATA was used on the first fragement was not received first
* DATA was used and different ppids where used.
Thanks to Julian Cordes for making me aware of the issue.
MFC after: 1 week
This made a couple of bugs visible in handling SSN wrap-arounds
when using DATA chunks. Now bulk transfer seems to work fine...
This fixes the issue reported in
https://github.com/sctplab/usrsctp/issues/111
MFC after: 1 week
The tools using to generate the sources has been updated and produces
different whitespaces. Commit this seperately to avoid intermixing
these with real code changes.
MFC after: 3 days
When a TCP segment with the FIN bit set was received in the CLOSED state,
a TCP RST-ACK-segment is sent. When computing SEG.ACK for this, the
FIN counts as one byte. This accounting was missing and is fixed by this
patch.
Reviewed by: hiren
MFC after: 1 month
Sponsored by: Netflix, Inc.
Differential Revision: https://svn.freebsd.org/base/head
many borken middle-boxes tend to do that. But during 3whs, in syncache_expand(),
we don't do that which causes us to send a RST to such a client. Relax this
constraint by only using tsecr to compare against timestamp that we sent when it
is not 0. As a result, we'd now accept the final ACK of 3whs with tsecr of 0.
Reviewed by: jtl, gnn
Sponsored by: Limelight Networks
Differential Revision: https://reviews.freebsd.org/D8552
This does not cover state changes from TIME-WAIT.
Reviewed by: gnn
MFC after: 3 weeks
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D8443
either in the CLOSING or LAST-ACK state.
Reviewed by: hiren
MFC after: 3 weeks
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D8371
introduced with r261242. The useful and expected soisconnected()
call is done in tcp_do_segment().
Has been found as part of unrelated PR:212920 investigation.
Improve slightly (~2%) the maximum number of TCP accept per second.
Tested by: kevin.bowling_kev009.com, jch
Approved by: gnn, hiren
MFC after: 1 week
Sponsored by: Verisign, Inc
Differential Revision: https://reviews.freebsd.org/D8072
loss event but not use or obay the recommendations i.e. values set by it in some
cases.
Here is an attempt to solve that confusion by following relevant RFCs/drafts.
Stack only sets congestion window/slow start threshold values when there is no
CC module availalbe to take that action. All CC modules are inspected and
updated when needed to take appropriate action on loss.
tcp_stacks/fastpath module has been updated to adapt these changes.
Note: Probably, the most significant change would be to not bring congestion
window down to 1MSS on a loss signaled by 3-duplicate acks and letting
respective CC decide that value.
In collaboration with: Matt Macy <mmacy at nextbsd dot org>
Discussed on: transport@ mailing list
Reviewed by: jtl
MFC after: 1 month
Sponsored by: Limelight Networks
Differential Revision: https://reviews.freebsd.org/D8225
In r304435, ip_output() was changed to use the result of the route
lookup to decide whether the outgoing packet was a broadcast or
not. This introduced a regression on interfaces where
IFF_BROADCAST was not set (e.g. point-to-point links), as the
algorithm could incorrectly treat the destination address as a
broadcast address, and ip_output() would subsequently drop the
packet as broadcasting on a non-IFF_BROADCAST interface is not
allowed.
Differential Revision: https://reviews.freebsd.org/D8303
Reviewed by: jtl
Reported by: ambrisko
MFC after: 2 weeks
X-MFC-With: r304435
Sponsored by: Dell EMC Isilon
handling. Ensure that:
* Protocol unreachable errors are handled by indicating ECONNREFUSED
to the TCP user for both IPv4 and IPv6. These were ignored for IPv6.
* Communication prohibited errors are handled by indicating ECONNREFUSED
to the TCP user for both IPv4 and IPv6. These were ignored for IPv6.
* Hop Limited exceeded errors are handled by indicating EHOSTUNREACH
to the TCP user for both IPv4 and IPv6.
For IPv6 the TCP connected was dropped but errno wasn't set.
Reviewed by: gallatin, rrs
MFC after: 1 month
Sponsored by: Netflix
Differential Revision: 7904
after having been dropped.
This fixes enforces in_pcbdrop() logic in tcp_input():
"in_pcbdrop() is used by TCP to mark an inpcb as unused and avoid future packet
delivery or event notification when a socket remains open but TCP has closed."
PR: 203175
Reported by: Palle Girgensohn, Slawa Olhovchenkov
Tested by: Slawa Olhovchenkov
Reviewed by: Slawa Olhovchenkov
Approved by: gnn, Slawa Olhovchenkov
Differential Revision: https://reviews.freebsd.org/D8211
MFC after: 1 week
Sponsored by: Verisign, inc
to at least 64.
This is still just a coverup to avoid kernel panic and not an actual fix.
PR: 213232
Reviewed by: glebius
MFC after: 1 week
Sponsored by: Limelight Networks
Differential Revision: https://reviews.freebsd.org/D8272
Also renamed some tfo labels and added/reworked comments for clarity.
Based on an initial patch from jtl.
PR: 213424
Reviewed by: jtl
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D8235
compile when that option is configured. In tcp_destroy(), the error
variable is now only used in code enclosed in an '#ifdef TCP_HHOOK' block.
This broke the build for VNET images.
Enclose the error variable itself in an #ifdef block.
Submitted by: Shawn Webb <shawn.webb at hardenedbsd.org>
Reported by: Shawn Webb <shawn.webb at hardenedbsd.org>
PointyHat to: jtl
received on a TCP session that has entered the ESTABLISHED state. This
results in a lot of calls to reset the keepalive timer.
This patch changes the behavior so we set the keepalive timer for the
keepalive idle time (TP_KEEPIDLE). When the keepalive timer fires, it will
first check to see if the session has been idle for TP_KEEPIDLE ticks. If
not, it will reschedule the keepalive timer for the time the session will
have been idle for TP_KEEPIDLE ticks.
For a session with regular communication, the keepalive timer should fire
approximately once every TP_KEEPIDLE ticks. For sessions with irregular
communication, the keepalive timer might fire more often. But, the
disruption from a periodic keepalive timer should be less than the regular
cost of resetting the keepalive timer on every packet.
(FWIW, this change saved approximately 1.73% of the busy CPU cycles on a
particular test system with a heavy TCP output load. Of course, the
actual impact is very specific to the particular hardware and workload.)
Reviewed by: gallatin, rrs
MFC after: 2 weeks
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D8243
the TCP_RFC7413 kernel option. This change removes those few instructions
from the packet processing path.
While not strictly necessary, for the sake of consistency, I applied the
new IS_FASTOPEN macro to all places in the packet processing path that
used the (t_flags & TF_FASTOPEN) check.
Reviewed by: hiren
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D8219
TCPCB, it checks (so->so_options & SO_ACCEPTCONN) to determine whether or
not the socket is a listening socket. However, this causes the code to
access a different cacheline. If we first check if the socket is in the
LISTEN state, we can avoid accessing so->so_options when processing packets
received for ESTABLISHED sessions.
If INVARIANTS is defined, the code still needs to access both variables to
check that so->so_options is consistent with the state.
Reviewed by: gallatin
MFC after: 1 week
Sponsored by: Netflix
to add actions that run when a TCP frame is sent or received on a TCP
session in the ESTABLISHED state. In the base tree, this functionality is
only used for the h_ertt module, which is used by the cc_cdg, cc_chd, cc_hd,
and cc_vegas congestion control modules.
Presently, we incur overhead to check for hooks each time a TCP frame is
sent or received on an ESTABLISHED TCP session.
This change adds a new compile-time option (TCP_HHOOK) to determine whether
to include the hhook(9) framework for TCP. To retain backwards
compatibility, I added the TCP_HHOOK option to every configuration file that
already defined "options INET". (Therefore, this patch introduces no
functional change. In order to see a functional difference, you need to
compile a custom kernel without the TCP_HHOOK option.) This change will
allow users to easily exclude this functionality from their kernel, should
they wish to do so.
Note that any users who use a custom kernel configuration and use one of the
congestion control modules listed above will need to add the TCP_HHOOK
option to their kernel configuration.
Reviewed by: rrs, lstewart, hiren (previous version), sjg (makefiles only)
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D8185
This change extends the nd6 lock to protect the ND prefix list as well
as the list of advertising routers associated with each prefix. To handle
cases where the nd6 lock must be dropped while iterating over either the
prefix or default router lists, a generation counter is used to track
modifications to the lists. Additionally, a new mutex is used to serialize
prefix on-link/off-link transitions. This mutex must be acquired before
the nd6 lock and is held while updating the routing table in
nd6_prefix_onlink() and nd6_prefix_offlink().
Reviewed by: ae, tuexen (SCTP bits)
Tested by: Jason Wolfe <jason@llnw.com>,
Larry Rosenman <ler@lerctr.org>
MFC after: 2 months
Differential Revision: https://reviews.freebsd.org/D8125
In the persist case, take the SYN and FIN flags into account when updating
the sequence space sent.
Reviewed by: gnn
MFC after: 2 weeks
Sponsored by: Juniper Networks, Netflix
Differential Revision: https://reviews.freebsd.org/D7075
Tested by: Limelight, Netflix
A single gratuitous ARP (GARP) is always transmitted when an IPv4
address is added to an interface, and that is usually sufficient.
However, in some circumstances, such as when a shared address is
passed between cluster nodes, this single GARP may occasionally be
dropped or lost. This can lead to neighbors on the network link
working with a stale ARP cache and sending packets destined for
that address to the node that previously owned the address, which
may not respond.
To avoid this situation, GARP retransmissions can be enabled by setting
the net.link.ether.inet.garp_rexmit_count sysctl to a value greater
than zero. The setting represents the maximum number of retransmissions.
The interval between retransmissions is calculated using an exponential
backoff algorithm, doubling each time, so the retransmission intervals
are: {1, 2, 4, 8, 16, ...} (seconds).
Due to the exponential backoff algorithm used for the interval
between GARP retransmissions, the maximum number of retransmissions
is limited to 16 for sanity. This limit corresponds to a maximum
interval between retransmissions of 2^16 seconds ~= 18 hours.
Increasing this limit is possible, but sending out GARPs spaced
days apart would be of little use.
Submitted by: David A. Bright <david.a.bright@dell.com>
MFC after: 1 month
Relnotes: yes
Sponsored by: Dell EMC
Differential Revision: https://reviews.freebsd.org/D7695
is NULL and the function jumps to the "release:" label.
For this case, the "inp" was write locked, but the code attempted to
read unlock it. This patch fixes the problem.
This case could occur for NFS over UDP mounts, where the server was
down for a few minutes under certain circumstances.
Reported by: bde
Tested by: bde
Reviewed by: gnn
MFC after: 2 weeks
during testing of network related changes where cached entries may pollute your
results, or during known congestion events where you don't want to unfairly
penalize hosts.
Prior to r232346 this would have meant you would break any connection with a sub
1500 MTU, as the hostcache was authoritative. All entries as they stand today
should simply be used to pre populate values for efficiency.
Submitted by: Jason Wolfe (j at nitrology dot com)
Reviewed by: rwatson, sbruno, rrs , bz (earlier version)
MFC after: 2 weeks
Sponsored by: Limelight Networks
Differential Revision: https://reviews.freebsd.org/D6198