lookup for the inp flowid/flowtype to destination CPU.
This only modifies the case where RSS is enabled and the per-cpu tcp
timer option is enabled. Otherwise the behaviour should be the same
as before.
This is intended to be used by various places that wish to hash some
information about a TCP/UDP/IP flow but don't necessarily have a
live mbuf to do it with.
Refactor rss_m2cpuid() to use the refactored function.
Add `flags` u16 field to the hole in ipfw_table_xentry structure.
Kernel has been guessing address family for supplied record based
on xent length size.
Userland, however, has been getting fixed-size ipfw_table_xentry structures
guessing address family by checking address by IN6_IS_ADDR_V4COMPAT().
Fix this behavior by providing specific IPFW_TCF_INET flag for IPv4 records.
PR: bin/189471
Submitted by: Dennis Yusupoff <dyr@smartspb.net>
MFC after: 2 weeks
near-term future use.
These are intended to fetch the current flow id, flow hash type
(M_HASHTYPE_* from the sys/mbuf.h) and if RSS is enabled, the
RSS destined CPU ID for the receive path.
eight years. The original concept was to improve the
corner case where you run out of ephemeral ports, but it
was causing performance problems and the mechanism
of limiting the number of time_wait sockets serves
the same purpose in the end.
Reviewed by: bz
same events that tcpstat's tcps_rcvmemdrop counter counts.
- Rename tcps_rcvmemdrop to tcps_rcvreassfull and improve its
description in netstat(1) output.
Sponsored by: Netflix
Sponsored by: Nginx, Inc.
mixing on stack memory and UMA memory in one linked list.
Thus, rewrite TCP reassembly code in terms of memory usage. The
algorithm remains unchanged.
We actually do not need extra memory to build a reassembly queue.
Arriving mbufs are always packet header mbufs. So we got the length
of data as pkthdr.len. We got m_nextpkt for linkage. And we need
only one pointer to point at the tcphdr, use PH_loc for that.
In tcpcb the t_segq fields becomes mbuf pointer. The t_segqlen
field now counts not packets, but bytes in the queue. This gives
us more precision when comparing to socket buffer limits.
Sponsored by: Netflix
Sponsored by: Nginx, Inc.
exists on another interface. The panic was introduced by change 264887, which
changed the fibnum parameter in the call to rtalloc1_fib() in
ifa_switch_loopback_route() from RT_DEFAULT_FIB to RT_ALL_FIBS. The solution
is to use the interface fib in that call. For the majority of users, that will
be equivalent to the legacy behavior.
PR: kern/189089
Reported by: neel
Reviewed by: neel
MFC after: 3 weeks
X-MFC with: 264887
Sponsored by: Spectra Logic
These two bugs are closely related. The root cause is that ifa_ifwithnet
does not consider FIBs when searching for an interface address.
sys/net/if_var.h
sys/net/if.c
Add a fib argument to ifa_ifwithnet and ifa_ifwithdstadddr. Those
functions will only return an address whose interface fib equals the
argument.
sys/net/route.c
Update calls to ifa_ifwithnet and ifa_ifwithdstaddr with fib
arguments.
sys/netinet/in.c
Update in_addprefix to consider the interface fib when adding
prefixes. This will prevent it from not adding a subnet route when
one already exists on a different fib.
sys/net/rtsock.c
sys/netinet/in_pcb.c
sys/netinet/ip_output.c
sys/netinet/ip_options.c
sys/netinet6/nd6.c
Add RT_DEFAULT_FIB arguments to ifa_ifwithdstaddr and ifa_ifwithnet.
In some cases it there wasn't a clear specific fib number to use.
In others, I was unable to test those functions so I chose
RT_DEFAULT_FIB to minimize divergence from current behavior. I will
fix some of the latter changes along with PR kern/187553.
tests/sys/netinet/fibs_test.sh
tests/sys/netinet/udp_dontroute.c
tests/sys/netinet/Makefile
Revert r263738. The udp_dontroute test was right all along.
However, bugs kern/187550 and kern/187553 cancelled each other out
when it came to this test. Because of kern/187553, ifa_ifwithnet
searched the default fib instead of the requested one, but because
of kern/187550, there was an applicable subnet route on the default
fib. The new test added in r263738 doesn't work right, however. I
can verify with dtrace that ifa_ifwithnet returned the wrong address
before I applied this commit, but route(8) miraculously found the
correct interface to use anyway. I don't know how.
Clear expected failure messages for kern/187550 and kern/187552.
PR: kern/187550
PR: kern/187552
Reviewed by: melifaro
MFC after: 3 weeks
Sponsored by: Spectra Logic
sys/net/route.c
In rtinit1, use the interface fib instead of the process fib. The
latter wasn't very useful because ifconfig(8) is usually invoked
with the default process fib. Changing ifconfig(8) to use setfib(2)
would be redundant, because it already sets the interface fib.
tests/sys/netinet/fibs_test.sh
Clear the expected ATF failure
sys/net/if.c
Pass the interface fib in calls to rtrequest1_fib and rtalloc1_fib
sys/netinet/in.c
sys/net/if_var.h
Add a fibnum argument to ifa_switch_loopback_route, a subroutine of
in_scrubprefix. Pass it the interface fib.
PR: kern/187549
Reviewed by: melifaro
MFC after: 3 weeks
Sponsored by: Spectra Logic Corporation
walks the list of connections in TIME_WAIT closing expired connections
due to contention on the global TCP pcbinfo lock.
To remediate, introduce a new global lock to protect the list of
connections in TIME_WAIT. Only acquire the TCP pcbinfo lock when
closing an expired connection. This limits the window of time when
TCP input processing is stopped to the amount of time needed to close
a single connection.
Submitted by: Julien Charbon <jcharbon@verisign.com>
Reviewed by: rwatson, rrs, adrian
MFC after: 2 months
restricted to a single FIB in a multifib system.
Restricting an interface's routes to the FIB to which it is assigned (by
setting net.add_addr_allfibs=0) causes ARP updates to fail with "arpresolve:
can't allocate llinfo for x.x.x.x". This is due to the ARP update code hard
coding it's lookup for existing routing entries to FIB 0.
sys/netinet/in.c:
When dealing with RTM_ADD (add route) requests for an interface, use
the interface's assigned FIB instead of the default (FIB 0).
sys/netinet/if_ether.c:
In arpresolve(), enhance error message generated when an
lla_lookup() fails so that the interface causing the error is
visible in logs.
tests/sys/netinet/fibs_test.sh
Clear ATF expected error.
PR: kern/167947
Submitted by: Nikolay Denev <ndenev@gmail.com> (previous version)
Reviewed by: melifaro
MFC after: 3 weeks
Sponsored by: Spectra Logic Corporation
proprietary binary format.
* Add support for a diagnostic information error cause.
The code is sysctlable and the default is 0, which
means it is not sent.
This is joint work with rrs@.
MFC after: 1 week
linking NIC Receive Side Scaling (RSS) to the network stack's
connection-group implementation. This prototype (and derived patches)
are in use at Juniper and several other FreeBSD-using companies, so
despite some reservations about its maturity, merge the patch to the
base tree so that it can be iteratively refined in collaboration rather
than maintained as a set of gradually diverging patch sets.
(1) Merge a software implementation of the Toeplitz hash specified in
RSS implemented by David Malone. This is used to allow suitable
pcbgroup placement of connections before the first packet is
received from the NIC. Software hashing is generally avoided,
however, due to high cost of the hash on general-purpose CPUs.
(2) In in_rss.c, maintain authoritative versions of RSS state intended
to be pushed to each NIC, including keying material, hash
algorithm/ configuration, and buckets. Provide software-facing
interfaces to hash 2- and 4-tuples for IPv4 and IPv6 using both
the RSS standardised Toeplitz and a 'naive' variation with a hash
efficient in software but with poor distribution properties.
Implement rss_m2cpuid()to be used by netisr and other load
balancing code to look up the CPU on which an mbuf should be
processed.
(3) In the Ethernet link layer, allow netisr distribution using RSS as
a source of policy as an alternative to source ordering; continue
to default to direct dispatch (i.e., don't try and requeue packets
for processing on the 'right' CPU if they arrive in a directly
dispatchable context).
(4) Allow RSS to control tuning of connection groups in order to align
groups with RSS buckets. If a packet arrives on a protocol using
connection groups, and contains a suitable hardware-generated
hash, use that hash value to select the connection group for pcb
lookup for both IPv4 and IPv6. If no hardware-generated Toeplitz
hash is available, we fall back on regular PCB lookup risking
contention rather than pay the cost of Toeplitz in software --
this is a less scalable but, at my last measurement, faster
approach. As core counts go up, we may want to revise this
strategy despite CPU overhead.
Where device drivers suitably configure NICs, and connection groups /
RSS are enabled, this should avoid both lock and line contention during
connection lookup for TCP. This commit does not modify any device
drivers to tune device RSS configuration to the global RSS
configuration; patches are in circulation to do this for at least
Chelsio T3 and Intel 1G/10G drivers. Currently, the KPI for device
drivers is not particularly robust, nor aware of more advanced features
such as runtime reconfiguration/rebalancing. This will hopefully prove
a useful starting point for refinement.
No MFC is scheduled as we will first want to nail down a more mature
and maintainable KPI/KBI for device drivers.
Sponsored by: Juniper Networks (original work)
Sponsored by: EMC/Isilon (patch update and merge)
AppleTalk was a network transport protocol for Apple Macintosh devices
in 80s and then 90s. Starting with Mac OS X in 2000 the AppleTalk was
a legacy protocol and primary networking protocol is TCP/IP. The last
Mac OS X release to support AppleTalk happened in 2009. The same year
routing equipment vendors (namely Cisco) end their support.
Thus, AppleTalk won't be supported in FreeBSD 11.0-RELEASE.