These changes prevent sysctl(8) from returning proper output,
such as:
1) no output from sysctl(8)
2) erroneously returning ENOMEM with tools like truss(1)
or uname(1)
truss: can not get etype: Cannot allocate memory
there is an environment variable which shall initialize the SYSCTL
during early boot. This works for all SYSCTL types both statically and
dynamically created ones, except for the SYSCTL NODE type and SYSCTLs
which belong to VNETs. A new flag, CTLFLAG_NOFETCH, has been added to
be used in the case a tunable sysctl has a custom initialisation
function allowing the sysctl to still be marked as a tunable. The
kernel SYSCTL API is mostly the same, with a few exceptions for some
special operations like iterating childrens of a static/extern SYSCTL
node. This operation should probably be made into a factored out
common macro, hence some device drivers use this. The reason for
changing the SYSCTL API was the need for a SYSCTL parent OID pointer
and not only the SYSCTL parent OID list pointer in order to quickly
generate the sysctl path. The motivation behind this patch is to avoid
parameter loading cludges inside the OFED driver subsystem. Instead of
adding special code to the OFED driver subsystem to post-load tunables
into dynamically created sysctls, we generalize this in the kernel.
Other changes:
- Corrected a possibly incorrect sysctl name from "hw.cbb.intr_mask"
to "hw.pcic.intr_mask".
- Removed redundant TUNABLE statements throughout the kernel.
- Some minor code rewrites in connection to removing not needed
TUNABLE statements.
- Added a missing SYSCTL_DECL().
- Wrapped two very long lines.
- Avoid malloc()/free() inside sysctl string handling, in case it is
called to initialize a sysctl from a tunable, hence malloc()/free() is
not ready when sysctls from the sysctl dataset are registered.
- Bumped FreeBSD version to indicate SYSCTL API change.
MFC after: 2 weeks
Sponsored by: Mellanox Technologies
map the bucket to an RSS queue, then map the queue to a CPU ID.
This way the bucket->queue and queue->CPU mapping can change
over time.
Introduce IP_RSSBUCKETID - which instead looks up the RSS bucket.
User applications can then map the RSS bucket to a CPU.
There's 128 indirection table entries which correspond to the
low 7 bits of the 32 bit RSS hash. Each value will correspond
to an RSS bucket. (Then each RSS bucket currently will map
to a CPU.)
This is a more explicit way of figuring out which RSS bucket
is in each RSS indirection slot. It can be inferred by the other
methods but I'd rather drivers use something more simplified and
explicit.
reporting IP-addresses to the peer during the handshake, adding
addresses to the host, reporting the addresses via the sysctl
interface (used by netstat, for example) and reporting the
addresses to the application via socket options.
This issue was reported by Bernd Walter.
MFC after: 3 days
ifa_ifwithnet() and ifa_ifwithdstaddr() The legacy functions will call the
_fib() versions with RT_ALL_FIBS, preserving legacy behavior.
sys/net/if_var.h
sys/net/if.c
Add legacy-compatible functions as described above. Ensure legacy
behavior when RT_ALL_FIBS is passed as fibnum.
sys/netinet/in_pcb.c
sys/netinet/ip_output.c
sys/netinet/ip_options.c
sys/net/route.c
sys/net/rtsock.c
sys/netinet6/nd6.c
Call with _fib() functions if we must use a specific fib, or the
legacy functions otherwise.
tests/sys/netinet/fibs_test.sh
tests/sys/netinet/udp_dontroute.c
Improve the udp_dontroute test. The bug that this test exercises is
that ifa_ifwithnet() will return the wrong address, if multiple
interfaces have addresses on the same subnet but with different
fibs. The previous version of the test only considered one possible
failure mode: that ifa_ifwithnet_fib() might fail to find any
suitable address at all. The new version also checks whether
ifa_ifwithnet_fib() finds the correct address by checking where the
ARP request goes.
Reported by: bz, hrs
Reviewed by: hrs
MFC after: 1 week
X-MFC-with: 264905
Sponsored by: Spectra Logic
mode.
Put the htonl(), htons(), ntohl() and ntohs() declarations under
__POSIX_VISIBLE >= 200112. POSIX.1-2001 and newer require these to be
exposed from <netinet/in.h> (as well as <arpa/inet.h>).
Note that it may be unnecessary to check __POSIX_VISIBLE >= 200112 because
older versions of POSIX and the C standard do not define this header.
However, other places in the same file already perform the check.
PR: 188316
Submitted by: Christian Neukirchen
mappings. Instead, they should be first mapping to an RSS bucket and
then querying the RSS bucket -> CPU ID mapping to figure out the target
CPU.
When (if?) RSS rebalancing is implemented or some other (non round-robin)
distribution of work from buckets to CPU IDs, various bits of code - both
userland and kernel - will need to know how this mapping works.
So, to support this:
* Add a new function rss_m2bucket() - this maps an mbuf to a given bucket.
Anything which is currently doing hash -> CPU work may instead wish to
do hash -> bucket, and then query the bucket->cpuid map for which
CPU it belongs on. Or, map it to a bucket, then re-pin that bucket ->
CPU during a rebalance operation.
* For userland applications which wish to exploit affinity to RSS buckets,
the bucket -> CPU ID mapping is now available via a sysctl.
net.inet.rss.bucket_mapping lists the bucket to CPU ID mapping via
a list of bucket:cpu pairs.
lookup for the inp flowid/flowtype to destination CPU.
This only modifies the case where RSS is enabled and the per-cpu tcp
timer option is enabled. Otherwise the behaviour should be the same
as before.
This is intended to be used by various places that wish to hash some
information about a TCP/UDP/IP flow but don't necessarily have a
live mbuf to do it with.
Refactor rss_m2cpuid() to use the refactored function.
Add `flags` u16 field to the hole in ipfw_table_xentry structure.
Kernel has been guessing address family for supplied record based
on xent length size.
Userland, however, has been getting fixed-size ipfw_table_xentry structures
guessing address family by checking address by IN6_IS_ADDR_V4COMPAT().
Fix this behavior by providing specific IPFW_TCF_INET flag for IPv4 records.
PR: bin/189471
Submitted by: Dennis Yusupoff <dyr@smartspb.net>
MFC after: 2 weeks
near-term future use.
These are intended to fetch the current flow id, flow hash type
(M_HASHTYPE_* from the sys/mbuf.h) and if RSS is enabled, the
RSS destined CPU ID for the receive path.
eight years. The original concept was to improve the
corner case where you run out of ephemeral ports, but it
was causing performance problems and the mechanism
of limiting the number of time_wait sockets serves
the same purpose in the end.
Reviewed by: bz
same events that tcpstat's tcps_rcvmemdrop counter counts.
- Rename tcps_rcvmemdrop to tcps_rcvreassfull and improve its
description in netstat(1) output.
Sponsored by: Netflix
Sponsored by: Nginx, Inc.
mixing on stack memory and UMA memory in one linked list.
Thus, rewrite TCP reassembly code in terms of memory usage. The
algorithm remains unchanged.
We actually do not need extra memory to build a reassembly queue.
Arriving mbufs are always packet header mbufs. So we got the length
of data as pkthdr.len. We got m_nextpkt for linkage. And we need
only one pointer to point at the tcphdr, use PH_loc for that.
In tcpcb the t_segq fields becomes mbuf pointer. The t_segqlen
field now counts not packets, but bytes in the queue. This gives
us more precision when comparing to socket buffer limits.
Sponsored by: Netflix
Sponsored by: Nginx, Inc.
exists on another interface. The panic was introduced by change 264887, which
changed the fibnum parameter in the call to rtalloc1_fib() in
ifa_switch_loopback_route() from RT_DEFAULT_FIB to RT_ALL_FIBS. The solution
is to use the interface fib in that call. For the majority of users, that will
be equivalent to the legacy behavior.
PR: kern/189089
Reported by: neel
Reviewed by: neel
MFC after: 3 weeks
X-MFC with: 264887
Sponsored by: Spectra Logic
These two bugs are closely related. The root cause is that ifa_ifwithnet
does not consider FIBs when searching for an interface address.
sys/net/if_var.h
sys/net/if.c
Add a fib argument to ifa_ifwithnet and ifa_ifwithdstadddr. Those
functions will only return an address whose interface fib equals the
argument.
sys/net/route.c
Update calls to ifa_ifwithnet and ifa_ifwithdstaddr with fib
arguments.
sys/netinet/in.c
Update in_addprefix to consider the interface fib when adding
prefixes. This will prevent it from not adding a subnet route when
one already exists on a different fib.
sys/net/rtsock.c
sys/netinet/in_pcb.c
sys/netinet/ip_output.c
sys/netinet/ip_options.c
sys/netinet6/nd6.c
Add RT_DEFAULT_FIB arguments to ifa_ifwithdstaddr and ifa_ifwithnet.
In some cases it there wasn't a clear specific fib number to use.
In others, I was unable to test those functions so I chose
RT_DEFAULT_FIB to minimize divergence from current behavior. I will
fix some of the latter changes along with PR kern/187553.
tests/sys/netinet/fibs_test.sh
tests/sys/netinet/udp_dontroute.c
tests/sys/netinet/Makefile
Revert r263738. The udp_dontroute test was right all along.
However, bugs kern/187550 and kern/187553 cancelled each other out
when it came to this test. Because of kern/187553, ifa_ifwithnet
searched the default fib instead of the requested one, but because
of kern/187550, there was an applicable subnet route on the default
fib. The new test added in r263738 doesn't work right, however. I
can verify with dtrace that ifa_ifwithnet returned the wrong address
before I applied this commit, but route(8) miraculously found the
correct interface to use anyway. I don't know how.
Clear expected failure messages for kern/187550 and kern/187552.
PR: kern/187550
PR: kern/187552
Reviewed by: melifaro
MFC after: 3 weeks
Sponsored by: Spectra Logic
sys/net/route.c
In rtinit1, use the interface fib instead of the process fib. The
latter wasn't very useful because ifconfig(8) is usually invoked
with the default process fib. Changing ifconfig(8) to use setfib(2)
would be redundant, because it already sets the interface fib.
tests/sys/netinet/fibs_test.sh
Clear the expected ATF failure
sys/net/if.c
Pass the interface fib in calls to rtrequest1_fib and rtalloc1_fib
sys/netinet/in.c
sys/net/if_var.h
Add a fibnum argument to ifa_switch_loopback_route, a subroutine of
in_scrubprefix. Pass it the interface fib.
PR: kern/187549
Reviewed by: melifaro
MFC after: 3 weeks
Sponsored by: Spectra Logic Corporation