Commit Graph

4358 Commits

Author SHA1 Message Date
Pawel Biernacki
9033ad5f15 sysctl: fix setting net.isr.dispatch during early boot
Fix another collateral damage of r357614: netisr is initialised way before
malloc() is available hence it can't use sysctl_handle_string() that
allocates temporary buffer.  Handle that internally in
sysctl_netisr_dispatch_policy().

PR:		246114
Reported by:	delphij
Reviewed by:	kib
Approved by:	kib (mentor)
Differential Revision:	https://reviews.freebsd.org/D24858
2020-05-16 17:05:44 +00:00
Mark Johnston
c1be839971 pf: Add a new zone for per-table entry counters.
Right now we optionally allocate 8 counters per table entry, so in
addition to memory consumed by counters, we require 8 pointers worth of
space in each entry even when counters are not allocated (the default).

Instead, define a UMA zone that returns contiguous per-CPU counter
arrays for use in table entries.  On amd64 this reduces sizeof(struct
pfr_kentry) from 216 to 160.  The smaller size also results in better
slab efficiency, so memory usage for large tables is reduced by about
28%.

Reviewed by:	kp
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D24843
2020-05-16 00:28:12 +00:00
Kyle Evans
c79cee7136 kernel: provide panicky version of __unreachable
__builtin_unreachable doesn't raise any compile-time warnings/errors on its
own, so problems with its usage can't be easily detected. While it would be
nice for this situation to change and compilers to at least add a warning
for trivial cases where local state means the instruction can't be reached,
this isn't the case at the moment and likely will not happen.

This commit adds an __assert_unreachable, whose intent is incredibly clear:
it asserts that this instruction is unreachable. On INVARIANTS builds, it's
a panic(), and on non-INVARIANTS it expands to  __unreachable().

Existing users of __unreachable() are converted to __assert_unreachable,
to improve debuggability if this assumption is violated.

Reviewed by:	mjg
Differential Revision:	https://reviews.freebsd.org/D23793
2020-05-13 18:07:37 +00:00
Warner Losh
83b4342743 Kill trailing newline while I'm here... 2020-05-12 23:46:52 +00:00
Alexander V. Chernikov
4a6ee281d9 Remove unused rnh_close callback from rtable & cleanup depends.
rnh_close callbackes was used by the in[6]_clsroute() handlers,
 doing cleanup in the route cloning code. Route cloning was eliminated
 somewhere around r186119. Last callback user was eliminated in r186215,
 11 years ago.

Differential Revision:	https://reviews.freebsd.org/D24793
2020-05-11 06:09:18 +00:00
Alexander V. Chernikov
d223372545 Remove rtalloc1(_fib) KPI.
Last user of rtalloc1() KPI has been eliminated in rS360631.
As kernel is now fully switched to use new routing KPI defined in
rS359823, remove old lookup functions.

Differential Revision:	https://reviews.freebsd.org/D24776
2020-05-10 09:34:48 +00:00
Alexander V. Chernikov
656442a718 Embed dst sockaddr into rtentry and remove rte packet counter
Currently each rtentry has dst&gateway allocated separately from another zone,
 bloating cache accesses.

Current 'struct rtentry' has 12 "mandatory" radix pointers in the beginning,
 leaving 4 usable pointers/32 bytes in the first 2 cache lines (amd64).
Fields needed for the datapath are destination sockaddr and rt_nhop.

So far it doesn't look like there is other routable addressing protocol other
 than IPv4/IPv6/MPLS, which uses keys longer than 20 bytes.
With that in mind, embed dst into struct rtentry, making the first 24 bytes
 of rtentry within 128 bytes. That is enough to make IPv6 address within first
 128 bytes.

It is still pretty easy to add code for supporting separately-allocated dst,
 however it doesn't make a lot of sense in having such code without a use case.

As rS359823 moved the gateway to the nexthop structure, the dst embedding change
 removes the need for any additional allocations done by rt_setgate().

Lastly, as a part of cleanup, remove counter(9) allocation code, as this field
 is not used in packet processing anymore.

Reviewed by:	ae
Differential Revision:	https://reviews.freebsd.org/D24669
2020-05-08 21:06:10 +00:00
Alexander V. Chernikov
682b902daf Add rib_lookup() sockaddr lookup wrapper and make ifa_ifwithroute use it.
Create rib_lookup() wrapper around per-af dataplane lookup functions.
This will help in the cases of having control plane af-agnostic code.

Switch ifa_ifwithroute() to use this function instead of rtalloc1().

Reviewed by:	ae
Differential Revision:	https://reviews.freebsd.org/D24731
2020-05-07 08:11:36 +00:00
Alexander V. Chernikov
d94be4ccaa Switch DDB show route to direct rnh_matchaddr() call instead of rtalloc1().
Eliminate the last rtalloc1() call to finish transition to the new routing
KPI defined in r359823.

Differential Revision:	https://reviews.freebsd.org/D24663
2020-05-04 15:07:57 +00:00
Alexander V. Chernikov
4f08f052ad Simplify address parsing in DDB show route command.
Use db_get_line() to overcome parser limitation.

Differential Revision:	https://reviews.freebsd.org/D24662
2020-05-04 15:00:19 +00:00
Alexander V. Chernikov
9e02229580 Remove now-unused rt_ifp,rt_ifa,rt_gateway,rt_mtu rte fields.
After converting routing subsystem customers to use nexthop objects
 defined in r359823, some fields in struct rtentry became unused.

This commit removes rt_ifp, rt_ifa, rt_gateway and rt_mtu from struct rtentry
 along with the code initializing and updating these fields.

Cleanup of the remaining fields will be addressed by D24669.

This commit also changes the implementation of the RTM_CHANGE handling.
Old implementation tried to perform the whole operation under radix WLOCK,
 resulting in slow performance and hacks like using RTF_RNH_LOCKED flag.
New implementation looks up the route nexthop under radix RLOCK, creates new
 nexthop and tries to update rte nhop pointer. Only last part is done under
 WLOCK.
In the hypothetical scenarious where multiple rtsock clients
 repeatedly issue RTM_CHANGE requests for the same route, route may get
 updated between read and update operation. This is addressed by retrying
 the operation multiple (3) times before returning failure back to the
 caller.

Differential Revision:	https://reviews.freebsd.org/D24666
2020-05-04 14:31:45 +00:00
Mark Johnston
814fa34dfb Increase the iflib txq callout mutex name length to 32 bytes.
With a length of 16, the name ("<if name>:TX(<qid>):callout") typically
gets truncated.

PR:		245712
Reported by:	ghuckriede@blackberry.com
MFC after:	1 week
2020-04-30 15:39:04 +00:00
Alexander V. Chernikov
8c61eb2107 Convert more rtentry field accesses into nhop fields accesses.
Continue routing subsystem conversion to nhop objects defined in r359823.
Use fields from nhop structure instead of "struct rtentry" fields.
This is one of the last changes prior to removing rt_ifp, rt_ifa,
 rt_gateway and rt_mtu from struct rtentry.

Differential Revision:	https://reviews.freebsd.org/D24609
2020-04-29 21:54:09 +00:00
Alexander V. Chernikov
74787ef47b Add nhop to the ifa_rtrequest() callback.
With the upcoming multipath changes described in D24141,
 rt->rt_nhop can potentially point to a nexthop group instead of
 an individual nhop.
To simplify caller handling of such cases, change ifa_rtrequest() callback
 to pass changed nhop directly.

Differential Revision:	https://reviews.freebsd.org/D24604
2020-04-29 19:28:56 +00:00
Alexander V. Chernikov
fc88ecd31a Move route-specific ddb commands to route/route_ddb.c
Currently functionality resides in rtsock.c, which is a controlling
 interface, partially external to the routing subsystem.
Additionally, DDB-supporting functionality is > 100SLOC, which deserves
 a separate file.

Given that, move this functionality to a newly-created net/route/ subdir.

Reviewed by:	cem
Differential Revision:	https://reviews.freebsd.org/D24561
2020-04-28 20:00:17 +00:00
Alexander V. Chernikov
e7d8af4f65 Move route_temporal.c and route_var.h to net/route.
Nexthop objects implementation, defined in r359823,
 introduced sys/net/route directory intended to hold all
 routing-related code. Move recently-introduced route_temporal.c and
 private route_var.h header there.

Differential Revision:	https://reviews.freebsd.org/D24597
2020-04-28 19:14:09 +00:00
Alexander V. Chernikov
fe6da72759 Move struct rtentry definition to nhop_var.h.
One of the goals of the new routing KPI defined in r359823
 is to entirely hide`struct rtentry` from the consumers.
It will allow to improve routing subsystem internals and deliver
 features much faster.

This is one of the last changes, effectively moving struct rtentry
 definition to a net/route_var.h header, internal to the routing subsystem.

Differential Revision:	https://reviews.freebsd.org/D24580
2020-04-28 18:42:30 +00:00
Alexander V. Chernikov
4043ee3cd7 Convert rtalloc_mpath_fib() users to the new KPI.
New fib[46]_lookup() functions support multipath transparently.
Given that, switch the last rtalloc_mpath_fib() calls to
 dib4_lookup() and eliminate the function itself.

Note: proper flowid generation (especially for the outbound traffic) is a
 bigger topic and will be handled in a separate review.
This change leaves flowid generation intact.

Differential Revision:	https://reviews.freebsd.org/D24595
2020-04-28 08:06:56 +00:00
Alexander V. Chernikov
1b0051bada Eliminate now-unused parts of old routing KPI.
r360292 switched most of the remaining routing customers to a new KPI,
 leaving a bunch of wrappers for old routing lookup functions unused.

Remove them from the tree as a part of routing cleanup.

Differential Revision:	https://reviews.freebsd.org/D24569
2020-04-28 07:25:34 +00:00
Eric Joyner
45818bf1a0 iflib: Stop interface before (un)registering VLAN
This patch is intended to solve a specific problem that iavf(4)
encounters, but what it does can be extended to solve other issues.

To summarize the iavf(4) issue, if the PF driver configures VLAN
anti-spoof, then the VF driver needs to make sure no untagged traffic is
sent if a VLAN is configured, and vice-versa. This can be an issue when
a VLAN is being registered or unregistered, e.g. when a packet may be on
the ring with a VLAN in it, but the VLANs are being unregistered. This
can cause that tagged packet to go out and cause an MDD event.

To fix this, include a new interface-dependent function that drivers can
implement named IFDI_NEEDS_RESTART(). Right now, this function is called
in iflib_vlan_unregister/register() to determine whether the interface
needs to be stopped and started when a VLAN is registered or
unregistered. The default return value of IFDI_NEEDS_RESTART() is true,
so this fixes the MDD problem that iavf(4) encounters, since the
interface rings are flushed during a stop/init.

A future change to iavf(4) will implement that function just in case the
default value changes, and to make it explicit that this interface reset
is required when a VLAN is added or removed.

Reviewed by:	gallatin@
MFC after:	1 week
Sponsored by:	Intel Corporation
Differential Revision:	https://reviews.freebsd.org/D22086
2020-04-27 22:02:44 +00:00
Alexander V. Chernikov
55f57ca9ac Convert debugnet to the new routing KPI.
Introduce new fib[46]_lookup_debugnet() functions serving as a
special interface for the crash-time operations. Underlying
implementation will try to return lookup result if
datastructures are not corrupted, avoding locking.

Convert debugnet to use fib4_lookup_debugnet() and switch it
to use nexthops instead of rtentries.

Reviewed by:	cem
Differential Revision:	https://reviews.freebsd.org/D24555
2020-04-26 18:42:38 +00:00
Kristof Provost
fffd27e5f3 bridge: epoch-ification
Run the bridge datapath under epoch, rather than under the
BRIDGE_LOCK().

We still take the BRIDGE_LOCK() whenever we insert or delete items in
the relevant lists, but we use epoch callbacks to free items so that
it's safe to iterate the lists without the BRIDGE_LOCK.

Tests on mercat5/6 shows this increases bridge throughput significantly,
from 3.7Mpps to 18.6Mpps.

Reviewed by:	emaste, philip, melifaro
MFC after:	2 months
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D24250
2020-04-26 16:22:35 +00:00
Alexander V. Chernikov
57e70e4471 Fix userland build broken by r360292. 2020-04-25 09:25:06 +00:00
Alexander V. Chernikov
983066f05b Convert route caching to nexthop caching.
This change is build on top of nexthop objects introduced in r359823.

Nexthops are separate datastructures, containing all necessary information
 to perform packet forwarding such as gateway interface and mtu. Nexthops
 are shared among the routes, providing more pre-computed cache-efficient
 data while requiring less memory. Splitting the LPM code and the attached
 data solves multiple long-standing problems in the routing layer,
 drastically reduces the coupling with outher parts of the stack and allows
 to transparently introduce faster lookup algorithms.

Route caching was (re)introduced to minimise (slow) routing lookups, allowing
 for notably better performance for large TCP senders. Caching works by
 acquiring rtentry reference, which is protected by per-rtentry mutex.
 If the routing table is changed (checked by comparing the rtable generation id)
 or link goes down, cache record gets withdrawn.

Nexthops have the same reference counting interface, backed by refcount(9).
This change merely replaces rtentry with the actual forwarding nextop as a
 cached object, which is mostly mechanical. Other moving parts like cache
 cleanup on rtable change remains the same.

Differential Revision:	https://reviews.freebsd.org/D24340
2020-04-25 09:06:11 +00:00
Alexander V. Chernikov
aaad3c4fca Convert rtentry field accesses into nhop field accesses.
One of the goals of the new routing KPI defined in r359823 is to entirely
 hide`struct rtentry` from the consumers. It will allow to improve routing
 subsystem internals and deliver more features much faster.

This commit is mostly mechanical change to eliminate direct struct rtentry
 field accesses.

The only notable difference is AF_LINK gateway encoding.

AF_LINK gw is used in routing stack for operations with interface routes
 and host loopback routes.
In the former case it indicates _some_ non-NULL gateway, as the interface
 is the same as in rt_ifp in kernel and rtm_ifindex in rtsock reporting.
In the latter case the interface index inside gateway was used by the IPv6
 datapath to verify address scope for link-local interfaces.

Kernel uses struct sockaddr_dl for this type of gateway. This structure
 allows for specifying rich interface data, such as mac address and interface
 name. However, this results in relatively large structure size - 52 bytes.
Routing stack fils in only 2 fields - sdl_index and sdl_type, which reside
 in the first 8 bytes of the structure.

In the new KPI, struct nhop_object tries to be cache-efficient, hence
 embodies gateway address inside the structure. In the AF_LINK case it
 stores stortened version of the structure - struct sockaddr_dl_short,
 which occupies 16 bytes. After D24340 changes, the data inside AF_LINK
 gateway will not be used in the kernel at all, leaving rtsock as the only
 potential concern.

The difference in rtsock reporting:

(old)
got message of size 240 on Thu Apr 16 03:12:13 2020
RTM_ADD: Add Route: len 240, pid: 0, seq 0, errno 0, flags:<UP,DONE,PINNED>
locks:  inits:
sockaddrs: <DST,GATEWAY,NETMASK>
 10.0.0.0 link#5 255.255.255.0

(new)
got message of size 200 on Sun Apr 19 09:46:32 2020
RTM_ADD: Add Route: len 200, pid: 0, seq 0, errno 0, flags:<UP,DONE,PINNED>
locks:  inits:
sockaddrs: <DST,GATEWAY,NETMASK>
 10.0.0.0 link#5 255.255.255.0

Note 40 bytes different (52-16 + alignment).
However, gateway is still a valid AF_LINK gateway with proper data filled in.

It is worth noting that these particular messages (interface routes) are mostly
 ignored by routing daemons:
* bird/quagga/frr uses RTM_NEWADDR and ignores prefix route addition messages.
* quagga/frr ignores routes without gateway

More detailed overview on how rtsock messages are used by the
 routing daemons to reconstruct the kernel view, can be found in D22974.

Differential Revision:	https://reviews.freebsd.org/D24519
2020-04-23 08:04:20 +00:00
Kristof Provost
fac24ad7f0 bridge: Simplify mac address generation
Unconditionally use ether_gen_addr() to generate bridge mac addresses.  This
function is now less likely to generate duplicate mac addresses across jails.
The old hand rolled hostid based code adds no value.

Reviewed by:	bz
Differential Revision:	https://reviews.freebsd.org/D24432
2020-04-18 08:00:58 +00:00
Kristof Provost
3f8bc99c4c ethersubr: Make the mac address generation more robust
If we create two (vnet) jails and create a bridge interface in each we end up
with the same mac address on both bridge interfaces.
These very often conflicts, resulting in same mac address in both jails.

Mitigate this problem by including the jail name in the mac address.

Reviewed by:	kevans, melifaro
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D24383
2020-04-18 07:50:30 +00:00
Alexander V. Chernikov
ae4b62595e Unbreak build by reverting if_bridge part of r360047.
Pointy hat to: melifaro
2020-04-17 18:22:37 +00:00
Alexander V. Chernikov
6745294280 Finish r191148: replace rtentry with route in if_bridge if_output() callback.
Generic if_output() callback signature was modified to use struct route
 instead of struct rtentry in r191148, back in 2009.

Quoting commit message:

 Change if_output to take a struct route as its fourth argument in order
 to allow passing a cached struct llentry * down to L2

Fix bridge_output() to match this signature and update the remaining
 comment in if_var.h.

Reviewed by:	kp
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D24394
2020-04-17 17:05:58 +00:00
Adrian Chadd
2538a4b1b4 Remove an duplicate definition of nhops_dump_sysctl()
One of the source files included both nhop.h and shared.h, leading to this
clashing.

Tested with: mips-gcc cross toolchain
2020-04-16 23:28:47 +00:00
Alexander V. Chernikov
5cdc484410 Fix userland build broken by r360014. 2020-04-16 17:53:23 +00:00
Alexander V. Chernikov
539642a29d Add nhop parameter to rti_filter callback.
One of the goals of the new routing KPI defined in r359823 is to
 entirely hide`struct rtentry` from the consumers. It will allow to
 improve routing subsystem internals and deliver more features much faster.
This change is one of the ongoing changes to eliminate direct
 struct rtentry field accesses.

Additionally, with the followup multipath changes, single rtentry can point
 to multiple nexthops.

With that in mind, convert rti_filter callback used when traversing the
 routing table to accept pair (rt, nhop) instead of nexthop.

Reviewed by:	ae
Differential Revision:	https://reviews.freebsd.org/D24440
2020-04-16 17:20:18 +00:00
Alexander V. Chernikov
dd4776f0cc Reorganise nd6 notification code to avoid direct rtentry field access.
One of the goals of the new routing KPI defined in r359823 is to entirely hide
 `struct rtentry` from the consumers. Doing so will allow to improve routing
 subsystem internals and deliver features more easily. This change is one of
  the ongoing changes to eliminate direct struct rtentry field accesses.

It introduces rtfree_func() wrapper around RTFREE() and reorganises nd6 notification
 code to avoid accessing most of the rtentry fields.

Reviewed by:	ae
Differential Revision:	https://reviews.freebsd.org/D24404
2020-04-14 22:48:33 +00:00
Alexander V. Chernikov
1968b7eb21 Postpone multipath seed init till SI_SUB_LAST, as it is needed only after
some useland program installs multiple paths to the same destination.

While here, make multipath init conditional.

Discussed with:	cem,ian
2020-04-14 07:38:34 +00:00
Andrew Gallatin
bd673b9942 lagg: stop double-counting output errors and counting drops as errors
Before this change, lagg double-counted errors from lagg members, and counted
every drop by a lagg member as an error.  Eg, if lagg sent a packet, and the
underlying hardware driver dropped it, a counter would be incremented by both
lagg and the underlying driver.

This change attempts to fix that by incrementing lagg's counters only for
errors that do not come from underlying drivers.

Reviewed by:	hselasky, jhb
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D24331
2020-04-13 23:06:56 +00:00
Alexander V. Chernikov
a666325282 Introduce nexthop objects and new routing KPI.
This is the foundational change for the routing subsytem rearchitecture.
 More details and goals are available in https://reviews.freebsd.org/D24141 .

This patch introduces concept of nexthop objects and new nexthop-based
 routing KPI.

Nexthops are objects, containing all necessary information for performing
 the packet output decision. Output interface, mtu, flags, gw address goes
 there. For most of the cases, these objects will serve the same role as
 the struct rtentry is currently serving.
Typically there will be low tens of such objects for the router even with
 multiple BGP full-views, as these objects will be shared between routing
 entries. This allows to store more information in the nexthop.

New KPI:

struct nhop_object *fib4_lookup(uint32_t fibnum, struct in_addr dst,
  uint32_t scopeid, uint32_t flags, uint32_t flowid);
struct nhop_object *fib6_lookup(uint32_t fibnum, const struct in6_addr *dst6,
  uint32_t scopeid, uint32_t flags, uint32_t flowid);

These 2 function are intended to replace all all flavours of
 <in_|in6_>rtalloc[1]<_ign><_fib>, mpath functions  and the previous
 fib[46]-generation functions.

Upon successful lookup, they return nexthop object which is guaranteed to
 exist within current NET_EPOCH. If longer lifetime is desired, one can
 specify NHR_REF as a flag and get a referenced version of the nexthop.
 Reference semantic closely resembles rtentry one, allowing sed-style conversion.

Additionally, another 2 functions are introduced to support uRPF functionality
 inside variety of our firewalls. Their primary goal is to hide the multipath
 implementation details inside the routing subsystem, greatly simplifying
 firewalls implementation:

int fib4_lookup_urpf(uint32_t fibnum, struct in_addr dst, uint32_t scopeid,
  uint32_t flags, const struct ifnet *src_if);
int fib6_lookup_urpf(uint32_t fibnum, const struct in6_addr *dst6, uint32_t scopeid,
  uint32_t flags, const struct ifnet *src_if);

All functions have a separate scopeid argument, paving way to eliminating IPv6 scope
 embedding and allowing to support IPv4 link-locals in the future.

Structure changes:
 * rtentry gets new 'rt_nhop' pointer, slightly growing the overall size.
 * rib_head gets new 'rnh_preadd' callback pointer, slightly growing overall sz.

Old KPI:
During the transition state old and new KPI will coexists. As there are another 4-5
 decent-sized conversion patches, it will probably take a couple of weeks.
To support both KPIs, fields not required by the new KPI (most of rtentry) has to be
 kept, resulting in the temporary size increase.
Once conversion is finished, rtentry will notably shrink.

More details:
* architectural overview: https://reviews.freebsd.org/D24141
* list of the next changes: https://reviews.freebsd.org/D24232

Reviewed by:	ae,glebius(initial version)
Differential Revision:	https://reviews.freebsd.org/D24232
2020-04-12 14:30:00 +00:00
Alexander V. Chernikov
62d95afacb Fix build by adding forgotten header to radix_mpath.c after r359797. 2020-04-11 09:38:45 +00:00
Alexander V. Chernikov
4684d3cbcb Remove per-AF radix_mpath initializtion functions.
Split their functionality by moving random seed allocation
 to SYSINIT and calling (new) generic multipath function from
 standard IPv4/IPv5 RIB init handlers.

Differential Revision:	https://reviews.freebsd.org/D24356
2020-04-11 07:37:08 +00:00
Alexander V. Chernikov
aef2d5fb0e Split rtrequest1_fib() into smaller manageable chunks.
No functional changes.

* Move route addition / route deletion code from rtrequest1_fib()
  to add_route() and del_route() respectively.
* Rename rtrequest1_fib_change() to change_route() for consistency.
* Shrink the scope of ugly info #defines.

MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D24349
2020-04-10 16:27:27 +00:00
Kristof Provost
dd00a42a6b bridge: Change lists to CK_LIST as a peparation for epochification
Prepare the ground for a rework of the bridge locking approach. We will
use an epoch-based approach in the datapath and making it safe to
iterate over the interface, span and rtnode lists without holding the
BRIDGE_LOCK. Replace the relevant lists by their ConcurrencyKit
equivalents.

No functional change in this commit.

Reviewed by:	emaste, ae, philip (previous version)
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D24249
2020-04-05 17:15:20 +00:00
Ed Maste
aeb665b538 remove extraneous double ;s in sys/ 2020-03-30 16:04:25 +00:00
Mark Johnston
59d50fe5ef Simplify taskqgroup inititialization.
taskqgroup initialization was broken into two steps:

1. allocate the taskqgroup structure, at SI_SUB_TASKQ;
2. initialize taskqueues, start taskqueue threads, enqueue "binder"
   tasks to bind threads to specific CPUs, at SI_SUB_SMP.

Step 2 tries to handle the case where tasks have already been attached
to a queue, by migrating them to their intended queue.  In particular,
tasks can't be enqueued before step 2 has completed.  This breaks NFS
mountroot on systems using an iflib-based driver when EARLY_AP_STARTUP
is not defined, since mountroot happens before SI_SUB_SMP in this case.

Simplify initialization: do all initialization except for CPU binding at
SI_SUB_TASKQ.  This means that until CPU binding is completed, group
tasks may be executed on a CPU other than that to which they were bound,
but this should not be a problem for existing users of the taskqgroup
KPIs.

Reported by:	sbruno
Tested by:	bdragon, sbruno
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D24188
2020-03-30 14:22:52 +00:00
Conrad Meyer
704101dd3d Fix PNP matching for iflib NIC drivers
The previous descriptor string specified that all fields were significant for
match.  However, the only significant fields for in-tree drivers are
vendor:devid, and the fictitious zero values constructed by PVID() did not
match real subvendor, subdevice, revision, and/or class values, resulting in no
automatic probe.

If a future iflib driver needs to match on other criteria, the descriptor
string can be updated accordingly.  (E.g., "V32" and ~0 for unspecified values
in PVID().)

Reported by:	mav
Sponsored by:	Dell EMC Isilon
2020-03-24 19:20:10 +00:00
Ed Maste
ed6611cc8c iflib: simplify MPASS assertion
Submitted by:	andrew
2020-03-24 17:54:34 +00:00
Ed Maste
68af0153a7 iflib: split compound assertion
ThunderX cluster systems are panicking on boot with a failed assertion
MPASS(gtask != NULL && gtask->gt_taskqueue != NULL).  Split the
assertion so that it's clear which part is failing.
2020-03-24 17:25:56 +00:00
Patrick Kelsey
876996910a Remove extraneous code from iflib
ifsd_cidx is never used, and the line removed from rxd_frag_to_sd() is
just dead code.

Reviewed by:	erj, gallatin
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D23951
2020-03-14 20:13:42 +00:00
Patrick Kelsey
3caff1885f Remove refill budget from iflib
Reviewed by:	gallatin
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D23948
2020-03-14 19:58:50 +00:00
Patrick Kelsey
b38136097a Allow iflib drivers to specify the buffer size used for each receive queue
Reviewed by:	erj, gallatin
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D23947
2020-03-14 19:56:46 +00:00
Patrick Kelsey
e503049067 Remove freelist contiguous-indexes assertion from rxd_frag_to_sd()
The vmx driver is an example of an iflib driver that might report
packets using non-contiguous descriptors (with unused descriptors
either between received packets or between the fragments of a received
packet), so this assertion needs to be removed.

For such drivers, the freelist producer and consumer indexes don't
relate directly to driver ring slots (the driver deals directly with
freelist buffer indexes supplied by iflib during refill, and reports
them with each fragment during packet reception), but do continue to
be used by iflib for accounting, such as determining the number of
ring slots that are refillable.

PR:		243126, 243392, 240628
Reported by:	avg, alexandr.oleynikov@gmail.com, Harald Schmalzbauer
Reviewed by:	gallatin
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D23946
2020-03-14 19:55:05 +00:00
Patrick Kelsey
4f2beb721b Fix iflib zero-length fragment handling
The dmamap for zero-length fragments should not be unloaded, as doing
so breaks the the cluster-reuse logic in _iflib_fl_refill().

All zero-length fragments are now handled by the assemble_segments()
path so that the cluster-reuse logic there does not have to be
replicated in the small-single-fragment-packet path of
iflib_rxd_pkt_get().

Packets consisting entirely of zero-length fragments (which result in
a NULL mbuf pointer) are now properly tolerated.  This allows drivers
(such as the vmx driver) to pass such packets to iflib when a
descriptor error occurs during packet reception, the advantage being
that the refill of descriptors associated with the error packet are
handled via the existing iflib machinery without having to duplicate
parts of that machinery in the driver to handle that error case.

Reviewed by:	avg, erj, gallatin
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D23945
2020-03-14 19:51:55 +00:00