freebsd-nq

Author	SHA1	Message	Date
Alexander V. Chernikov	f5baf8bb12	Add modular fib lookup framework. This change introduces framework that allows to dynamically attach or detach longest prefix match (lpm) lookup algorithms to speed up datapath route tables lookups. Framework takes care of handling initial synchronisation, route subscription, nhop/nhop groups reference and indexing, dataplane attachments and fib instance algorithm setup/teardown. Framework features automatic algorithm selection, allowing for picking the best matching algorithm on-the-fly based on the amount of routes in the routing table. Currently framework code is guarded under FIB_ALGO config option. An idea is to enable it by default in the next couple of weeks. The following algorithms are provided by default: IPv4: * bsearch4 (lockless binary search in a special IP array), tailored for small-fib (<16 routes) * radix4_lockless (lockless immutable radix, re-created on every rtable change), tailored for small-fib (<1000 routes) * radix4 (base system radix backend) * dpdk_lpm4 (DPDK DIR24-8-based lookups), lockless datastrucure, optimized for large-fib (D27412) IPv6: * radix6_lockless (lockless immutable radix, re-created on every rtable change), tailed for small-fib (<1000 routes) * radix6 (base system radix backend) * dpdk_lpm6 (DPDK DIR24-8-based lookups), lockless datastrucure, optimized for large-fib (D27412) Performance changes: Micro benchmarks (I7-7660U, single-core lookups, 2048k dst, code in D27604): IPv4: 8 routes: radix4: ~20mpps radix4_lockless: ~24.8mpps bsearch4: ~69mpps dpdk_lpm4: ~67 mpps 700k routes: radix4_lockless: 3.3mpps dpdk_lpm4: 46mpps IPv6: 8 routes: radix6_lockless: ~20mpps dpdk_lpm6: ~70mpps 100k routes: radix6_lockless: 13.9mpps dpdk_lpm6: 57mpps Forwarding benchmarks: + 10-15% IPv4 forwarding performance (small-fib, bsearch4) + 25% IPv4 forwarding performance (full-view, dpdk_lpm4) + 20% IPv6 forwarding performance (full-view, dpdk_lpm6) Control: Framwork adds the following runtime sysctls: List algos * net.route.algo.inet.algo_list: bsearch4, radix4_lockless, radix4 * net.route.algo.inet6.algo_list: radix6_lockless, radix6, dpdk_lpm6 Debug level (7=LOG_DEBUG, per-route) net.route.algo.debug_level: 5 Algo selection (currently only for fib 0): net.route.algo.inet.algo: bsearch4 net.route.algo.inet6.algo: radix6_lockless Support for manually changing algos in non-default fib will be added soon. Some sysctl names will be changed in the near future. Differential Revision: https://reviews.freebsd.org/D27401	2020-12-25 11:33:17 +00:00
Ryan Libby	2fb4a03d55	rtsock: quiet -Wunused-variable in LINT-NOIP kernels Fixup after r368769 / `d68fb8d978`. Reported by: mjg Reviewed by: melifaro Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D27730	2020-12-24 12:34:18 -08:00
Mark Johnston	92be2847e8	rtsock: Avoid copying uninitialized padding bytes When copying sockaddrs out to userspace, we pad them to a multiple of the platform alignment (sizeof(long)). However, some sockaddr sizes, such as struct sockaddr_dl, are not an integer multiple of the alignment, so we may end up copying out uninitialized bytes. Fix this by always bouncing through a pre-zeroed sockaddr_storage. Reported by: KASAN Reviewed by: melifaro MFC after: 3 days Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D27729	2020-12-23 11:16:40 -05:00
Kristof Provost	1c00efe98e	pf: Use counter(9) for pf_state byte/packet tracking This improves cache behaviour by not writing to the same variable from multiple cores simultaneously. pf_state is only used in the kernel, so can be safely modified. Reviewed by: Lutz Donnerhacke, philip MFC after: 1 week Sponsed by: Orange Business Services Differential Revision: https://reviews.freebsd.org/D27661	2020-12-23 12:03:21 +01:00
Kristof Provost	c3f69af03a	pf: Fix unaligned checksum updates The algorithm we use to update checksums only works correctly if the updated data is aligned on 16-bit boundaries (relative to the start of the packet). Import the OpenBSD fix for this issue. PR: 240416 Obtained from: OpenBSD MFC after: 1 week Reviewed by: tuexen (previous version) Differential Revision: https://reviews.freebsd.org/D27696	2020-12-23 12:03:20 +01:00
Hans Petter Selasky	ddce63fcb6	Remove not needed variable initialization. And switch from int to bool while at it. Reviewed by: melifaro@ Differential Revision: https://reviews.freebsd.org/D27725 MFC after: 1 week Sponsored by: Mellanox Technologies // NVIDIA Networking	2020-12-23 12:04:46 +01:00
Konstantin Belousov	994e47023a	vxlan: stop checking CSUM_ENCAP_VXLAN when converting inner CSUM flags into normal, for decapsulation. The packet, if processed at this point, was already parsed to be UDP directed to a vxlan port. Connect-X 4+ does not provide easy method to infer which parser processed the packet, so driver cannot set the flag without a lot of efforts which are only to satisfy the formal requirements. Reviewed by: bryanv, np Sponsored by: Mellanox Technologies/NVidia Networking Differential revision: https://reviews.freebsd.org/D27449 MFC after: 1 week	2020-12-23 10:54:06 +02:00
Alexander V. Chernikov	d68fb8d978	Switch direct rt fields access in rtsock.c to newly-create field acessors. rtsock code was build around the assumption that each rtentry record in the system radix tree is a ready-to-use sockaddr. This assumptions turned out to be not quite true: * masks have their length tweaked, so we have rtsock_fix_netmask() hack * IPv6 addresses have their scope embedded, so we have another explicit deembedding hack. Change the code to decouple rtentry internals from rtsock code using newly-created rtentry accessors. This will allow to eventually eliminate both of the hacks and change rtentry dst/mask format. Differential Revision: https://reviews.freebsd.org/D27451	2020-12-18 22:00:57 +00:00
Brooks Davis	f3f2ee76ad	style(9): Correct whitespace in struct definitions struct ifconf and struct ifreq use the odd style "struct<tab>foo". struct ifdrv seems to have tried to follow this but was committed with spaces in place of most tabs resulting in "struct<space><space>ifdrv". MFC after: 3 days	2020-12-11 01:00:07 +00:00
Gleb Smirnoff	5ee33a9076	Fixup r368446 with KERN_TLS.	2020-12-08 23:54:09 +00:00
Gleb Smirnoff	e1074ed6a0	The list of ports in configuration path shall be protected by locks, epoch shall be used only for fast path. Thus use LAGG_XLOCK() in lagg_[un]register_vlan. This fixes sleeping in epoch panic. PR: 240609	2020-12-08 16:46:00 +00:00
Gleb Smirnoff	87bf9b9cbe	Convert LAGG_RLOCK() to NET_EPOCH_ENTER(). No functional changes.	2020-12-08 16:36:46 +00:00
Mark Johnston	c065d4e5e9	iflib: Avoid leaking the freelist bitmaps upon driver detach Submitted by: Sai Rajesh Tallamraju <stallamr@netapp.com> MFC after: 2 weeks Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D27342	2020-12-07 14:53:14 +00:00
Mark Johnston	102540192c	iflib: Detach tasks upon device registration failure In some error paths we would fail to detach from the iflib taskqueue groups. Also move the detach code into its own subroutine instead of duplicating it. Submitted by: Sai Rajesh Tallamraju <stallamr@netapp.com> MFC after: 2 weeks Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D27342	2020-12-07 14:52:57 +00:00
Alexander V. Chernikov	df9053920f	Add IPv4/IPv6 rtentry prefix accessors. Multiple consumers like ipfw, netflow or new route lookup algorithms need to get the prefix data out of struct rtentry. Instead of providing direct access to the rtentry, create IPv4/IPv6 accessors to abstract struct rtentry internals and avoid including internal routing headers for external consumers. While here, move struct route_nhop_data to the public header, so external customers can actually use lookup functions returning rt&nhop data. Differential Revision: https://reviews.freebsd.org/D27416	2020-12-03 22:23:57 +00:00
Kristof Provost	7f883a9b5b	net: Revert vnet/epair cleanup race mitigation Revert the mitigation code for the vnet/epair cleanup race (done in r365457). r368237 introduced a more reliable fix. MFC after: 2 weeks Sponsored by: Modirum MDPay	2020-12-01 16:34:43 +00:00
Kristof Provost	e133271fc1	if: Fix panic when destroying vnet and epair simultaneously When destroying a vnet and an epair (with one end in the vnet) we often panicked. This was the result of the destruction of the epair, which destroys both ends simultaneously, happening while vnet_if_return() was moving the struct ifnet to its home vnet. This can result in a freed ifnet being re-added to the home vnet V_ifnet list. That in turn panics the next time the ifnet is used. Prevent this race by ensuring that vnet_if_return() cannot run at the same time as if_detach() or epair_clone_destroy(). PR: 238870, 234985, 244703, 250870 MFC after: 2 weeks Sponsored by: Modirum MDPay Differential Revision: https://reviews.freebsd.org/D27378	2020-12-01 16:23:59 +00:00
Alexander V. Chernikov	77df2c21cb	Renumber NHR_* flags after NHR_IFAIF removal in r368127. Suggested by: rpokala	2020-11-30 21:42:55 +00:00
Alexander V. Chernikov	d1d941c5b9	Remove RADIX_MPATH config option. ROUTE_MPATH is the new config option controlling new multipath routing implementation. Remove the last pieces of RADIX_MPATH-related code and the config option. Reviewed by: glebius Differential Revision: https://reviews.freebsd.org/D27244	2020-11-29 19:43:33 +00:00
Matt Macy	2338da0373	Import kernel WireGuard support Data path largely shared with the OpenBSD implementation by Matt Dunwoodie <ncon@nconroy.net> Reviewed by: grehan@freebsd.org MFC after: 1 month Sponsored by: Rubicon LLC, (Netgate) Differential Revision: https://reviews.freebsd.org/D26137	2020-11-29 19:38:03 +00:00
Alexander V. Chernikov	3b1654cb14	Introduce rib_walk_ext_internal() to allow iteration with rnh pointer. This solves the case when rib is not yet attached/detached to/from the system rib array. Differential Revision: https://reviews.freebsd.org/D27406	2020-11-29 13:54:49 +00:00
Alexander V. Chernikov	f47fa26065	Add nhop_ref_any() to unify referencing nhop or nexthop group. It allows code within routing subsystem to transparently reference nexthops and nexthop groups, similar to nhop_free_any(), abstracting ROUTE_MPATH details. Differential Revision: https://reviews.freebsd.org/D27410	2020-11-29 13:52:06 +00:00
Alexander V. Chernikov	b712e3e343	Refactor fib4/fib6 functions. No functional changes. * Make lookup path of fib<4\|6>_lookup_debugnet() separate functions (fib<46>_lookup_rt()). These will be used in the control plane code requiring unlocked radix operations and actual prefix pointer. * Make lookup part of fib<4\|6>_check_urpf() separate functions. This change simplifies the switch to alternative lookup implementations, which helps algorithmic lookups introduction. * While here, use static initializers for IPv4/IPv6 keys Differential Revision: https://reviews.freebsd.org/D27405	2020-11-29 13:41:49 +00:00
Alexander V. Chernikov	98d5c4e5c8	Add tracking for rib/nhops/nhgrp objects and provide cumulative number accessors. The resulting KPI can be used by routing table consumers to estimate the required scale for route table export. * Add tracking for rib routes * Add accessors for number of nexthops/nexthop objects * Simplify rib_unsubscribe: store rnh we're attached to instead of requiring it up again on destruction. This helps in the cases when rnh is not linked yet/already unlinked. Differential Revision: https://reviews.freebsd.org/D27404	2020-11-29 13:27:24 +00:00
Alexander V. Chernikov	ef6ef7e5da	Add nhgrp_get_idx() as a counterpart for nhop_get_idx(). It allows the routing-related code to reference nexthop groups by index instead of storing a pointer.	2020-11-28 15:46:40 +00:00
Alexander V. Chernikov	7a6dc73c98	Cleanup nexthops request flags: * remove NHR_IFAIF as it was used by previous version of nexthop KPI * update NHR_REF description	2020-11-28 15:11:59 +00:00
Konstantin Belousov	cd85379104	Make MAXPHYS tunable. Bump MAXPHYS to 1M. Replace MAXPHYS by runtime variable maxphys. It is initialized from MAXPHYS by default, but can be also adjusted with the tunable kern.maxphys. Make b_pages[] array in struct buf flexible. Size b_pages[] for buffer cache buffers exactly to atop(maxbcachebuf) (currently it is sized to atop(MAXPHYS)), and b_pages[] for pbufs is sized to atop(maxphys) + 1. The +1 for pbufs allow several pbuf consumers, among them vmapbuf(), to use unaligned buffers still sized to maxphys, esp. when such buffers come from userspace (). Overall, we save significant amount of otherwise wasted memory in b_pages[] for buffer cache buffers, while bumping MAXPHYS to desired high value. Eliminate all direct uses of the MAXPHYS constant in kernel and driver sources, except a place which initialize maxphys. Some random (and arguably weird) uses of MAXPHYS, e.g. in linuxolator, are converted straight. Some drivers, which use MAXPHYS to size embeded structures, get private MAXPHYS-like constant; their convertion is out of scope for this work. Changes to cam/, dev/ahci, dev/ata, dev/mpr, dev/mpt, dev/mvs, dev/siis, where either submitted by, or based on changes by mav. Suggested by: mav () Reviewed by: imp, mav, imp, mckusick, scottl (intermediate versions) Tested by: pho Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D27225	2020-11-28 12:12:51 +00:00
Kristof Provost	bca0e1d2ac	if: Fix non-VIMAGE build if_link_ifnet() and if_unlink_ifnet() are needed even when VIMAGE is not enabled. MFC after: 2 weeks Sponsored by: Modirum MDPay	2020-11-25 17:15:24 +00:00
Kristof Provost	a779388f8b	if: Protect V_ifnet in vnet_if_return() When we terminate a vnet (i.e. jail) we move interfaces back to their home vnet. We need to protect our access to the V_ifnet CK_LIST. We could enter NET_EPOCH, but if_detach_internal() (called from if_vmove()) waits for net epoch callback completion. That's not possible from NET_EPOCH. Instead, we take the IFNET_WLOCK, build a list of the interfaces that need to move and, once we've released the lock, move them back to their home vnet. We cannot hold the IFNET_WLOCK() during if_vmove(), because that results in a LOR between ifnet_sx, in_multi_sx and iflib ctx lock. Separate out moving the ifp into or out of V_ifnet, so we can hold the lock as we do the list manipulation, but do not hold it as we if_vmove(). Reviewed by: melifaro MFC after: 2 weeks Sponsored by: Modirum MDPay Differential Revision: https://reviews.freebsd.org/D27279	2020-11-25 15:07:22 +00:00
Kristof Provost	a60100fdfc	if: Remove ifnet_rwlock It no longer serves any purpose, as evidenced by the fact that we never take it without ifnet_sxlock. Sponsored by: Modirum MDPay Differential Revision: https://reviews.freebsd.org/D27278	2020-11-25 10:56:38 +00:00
Alexander V. Chernikov	7511a63825	Refactor rib iterator functions. * Make rib_walk() order of arguments consistent with the rest of RIB api * Add rib_walk_ext() allowing to exec callback before/after iteration. * Rename rt_foreach_fib_walk_del -> rib_foreach_table_walk_del * Rename rt_forach_fib_walk -> rib_foreach_table_walk * Move rib_foreach_table_walk{_del} to route/route_helpers.c * Slightly refactor rib_foreach_table_walk{_del} to make the implementation consistent and prepare for upcoming iterator optimizations. Differential Revision: https://reviews.freebsd.org/D27219	2020-11-22 20:21:10 +00:00
Mitchell Horne	70af7ce99a	Make net/ifq.h C++ friendly Don't use "new" as an identifier, and add explicit casts from void *. As a general policy, FreeBSD doesn't make any C++ compatibility guarantees for kernel headers like it does for userland, but it is a small effort to do so in this case, to the benefit of a downstream consumer (NetApp). Reviewed by: rscheff Sponsored by: NetApp, Inc. Sponsored by: Klara, Inc. Differential Revision: https://reviews.freebsd.org/D27286	2020-11-20 14:45:45 +00:00
Andrew Gallatin	8732245d29	LACP: When suppressing distributing, return ENOBUFS When links come and go, lacp goes into a "suppress distributing" mode where it drops traffic for 3 seconds. When in this mode, lagg/lacp historiclally drops traffic with ENETDOWN. That return value causes TCP to close any connection where it gets that value back from the lower parts of the stack. This means that any TCP connection with active traffic during a 3-second windown when an LACP link comes or goes would get closed. TCP treats return values of ENOBUFS as transient errors, and re-schedules transmission later. So rather than returning ENETDOWN, lets return ENOBUFS instead. This allows TCP connections to be preserved. I've tested this by repeatedly bouncing links on a Netlfix CDN server under a moderate (20Gb/s) load and overved ENOBUFS reported back to the TCP stack (as reported by a RACK TCP sysctl). Reviewed by: jhb, jtl, rrs Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D27188	2020-11-18 14:55:49 +00:00
Mark Johnston	54bf96fb4f	iflib: Free full mbuf chains when draining transmit queues Submitted by: Sai Rajesh Tallamraju <stallamr@netapp.com> Reviewed by: gallatin, hselasky MFC after: 1 week Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D27179	2020-11-11 18:00:06 +00:00
Andrey V. Elsukov	2f4ffa9f72	Fix possible NULL pointer dereference. lagg(4) replaces if_output method of its child interfaces and expects that this method can be called only by child interfaces. But it is possible that lagg_port_output() could be called by children of child interfaces. In this case ifnet's if_lagg field is NULL. Add check that lp is not NULL. Obtained from: Yandex LLC MFC after: 1 week Sponsored by: Yandex LLC	2020-11-11 15:53:36 +00:00
Mitchell Horne	4a3fc6e22e	Fix definition of rn_addmask() Add the missing static keyword present in the declaration. Reviewed by: melifaro Sponsored by: NetApp, Inc. Sponsored by: Klara, Inc. Differential Revision: https://reviews.freebsd.org/D27024	2020-11-08 19:02:22 +00:00
Alexander V. Chernikov	2d39824195	Switch net.add_addr_allfibs default to 0. The goal of the fib support is to provide multiple independent routing tables, isolated from each other. net.add_addr_allfibs default tries to shift gears in the opposite direction, unconditionally inserting all addresses to all of the fibs. There are use cases when this is necessary, however this is not a default expected behaviour, especially compared to other implementations. Provide WARNING message for the setups with multiple fibs to notify potential users of the feature. Differential Revision: https://reviews.freebsd.org/D26076	2020-11-08 18:27:49 +00:00
Alexander V. Chernikov	76e6b37f6b	Temporarily revert setting net.add_addr_allfibs to 0. It accidentally sweeped in r367486. Revert to allow for proper commit message & warning.	2020-11-08 18:11:12 +00:00
Alexander V. Chernikov	770495f4c0	Fix build broken by r367484: add route_ifaddrs.c. Pointy hat to: melifaro Reported by: jenkins	2020-11-08 13:30:44 +00:00
Alexander V. Chernikov	bad6b23606	Move all ifaddr route creation business logic to net/route/route_ifaddr.c Differential Revision: https://reviews.freebsd.org/D26318	2020-11-08 11:12:00 +00:00
Konstantin Belousov	80ba361b2f	if_media.c SIOCGMEDIAX handler: improve loop Stop advancing counter past the current iteration number at the start of iteration. This removes the need of subtracting one when calculating index for copyout, and arguably fixes off-by-one reporting of copied out elements when copyout failed. Reviewed by: hselasky Sponsored by: Mellanox Technologies / NVidia Networking MFC after: 1 week Differential revision: https://reviews.freebsd.org/D27073	2020-11-03 14:33:04 +00:00
Konstantin Belousov	1fbbe9dbf5	net/if_media.c: improve IFMEDIA_DEBUG output. Use consistent output format for hex. Print both media and mask where relevant. Reviewed by: hselasky Sponsored by: Mellanox Technologies/NVidia Networking MFC after: 1 week Differential revision: https://reviews.freebsd.org/D27034	2020-11-01 16:38:30 +00:00
Konstantin Belousov	e399f19dba	Cleanup of net/if_media.c: simplify cleanup loop in ifmedia_removeall(). Reviewed by: hselasky Sponsored by: Mellanox Technologies/NVidia Networking MFC after: 1 week Differential revision: https://reviews.freebsd.org/D27034	2020-11-01 16:36:21 +00:00
Konstantin Belousov	899322fdfa	Cleanup of net/if_media.c: some style. Reviewed by: hselasky Sponsored by: Mellanox Technologies/NVidia Networking MFC after: 1 week Differential revision: https://reviews.freebsd.org/D27034	2020-11-01 16:30:17 +00:00
Konstantin Belousov	2193fb16b5	Cleanup of net/if_media.c: switch to ANSI C function definitions. Reviewed by: hselasky Sponsored by: Mellanox Technologies/NVidia Networking MFC after: 1 week Differential revision: https://reviews.freebsd.org/D27034	2020-11-01 16:25:35 +00:00
Mitchell Horne	ced0f52457	net: add ETHER_IS_IPV6_MULTICAST This can be used to detect if an ethernet address is specifically an IPv6 multicast address, defined in accordance to RFC 2464. ETHER_IS_MULTICAST is still preferred in the general case. Reviewed by: ae Sponsored by: NetApp, Inc. Sponsored by: Klara, Inc. Differential Revision: https://reviews.freebsd.org/D26611	2020-10-30 13:32:58 +00:00
John Baldwin	36e0a362ac	Add m_snd_tag_alloc() as a wrapper around if_snd_tag_alloc(). This gives a more uniform API for send tag life cycle management. Reviewed by: gallatin, hselasky Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D27000	2020-10-29 23:28:39 +00:00
John Baldwin	521eac97f3	Support hardware rate limiting (pacing) with TLS offload. - Add a new send tag type for a send tag that supports both rate limiting (packet pacing) and TLS offload (mostly similar to D22669 but adds a separate structure when allocating the new tag type). - When allocating a send tag for TLS offload, check to see if the connection already has a pacing rate. If so, allocate a tag that supports both rate limiting and TLS offload rather than a plain TLS offload tag. - When setting an initial rate on an existing ifnet KTLS connection, set the rate in the TCP control block inp and then reset the TLS send tag (via ktls_output_eagain) to reallocate a TLS + ratelimit send tag. This allocates the TLS send tag asynchronously from a task queue, so the TLS rate limit tag alloc is always sleepable. - When modifying a rate on a connection using KTLS, look for a TLS send tag. If the send tag is only a plain TLS send tag, assume we failed to allocate a TLS ratelimit tag (either during the TCP_TXTLS_ENABLE socket option, or during the send tag reset triggered by ktls_output_eagain) and ignore the new rate. If the send tag is a ratelimit TLS send tag, change the rate on the TLS tag and leave the inp tag alone. - Lock the inp lock when setting sb_tls_info for a socket send buffer so that the routines in tcp_ratelimit can safely dereference the pointer without needing to grab the socket buffer lock. - Add an IFCAP_TXTLS_RTLMT capability flag and associated administrative controls in ifconfig(8). TLS rate limit tags are only allocated if this capability is enabled. Note that TLS offload (whether unlimited or rate limited) always requires IFCAP_TXTLS[46]. Reviewed by: gallatin, hselasky Relnotes: yes Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D26691	2020-10-29 00:23:16 +00:00
Vincenzo Maffione	be7a6b3d84	iflib: fix typo bug introduced by r367093 Code was supposed to call callout_reset_sbt_on() rather than callout_reset_sbt(). This resulted into passing a "cpu" value to a "flag" argument. A recipe for subtle errors. PR: 248652 Reported by: sg@efficientip.com MFC with: r367093	2020-10-28 21:06:17 +00:00
Vincenzo Maffione	17cec474c0	iflib: add per-tx-queue netmap timer The way netmap TX is handled in iflib when TX interrupts are not used (IFC_NETMAP_TX_IRQ not set) has some issues: - The netmap_tx_irq() function gets called by iflib_timer(), which gets scheduled with tick granularity (hz). This is not frequent enough for 10Gbps NICs and beyond (e.g., ixgbe or ixl). The end result is that the transmitting netmap application is not woken up fast enough to saturate the link with small packets. - The iflib_timer() functions also calls isc_txd_credits_update() to ask for more TX completion updates. However, this violates the netmap requirement that only txsync can access the TX queue for datapath operations. Only netmap_tx_irq() may be called out of the txsync context. This change introduces per-tx-queue netmap timers, using microsecond granularity to ensure that netmap_tx_irq() can be called often enough to allow for maximum packet rate. The timer routine simply calls netmap_tx_irq() to wake up the netmap application. The latter will wake up and call txsync to collect TX completion updates. This change brings back line rate speed with small packets for ixgbe. For the time being, timer expiration is hardcoded to 90 microseconds, in order to avoid introducing a new sysctl. We may eventually implement an adaptive expiration period or use another deferred work mechanism in place of timers. Also, fix the timers usage to make sure that each queue is serviced by a different CPU. PR: 248652 Reported by: sg@efficientip.com MFC after: 2 weeks	2020-10-27 21:53:33 +00:00

1 2 3 4 5 ...

4510 Commits