freebsd-dev

Author	SHA1	Message	Date
Navdeep Parhar	610d345953	if_vxlan(4): csum_flags_to_inner_flags takes the tunnel protocol as a parameter. No functional change.	2020-10-22 17:05:55 +00:00
Hans Petter Selasky	ce329aa256	Compile fix for MIPS, MIPS64, POWERPC and POWERPC64. Add missing include files. Differential Revision: https://reviews.freebsd.org/D26254 Reviewed by: melifaro@ MFC after: 1 week Sponsored by: Mellanox Technologies // NVIDIA Networking	2020-10-22 12:22:08 +00:00
Hans Petter Selasky	a92c4bb62a	Add support for IP over infiniband, IPoIB, to lagg(4). Currently only the failover protocol is supported due to limitations in the IPoIB architecture. Refer to the lagg(4) manual page for how to configure and use this new feature. A new network interface type, IFT_INFINIBANDLAG, has been added, similar to the existing IFT_IEEE8023ADLAG . ifconfig(8) has been updated to accept a new laggtype argument when creating lagg(4) network interfaces. This new argument is used to distinguish between ethernet and infiniband type of lagg(4) network interface. The laggtype argument is optional and defaults to ethernet. The lagg(4) command line syntax is backwards compatible. Differential Revision: https://reviews.freebsd.org/D26254 Reviewed by: melifaro@ MFC after: 1 week Sponsored by: Mellanox Technologies // NVIDIA Networking	2020-10-22 09:47:12 +00:00
Hans Petter Selasky	9d40cf60d6	Factor out generic IP over infiniband, IPoIB, definitions and code into net/if_infiniband.c and net/infiniband.h . No functional change intended. Differential Revision: https://reviews.freebsd.org/D26254 Reviewed by: melifaro@ MFC after: 1 week Sponsored by: Mellanox Technologies // NVIDIA Networking	2020-10-22 09:09:53 +00:00
Alexander V. Chernikov	c7cffd65c5	Add support for stacked VLANs (IEEE 802.1ad, AKA Q-in-Q). 802.1ad interfaces are created with ifconfig using the "vlanproto" parameter. Eg., the following creates a 802.1Q VLAN (id #42) over a 802.1ad S-VLAN (id #5) over a physical Ethernet interface (em0). ifconfig vlan5 create vlandev em0 vlan 5 vlanproto 802.1ad up ifconfig vlan42 create vlandev vlan5 vlan 42 inet 10.5.42.1/24 VLAN_MTU, VLAN_HWCSUM and VLAN_TSO capabilities should be properly supported. VLAN_HWTAGGING is only partially supported, as there is currently no IFCAP_VLAN_* denoting the possibility to set the VLAN EtherType to anything else than 0x8100 (802.1ad uses 0x88A8). Submitted by: Olivier Piras Sponsored by: RG Nets Differential Revision: https://reviews.freebsd.org/D26436	2020-10-21 21:28:20 +00:00
Alexander V. Chernikov	0c325f53f1	Implement flowid calculation for outbound connections to balance connections over multiple paths. Multipath routing relies on mbuf flowid data for both transit and outbound traffic. Current code fills mbuf flowid from inp_flowid for connection-oriented sockets. However, inp_flowid is currently not calculated for outbound connections. This change creates simple hashing functions and starts calculating hashes for TCP,UDP/UDP-Lite and raw IP if multipath routes are present in the system. Reviewed by: glebius (previous version),ae Differential Revision: https://reviews.freebsd.org/D26523	2020-10-18 17:15:47 +00:00
Marcin Wojtas	1148702e43	Add SADB_SAFLAGS_ESN flag This flag is going to be used by IKE daemon to signal if Extended Sequence Number feature is going to be used. Value for this flag was taken from OpenBSD source code `6b4cbaf181` Submitted by: Patryk Duda <pdk@semihalf.com> Reviewed by: ae Differential revision: https://reviews.freebsd.org/D22366 Obtained from: Semihalf Sponsored by: Stormshield	2020-10-16 11:22:29 +00:00
Richard Scheffenegger	868aabb470	Add IP(V6)_VLAN_PCP to set 802.1 priority per-flow. This adds a new IP_PROTO / IPV6_PROTO setsockopt (getsockopt) option IP(V6)_VLAN_PCP, which can be set to -1 (interface default), or explicitly to any priority between 0 and 7. Note that for untagged traffic, explicitly adding a priority will insert a special 801.1Q vlan header with vlan ID = 0 to carry the priority setting Reviewed by: gallatin, rrs MFC after: 2 weeks Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D26409	2020-10-09 12:06:43 +00:00
Konstantin Belousov	cefdb89514	Fix typo. Sponsored by: Mellanox Technologies/NVIDIA Networking MFC after: 3 days	2020-10-07 10:58:56 +00:00
Kristof Provost	4af1bd8157	bridge: call member interface ioctl() without NET_EPOCH We're not allowed to hold NET_EPOCH while sleeping, so when we call ioctl() handlers for member interfaces we cannot be in NET_EPOCH. We still need some protection of our CK_LISTs, so hold BRIDGE_LOCK instead. That requires changing BRIDGE_LOCK into a sleepable lock, and separating the BRIDGE_RT_LOCK, to protect bridge_rtnode lists. That lock is taken in the data path (while in NET_EPOCH), so it cannot be a sleepable lock. While here document the locking strategy. MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D26418	2020-10-06 19:19:56 +00:00
John Baldwin	56fb710f1b	Store the send tag type in the common send tag header. Both cxgbe(4) and mlx5(4) wrapped the existing send tag header with their own identical headers that stored the type that the type-specific tag structures inherited from, so in practice it seems drivers need this in the tag anyway. This permits removing these extra header indirections (struct cxgbe_snd_tag and struct mlx5e_snd_tag). In addition, this permits driver-independent code to query the type of a tag, e.g. to know what type of tag is being queried via if_snd_query. Reviewed by: gallatin, hselasky, np, kib Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D26689	2020-10-06 17:58:56 +00:00
Alexander V. Chernikov	1b95005e95	Fix route flags update during RTM_CHANGE. Nexthop lookup was not consireding rt_flags when doing structure comparison, which lead to an original nexthop selection when changing flags. Fix the case by adding rt_flags field into comparison and rearranging nhop_priv fields to allow for efficient matching. Fix `route change X/Y flags` case - recent changes disallowed specifying RTF_GATEWAY flag without actual gateway. It turns out, route(8) fills in RTF_GATEWAY by default, unless -interface flag is specified. Fix regression by clearing RTF_GATEWAY flag instead of failing. Fix route flag reporting in RTM_CHANGE messages by explicitly updating rtm_flags after operation competion. Add IPv4/IPv6 tests for flag-only route changes.	2020-10-04 13:24:58 +00:00
Alexander V. Chernikov	9c584fa4bc	Remove ROUTE_MPATH-related warnings introduced in r366390. Reported by: mjg	2020-10-03 14:37:54 +00:00
Alexander V. Chernikov	fedeb08b6a	Introduce scalable route multipath. This change is based on the nexthop objects landed in D24232. The change introduces the concept of nexthop groups. Each group contains the collection of nexthops with their relative weights and a dataplane-optimized structure to enable efficient nexthop selection. Simular to the nexthops, nexthop groups are immutable. Dataplane part gets compiled during group creation and is basically an array of nexthop pointers, compiled w.r.t their weights. With this change, `rt_nhop` field of `struct rtentry` contains either nexthop or nexthop group. They are distinguished by the presense of NHF_MULTIPATH flag. All dataplane lookup functions returns pointer to the nexthop object, leaving nexhop groups details inside routing subsystem. User-visible changes: The change is intended to be backward-compatible: all non-mpath operations should work as before with ROUTE_MPATH and net.route.multipath=1. All routes now comes with weight, default weight is 1, maximum is 2^24-1. Current maximum multipath group width is statically set to 64. This will become sysctl-tunable in the followup changes. Using functionality: * Recompile kernel with ROUTE_MPATH * set net.route.multipath to 1 route add -6 2001:db8::/32 2001:db8::2 -weight 10 route add -6 2001:db8::/32 2001:db8::3 -weight 20 netstat -6On Nexthop groups data Internet6: GrpIdx NhIdx Weight Slots Gateway Netif Refcnt 1 ------- ------- ------- --------------------------------------- --------- 1 13 10 1 2001:db8::2 vlan2 14 20 2 2001:db8::3 vlan2 Next steps: * Land outbound hashing for locally-originated routes ( D26523 ). * Fix net/bird multipath (net/frr seems to work fine) * Add ROUTE_MPATH to GENERIC * Set net.route.multipath=1 by default Tested by: olivier Reviewed by: glebius Relnotes: yes Differential Revision: https://reviews.freebsd.org/D26449	2020-10-03 10:47:17 +00:00
Vincenzo Maffione	adf41f0788	netmap: fix constness warnings generated by "-Wcast-qual" Submitted by: milosz.kaniewski@gmail.com MFC after: 3 days	2020-10-03 09:33:29 +00:00
Ed Maste	c1aedfcbd9	add SIOCGIFDATA ioctl For interfaces that do not support SIOCGIFMEDIA (for which there are quite a few) the only fallback is to query the interface for if_data->ifi_link_state. While it's possible to get at if_data for an interface via getifaddrs(3) or sysctl, both are heavy weight mechanisms. SIOCGIFDATA is a simple ioctl to retrieve this fast with very little resource use in comparison. This implementation mirrors that of other similar ioctls in FreeBSD. Submitted by: Roy Marples <roy@marples.name> Reviewed by: markj MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D26538	2020-09-28 16:54:39 +00:00
Alexander V. Chernikov	2259a03020	Rework part of routing code to reduce difference to D26449. * Split rt_setmetrics into get_info_weight() and rt_set_expire_info(), as these two can be applied at different entities and at different times. * Start filling route weight in route change notifications * Pass flowid to UDP/raw IP route lookups * Rework nd6_subscription_cb() and sysctl_dumpentry() to prepare for the fact that rtentry can contain multiple nexthops. Differential Revision: https://reviews.freebsd.org/D26497	2020-09-21 20:02:26 +00:00
Alexander V. Chernikov	1440f62266	Remove unused nhop_ref_any() function. Remove "opt_mpath.h" header where not needed. No functional changes.	2020-09-20 21:32:52 +00:00
Alexander V. Chernikov	c4bcfe98e2	Fix gw updates / flag updates during route changes. * Zero gw_sdl if switching to interface route - the assumption that underlying storage is zeroed is incorrect with route changes. * Apply proper flag mask to rte. Reported by: vangyzen	2020-09-20 12:31:48 +00:00
Navdeep Parhar	b092fd6c97	if_vxlan(4): add support for hardware assisted checksumming, TSO, and RSS. This lets a VXLAN pseudo-interface take advantage of hardware checksumming (tx and rx), TSO, and RSS if the NIC is capable of performing these operations on inner VXLAN traffic. A VXLAN interface inherits the capabilities of its vxlandev interface if one is specified or of the interface that hosts the vxlanlocal address. If other interfaces will carry traffic for that VXLAN then they must have the same hardware capabilities. On transmit, if_vxlan verifies that the outbound interface has the required capabilities and then translates the CSUM_ flags to their inner equivalents. This tells the hardware ifnet that it needs to operate on the inner frame and not the outer VXLAN headers. An event is generated when a VXLAN ifnet starts. This allows hardware drivers to configure their devices to expect VXLAN traffic on the specified incoming port. On receive, the hardware does RSS and checksum verification on the inner frame. if_vxlan now does a direct netisr dispatch to take full advantage of RSS. It is not very clear why it didn't do this already. Future work: Rx: it should be possible to avoid the first trip up the protocol stack to get the frame to if_vxlan just so it can decapsulate and requeue for a second trip up the stack. The hardware NIC driver could directly call an if_vxlan receive routine for VXLAN traffic instead. Rx: LRO. depends on what happens with the previous item. There will have to to be a mechanism to indicate that it's time for if_vxlan to flush its LRO state. Reviewed by: kib@ Relnotes: Yes Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D25873	2020-09-18 02:37:57 +00:00
Navdeep Parhar	830edb4561	Add two new ifnet capabilities for hw checksumming and TSO for VXLAN traffic. These are similar to the existing VLAN capabilities. Reviewed by: kib@ Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D25873	2020-09-18 02:10:28 +00:00
Mitchell Horne	ceff9b9d25	if_media: definitions for 40GE LM4 ethernet media type Reviewed by: erj Sponsored by: NetApp, Inc. Sponsored by: Klara, Inc. Differential Revision: https://reviews.freebsd.org/D26276	2020-09-16 14:45:16 +00:00
Alexander V. Chernikov	2b32d93e55	Fix RADIX_MPATH build broken by r365521. Reported by: jenkins, Hartmann, O. <ohartmann at walstatt.org>	2020-09-10 07:05:31 +00:00
Alexander V. Chernikov	aa8f9f90ff	Update nexthop handling for route addition/deletion in preparation for mpath. Currently kernel requests deletion for the certain routes with specified gateway, but this gateway is not actually checked. With multipath routes, internal gateway checking becomes mandatory. Add the logic performing this check. Generalise RTF_PINNED routes to the generic route priorities, simplifying the logic. Add lookup_prefix() function to perform exact match search based on data in @info. Differential Revision: https://reviews.freebsd.org/D26356	2020-09-09 22:07:54 +00:00
Alexander V. Chernikov	cd6298d5c5	Retain marking net.fibs sysctl as a tunable. Suggested by: avg	2020-09-09 21:45:18 +00:00
Alexander V. Chernikov	4a8201c13a	Fix panic with net.fibs tunable set in loader.conf. Fix by removing forgotten CTLFLAG_RWTUN flag from the sysctl, loader variable will be read later in vnet_rtables_init(). Reported by: mav	2020-09-08 21:39:34 +00:00
Kristof Provost	a969635b83	net: mitigate vnet / epair cleanup races There's a race where dying vnets move their interfaces back to their original vnet, and if_epair cleanup (where deleting one interface also deletes the other end of the epair). This is commonly triggered by the pf tests, but also by cleanup of vnet jails. As we've not yet been able to fix the root cause of the issue work around the panic by not dereferencing a NULL softc in epair_qflush() and by not re-attaching DYING interfaces. This isn't a full fix, but makes a very common panic far less likely. PR: 244703, 238870 Reviewed by: lutz_donnerhacke.de MFC after: 4 days Differential Revision: https://reviews.freebsd.org/D26324	2020-09-08 14:54:10 +00:00
Alexander V. Chernikov	05aca418f4	Consistently use the same gateway when adding/deleting interface routes. Use the same link-level gateway when adding or deleting interface routes. This helps nexthop checking in the upcoming multipath changes. Differential Revision: https://reviews.freebsd.org/D26317	2020-09-07 10:13:54 +00:00
Ed Maste	4fa9815a3d	rtsock.c: remove extraneous space Submitted by: Neel Chauhan <neel AT neelc DOT org> Differential Revision: https://reviews.freebsd.org/D26249	2020-09-05 16:13:36 +00:00
Alexander V. Chernikov	8f07963360	Fix regression for IPv6 loopback routes. After nexthop introduction, loopback routes for the interface addresses were created without embedding actual interface index in the gateway. The latter is needed to pass the IPv6 scope during transmission via loopback.. Fix the regression by actually using passed gateway data with interface index. Differential Revision: https://reviews.freebsd.org/D26306	2020-09-03 22:24:52 +00:00
Mateusz Guzik	662c13053f	net: clean up empty lines in .c and .h files	2020-09-01 21:19:14 +00:00
Vincenzo Maffione	35d8a463e8	iflib: leave only 1 receive descriptor unused The pidx argument of isc_rxd_flush() indicates which is the last valid receive descriptor to be used by the NIC. However, current code has multiple issues: - Intel drivers write pidx to their RDT register, which means that NICs will only use the descriptors up to pidx-1 (modulo ring size N), and won't actually use the one pointed by pidx. This does not break reception, but it is anyway confusing and suboptimal (the NIC will actually see only N-2 descriptors as available, rather than N-1). Other drivers (if_vmx, if_bnxt, if_mgb) adhere to this semantic). - The semantic used by Intel (RDT is one descriptor past the last valid one) is used by most (if not all) NICs, and it is also used on the TX side (also in iflib). Since iflib is not currently using this semantic for RX, it must decrement fl->ifl_pidx (modulo N) before calling isc_rxd_flush(), and then the per-driver callback implementation must increment the index again (to match the real semantic). This is confusing and suboptimal. - The iflib refill function is also called at initialization. However, in case the ring size is smaller than 128 (e.g. if_mgb), the refill function will actually prepare all the receive descriptors (N), without leaving one unused, as most of NICs assume (e.g. to avoid RDT to overrun RDH). I can speculate that the code looks like this right now because this issue showed up during testing (e.g. with if_mgb), and it was easy to workaround by decrementing pidx before isc_rxd_flush(). The goal of this change is to simplify the code (removing a bunch of instructions from the RX fast path), and to make the semantic of isc_rxd_flush() consistent across drivers. To achieve this, we: - change the semantics of the pidx argument to the usual one (that is the index one past the last valid one), so that both iflib and drivers avoid the decrement/increment dance. - fix the initialization code to prepare at most N-1 descriptors. Reviewed by: markj MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D26191	2020-09-01 20:41:47 +00:00
Alexander V. Chernikov	b8d2d479cd	Revert uma zone alignemnt cache unadvertenly committed in r364950.	2020-08-29 12:04:13 +00:00
Alexander V. Chernikov	6498f66f7c	Fix build with RADIX_MPATH. Reported by: Hartmann, O <ohartmann@walstatt.org>	2020-08-29 11:04:24 +00:00
Alexander V. Chernikov	7c89a3b63f	Move fib_rte_to_nh_flags() from net/route_var.h to net/route/nhop_ctl.c. No functional changes. Initially this function was created to perform runtime flag conversions for the previous incarnation of fib lookup functions. As these functions got deprecated, move the function to the file with the only remaining caller. Lastly, rename it to convert_rt_to_nh_flags() to follow the naming notation.	2020-08-28 23:01:56 +00:00
Alexander V. Chernikov	a624ca3dff	Move net/route/shared.h definitions to net/route/route_var.h. No functional changes. net/route/shared.h was created in the inital phases of nexthop conversion. It was intended to serve the same purpose as route_var.h - share definitions of functions and structures between the routing subsystem components. At that time route_var.h was included by many files external to the routing subsystem, which largerly defeats its purpose. As currently this is not the case anymore and amount of route_var.h includes is roughly the same as shared.h, retire the latter in favour of the former.	2020-08-28 22:50:20 +00:00
Alexander V. Chernikov	b122304f6a	Further split nhop creation and rtable operations. As nexthops are immutable, some operations such as route attribute changes require nexthop fetching, forking, modification and route switching. These operations are not atomic, so they may need to be retried multiple times in presence of multiple speakers changing the same route. This change introduces "synchronisation" primitive: route_update_conditional(), simplifying logic for route changes and upcoming multipath operations. Differential Revision: https://reviews.freebsd.org/D26216	2020-08-28 21:59:10 +00:00
Vincenzo Maffione	ae750d5cdf	iflib: netmap: publish all the receive buffer At initialization time, the netmap RX refill function used to prepare the NIC RX ring with N-1 buffers rather than N (with N equal to the number of descriptors in the NIC RX ring). This is not how netmap is supposed to work, as it would keep kring->nr_hwcur not in sync with the NIC "next index to refill" (i.e., fl->ifl_pidx). Instead we prepare N buffers, although we still publish (with isc_rxd_flush()) only the first N-1 buffers, to avoid the NIC producer pointer to overrun the NIC consumer pointer (for NICs where this is a real issue, e.g. Intel ones). MFC after: 2 weeks	2020-08-25 15:19:45 +00:00
Alexander V. Chernikov	592d300e34	Remove RT_LOCK mutex from rte. rtentry lock traditionally served 2 purposed: first was protecting refcounts, the second was assuring consistent field access/changes. Since route nexthop introduction, the need for the former disappeared and the need for the latter reduced. To be more precise, the following rte field are mutable: rt_nhop (nexthop pointer, updated with RIB_WLOCK, passed in rib_cmd_info) rte_flags (only RTF_HOST and RTF_UP, where RTF_UP gets changed at rte removal) rt_weight (relative weight, updated with RIB_WLOCK, passed in rib_cmd_info) rt_expire (time when rte deletion is scheduled, updated with RIB_WLOCK) rt_chain (deletion chain pointer, updated with RIB_WLOCK) All of them are updated under RIB_WLOCK, so the only remaining concern is the reading. rt_nhop and rt_weight (addressed in this review) are read under rib lock and stored in the rib_cmd_info, so the caller has no problem with consitency. rte_flags is currently read unlocked in rtsock reporting (however the scope is only RTF_UP flag, which is pretty static). rt_expire is currently read unlocked in rtsock reporting. rt_chain accesses are safe, as this is only used at route deletion. rt_expire and rte_flags reads will be dealt in a separate reviews soon. Differential Revision: https://reviews.freebsd.org/D26162	2020-08-24 20:23:34 +00:00
Vincenzo Maffione	de5b46107c	iflib: fix isc_rxd_flush call in netmap_fl_refill() The semantic of the pidx argument of isc_rxd_flush() is the last valid index of in the free list, rather than the next index to be published. However, netmap was still using the old convention. While there, also refactor the netmap_fl_refill() to simplify a little bit and add an assertion. MFC after: 2 weeks	2020-08-24 11:44:20 +00:00
Alexander V. Chernikov	eb1c7adb70	Finish r364492 by renaming rt_flags to rte_flags for multipath code.	2020-08-22 20:02:40 +00:00
Alexander V. Chernikov	93bfd365d2	Rename rt_flags to rte_flags && reduce number of rt_nhop accesses. No functional changes. Most of the routing flags are stored in the netxtop instead of rtentry. Rename rt->rt_flags to rt->rte_flags to simplify reading/modifying code checking routing flags. In the new multipath code, rt->rt_nhop may actually point to nexthop group instead of nhop. To ease transition, reduce the amount of rt->rt_nhop->... accesses. Differential Revision: https://reviews.freebsd.org/D26156	2020-08-22 19:30:56 +00:00
Mateusz Guzik	c93d310f87	Fix tinderbox build after r364465	2020-08-22 07:43:38 +00:00
Alexander V. Chernikov	f5247a232a	Make net.fibs growable. Allow to dynamically grow the amount of fibs in each vnet. This change alters current behavior. Currently, if one defines ROUTETABLES > 1 in the kernel config, each vnet will be created with the number of fibs defined in the kernel config. After this commit vnets will be created with fibs=1. Dynamic net.fibs is not compatible with net.add_addr_allfibs. The plan is to deprecate the latter and make net.add_addr_allfibs=0 default behaviour. Reviewed by: glebius Relnotes: yes Differential Revision: https://reviews.freebsd.org/D26062	2020-08-21 21:34:52 +00:00
Warner Losh	773e541e8d	Use devctl.h instead of bus.h to reduce newbus pollution. There's no need for these parts of the kernel to know about newbus, so narrow what is included to devctl.h for device_notify_*. Suggested by: kib@	2020-08-21 00:03:24 +00:00
Bjoern A. Zeeb	2ae32ca2c8	For consistency and to avoid any problems getting past the 31bit boundry change the last two IF_Mbps(2500) and additionally one IF_Mbps(5000) to ULL as well. MFC after: 2 weeks Sponsored by: Rubicon Communications, LLC (d/b/a "Netgate")	2020-08-17 13:51:25 +00:00
Qing Li	a5154bb2e5	Correct the mask byte order when checking for reserved bits. Reviewed by: gnn Approved by: gnn MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D26071	2020-08-15 16:48:58 +00:00
Alexander V. Chernikov	bec053ffe0	Make net.inet6.ip6.deembed_scopeid behaviour default & remove sysctl. Submitted by: Neel Chauhan <neel AT neelc DOT org> Differential Revision: https://reviews.freebsd.org/D25637	2020-08-15 11:37:44 +00:00
Alexander V. Chernikov	2f23f45b20	Simplify dom_<rtattach\|rtdetach>. Remove unused arguments from dom_rtattach/dom_rtdetach functions and make them return/accept 'struct rib_head' instead of 'void **'. Declare inet/inet6 implementations in the relevant _var.h headers similar to domifattach / domifdetach. Add rib_subscribe_internal() function to accept subscriptions to the rnh directly. Differential Revision: https://reviews.freebsd.org/D26053	2020-08-14 21:29:56 +00:00
Bryan Drewery	3869d41465	lagg: Avoid adding a port to a lagg device being destroyed. The lagg_clone_destroy() handles detach and waiting for ifconfig callers to drain already. This narrows the race for 2 panics that the tests triggered. Both were a consequence of adding a port to the lagg device after it had already detached from all of its ports. The link state task would run after lagg_clone_destroy() free'd the lagg softc. kernel:trap_fatal+0xa4 kernel:trap_pfault+0x61 kernel:trap+0x316 kernel:witness_checkorder+0x6d kernel:_sx_xlock+0x72 if_lagg.ko:lagg_port_state+0x3b kernel:if_down+0x144 kernel:if_detach+0x659 if_tap.ko:tap_destroy+0x46 kernel:if_clone_destroyif+0x1b7 kernel:if_clone_destroy+0x8d kernel:ifioctl+0x29c kernel:kern_ioctl+0x2bd kernel:sys_ioctl+0x16d kernel:amd64_syscall+0x337 kernel:trap_fatal+0xa4 kernel:trap_pfault+0x61 kernel:trap+0x316 kernel:witness_checkorder+0x6d kernel:_sx_xlock+0x72 if_lagg.ko:lagg_port_state+0x3b kernel:do_link_state_change+0x9b kernel:taskqueue_run_locked+0x10b kernel:taskqueue_run+0x49 kernel:ithread_loop+0x19c kernel:fork_exit+0x83 PR: 244168 Reviewed by: markj MFC after: 2 weeks Sponsored by: Dell EMC Differential Revision: https://reviews.freebsd.org/D25284	2020-08-13 22:06:27 +00:00
Alexander V. Chernikov	6cbadc4234	Move rtzone handling code to net/route_ctl.c After moving the route control plane code from net/route.c, all rtzone users ended up being in net/route_ctl.c. Move uma(9) rtzone setup/teardown code to net/route_ctl.c as well to have everything in a single place. While here, remove custom initializers from the zone. It was added originally to avoid setup/teardown of costy per-cpu couters. With these counters removed, the only remaining job was avoiding rte mutex setup/teardown. Mutex setup is relatively cheap. Additionally, this mutex will soon be removed. With that in mind, there is no sense in keeping custom zone callbacks. Differential Revision: https://reviews.freebsd.org/D26051	2020-08-13 18:35:29 +00:00
Mitchell Horne	f7d79f6c6d	Correctly set error in rt_mpath_unlink It is possible for rn_delete() to return NULL. If this happens, then set *perror to ESRCH, as is done in the rest of the function. Sponsored by: NetApp, Inc. Sponsored by: Klara, Inc. Differential Revision: https://reviews.freebsd.org/D25871	2020-08-12 16:43:20 +00:00
Vincenzo Maffione	6d84e76a25	iflib: netmap: improve rxsync to support IFLIB_HAS_RXCQ For drivers with IFLIB_HAS_RXCQ set, there is a separate completion queue. In this case, the netmap rxsync routine needs to update rxq->ifr_cq_cidx in the same way it is updated by iflib_rxeof(). This improves the situation for vmx(4) and bnxt(4) drivers, which use iflib and have the IFLIB_HAS_RXCQ bit set. PR: 248494 MFC after: 3 weeks	2020-08-12 14:45:31 +00:00
Vincenzo Maffione	530960be8d	iflib: refactor netmap_fl_refill and fix off-by-one issue First, fix the initialization of the fl->ifl_rxd_idxs array, which was affected by an off-by-one bug. Once there, refactor the function to use better names for local variables, optimize the variable assignments, and merge the bus_dmamap_sync() inner loop with the outer one. PR: 248494 MFC after: 3 weeks	2020-08-12 14:17:38 +00:00
Alexander V. Chernikov	8a0917c35b	Do not enter epoch in add_route(), as it is already called in epoch. Reviewed by: glebius	2020-08-11 07:23:07 +00:00
Alexander V. Chernikov	136a1f8da8	Make <add\|del\|change>_route() static to finish the transition to the new kpi. Discussed with: glebius	2020-08-11 07:21:32 +00:00
Alexander V. Chernikov	9a00f6d067	Fix rib_subscribe() waitok flag by performing allocation outside epoch. Make in6_inithead() use rib_subscribe with waitok to achieve reliable subscription allocation. Reviewed by: glebius	2020-08-11 07:05:30 +00:00
Vincenzo Maffione	c9d886cd7f	iflib: netmap: drop redundant check The validity of head is already checked by nm_rxsync_prologue(). MFC after: 2 weeks	2020-08-06 21:37:38 +00:00
Vincenzo Maffione	ee07345d20	iflib: netmap: don't increment ifl_cidx on the wrong free list Netmap only uses free list 0 to keep it consistent with its one-to-one mapping between each netmap ring and a device RX (or TX) queue. However, the current iflib_netmap_rxsync() routine was mistakenly updating the ifl_cidx field of both free lists. PR: 248494 MFC after: 2 weeks	2020-08-06 21:32:25 +00:00
Mark Johnston	96ad26eefb	Remove free_domain() and uma_zfree_domain(). These functions were introduced before UMA started ensuring that freed memory gets placed in domain-local caches. They no longer serve any purpose since UMA now provides their functionality by default. Remove them to simplyify the kernel memory allocator interfaces a bit. Reviewed by: cem, kib Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D25937	2020-08-04 13:58:36 +00:00
Matt Macy	0ae0e8d2bd	iflib: fix LOR with bpf detach Reported by: grehan@ Approved by: grehan@ MFC after: 1 week Sponsored by: Netgate Differential Revision: https://reviews.freebsd.org/D25530	2020-07-27 01:17:59 +00:00
Alexander V. Chernikov	e1c05fd290	Transition from rtrequest1_fib() to rib_action(). Remove all variations of rtrequest <rtrequest1_fib, rtrequest_fib, in6_rtrequest, rtrequest_fib> and their uses and switch to to rib_action(). This is part of the new routing KPI. Submitted by: Neel Chauhan <neel AT neelc DOT org> Differential Revision: https://reviews.freebsd.org/D25546	2020-07-21 19:56:13 +00:00
Vincenzo Maffione	ac11d85740	iflib: initialize netmap with the correct number of descriptors In case the network device has a RX or TX control queue, the correct number of TX/RX descriptors is contained in the second entry of the isc_ntxd (or isc_nrxd) array, rather than in the first entry. This case is correctly handled by iflib_device_register() and iflib_pseudo_register(), but not by iflib_netmap_attach(). If the first entry is larger than the second, this can result in a panic. This change fixes the bug by introducing two helper functions that also lead to some code simplification. PR: 247647 MFC after: 3 weeks Differential Revision: https://reviews.freebsd.org/D25541	2020-07-20 21:08:56 +00:00
Alexander V. Chernikov	725871230d	Temporarly revert r363319 to unbreak the build. Reported by: CI Pointy hat to: melifaro	2020-07-19 10:53:15 +00:00
Alexander V. Chernikov	8cee15d9e4	Transition from rtrequest1_fib() to rib_action(). Remove all variations of rtrequest <rtrequest1_fib, rtrequest_fib, in6_rtrequest, rtrequest_fib> and their uses and switch to to rib_action(). This is part of the new routing KPI. Submitted by: Neel Chauhan <neel AT neelc DOT org> Differential Revision: https://reviews.freebsd.org/D25546	2020-07-19 09:29:27 +00:00
Kristof Provost	93ed6ade6e	bridge: Don't sleep during epoch While it doesn't trigger INVARIANTS or WITNESS on head it does in stable/12. There's also no reason for it, as we can easily report the out of memory error to the caller (i.e. userspace). All of these can already fail. PR: 248046 MFC after: 3 days	2020-07-18 12:43:11 +00:00
Kyle Evans	cef5fc74c2	tuntap: drop redundant if_mtu assignment in tuncreate ether_ifattach will immediately clobber if_mtu with ETHERMTU anyways, just let it happen. MFC after: 1 week	2020-07-16 15:02:11 +00:00
Kyle Evans	fa2ab81e32	ether_ifattach: set mtu before calling if_attach() if_attach() -> if_attach_internal() will call if_attachdomain1(ifp) any time an ethernet interface is setup after SI_SUB_PROTO_IFATTACHDOMAIN/SI_ORDER_FIRST. This eventually leads to nd6_ifattach() -> nd6_setmtu0() stashing off ifp->if_mtu in ndi->maxmtu before ifp->if_mtu has been properly set in some scenarios, e.g., USB ethernet adapter plugged in later on. For interfaces that are created in early boot, we don't have this issue as domains aren't constructed enough for them to attach and thus it gets deferred to domainifattach at SI_SUB_PROTO_IFATTACHDOMAIN/SI_ORDER_SECOND after the mtu has been set earlier in ether_ifattach(). PR: 248005 Submitted by: Mathew <mjanelle blackberry com> MFC after: 1 week	2020-07-16 13:37:32 +00:00
Alexander V. Chernikov	4c7ba83f9d	Switch inet6 default route subscription to the new rib subscription api. Old subscription model allowed only single customer. Switch inet6 to the new subscription api and eliminate the old model. Differential Revision: https://reviews.freebsd.org/D25615	2020-07-12 11:24:23 +00:00
Alexander V. Chernikov	edc37a66e3	Add destructor for the rib subscription system to simplify users code. Subscriptions are planned to be used by modules such as route lookup engines. In that case that's the module task to properly unsibscribe before detach. However, the in-kernel customer - inet6 wants to track default route changes. To avoid having inet6 store per-fib subscriptions, handle automatic destruction internally. Differential Revision: https://reviews.freebsd.org/D25614	2020-07-12 11:18:09 +00:00
Mark Johnston	26dd427800	Split nhop_ref_object(). Now nhop_ref_object() unconditionally acquires a reference, and the new nhop_try_ref_object() uses refcount_acquire_if_not_zero() to conditionally acquire a reference. Since the former is cheaper, use it when we know that the initial counter value is non-zero. No functional change intended. Reviewed by: melifaro Differential Revision: https://reviews.freebsd.org/D25535	2020-07-06 21:20:57 +00:00
Mark Johnston	b256d25c50	iflib: Fix some nits in the rx refill code. - Get rid of the ifl_vm_addrs array. It is not used by any existing consumer, so we are just dirtying a couple of cache lines for no reason. - Use uma_zalloc(fl->ifl_zone) instead of m_cljget(). Otherwise m_cljget() is doing unnecessary work to look up the correct zone, when iflib already knows what that zone is. - ifl_gen is only used when INVARIANTS is on, so make that more clear. - Fix some style nits and inconsistencies. Reviewed by: gallatin Tested by: pho MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D25490	2020-07-06 14:52:21 +00:00
Mark Johnston	a363e1d4d0	iflib: Fix handling of mbuf cluster allocation failures. When refilling an rx freelist, make sure we only update the hardware producer index if at least one cluster was allocated. Otherwise the NIC is programmed to write a previously used cluster, typically resulting in a use-after-free when packet data is written by the hardware. Also make sure that we don't update the fragment index cursor if the last allocation attempt didn't succeed. For at least Intel drivers, iflib assumes that the consumer index and fragment index cursor stay in lockstep, but this assumption was violated in the face of cluster allocation failures. Reported and tested by: pho Reviewed by: gallatin, hselasky MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D25489	2020-07-06 14:52:09 +00:00
Alexander V. Chernikov	6ad7446c6f	Complete conversions from fib<4\|6>_lookup_nh_<basic\|ext> to fib<4\|6>_lookup(). fib[46]_lookup_nh_ represents pre-epoch generation of fib api, providing less guarantees over pointer validness and requiring on-stack data copying. With no callers remaining, remove fib[46]_lookup_nh_ functions. Submitted by: Neel Chauhan <neel AT neelc DOT org> Differential Revision: https://reviews.freebsd.org/D25445	2020-07-02 21:04:08 +00:00
Ryan Moeller	e5539fb618	libifconfig: Add function to get bridge status The new function operates similarly to ifconfig_lagg_get_lagg_status and likewise is accompanied by a function to free the bridge status data structure. I have included in this patch the relocation of some strings describing STP parameters and the PV2ID macro from ifconfig into net/if_bridgevar.h as they are useful for consumers of libifconfig. Reviewed by: kp, melifaro, mmacy Approved by: mmacy (mentor) MFC after: 1 week Relnotes: yes Differential Revision: https://reviews.freebsd.org/D25460	2020-07-01 02:32:41 +00:00
Vincenzo Maffione	9503233f87	iflib: fix compilation issue introduced in r362621 The ifp local variable is useful even without netmap and altq, as it is used to check for IFF_DRV_RUNNING. MFC after: 2 weeks	2020-06-25 20:43:21 +00:00
Vincenzo Maffione	d8b2d26b15	iflib: netmap: add support for partial ring openings Reviewed by: gallatin MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D25254	2020-06-25 19:44:24 +00:00
Vincenzo Maffione	88a688663a	iflib: netmap: add per-tx-queue netmap support Reviewed by: gallatin MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D25253	2020-06-25 19:35:43 +00:00
Vincenzo Maffione	0ff2126795	iflib: netmap: fix rsync index overrun In the current iflib_netmap_rxsync, there is nothing that prevents kring->nr_hwtail to overrun kring->nr_hwcur during the descriptor import phase. This may cause errors in netmap applications, such as: em1 RX0: fail 'head < kring->nr_hwcur \|\| head > kring->nr_hwtail' h 795 c 795 t 282 rh 795 rc 795 rt 282 hc 282 ht 282 Reviewed by: gallatin MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D25252	2020-06-23 20:23:56 +00:00
Tycho Nightingale	c774294c57	To avoid a startup script race change net.bpf.optimize_writers from CTLFLAG_RW to CTLFLAG_RWTUN to allow it to be modified by a loader tunable. Sponsored by: Dell EMC Isilon	2020-06-23 13:57:53 +00:00
Matt Macy	9aeca21324	iflib: fix cloneattach fail and generalize pseudo device handling - a cloneattach failure will not currently be handled correctly, jump to the right target - pseudo devices are all treat as if they're ethernet devices - this often doesn't make sense MFC after: 1 week Sponsored by: Netgate, Inc. Differential Revision: https://reviews.freebsd.org/D25083	2020-06-21 22:02:49 +00:00
Pawel Biernacki	9daf71541c	net.link.generic.ifdata.<ifindex>.linkspecific: rework handler This OID was added in r17352 but the write path of IFDATA_LINKSPECIFIC seems unused as there are no in-base writers, and as far as I can tell we had issues with this code before, see PR 219472. Drop the write path to make the handler read-only as described in comments and man-pages. It can be marked as MPSAFE now. Reviewed by: bdragon, kib, melifaro, wollman Approved by: kib (mentor) Sponsored by: Mysterious Code Ltd. Differential Revision: https://reviews.freebsd.org/D25348	2020-06-21 18:40:17 +00:00
Vincenzo Maffione	0a182b4c63	iflib: netmap: enter/exit netmap mode after device stops Avoid possible race conditions by calling nm_set_native_flags() and nm_clear_native_flags() only after the device has been stopped. MFC after: 1 week	2020-06-14 21:07:12 +00:00
Ravi Pokala	2a73c8f5e1	Decode the "LACP Fast Timeout" LAGG option flag r286700 added the "lacp_fast_timeout" option to `ifconfig', but we forgot to include the new option in the string used to decode the option bits. Add "LACP_FAST_TIMO" to LAGG_OPT_BITS. Also, s/LAGG_OPT_LACP_TIMEOUT/LAGG_OPT_LACP_FAST_TIMO/g , to be clearer that the flag indicates "Fast Timeout" mode. Reported by: Greg Foster <gfoster at panasas dot com> Reviewed by: jpaetzel MFC after: 1 week Sponsored by: Panasas Differential Revision: https://reviews.freebsd.org/D25239	2020-06-11 22:46:08 +00:00
Alexander V. Chernikov	a287a973e3	Switch rtsock code to using newly-create rib_action() KPI call. This simplifies the code and allows to further split rtentry and nexthop, removing one of the blockers for multipath code introduction, described in D24141. Reviewed by: ae Differential Revision: https://reviews.freebsd.org/D25192	2020-06-10 07:46:22 +00:00
Vincenzo Maffione	e136e9c88f	iflib: netmap: honor netmap_irx_irq return values In the receive interrupt routine, always call netmap_rx_irq(). The latter function will return != NM_IRQ_PASS if netmap is not active on that specific receive queue, so that the driver can go on with iflib_rxeof(). Note that netmap supports partial opening, where only a subset of the RX or TX rings can be open in netmap mode. Checking the IFCAP_NETMAP flag is not enough to make sure that the queue is indeed in netmap mode. Moreover, in case netmap_rx_irq() returns NM_IRQ_RESCHED, it means that netmap expects the driver to call netmap_rx_irq() again as soon as possible. Currently, this may happen when the device is attached to a VALE switch. Reviewed by: gallatin MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D25167	2020-06-09 19:15:43 +00:00
John Baldwin	00a4311adc	Refer to AES-CBC as "aes-cbc" rather than "rijndael-cbc" for IPsec. At this point, AES is the more common name for Rijndael128. setkey(8) will still accept the old name, and old constants remain for compatiblity. Reviewed by: cem, bcr (manpages) MFC after: 2 weeks Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D24964	2020-06-04 22:58:37 +00:00
Andrey V. Elsukov	dd4490fdab	Add if_reassing method to all tunneling interfaces. After r339550 tunneling interfaces have started handle appearing and disappearing of ingress IP address on the host system. When such interfaces are moving into VNET jail, they lose ability to properly handle ifaddr_event_ext event. And this leads to need to reconfigure tunnel to make it working again. Since moving an interface into VNET jail leads to removing of all IP addresses, it looks consistent, that tunnel configuration should also be cleared. This is what will do if_reassing method. Reported by: John W. O'Brien <john saltant com> MFC after: 1 week	2020-06-03 13:02:31 +00:00
Alexander V. Chernikov	41e66f4eca	Add rib subscription API. Currently there is no easy way of subscribing for the routing table changes. The only existing way is to set ifa_rtrequest callback in the each protocol ifaddr, which is not convenient or extandable. This change provides generic notification subscription mechanism, that will replace current ifa_rtrequest one and allow other applications such as accelerated routing lookup modules subscribe for the changes. In particular, this change provides 2 hooks: 1) synchronous one (RIB_NOTIFY_IMMEDIATE), called under RIB_WLOCK, which ensures exact ordering of the changes and 2) async one, (RIB_NOTIFY_DELAYED) that is called after the change w/o holding locks. The latter one does not provide any notification ordering guarantee. Differential Revision: https://reviews.freebsd.org/D25070	2020-06-01 21:52:24 +00:00
Alexander V. Chernikov	46cc6153d4	Finish r361706: add sys/net/route/route_ctl.h, missed in previous commit.	2020-06-01 21:51:20 +00:00
Alexander V. Chernikov	da187ddb3d	* Add rib_<add\|del\|change>_route() functions to manipulate the routing table. The main driver for the change is the need to improve notification mechanism. Currently callers guess the operation data based on the rtentry structure returned in case of successful operation result. There are two problems with this appoach. First is that it doesn't provide enough information for the upcoming multipath changes, where rtentry refers to a new nexthop group, and there is no way of guessing which paths were added during the change. Second is that some rtentry fields can change during notification and protecting from it by requiring customers to unlock rtentry is not desired. Additionally, as the consumers such as rtsock do know which operation they request in advance, making explicit add/change/del versions of the functions makes sense, especially given the functions don't share a lot of code. With that in mind, introduce rib_cmd_info notification structure and rib_<add\|del\|change>_route() functions, with mandatory rib_cmd_info pointer. It will be used in upcoming generalized notifications. * Move definitions of the new functions and some other functions/structures used for the routing table manipulation to a separate header file, net/route/route_ctl.h. net/route.h is a frequently used file included in ~140 places in kernel, and 90% of the users don't need these definitions. Reviewed by: ae Differential Revision: https://reviews.freebsd.org/D25067	2020-06-01 20:49:42 +00:00
Alexander V. Chernikov	e7403d0230	Revert r361704, it accidentally committed merged D25067 and D25070.	2020-06-01 20:40:40 +00:00
Alexander V. Chernikov	79674562b8	* Add rib_<add\|del\|change>_route() functions to manipulate the routing table. The main driver for the change is the need to improve notification mechanism. Currently callers guess the operation data based on the rtentry structure returned in case of successful operation result. There are two problems with this appoach. First is that it doesn't provide enough information for the upcoming multipath changes, where rtentry refers to a new nexthop group, and there is no way of guessing which paths were added during the change. Second is that some rtentry fields can change during notification and protecting from it by requiring customers to unlock rtentry is not desired. Additionally, as the consumers such as rtsock do know which operation they request in advance, making explicit add/change/del versions of the functions makes sense, especially given the functions don't share a lot of code. With that in mind, introduce rib_cmd_info notification structure and rib_<add\|del\|change>_route() functions, with mandatory rib_cmd_info pointer. It will be used in upcoming generalized notifications. * Move definitions of the new functions and some other functions/structures used for the routing table manipulation to a separate header file, net/route/route_ctl.h. net/route.h is a frequently used file included in ~140 places in kernel, and 90% of the users don't need these definitions. Reviewed by: ae Differential Revision: https://reviews.freebsd.org/D25067	2020-06-01 20:32:02 +00:00
Matt Macy	1f93e931d9	Fix panics when using iflib pseudo device support Reviewed by: gallatin@, hselasky@ MFC after: 1 week Sponsored by: Netgate, Inc. Differential Revision: https://reviews.freebsd.org/D23710	2020-05-31 18:42:00 +00:00
John Baldwin	28d2a72bbf	Consistently include opt_ipsec.h for consumers of <netipsec/ipsec.h>. This fixes ipsec.ko to include all of IPSEC_DEBUG. Reviewed by: imp MFC after: 2 weeks Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D25046	2020-05-29 19:22:40 +00:00
Alexander V. Chernikov	cb86ca48bf	Unlock rtentry before calling for epoch(9) destruction as the destruction may happen immediately, leading to panic. Reported by: bdragon	2020-05-28 07:23:27 +00:00
Alexander V. Chernikov	4d2c2509f2	Move <add\|del\|change>_route() functions to route_ctl.c in preparation of multipath control plane changed described in D24141. Currently route.c contains core routing init/teardown functions, route table manipulation functions and various helper functions, resulting in >2KLOC file in total. This change moves most of the route table manipulation parts to a dedicated file, simplifying planned multipath changes and making route.c more manageable. Differential Revision: https://reviews.freebsd.org/D24870	2020-05-23 19:06:57 +00:00
Alexander V. Chernikov	a82f62ec2d	Remove refcounting from rtentry. After making rtentry reclamation backed by epoch(9) in r361409, there is no reason in keeping reference counting code. Differential Revision: https://reviews.freebsd.org/D24867	2020-05-23 12:15:47 +00:00
Alexander V. Chernikov	2bbab0af6d	Use epoch(9) for rtentries to simplify control plane operations. Currently the only reason of refcounting rtentries is the need to report the rtable operation details immediately after the execution. Delaying rtentry reclamation allows to stop refcounting and simplify the code. Additionally, this change allows to reimplement rib_lookup_info(), which is used by some of the customers to get the matching prefix along with nexthops, in more efficient way. The change keeps per-vnet rtzone uma zone. It adds nh_vnet field to nhop_priv to be able to reliably set curvnet even during vnet teardown. Rest of the reference counting code will be removed in the D24867 . Differential Revision: https://reviews.freebsd.org/D24866	2020-05-23 10:21:02 +00:00
Pawel Biernacki	9033ad5f15	sysctl: fix setting net.isr.dispatch during early boot Fix another collateral damage of r357614: netisr is initialised way before malloc() is available hence it can't use sysctl_handle_string() that allocates temporary buffer. Handle that internally in sysctl_netisr_dispatch_policy(). PR: 246114 Reported by: delphij Reviewed by: kib Approved by: kib (mentor) Differential Revision: https://reviews.freebsd.org/D24858	2020-05-16 17:05:44 +00:00
Mark Johnston	c1be839971	pf: Add a new zone for per-table entry counters. Right now we optionally allocate 8 counters per table entry, so in addition to memory consumed by counters, we require 8 pointers worth of space in each entry even when counters are not allocated (the default). Instead, define a UMA zone that returns contiguous per-CPU counter arrays for use in table entries. On amd64 this reduces sizeof(struct pfr_kentry) from 216 to 160. The smaller size also results in better slab efficiency, so memory usage for large tables is reduced by about 28%. Reviewed by: kp MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D24843	2020-05-16 00:28:12 +00:00
Kyle Evans	c79cee7136	kernel: provide panicky version of __unreachable __builtin_unreachable doesn't raise any compile-time warnings/errors on its own, so problems with its usage can't be easily detected. While it would be nice for this situation to change and compilers to at least add a warning for trivial cases where local state means the instruction can't be reached, this isn't the case at the moment and likely will not happen. This commit adds an __assert_unreachable, whose intent is incredibly clear: it asserts that this instruction is unreachable. On INVARIANTS builds, it's a panic(), and on non-INVARIANTS it expands to __unreachable(). Existing users of __unreachable() are converted to __assert_unreachable, to improve debuggability if this assumption is violated. Reviewed by: mjg Differential Revision: https://reviews.freebsd.org/D23793	2020-05-13 18:07:37 +00:00
Warner Losh	83b4342743	Kill trailing newline while I'm here...	2020-05-12 23:46:52 +00:00
Alexander V. Chernikov	4a6ee281d9	Remove unused rnh_close callback from rtable & cleanup depends. rnh_close callbackes was used by the in[6]_clsroute() handlers, doing cleanup in the route cloning code. Route cloning was eliminated somewhere around r186119. Last callback user was eliminated in r186215, 11 years ago. Differential Revision: https://reviews.freebsd.org/D24793	2020-05-11 06:09:18 +00:00
Alexander V. Chernikov	d223372545	Remove rtalloc1(_fib) KPI. Last user of rtalloc1() KPI has been eliminated in rS360631. As kernel is now fully switched to use new routing KPI defined in rS359823, remove old lookup functions. Differential Revision: https://reviews.freebsd.org/D24776	2020-05-10 09:34:48 +00:00
Alexander V. Chernikov	656442a718	Embed dst sockaddr into rtentry and remove rte packet counter Currently each rtentry has dst&gateway allocated separately from another zone, bloating cache accesses. Current 'struct rtentry' has 12 "mandatory" radix pointers in the beginning, leaving 4 usable pointers/32 bytes in the first 2 cache lines (amd64). Fields needed for the datapath are destination sockaddr and rt_nhop. So far it doesn't look like there is other routable addressing protocol other than IPv4/IPv6/MPLS, which uses keys longer than 20 bytes. With that in mind, embed dst into struct rtentry, making the first 24 bytes of rtentry within 128 bytes. That is enough to make IPv6 address within first 128 bytes. It is still pretty easy to add code for supporting separately-allocated dst, however it doesn't make a lot of sense in having such code without a use case. As rS359823 moved the gateway to the nexthop structure, the dst embedding change removes the need for any additional allocations done by rt_setgate(). Lastly, as a part of cleanup, remove counter(9) allocation code, as this field is not used in packet processing anymore. Reviewed by: ae Differential Revision: https://reviews.freebsd.org/D24669	2020-05-08 21:06:10 +00:00
Alexander V. Chernikov	682b902daf	Add rib_lookup() sockaddr lookup wrapper and make ifa_ifwithroute use it. Create rib_lookup() wrapper around per-af dataplane lookup functions. This will help in the cases of having control plane af-agnostic code. Switch ifa_ifwithroute() to use this function instead of rtalloc1(). Reviewed by: ae Differential Revision: https://reviews.freebsd.org/D24731	2020-05-07 08:11:36 +00:00
Alexander V. Chernikov	d94be4ccaa	Switch DDB show route to direct rnh_matchaddr() call instead of rtalloc1(). Eliminate the last rtalloc1() call to finish transition to the new routing KPI defined in r359823. Differential Revision: https://reviews.freebsd.org/D24663	2020-05-04 15:07:57 +00:00
Alexander V. Chernikov	4f08f052ad	Simplify address parsing in DDB show route command. Use db_get_line() to overcome parser limitation. Differential Revision: https://reviews.freebsd.org/D24662	2020-05-04 15:00:19 +00:00
Alexander V. Chernikov	9e02229580	Remove now-unused rt_ifp,rt_ifa,rt_gateway,rt_mtu rte fields. After converting routing subsystem customers to use nexthop objects defined in r359823, some fields in struct rtentry became unused. This commit removes rt_ifp, rt_ifa, rt_gateway and rt_mtu from struct rtentry along with the code initializing and updating these fields. Cleanup of the remaining fields will be addressed by D24669. This commit also changes the implementation of the RTM_CHANGE handling. Old implementation tried to perform the whole operation under radix WLOCK, resulting in slow performance and hacks like using RTF_RNH_LOCKED flag. New implementation looks up the route nexthop under radix RLOCK, creates new nexthop and tries to update rte nhop pointer. Only last part is done under WLOCK. In the hypothetical scenarious where multiple rtsock clients repeatedly issue RTM_CHANGE requests for the same route, route may get updated between read and update operation. This is addressed by retrying the operation multiple (3) times before returning failure back to the caller. Differential Revision: https://reviews.freebsd.org/D24666	2020-05-04 14:31:45 +00:00
Mark Johnston	814fa34dfb	Increase the iflib txq callout mutex name length to 32 bytes. With a length of 16, the name ("<if name>:TX(<qid>):callout") typically gets truncated. PR: 245712 Reported by: ghuckriede@blackberry.com MFC after: 1 week	2020-04-30 15:39:04 +00:00
Alexander V. Chernikov	8c61eb2107	Convert more rtentry field accesses into nhop fields accesses. Continue routing subsystem conversion to nhop objects defined in r359823. Use fields from nhop structure instead of "struct rtentry" fields. This is one of the last changes prior to removing rt_ifp, rt_ifa, rt_gateway and rt_mtu from struct rtentry. Differential Revision: https://reviews.freebsd.org/D24609	2020-04-29 21:54:09 +00:00
Alexander V. Chernikov	74787ef47b	Add nhop to the ifa_rtrequest() callback. With the upcoming multipath changes described in D24141, rt->rt_nhop can potentially point to a nexthop group instead of an individual nhop. To simplify caller handling of such cases, change ifa_rtrequest() callback to pass changed nhop directly. Differential Revision: https://reviews.freebsd.org/D24604	2020-04-29 19:28:56 +00:00
Alexander V. Chernikov	fc88ecd31a	Move route-specific ddb commands to route/route_ddb.c Currently functionality resides in rtsock.c, which is a controlling interface, partially external to the routing subsystem. Additionally, DDB-supporting functionality is > 100SLOC, which deserves a separate file. Given that, move this functionality to a newly-created net/route/ subdir. Reviewed by: cem Differential Revision: https://reviews.freebsd.org/D24561	2020-04-28 20:00:17 +00:00
Alexander V. Chernikov	e7d8af4f65	Move route_temporal.c and route_var.h to net/route. Nexthop objects implementation, defined in r359823, introduced sys/net/route directory intended to hold all routing-related code. Move recently-introduced route_temporal.c and private route_var.h header there. Differential Revision: https://reviews.freebsd.org/D24597	2020-04-28 19:14:09 +00:00
Alexander V. Chernikov	fe6da72759	Move struct rtentry definition to nhop_var.h. One of the goals of the new routing KPI defined in r359823 is to entirely hide`struct rtentry` from the consumers. It will allow to improve routing subsystem internals and deliver features much faster. This is one of the last changes, effectively moving struct rtentry definition to a net/route_var.h header, internal to the routing subsystem. Differential Revision: https://reviews.freebsd.org/D24580	2020-04-28 18:42:30 +00:00
Alexander V. Chernikov	4043ee3cd7	Convert rtalloc_mpath_fib() users to the new KPI. New fib[46]_lookup() functions support multipath transparently. Given that, switch the last rtalloc_mpath_fib() calls to dib4_lookup() and eliminate the function itself. Note: proper flowid generation (especially for the outbound traffic) is a bigger topic and will be handled in a separate review. This change leaves flowid generation intact. Differential Revision: https://reviews.freebsd.org/D24595	2020-04-28 08:06:56 +00:00
Alexander V. Chernikov	1b0051bada	Eliminate now-unused parts of old routing KPI. r360292 switched most of the remaining routing customers to a new KPI, leaving a bunch of wrappers for old routing lookup functions unused. Remove them from the tree as a part of routing cleanup. Differential Revision: https://reviews.freebsd.org/D24569	2020-04-28 07:25:34 +00:00
Eric Joyner	45818bf1a0	iflib: Stop interface before (un)registering VLAN This patch is intended to solve a specific problem that iavf(4) encounters, but what it does can be extended to solve other issues. To summarize the iavf(4) issue, if the PF driver configures VLAN anti-spoof, then the VF driver needs to make sure no untagged traffic is sent if a VLAN is configured, and vice-versa. This can be an issue when a VLAN is being registered or unregistered, e.g. when a packet may be on the ring with a VLAN in it, but the VLANs are being unregistered. This can cause that tagged packet to go out and cause an MDD event. To fix this, include a new interface-dependent function that drivers can implement named IFDI_NEEDS_RESTART(). Right now, this function is called in iflib_vlan_unregister/register() to determine whether the interface needs to be stopped and started when a VLAN is registered or unregistered. The default return value of IFDI_NEEDS_RESTART() is true, so this fixes the MDD problem that iavf(4) encounters, since the interface rings are flushed during a stop/init. A future change to iavf(4) will implement that function just in case the default value changes, and to make it explicit that this interface reset is required when a VLAN is added or removed. Reviewed by: gallatin@ MFC after: 1 week Sponsored by: Intel Corporation Differential Revision: https://reviews.freebsd.org/D22086	2020-04-27 22:02:44 +00:00
Alexander V. Chernikov	55f57ca9ac	Convert debugnet to the new routing KPI. Introduce new fib[46]_lookup_debugnet() functions serving as a special interface for the crash-time operations. Underlying implementation will try to return lookup result if datastructures are not corrupted, avoding locking. Convert debugnet to use fib4_lookup_debugnet() and switch it to use nexthops instead of rtentries. Reviewed by: cem Differential Revision: https://reviews.freebsd.org/D24555	2020-04-26 18:42:38 +00:00
Kristof Provost	fffd27e5f3	bridge: epoch-ification Run the bridge datapath under epoch, rather than under the BRIDGE_LOCK(). We still take the BRIDGE_LOCK() whenever we insert or delete items in the relevant lists, but we use epoch callbacks to free items so that it's safe to iterate the lists without the BRIDGE_LOCK. Tests on mercat5/6 shows this increases bridge throughput significantly, from 3.7Mpps to 18.6Mpps. Reviewed by: emaste, philip, melifaro MFC after: 2 months Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D24250	2020-04-26 16:22:35 +00:00
Alexander V. Chernikov	57e70e4471	Fix userland build broken by r360292.	2020-04-25 09:25:06 +00:00
Alexander V. Chernikov	983066f05b	Convert route caching to nexthop caching. This change is build on top of nexthop objects introduced in r359823. Nexthops are separate datastructures, containing all necessary information to perform packet forwarding such as gateway interface and mtu. Nexthops are shared among the routes, providing more pre-computed cache-efficient data while requiring less memory. Splitting the LPM code and the attached data solves multiple long-standing problems in the routing layer, drastically reduces the coupling with outher parts of the stack and allows to transparently introduce faster lookup algorithms. Route caching was (re)introduced to minimise (slow) routing lookups, allowing for notably better performance for large TCP senders. Caching works by acquiring rtentry reference, which is protected by per-rtentry mutex. If the routing table is changed (checked by comparing the rtable generation id) or link goes down, cache record gets withdrawn. Nexthops have the same reference counting interface, backed by refcount(9). This change merely replaces rtentry with the actual forwarding nextop as a cached object, which is mostly mechanical. Other moving parts like cache cleanup on rtable change remains the same. Differential Revision: https://reviews.freebsd.org/D24340	2020-04-25 09:06:11 +00:00
Alexander V. Chernikov	aaad3c4fca	Convert rtentry field accesses into nhop field accesses. One of the goals of the new routing KPI defined in r359823 is to entirely hide`struct rtentry` from the consumers. It will allow to improve routing subsystem internals and deliver more features much faster. This commit is mostly mechanical change to eliminate direct struct rtentry field accesses. The only notable difference is AF_LINK gateway encoding. AF_LINK gw is used in routing stack for operations with interface routes and host loopback routes. In the former case it indicates _some_ non-NULL gateway, as the interface is the same as in rt_ifp in kernel and rtm_ifindex in rtsock reporting. In the latter case the interface index inside gateway was used by the IPv6 datapath to verify address scope for link-local interfaces. Kernel uses struct sockaddr_dl for this type of gateway. This structure allows for specifying rich interface data, such as mac address and interface name. However, this results in relatively large structure size - 52 bytes. Routing stack fils in only 2 fields - sdl_index and sdl_type, which reside in the first 8 bytes of the structure. In the new KPI, struct nhop_object tries to be cache-efficient, hence embodies gateway address inside the structure. In the AF_LINK case it stores stortened version of the structure - struct sockaddr_dl_short, which occupies 16 bytes. After D24340 changes, the data inside AF_LINK gateway will not be used in the kernel at all, leaving rtsock as the only potential concern. The difference in rtsock reporting: (old) got message of size 240 on Thu Apr 16 03:12:13 2020 RTM_ADD: Add Route: len 240, pid: 0, seq 0, errno 0, flags:<UP,DONE,PINNED> locks: inits: sockaddrs: <DST,GATEWAY,NETMASK> 10.0.0.0 link#5 255.255.255.0 (new) got message of size 200 on Sun Apr 19 09:46:32 2020 RTM_ADD: Add Route: len 200, pid: 0, seq 0, errno 0, flags:<UP,DONE,PINNED> locks: inits: sockaddrs: <DST,GATEWAY,NETMASK> 10.0.0.0 link#5 255.255.255.0 Note 40 bytes different (52-16 + alignment). However, gateway is still a valid AF_LINK gateway with proper data filled in. It is worth noting that these particular messages (interface routes) are mostly ignored by routing daemons: * bird/quagga/frr uses RTM_NEWADDR and ignores prefix route addition messages. * quagga/frr ignores routes without gateway More detailed overview on how rtsock messages are used by the routing daemons to reconstruct the kernel view, can be found in D22974. Differential Revision: https://reviews.freebsd.org/D24519	2020-04-23 08:04:20 +00:00
Kristof Provost	fac24ad7f0	bridge: Simplify mac address generation Unconditionally use ether_gen_addr() to generate bridge mac addresses. This function is now less likely to generate duplicate mac addresses across jails. The old hand rolled hostid based code adds no value. Reviewed by: bz Differential Revision: https://reviews.freebsd.org/D24432	2020-04-18 08:00:58 +00:00
Kristof Provost	3f8bc99c4c	ethersubr: Make the mac address generation more robust If we create two (vnet) jails and create a bridge interface in each we end up with the same mac address on both bridge interfaces. These very often conflicts, resulting in same mac address in both jails. Mitigate this problem by including the jail name in the mac address. Reviewed by: kevans, melifaro MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D24383	2020-04-18 07:50:30 +00:00
Alexander V. Chernikov	ae4b62595e	Unbreak build by reverting if_bridge part of r360047. Pointy hat to: melifaro	2020-04-17 18:22:37 +00:00
Alexander V. Chernikov	6745294280	Finish r191148: replace rtentry with route in if_bridge if_output() callback. Generic if_output() callback signature was modified to use struct route instead of struct rtentry in r191148, back in 2009. Quoting commit message: Change if_output to take a struct route as its fourth argument in order to allow passing a cached struct llentry * down to L2 Fix bridge_output() to match this signature and update the remaining comment in if_var.h. Reviewed by: kp MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D24394	2020-04-17 17:05:58 +00:00
Adrian Chadd	2538a4b1b4	Remove an duplicate definition of nhops_dump_sysctl() One of the source files included both nhop.h and shared.h, leading to this clashing. Tested with: mips-gcc cross toolchain	2020-04-16 23:28:47 +00:00
Alexander V. Chernikov	5cdc484410	Fix userland build broken by r360014.	2020-04-16 17:53:23 +00:00
Alexander V. Chernikov	539642a29d	Add nhop parameter to rti_filter callback. One of the goals of the new routing KPI defined in r359823 is to entirely hide`struct rtentry` from the consumers. It will allow to improve routing subsystem internals and deliver more features much faster. This change is one of the ongoing changes to eliminate direct struct rtentry field accesses. Additionally, with the followup multipath changes, single rtentry can point to multiple nexthops. With that in mind, convert rti_filter callback used when traversing the routing table to accept pair (rt, nhop) instead of nexthop. Reviewed by: ae Differential Revision: https://reviews.freebsd.org/D24440	2020-04-16 17:20:18 +00:00
Alexander V. Chernikov	dd4776f0cc	Reorganise nd6 notification code to avoid direct rtentry field access. One of the goals of the new routing KPI defined in r359823 is to entirely hide `struct rtentry` from the consumers. Doing so will allow to improve routing subsystem internals and deliver features more easily. This change is one of the ongoing changes to eliminate direct struct rtentry field accesses. It introduces rtfree_func() wrapper around RTFREE() and reorganises nd6 notification code to avoid accessing most of the rtentry fields. Reviewed by: ae Differential Revision: https://reviews.freebsd.org/D24404	2020-04-14 22:48:33 +00:00
Alexander V. Chernikov	1968b7eb21	Postpone multipath seed init till SI_SUB_LAST, as it is needed only after some useland program installs multiple paths to the same destination. While here, make multipath init conditional. Discussed with: cem,ian	2020-04-14 07:38:34 +00:00
Andrew Gallatin	bd673b9942	lagg: stop double-counting output errors and counting drops as errors Before this change, lagg double-counted errors from lagg members, and counted every drop by a lagg member as an error. Eg, if lagg sent a packet, and the underlying hardware driver dropped it, a counter would be incremented by both lagg and the underlying driver. This change attempts to fix that by incrementing lagg's counters only for errors that do not come from underlying drivers. Reviewed by: hselasky, jhb Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D24331	2020-04-13 23:06:56 +00:00
Alexander V. Chernikov	a666325282	Introduce nexthop objects and new routing KPI. This is the foundational change for the routing subsytem rearchitecture. More details and goals are available in https://reviews.freebsd.org/D24141 . This patch introduces concept of nexthop objects and new nexthop-based routing KPI. Nexthops are objects, containing all necessary information for performing the packet output decision. Output interface, mtu, flags, gw address goes there. For most of the cases, these objects will serve the same role as the struct rtentry is currently serving. Typically there will be low tens of such objects for the router even with multiple BGP full-views, as these objects will be shared between routing entries. This allows to store more information in the nexthop. New KPI: struct nhop_object fib4_lookup(uint32_t fibnum, struct in_addr dst, uint32_t scopeid, uint32_t flags, uint32_t flowid); struct nhop_object fib6_lookup(uint32_t fibnum, const struct in6_addr dst6, uint32_t scopeid, uint32_t flags, uint32_t flowid); These 2 function are intended to replace all all flavours of <in_\|in6_>rtalloc[1]<_ign><_fib>, mpath functions and the previous fib[46]-generation functions. Upon successful lookup, they return nexthop object which is guaranteed to exist within current NET_EPOCH. If longer lifetime is desired, one can specify NHR_REF as a flag and get a referenced version of the nexthop. Reference semantic closely resembles rtentry one, allowing sed-style conversion. Additionally, another 2 functions are introduced to support uRPF functionality inside variety of our firewalls. Their primary goal is to hide the multipath implementation details inside the routing subsystem, greatly simplifying firewalls implementation: int fib4_lookup_urpf(uint32_t fibnum, struct in_addr dst, uint32_t scopeid, uint32_t flags, const struct ifnet src_if); int fib6_lookup_urpf(uint32_t fibnum, const struct in6_addr dst6, uint32_t scopeid, uint32_t flags, const struct ifnet src_if); All functions have a separate scopeid argument, paving way to eliminating IPv6 scope embedding and allowing to support IPv4 link-locals in the future. Structure changes: * rtentry gets new 'rt_nhop' pointer, slightly growing the overall size. * rib_head gets new 'rnh_preadd' callback pointer, slightly growing overall sz. Old KPI: During the transition state old and new KPI will coexists. As there are another 4-5 decent-sized conversion patches, it will probably take a couple of weeks. To support both KPIs, fields not required by the new KPI (most of rtentry) has to be kept, resulting in the temporary size increase. Once conversion is finished, rtentry will notably shrink. More details: * architectural overview: https://reviews.freebsd.org/D24141 * list of the next changes: https://reviews.freebsd.org/D24232 Reviewed by: ae,glebius(initial version) Differential Revision: https://reviews.freebsd.org/D24232	2020-04-12 14:30:00 +00:00
Alexander V. Chernikov	62d95afacb	Fix build by adding forgotten header to radix_mpath.c after r359797.	2020-04-11 09:38:45 +00:00
Alexander V. Chernikov	4684d3cbcb	Remove per-AF radix_mpath initializtion functions. Split their functionality by moving random seed allocation to SYSINIT and calling (new) generic multipath function from standard IPv4/IPv5 RIB init handlers. Differential Revision: https://reviews.freebsd.org/D24356	2020-04-11 07:37:08 +00:00
Alexander V. Chernikov	aef2d5fb0e	Split rtrequest1_fib() into smaller manageable chunks. No functional changes. * Move route addition / route deletion code from rtrequest1_fib() to add_route() and del_route() respectively. * Rename rtrequest1_fib_change() to change_route() for consistency. * Shrink the scope of ugly info #defines. MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D24349	2020-04-10 16:27:27 +00:00
Kristof Provost	dd00a42a6b	bridge: Change lists to CK_LIST as a peparation for epochification Prepare the ground for a rework of the bridge locking approach. We will use an epoch-based approach in the datapath and making it safe to iterate over the interface, span and rtnode lists without holding the BRIDGE_LOCK. Replace the relevant lists by their ConcurrencyKit equivalents. No functional change in this commit. Reviewed by: emaste, ae, philip (previous version) Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D24249	2020-04-05 17:15:20 +00:00
Ed Maste	aeb665b538	remove extraneous double ;s in sys/	2020-03-30 16:04:25 +00:00
Mark Johnston	59d50fe5ef	Simplify taskqgroup inititialization. taskqgroup initialization was broken into two steps: 1. allocate the taskqgroup structure, at SI_SUB_TASKQ; 2. initialize taskqueues, start taskqueue threads, enqueue "binder" tasks to bind threads to specific CPUs, at SI_SUB_SMP. Step 2 tries to handle the case where tasks have already been attached to a queue, by migrating them to their intended queue. In particular, tasks can't be enqueued before step 2 has completed. This breaks NFS mountroot on systems using an iflib-based driver when EARLY_AP_STARTUP is not defined, since mountroot happens before SI_SUB_SMP in this case. Simplify initialization: do all initialization except for CPU binding at SI_SUB_TASKQ. This means that until CPU binding is completed, group tasks may be executed on a CPU other than that to which they were bound, but this should not be a problem for existing users of the taskqgroup KPIs. Reported by: sbruno Tested by: bdragon, sbruno MFC after: 1 month Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D24188	2020-03-30 14:22:52 +00:00
Conrad Meyer	704101dd3d	Fix PNP matching for iflib NIC drivers The previous descriptor string specified that all fields were significant for match. However, the only significant fields for in-tree drivers are vendor:devid, and the fictitious zero values constructed by PVID() did not match real subvendor, subdevice, revision, and/or class values, resulting in no automatic probe. If a future iflib driver needs to match on other criteria, the descriptor string can be updated accordingly. (E.g., "V32" and ~0 for unspecified values in PVID().) Reported by: mav Sponsored by: Dell EMC Isilon	2020-03-24 19:20:10 +00:00
Ed Maste	ed6611cc8c	iflib: simplify MPASS assertion Submitted by: andrew	2020-03-24 17:54:34 +00:00
Ed Maste	68af0153a7	iflib: split compound assertion ThunderX cluster systems are panicking on boot with a failed assertion MPASS(gtask != NULL && gtask->gt_taskqueue != NULL). Split the assertion so that it's clear which part is failing.	2020-03-24 17:25:56 +00:00
Patrick Kelsey	876996910a	Remove extraneous code from iflib ifsd_cidx is never used, and the line removed from rxd_frag_to_sd() is just dead code. Reviewed by: erj, gallatin MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D23951	2020-03-14 20:13:42 +00:00
Patrick Kelsey	3caff1885f	Remove refill budget from iflib Reviewed by: gallatin MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D23948	2020-03-14 19:58:50 +00:00
Patrick Kelsey	b38136097a	Allow iflib drivers to specify the buffer size used for each receive queue Reviewed by: erj, gallatin MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D23947	2020-03-14 19:56:46 +00:00
Patrick Kelsey	e503049067	Remove freelist contiguous-indexes assertion from rxd_frag_to_sd() The vmx driver is an example of an iflib driver that might report packets using non-contiguous descriptors (with unused descriptors either between received packets or between the fragments of a received packet), so this assertion needs to be removed. For such drivers, the freelist producer and consumer indexes don't relate directly to driver ring slots (the driver deals directly with freelist buffer indexes supplied by iflib during refill, and reports them with each fragment during packet reception), but do continue to be used by iflib for accounting, such as determining the number of ring slots that are refillable. PR: 243126, 243392, 240628 Reported by: avg, alexandr.oleynikov@gmail.com, Harald Schmalzbauer Reviewed by: gallatin MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D23946	2020-03-14 19:55:05 +00:00
Patrick Kelsey	4f2beb721b	Fix iflib zero-length fragment handling The dmamap for zero-length fragments should not be unloaded, as doing so breaks the the cluster-reuse logic in _iflib_fl_refill(). All zero-length fragments are now handled by the assemble_segments() path so that the cluster-reuse logic there does not have to be replicated in the small-single-fragment-packet path of iflib_rxd_pkt_get(). Packets consisting entirely of zero-length fragments (which result in a NULL mbuf pointer) are now properly tolerated. This allows drivers (such as the vmx driver) to pass such packets to iflib when a descriptor error occurs during packet reception, the advantage being that the refill of descriptors associated with the error packet are handled via the existing iflib machinery without having to duplicate parts of that machinery in the driver to handle that error case. Reviewed by: avg, erj, gallatin MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D23945	2020-03-14 19:51:55 +00:00
Patrick Kelsey	9e9b738ac5	Fix iflib freelist state corruption This fixes a bug in iflib freelist management that breaks the required correspondence between freelist indexes and driver ring slots. PR: 243126, 243392, 240628 Reported by: avg, alexandr.oleynikov@gmail.com, Harald Schmalzbauer Reviewed by: avg, gallatin MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D23943	2020-03-14 19:43:44 +00:00

1 2 3 4 5 ...

4557 Commits