freebsd-dev

Author	SHA1	Message	Date
Eric Joyner	db8e8f1ede	iflib: properly release memory allocated for DMA DMA memory allocations using the bus_dma.h interface are not properly released in all cases for both Tx and Rx. This causes ~448 bytes of M_DEVBUF allocations to be leaked. First, the DMA maps for Rx are not properly destroyed. A slight attempt is made in iflib_fl_bufs_free to destroy the maps if we're detaching. However, this function may not be reliably called during detach. Indeed, there is a comment "asking" if this should be moved out. Fix this by moving the bus_dmamap_destroy call into iflib_rx_sds_free, where we already sync and unload the DMA. Second, the DMA tag associated with the ifr_ifdi descriptor DMA is not released properly anywhere. Add a call to iflib_dma_free in iflib_rx_structures_free. Third, use of NULL as a canary value on the map pointer returned by bus_dmamap_create is not valid. On some platforms, notably x86, this value may be NULL. In this case, we fail to properly release the related resources. Remove the NULL checks on map values in both iflib_fl_bufs_free and iflib_txsd_destroy. With all of these fixes applied, the leaks to M_DEVBUF are squelched, and iflib drivers now seem to properly cleanup when detaching. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Submitted by: Jacob Keller <jacob.e.keller@intel.com> Reviewed by: erj@, gallatin@ MFC after: 1 week Sponsored by: Intel Corporation Differential Revision: https://reviews.freebsd.org/D22203	2019-11-04 23:06:57 +00:00
Eric Joyner	244e7cffa5	iflib: cleanup memory leaks on driver detach From Jake: The iflib stack failed to release all of the memory allocated under M_IFLIB during device detach. Specifically, the ifmp_ring, the ift_ifdi Tx DMA info, and the ifr_ifdi Rx DMA info were not being released. Release this memory so that iflib won't leak memory when a device detaches. Since we're freeing the ift_ifdi pointer during iflib_txq_destroy we need to call this only after iflib_dma_free in iflib_tx_structures_free. Additionally, also ensure that we destroy the callout mutex associated with each Tx queue when we free it. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Submitted by: Jacob Keller <jacob.e.keller@intel.com> Reviewed by: erj@, gallatin@ MFC after: 1 week Sponsored by: Intel Corporation Differential Revision: https://reviews.freebsd.org/D22157	2019-10-30 20:45:12 +00:00
Eric Joyner	1558015e3e	iflib: call ether_ifdetach and netmap_detach before stop From Jake: Calling ether_ifdetach after iflib_stop leads to a potential race where a stale ifp pointer can remain in the route entry list for IPv6 traffic. This will potentially cause a page fault or other system instability if the ifp pointer is accessed. Move both iflib_netmap_detach and ether_ifdetach to be called prior to iflib_stop. This avoids the race above, and helps ensure that other ifp references are removed before stopping the interface. Submitted by: Jacob Keller <jacob.e.keller@intel.com> Reviewed by: erj@, gallatin@, jhb@ MFC after: 3 days Sponsored by: Intel Corporation Differential Revision: https://reviews.freebsd.org/D22071	2019-10-23 23:20:49 +00:00
Conrad Meyer	7790c8c199	Split out a more generic debugnet(4) from netdump(4) Debugnet is a simplistic and specialized panic- or debug-time reliable datagram transport. It can drive a single connection at a time and is currently unidirectional (debug/panic machine transmit to remote server only). It is mostly a verbatim code lift from netdump(4). Netdump(4) remains the only consumer (until the rest of this patch series lands). The INET-specific logic has been extracted somewhat more thoroughly than previously in netdump(4), into debugnet_inet.c. UDP-layer logic and up, as much as possible as is protocol-independent, remains in debugnet.c. The separation is not perfect and future improvement is welcome. Supporting INET6 is a long-term goal. Much of the diff is "gratuitous" renaming from 'netdump_' or 'nd_' to 'debugnet_' or 'dn_' -- sorry. I thought keeping the netdump name on the generic module would be more confusing than the refactoring. The only functional change here is the mbuf allocation / tracking. Instead of initiating solely on netdump-configured interface(s) at dumpon(8) configuration time, we watch for any debugnet-enabled NIC for link activation and query it for mbuf parameters at that time. If they exceed the existing high-water mark allocation, we re-allocate and track the new high-water mark. Otherwise, we leave the pre-panic mbuf allocation alone. In a future patch in this series, this will allow initiating netdump from panic ddb(4) without pre-panic configuration. No other functional change intended. Reviewed by: markj (earlier version) Some discussion with: emaste, jhb Objection from: marius Differential Revision: https://reviews.freebsd.org/D21421	2019-10-17 16:23:03 +00:00
Mark Johnston	4166913371	Add IFLIB_SINGLE_IRQ_RX_ONLY. As of r347221 the iflib legacy interrupt mode setup assumes that drivers perform both receive and transmit processing from the interrupt handler. This assumption is invalid in the vmxnet3 driver, so introduce the IFLIB_SINGLE_IRQ_RX_ONLY flag to make iflib avoid tx processing in the interrupt handler. PR: 239118 Reported and tested by: Juraj Lutter <otis@sk.freebsd.org> Obtained from: marius Reviewed by: gallatin MFC after: 3 days Differential Revision: https://reviews.freebsd.org/D21831	2019-09-30 15:59:07 +00:00
Andrew Gallatin	6554362c66	kTLS support for TLS 1.3 TLS 1.3 requires a few changes because 1.3 pretends to be 1.2 with a record type of application data. The "real" record type is then included at the end of the user-supplied plaintext data. This required adding a field to the mbuf_ext_pgs struct to save the record type, and passing the real record type to the sw_encrypt() ktls backend functions. Reviewed by: jhb, hselasky Sponsored by: Netflix Differential Revision: D21801	2019-09-27 19:17:40 +00:00
Eric Joyner	53b5b9b049	iflib: Remove redundant VLAN events deregistration From Piotr: r351152 introduced iflib_deregister() function calling EVENTHANDLER_DEREGISTER() to unregister VLAN events. This patch removes duplicate of EVENTHANDLER_DEREGISTER() calls placed in iflib_device_deregister() as this function is now calling iflib_deregister(). This is to avoid deregistering same event twice. This patch also adds check in iflib_vlan_register() to prevent registering VLAN while being in detach. Patch co-authored by Krzysztof Galazka <krzysztof.galazka@intel.com>, erj <erj@FreeBSD.org> and Jacob Keller <jacob.e.keller@intel.com>. Signed-off-by: Piotr Pietruszewski <piotr.pietruszewski@intel.com> Submitted by: Piotr Pietruszewski <piotr.pietruszewski@intel.com> Reviewed by: gallatin@, erj@ MFC after: 3 days Sponsored by: Intel Corporation Differential Revision: https://reviews.freebsd.org/D21711	2019-09-24 17:03:31 +00:00
Eric Joyner	566144142e	iflib: add iflib_deregister to help cleanup on exit Commit message by Jake: The iflib_register function exists to allocate and setup some common structures used by both iflib_device_register and iflib_pseudo_register. There is no associated cleanup function used to undo the steps taken in this function. Both iflib_device_deregister and iflib_pseudo_deregister have some of the necessary steps scattered in their flow. However, most of the necessary cleanup is not done during the error path of iflib_device_register and iflib_pseudo_register. Some examples of missed cleanup include: the ifp pointer is not free'd during error cleanup the STATE and CTX locks are not destroyed during error cleanup the vlan event handlers are not removed during error cleanup media added to the ifmedia structure is not removed the kobject reference is never deleted Additionally, when initializing the kobject class reference counter is increased even though kobj_init already increases it. This results in the class never being free'd again because the reference count would never hit zero even after all driver instances are unloaded. To aid in proper cleanup, implement an iflib_deregister function that goes through the reverse steps taken by iflib_register. Call this function during the error cleanup for iflib_device_register and iflib_pseudo_register. Additionally call the function in the iflib_device_deregister and iflib_pseudo_deregister functions near the end of their flow. This helps reduce code duplication and ensures that proper steps are taken to cleanup allocations and references in both the regular and error cleanup flows. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Submitted by: Jacob Keller <jacob.e.keller@intel.com> Reviewed by: shurd@, erj@ MFC after: 3 days Sponsored by: Intel Corporation Differential Revision: https://reviews.freebsd.org/D21005	2019-08-16 23:33:44 +00:00
Eric Joyner	197c679824	iflib: Prevent kernel panic caused by loading driver with a specific interrupt configuration If a device has only 1 MSI-X interrupt available and does not support either MSI or legacy interrupts, iflib_device_register() will fail, leak memory and MSI resources, and the driver will not load. Worse, if another iflib-using driver tries to unload afterwards, a kernel panic will occur because the previous failed iflib driver loead did not properly call "taskqgroup_detach()" during it's cleanup. This patch is band-aid for this situation -- don't try allocating MSI or legacy interrupts if a single MSI-X interrupt was allocated, but fail to load instead. As well, during the cleanup, properly call taskqgroup_detach() on the admin task to prevent panics when other iflib drivers unload. This whole interrupt allocation process actually needs re-doing to properly support devices with only a single MSI-X interrupt, devices that only support MSI-X, non-PCI devices, and multiple non-MSIX interrupts, as well. Signed-off-by: Eric Joyner <erj@freebsd.org> Reviewed by: marius@ MFC after: 1 week Sponsored by: Intel Corporation Differential Revision: https://reviews.freebsd.org/D20747	2019-08-01 17:37:25 +00:00
Eric Joyner	6a3f243b04	iflib: remove kobject class reference increment Commit message from Jake: In iflib_register, the context is initialized as a kobject using the device driver's "driver" kobject class. As part of this, the function mistakenly increments the ref counter. The ref counter is incremented twice, once in the code directly, and once again by kobj_class_compile. However, there is no associated decrement in the detach path. Because of this, the ref counter will never go back down to zero, and thus the kobject method table will never be released. Remove this unnecessary reference count increment. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Submitted by: Jacob Keller <jacob.e.keller@intel.com> Reviewed by: jhb@, erj@ MFC after: 3 days Sponsored by: Intel Corporation Differential Revision: https://reviews.freebsd.org/D21125	2019-08-01 17:28:36 +00:00
Eric Joyner	7f3f6aad3e	iflib: fix dangling device softc pointer Commit text by Jake: If a driver's IFDI_ATTACH_PRE function fails, the iflib_device_register function will free the ctx pointer. However, it does not reset the device softc pointer to NULL. This will result in memory corruption as a future access to the now invalid pointer will corrupt memory that is later allocated on top of the same memory location. The iflib_device_deregister function correctly resets the softc pointer by using device_set_softc(). This clears up the invalid dangling pointer and prevents memory corruption that could lead to a panic or undefined behavior if the device's driver failed to attach. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Submitted by: Jacob Keller <jacob.e.keller@intel.com> Reviewed by: erj@, gallatin@ MFC after: 1 week Sponsored by: Intel Corporation Differential Revision: https://reviews.freebsd.org/D21003	2019-07-24 21:43:41 +00:00
Marius Strobl	c2c5d1e787	o In iflib_txq_drain(): - Remove desc_used, which is only ever written to. - Remove a dead store to reclaimed. - Don't recycle avail. - Sort variables according to style(9). These changes will make a subsequent commit easier to read. o In iflib_tx_credits_update(), don't bother checking whether the ift_txd_credits_update method pointer is NULL; _iflib_pre_assert() asserts upfront that this method has been assigned and functions like iflib_{fast_intr_rxtx,netmap_timer_adjust,txq_can_drain}() and _task_fn_tx() were already unconditionally relying on the method being callable.	2019-06-26 15:28:21 +00:00
Marko Zec	188adcb7e4	V_ip6_forwarding and V_ipforwarding have been defined in ip6_var.h / ip_var.h since at least 2008, so make use of those definitions here. MFC after: 3 days	2019-06-19 08:49:24 +00:00
Marko Zec	6aee0bfa85	Evaluating htons() at compile time is more efficient than doing ntohs() at runtime. This change removes a dependency on a barrel shifter pass before branch resolution, while reducing the instruction stream size by 9 bytes on amd64. MFC after: 3 days	2019-06-19 08:39:19 +00:00
Marius Strobl	d49e83eac3	- Replace unused and only ever written to members of public iflib(9) structs with placeholders (in the latter case, IFLIB_MAX_TX_BYTES etc. are also only ever used for these write-only members if at all, so both these macros and members can just go). Using these spares may render it possible to merge certain iflib(9) fixes to stable/12. Otherwise, changes extending struct if_irq or struct if_shared_ctx in any way would break KBI as instances of these are allocated by the driver front-ends (by contrast, struct if_pkt_info as well as struct if_softc_ctx instances are provided by iflib(9) and, thus, may grow at least at the end without breaking KBI). - Make the pvi_name in struct pci_vendor_info const char * as device identifiers in hardware lookup tables aren't to be expected to ever change at runtime. - Similarly, make the pci_vendor_info_t of struct if_shared_ctx which is used to point to the struct pci_vendor_info arrays provided by the driver front-ends const. - Remove the ETH_ADDR_LEN macro from iflib.h; this was duplicating ETHER_ADDR_LEN of <net/ethernet.h> with iflib(9) actually only consuming the latter macro. - Make the name argument of iflib_io_tqg_attach(9) const, matching the taskqgroup_attach_cpu(9) this function wraps as well as e. g. iflib_config_gtask_init(9). - Remove the orphaned iflib_qset_lock_get() prototype. - Remove some extraneous empty lines.	2019-06-15 11:07:41 +00:00
Eric Joyner	668d6dbb4c	iflib: provide probe wrapper for vendor drivers From Jake: Vendor drivers that exist out-of-tree generally should return BUS_PROBE_VENDOR from their device probe functions. This helps ensure that a vendor replacement driver will supersede the in-kernel driver for a given device. Currently, if a vendor wants to implement a driver based on iflib, it will always report BUS_PROBE_DEFAULT. Add a wrapper function, iflib_device_probe_vendor() which can be used in place of iflib_device_probe(). This function will just return BUS_PROBE_VENDOR whenever iflib_device_probe() would return BUS_PROBE_DEFAULT. While vendor drivers can already implement such a wrapper themselves, providing it in the iflib.h header makes it easier for the vendor driver to do the right thing. Submitted by: Jacob Keller <jacob.e.keller@intel.com> Reviewed by: erj@, gallatin@, marius@ MFC after: 1 week Sponsored by: Intel Corporation Differential Revision: https://reviews.freebsd.org/D20221	2019-05-29 22:24:10 +00:00
Eric Joyner	afb7737237	iflib: use default ntxd and nrxd when user value is not power of 2 From Jake: A user may set a sysctl to override the default number of Tx or Rx descriptors. However, certain calculations in the iflib core expect the number of descriptors to be a power of 2. Update _iflib_assert to verify that all of the shared context parameters for the number of descriptors are powers of 2. Modify iflib_reset_qvalues to check that the provided isc_nrxd value is a power of 2. If it's not, print a warning message and then use the default value. An alternative might be to try rounding the number down instead. However, this creates problems in case the rounded down value is below the minimum value that the driver would support. Submitted by: Jacob Keller <jacob.e.keller@intel.com> Reviewed by: marius@ MFC after: 1 week Sponsored by: Intel Corporation Differential Revision: https://reviews.freebsd.org/D19880	2019-05-10 00:41:42 +00:00
Marius Strobl	007b804fc7	Allow to build without INET and INET6 again after r347221. Submitted by: cam	2019-05-08 09:03:43 +00:00
Marius Strobl	3d10e9ed62	o Use iflib_fast_intr_rxtx() also for "legacy" interrupts, i. e. INTx and MSI. Unlike as with iflib_fast_intr_ctx(), the former will also enqueue _task_fn_tx() in addition to _task_fn_rx() if appropriate, bringing TCP TX throughput of EM-class devices on par with the MSI-X case and, thus, close to wirespeed/pre-iflib(4) times again. [1] Note that independently of the interrupt type, the UDP performance with these MACs still is abysmal and nowhere near to where it was before the conversion of em(4) to iflib(4). o In iflib_init_locked(), announce which free list failed to set up. o In _task_fn_tx() when running netmap(4), issue ifdi_intr_enable instead of the ifdi_tx_queue_intr_enable method in case of a "legacy" interrupt as the latter is valid with MSI-X only. o Instead of adding the missing - and apparently convoluted enough that a DBG_COUNTER_INC was put into a wrong spot in _task_fn_rx() - checks for ifdi_{r,t}x_queue_intr_enable being available in the MSI-X case also to iflib_fast_intr_rxtx(), factor these out to iflib_device_register() and make the checks fail gracefully rather than panic. This avoids invoking the checks at runtime over and over again in iflib_fast_intr_rxtx() and _task_fn_{r,t}x() - even if it's just in case of INVARIANTS - and makes these functions more readable. o In iflib_rx_structures_setup(), only initialize LRO resources if device and driver have LRO capability in order to not waste memory. Also, free the LRO resources again if setting them up fails for one of the queues. However, don't bother invoking iflib_rx_sds_free() in that case because iflib_rx_structures_setup() doesn't call iflib_rxsd_alloc() either (and iflib_{device,pseudo}_register() will issue iflib_rx_sds_free() in case of failure via iflib_rx_structures_free(), but there definitely is some asymmetry left to be fixed, though). o Similarly, free LRO resources again in iflib_rx_structures_free(). o In iflib_irq_set_affinity(), handle get_core_offset() errors gracefully instead of panicing (but only in case of INVARIANTS). This is a follow- up to r344132, as such driver bugs shouldn't be fatal. o Likewise, handle unknown iflib_intr_type_t in iflib_irq_alloc_generic() gracefully, too. o Bring yet more sanity to iflib_msix_init(): - If the device doesn't provide enough MSI-X vectors or not all vectors can be allocate so the expected number of queues in addition to admin interrupts can't be supported, try MSI next (and then INTx) as proper MSI-X vector distribution can't be assured in such cases. In essence, this change brings r254008 forward to iflib(4). Also, this is the fix alluded to in the commit message of r343934. - If the MSI-X allocation has failed, don't prematurely announce MSI is going to be used as the latter in fact may not be available either. - When falling back to MSI, only release the MSI-X table resource again if it was allocated in iflib_msix_init(), i. e. isn't supplied by the driver, in the first place. o In mp_ndesc_handler(), handle unknown type arguments gracefully, too. PR: 235031 (likely) [1] Reviewed by: shurd Differential Revision: https://reviews.freebsd.org/D20175	2019-05-07 08:28:35 +00:00
Marius Strobl	1722eeac95	- Remove the unused ifc_link_irq and ifc_mtx_name members of struct iflib_ctx. - Remove the only ever written to ift_db_mtx_name member of struct iflib_txq. - Remove the unused or only ever written to ifr_size, ifr_cq_pidx, ifr_cq_gen and ifr_lro_enabled members of struct iflib_rxq. - Consistently spell DMA, RX and TX uppercase in comments, messages etc. instead of mixing with some lowercase variants. - Consistently use if_t instead of a mix of if_t and struct ifnet pointers. - Bring the function comments of _iflib_fl_refill(), iflib_rx_sds_free() and iflib_fl_setup() in line with reality. - Judging problem reports, people are wondering what on earth messages like: "TX(0) desc avail = 1024, pidx = 0" are trying to indicate. Thus, extend this string to be more like that of non-iflib(4) Ethernet MAC drivers, notifying about a watchdog timeout due to which the interface will be reset. - Take advantage of the M_HAS_VLANTAG macro. - Use false/true rather than FALSE/TRUE for variables of type bool. - Use FALLTHROUGH as advocated by style(9).	2019-05-06 20:56:41 +00:00
Matt Macy	e2621d9657	Allow iflib drivers to pass a pointer to their own ifmedia structure. Tested by: emaste@ Differential Revision: https://reviews.freebsd.org/D19946	2019-05-03 20:05:31 +00:00
Ed Maste	ce3da455e9	iflib: remove assertion that isc_capabilities is nonzero It's atypical, but not invalid, for a driver to pass no capabilities. Submitted by: Gerald Aryeetey <aryeeteygerald_rogers.com> Reviewed by: shurd MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D20142	2019-05-02 19:13:31 +00:00
Stephen Hurd	f154ece02e	iflib: Better control over queue core assignment By default, cores are now assigned to queues in a sequential manner rather than all NICs starting at the first core. On a four-core system with two NICs each using two queue pairs, the nic:queue -> core mapping has changed from this: 0:0 -> 0, 0:1 -> 1 1:0 -> 0, 1:1 -> 1 To this: 0:0 -> 0, 0:1 -> 1 1:0 -> 2, 1:1 -> 3 Additionally, a device can now be configured to use separate cores for TX and RX queues. Two new tunables have been added, dev.X.Y.iflib.separate_txrx and dev.X.Y.iflib.core_offset. If core_offset is set, the NIC is not part of the auto-assigned sequence. Reviewed by: marius MFC after: 2 weeks Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D20029	2019-04-25 21:24:56 +00:00
Andrew Gallatin	6d49b41ee8	iflib: Add pfil hooks As with mlx5en, the idea is to drop unwanted traffic as early in receive as possible, before mbufs are allocated and anything is passed up the stack. This can save considerable CPU time when a machine is under a flooding style DOS attack. The major change here is to remove the unneeded abstraction where callers of rxd_frag_to_sd() get back a pointer to the mbuf ring, and are responsible for NULL'ing that mbuf themselves. Now this happens directly in rxd_frag_to_sd(), and it returns an mbuf. This allows us to use the decision (and potentially mbuf) returned by the pfil hooks. The driver can now recycle mbufs to avoid re-allocation when packets are dropped. Reviewed by: marius (shurd and erj also provided feedback) Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D19645	2019-04-24 13:32:04 +00:00
Kyle Evans	1fd8c72c0a	iflib: Use new ether_gen_addr, restricting addresses to that subset Differential Revision: https://reviews.freebsd.org/D19587	2019-04-17 17:19:54 +00:00
Eric Joyner	225eae1bb7	iflib: return ENETDOWN when the network device is down From Jake: iflib_if_transmit returns ENOBUFS when the device is down, or when the link isn't active. This was changed in r308792 from return (0), so that the function correctly reports an error that it was unable to transmit. However, using ENOBUFS can cause some network applications to produce the following or similar errors: "ping: sendto: No buffer space available" This is a bit confusing as the real cause of the issue is that the network device is down. Replace the ENOBUFS return with ENETDOWN to indicate more clearly that the reason for the failure to send is due to the network device is offline. This will cause the error message to be reported as "ping: sendto: Network is down" Submitted by: Jacob Keller <jacob.e.keller@intel.com> Reviewed by: shurd@, sbruno@, bz@ MFC after: 1 week Sponsored by: Intel Corporation Differential Revision: https://reviews.freebsd.org/D19652	2019-03-28 20:46:45 +00:00
Eric Joyner	aac9c817af	iflib: hold the CTX lock in iflib_pseudo_register From Jake: The iflib_device_register function takes the CTX lock before calling IFDI_ATTACH_PRE, and releases it upon finishing the registration. Mirror this process in iflib_pseudo_register, so that we always hold the CTX lock during the attach process when registering a pseudo interface or a regular interface. This was caught by code inspection while attempting to analyze where the CTX lock was held. Submitted by: Jacob Keller <jacob.e.keller@intel.com> Reviewed by: shurd@, erj@ MFC after: 1 week Sponsored by: Intel Corporation Differential Revision: https://reviews.freebsd.org/D19604	2019-03-28 20:43:47 +00:00
Eric Joyner	10a1e981d4	iflib: mark isc_driver_version as constant From Jake: The iflib core never modifies the isc_driver_version string. Allow drivers to safely assign pointers to constant buffers by marking this parameter const. Submitted by: Jacob Keller <jacob.e.keller@intel.com> Reviewed by: erj@, gallatin@, jhb@ MFC after: 1 week Sponsored by: Intel Corporation Differential Revision: https://reviews.freebsd.org/D19577	2019-03-19 23:44:26 +00:00
Eric Joyner	1b9d93948a	iflib: expose the Rx mbuf buffer size to drivers From Jake: iflib_fl_setup calculates a suitable buffer size for the Rx mbufs based on the isc_max_frame_size value that drivers setup. This calculation is repeated by drivers when programming their hardware with the size of each Rx buffer. This can lead to a mismatch where the iflib mbuf size is different from the expected size of the buffer as programmed by the hardware. This can lead to unexpected results. If iflib ever wants to support mbuf sizes larger than one page, every driver must be updated to account for the new possible buffer sizes. Fix this by calculating the mbuf size prior to calling IFDI_INIT, and adding the iflib_get_rx_mbuf_sz function which will expose this value to drivers, so that they do not repeat the same calculation. Submitted by: Jacob Keller <jacob.e.keller@intel.com> Reviewed by: shurd@, erj@ MFC after: 1 week Sponsored by: Intel Corporation Differential Revision: https://reviews.freebsd.org/D19489	2019-03-19 17:59:56 +00:00
Eric Joyner	3e8d1bae5f	iflib: prevent possible infinite loop in iflib_encap From Jake: iflib_encap calls bus_dmamap_load_mbuf_sg. Upon it returning EFBIG, an m_collapse and an m_defrag are attempted to shrink the mbuf cluster to fit within the DMA segment limitations. However, if we call m_defrag, and then bus_dmamap_load_mbuf_sg returns EFBIG on the now defragmented mbuf, we will continuously re-call bus_dmamap_load_mbuf_sg over and over. This happens because m_head isn't NULL, and remap is >1, so we don't try to m_collapse or m_defrag again. The only way we exit the loop is if m_head is NULL. However, m_head can't be modified by the call to bus_dmamap_load_mbuf_sg, because we don't pass it as a double pointer. I believe this will be an incredibly rare occurrence, because it is unlikely that bus_dmamap_load_mbuf_sg will actually fail on the second defragment with an EFBIG error. However, it still seems like a possibility that we should account for. Fix the exit check to ensure that if remap is >1, we will also exit, even if m_head is not NULL. Submitted by: Jacob Keller <jacob.e.keller@intel.com> Reviewed by: shurd@, gallatin@ MFC after: 1 week Sponsored by: Intel Corporation Differential Revision: https://reviews.freebsd.org/D19468	2019-03-19 17:49:03 +00:00
Eric Joyner	bc408c7d61	Remove references to CONTIGMALLOC_WORKS in iflib and em From Jake: "The iflib_fl_setup() function tries to pick various buffer sizes based on the max_frame_size value defined by the parent driver. However, this code was wrapped under CONTIGMALLOC_WORKS, which was never actually defined anywhere. This same code pattern was used in if_em.c, likely trying to match what iflib uses. Since CONTIGMALLOC_WORKS is not defined, remove this dead code from iflib_fl_setup and if_em.c Given that various iflib drivers appear to be using a similar calculation, it might be worth making this buffer size a value that the driver can peek at in the future." Submitted by: Jacob Keller <jacob.e.keller@intel.com> Reviewed by: shurd@ MFC after: 1 week Sponsored by: Intel Corporation Differential Revision: https://reviews.freebsd.org/D19199	2019-03-05 19:12:51 +00:00
Stephen Hurd	ca62461bc6	iflib: Improve return values of interrupt handlers. iflib was returning FILTER_HANDLED, in cases where FILTER_STRAY was more correct. This potentially caused issues with shared legacy interrupts. Driver filters returning FILTER_STRAY are now properly handled. Submitted by: Augustin Cavalier <waddlesplash@gmail.com> Reviewed by: marius, gallatin Obtained from: Haiku (a84bb9, 4947d1) MFC after: 1 week Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D19201	2019-02-15 18:51:43 +00:00
Marius Strobl	a6611c938b	Fix the build with ALTQ after r344060.	2019-02-12 22:33:17 +00:00
Marius Strobl	f855ec814d	Make taskqgroup_attach{,_cpu}(9) work across architectures So far, intr_{g,s}etaffinity(9) take a single int for identifying a device interrupt. This approach doesn't work on all architectures supported, as a single int isn't sufficient to globally specify a device interrupt. In particular, with multiple interrupt controllers in one system as found on e. g. arm and arm64 machines, an interrupt number as returned by rman_get_start(9) may be only unique relative to the bus and, thus, interrupt controller, a certain device hangs off from. In turn, this makes taskqgroup_attach{,_cpu}(9) and - internal to the gtaskqueue implementation - taskqgroup_attach_deferred{,_cpu}() not work across architectures. Yet in turn, iflib(4) as gtaskqueue consumer so far doesn't fit architectures where interrupt numbers aren't globally unique. However, at least for intr_setaffinity(..., CPU_WHICH_IRQ, ...) as employed by the gtaskqueue implementation to bind an interrupt to a particular CPU, using bus_bind_intr(9) instead is equivalent from a functional point of view, with bus_bind_intr(9) taking the device and interrupt resource arguments required for uniquely specifying a device interrupt. Thus, change the gtaskqueue implementation to employ bus_bind_intr(9) instead and intr_{g,s}etaffinity(9) to take the device and interrupt resource arguments required respectively. This change also moves struct grouptask from <sys/_task.h> to <sys/gtaskqueue.h> and wraps struct gtask along with the gtask_fn_t typedef into #ifdef _KERNEL as userland likes to include <sys/_task.h> or indirectly drags it in - for better or worse also with _KERNEL defined -, which with device_t and struct resource dependencies otherwise is no longer as easily possible now. The userland inclusion problem probably can be improved a bit by introducing a _WANT_TASK (as well as a _WANT_MOUNT) akin to the existing _WANT_PRISON etc., which is orthogonal to this change, though, and likely needs an exp-run. While at it: - Change the gt_cpu member in the grouptask structure to be of type int as used elswhere for specifying CPUs (an int16_t may be too narrow sooner or later), - move the gtaskqueue_enqueue_fn typedef from <sys/gtaskqueue.h> to the gtaskqueue implementation as it's only used and needed there, - change the GTASK_INIT macro to use "gtask" rather than "task" as argument given that it actually operates on a struct gtask rather than a struct task, and - let subr_gtaskqueue.c consistently use __func__ to print functions names. Reported by: mmel Reviewed by: mmel Differential Revision: https://reviews.freebsd.org/D19139	2019-02-12 21:23:59 +00:00
Marius Strobl	95dcf343b7	Further correct and optimize the bus_dma(9) usage of iflib(4): o Correct the obvious bugs in the netmap(4) parts: - No longer check for the existence of DMA maps as bus_dma(9) is used unconditionally in iflib(4) since r341095. - Supply the correct DMA tag and map pairs to bus_dma(9) functions (see also the commit message of r343753). - In iflib_netmap_timer_adjust(), add synchronization of the TX descriptors before calling the ift_txd_credits_update method as the latter evaluates the TX descriptors possibly updated by the MAC. - In _task_fn_tx(), wrap the netmap(4)-specific bits in #ifdef DEV_NETMAP just as done in _task_fn_admin() and _task_fn_rx() respectively. o In iflib_fast_intr_rxtx(), synchronize the TX rather than the RX descriptors before calling the ift_txd_credits_update method (see also above). o There's no need to synchronize an RX buffer that is going to be recycled in iflib_rxd_pkt_get(), yet; it's sufficient to do that as late as passing RX buffers to the MAC via the ift_rxd_refill method. Hence, combine that synchronization with the synchronization of new buffers into a common spot in _iflib_fl_refill(). o There's no need to synchronize the RX descriptors of a free list in preparation of the MAC updating their statuses with every invocation of rxd_frag_to_sd(); it's enough to do this once before handing control over to the MAC, i. e. before calling ift_rxd_flush method in _iflib_fl_refill(), which already performs the necessary synchronization. o Given that the ift_rxd_available method evaluates the RX descriptors which possibly have been altered by the MAC, synchronize as appropriate beforehand. Most notably this is now done in iflib_rxd_avail(), which in turn means that we don't need to issue the same synchronization yet again before calling the ift_rxd_pkt_get method in iflib_rxeof(). o In iflib_txd_db_check(), synchronize the TX descriptors before handing them over to the MAC for transmission via the ift_txd_flush method. o In iflib_encap(), move the TX buffer synchronization after the invocation of the ift_txd_encap() method. If the MAC driver fails to encapsulate the packet and we retry with a defragmented mbuf chain or finally fail, the cycles for TX buffer synchronization have been wasted. Synchronizing afterwards matches what non-iflib(4) drivers typically do and is sufficient as the MAC will not actually start with the transmission before - in this case - the ift_txd_flush method is called. Moreover, for the latter reason the synchronization of the TX descriptors in iflib_encap() can go as it's enough to synchronize them before passing control over to the MAC by issuing the ift_txd_flush() method (see above). o In iflib_txq_can_drain(), only synchronize TX descriptors if the ift_txd_credits_update method accessing these is actually called. Differential Revision: https://reviews.freebsd.org/D19081	2019-02-12 21:08:44 +00:00
Marius Strobl	bfce461ee9	o As illustrated by e. g. figure 7-14 of the Intel 82599 10 GbE controller datasheet revision 3.3, in the context of Ethernet MACs the control data describing the packet buffers typically are named "descriptors". Each of these descriptors references one buffer, multiple of which a packet can be composed of. By contrast, in comments, messages and the names of structure members, iflib(4) refers to DMA resources employed for RX and TX buffers (rather than control data) as "desc(riptors)". This odd naming convention of iflib(4) made reviewing r343085 and identifying wrong and missing bus_dmamap_sync(9) calls in particular way harder than it already is. This convention may also explain why the netmap(4) part of iflib(4) pairs the DMA tags for control data with DMA maps of buffers and vice versa in calls to bus_dma(9) functions. Therefore, change iflib(4) to refer to buf(fers) when buffers and not the usual understanding of descriptors is meant. This change does not include corrections to the DMA resources used in the netmap(4) parts. However, it revises error messages to state which kind of allocation/creation failed. Specifically, the "Unable to allocate tx_buffer (map) memory" copy & pasted inappropriately on several occasions was replaced with proper messages. o Enhance some other error messages to indicate which half - RX or TX - they apply to instead of using identical text in both cases and generally canonicalize them. o Correct the descriptions of iflib_{r,t}xsd_alloc() to reflect reality; current code doesn't use {r,t}x_buffer structures. o In iflib_queues_alloc(): - Remove redundant BUS_DMA_NOWAIT of iflib_dma_alloc() calls, - change the M_WAITOK from malloc(9) calls into M_NOWAIT. The return values are already checked, deferred DMA allocations not being an option at this point, BUS_DMA_NOWAIT has to be used anyway and prior malloc(9) calls in this function also specify M_NOWAIT. Reviewed by: shurd Differential Revision: https://reviews.freebsd.org/D19067	2019-02-04 20:46:57 +00:00
Marius Strobl	b97de13ae0	- Stop iflib(4) from leaking MSI messages on detachment by calling bus_teardown_intr(9) before pci_release_msi(9). - Ensure that iflib(4) and associated drivers pass correct RIDs to bus_release_resource(9) by obtaining the RIDs via rman_get_rid(9) on the corresponding resources instead of using the RIDs initially passed to bus_alloc_resource_any(9) as the latter function may change those RIDs. Solely em(4) for the ioport resource (but not others) and bnxt(4) were using the correct RIDs by caching the ones returned by bus_alloc_resource_any(9). - Change the logic of iflib_msix_init() around to only map the MSI-X BAR if MSI-X is actually supported, i. e. pci_msix_count(9) returns > 0. Otherwise the "Unable to map MSIX table " message triggers for devices that simply don't support MSI-X and the user may think that something is wrong while in fact everything works as expected. - Put some (mostly redundant) debug messages emitted by iflib(4) and em(4) during attachment under bootverbose. The non-verbose output of em(4) seen during attachment now is close to the one prior to the conversion to iflib(4). - Replace various variants of spelling "MSI-X" (several in messages) with "MSI-X" as used in the PCI specifications. - Remove some trailing whitespace from messages emitted by iflib(4) and change them to consistently start with uppercase. - Remove some obsolete comments about releasing interrupts from drivers and correct a few others. Reviewed by: erj, Jacob Keller, shurd Differential Revision: https://reviews.freebsd.org/D18980	2019-01-30 13:21:26 +00:00
Marius Strobl	3db348b54a	- In _iflib_fl_refill(), don't mark an RX buffer as available in the corresponding bitmap before adding an mbuf has actually succeeded. Previously, m_gethdr(M_NOWAIT, ...) failing caused a "hole" in the RX ring but not in its bitmap. One implication of such a hole was that in a subsequent call to _iflib_fl_refill() with the RX buffer accounting still indicating another reclaimable buffer, bit_ffc(3) nevertheless returned -1 in frag_idx which in turn caused havoc when used as an index. Thus, additionally assert that frag_idx is 0 or greater. Another possible consequence of a hole in the RX ring was a NULL- dereference when trying to use the unallocated mbuf, for example in iflib_rxd_pkt_get(). While at it, make the variable declarations in _iflib_fl_refill() conform to style(9) and remove redundant checks already performed by bit_ffc{,_at}(3). - In iflib_queues_alloc(), don't pass redundant M_ZERO to bit_alloc(3). Reported and tested by: pho	2019-01-26 21:35:51 +00:00
Andrew Gallatin	77102fd6a2	Fix an iflib driver unload panic introduced in r343085 The new loop to sync and unload descriptors was indexed by "i", rather than "j". The panic was caused by "i" being advanced rather than "j", and eventually becoming out of bounds. Reviewed by: kib MFC after: 3 days Sponsored by: Netflix	2019-01-25 15:02:18 +00:00
Patrick Kelsey	8f82136aec	onvert vmx(4) to being an iflib driver. Also, expose IFLIB_MAX_RX_SEGS to iflib drivers and add iflib_dma_alloc_align() to the iflib API. Performance is generally better with the tunable/sysctl dev.vmx.<index>.iflib.tx_abdicate=1. Reviewed by: shurd MFC after: 1 week Relnotes: yes Sponsored by: RG Nets Differential Revision: https://reviews.freebsd.org/D18761	2019-01-22 01:11:17 +00:00
Patrick Kelsey	7f3eb9dab3	Fix various resource leaks that can occur in the error paths of iflib_device_register() and iflib_pseudo_register(). Reviewed by: shurd MFC after: 1 week Sponsored by: RG Nets Differential Revision: https://reviews.freebsd.org/D18760	2019-01-22 00:56:44 +00:00
Konstantin Belousov	8a04b53dce	Improve iflib busdma(9) KPI use. - Specify BUS_DMA_NOWAIT for bus_dmamap_load() on rx refill, since callbacks are not supposed to be used. - Match tso/non-tso tags to corresponding tx map operations. Create separate tso maps for tx descriptors. In particular, do not use non-tso tag to load, unload, or destroy a map created with tso tag. - Add missed bus_dmamap_sync() calls. Submitted by: marius. Reported and tested by: pho Reviewed by: marius Sponsored by: The FreeBSD Foundation MFC after: 1 week	2019-01-16 05:44:14 +00:00
Stephen Hurd	cd28ea929a	Use iflib_if_init_locked() during resume instead of iflib_init_locked(). iflib_init_locked() assumes that iflib_stop() has been called, however, it is not called for suspend. iflib_if_init_locked() calls stop then init, so fixes the problem. This was causing errors after a resume from suspend. PR: 224059 Reported by: zeising MFC after: 1 week Sponsored by: Limelight Networks	2019-01-07 23:46:54 +00:00
Konstantin Belousov	85f3b801e9	Fix typo, use boolean operator instead of bit-wise. Reviewed by: marius, shurd MFC after: 3 days Sponsored by: The FreeBSD Foundation	2019-01-03 01:01:03 +00:00
Stephen Hurd	7124b5ba04	Fix !tx_abdicate error from r336560 r336560 was supposed to restore pre-r323954 behaviour when tx_abdicate is not set (the default case). However, it appears that rather than the drainage check being made conditional on tx_abdicate being set, it was duplicated so it occured twice if tx_abdicate was set and once if it was not. Now when !tx_abdicate, drainage is only checked if the doorbell isn't pending. Reported by: lev MFC after: 1 week Sponsored by: Limelight Networks	2018-12-11 17:46:01 +00:00
Andrew Gallatin	fbec776de0	Use busdma unconditionally in iflib - Remove the complex mechanism to choose between using busdma and raw pmap_kextract at runtime. The reduced complexity makes the code easier to read and maintain. - Fix a bug in the small packet receive path where clusters were repeatedly mapped but never unmapped. We now store the cluster's bus address and avoid re-mapping the cluster each time a small packet is received. This patch fixes bugs I've seen where ixl(4) will not even respond to ping without seeing DMAR faults. I see a small improvement (14%) on packet forwarding tests using a Haswell based Xeon E5-2697 v3. Olivier sees a small regression (-3% to -6%) with lower end hardware. Reviewed by: mmacy Not objected to by: sbruno MFC after: 8 weeks Sponsored by: Netflix, Inc Differential Revision: https://reviews.freebsd.org/D17901	2018-11-27 20:01:05 +00:00
Stephen Hurd	0efb1a464f	Clear RX completion queue state veriables in iflib_stop() iflib_stop() was not resetting the rxq completion queue state variables. This meant that for any driver that has receive completion queues, after a reinit, iflib would start asking what's available on the rx side starting at whatever the completion queue index was prior to the stop, instead of at 0. Submitted by: pkelsey Reported by: pkelsey MFC after: 3 days Sponsored by: Limelight Networks	2018-11-14 20:36:18 +00:00
Stephen Hurd	8d4ceb9cc4	Prevent POLA violation with TSO/CSUM offload Ensure that any time CSUM_IP_TSO or CSUM_IP6_TSO is set that the corresponding CSUM_IP6?_TCP / CSUM_IP flags are also set. Rather than requireing drivers to bake-in an understanding that TSO implies checksum offloads, make it explicit. This change requires us to move the IFLIB_NEED_ZERO_CSUM implementation to ensure it's zeroed for TSO. Reported by: Jacob Keller <jacob.e.keller@intel.com> MFC after: 1 week Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D17801	2018-11-14 15:23:39 +00:00
Stephen Hurd	4d261ce278	Fix leaks caused by ifc_nhwtxqs never being initialized r333502 removed initialization of ifc_nhwtxqs, and it's not clear there's a need to copy it into the struct iflib_ctx at all. Use ctx->ifc_sctx->isc_ntxqs instead. Further, iflib_stop() did not clear the last ring in the case where isc_nfl != isc_nrxqs (such as when IFLIB_HAS_RXCQ is set). Use ctx->ifc_sctx->isc_nrxqs here instead of isc_nfl. Reported by: pkelsey Reviewed by: pkelsey MFC after: 3 days Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D17979	2018-11-14 15:16:45 +00:00
Stephen Hurd	a42546df88	Fix rxcsum issue introduced in r338838 r338838 attempted to fix issues with rxcsum and rxcsum6. However, the rxcsum bits were set as though if_setcapenablebit() was being called, not if_togglecapenable() which is in use. As a result, it was not possible to disable rxcsum when rxcsum6 was supported. PR: 233004 Reported by: lev Reviewed by: lev MFC after: 3 days Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D17881	2018-11-07 19:31:48 +00:00

1 2 3 4

187 Commits