freebsd-dev

Author	SHA1	Message	Date
Randall Stewart	ded5ea6a25	This fixes a out-of-order problem with several of the newer drivers. The basic problem was that the driver was pulling the mbuf off the drbr ring and then when sending with xmit(), encounting a full transmit ring. Thus the lower layer xmit() function would return an error, and the drivers would then append the data back on to the ring. For TCP this is a horrible scenario sure to bring on a fast-retransmit. The fix is to use drbr_peek() to pull the data pointer but not remove it from the ring. If it fails then we either call the new drbr_putback or drbr_advance method. Advance moves it forward (we do this sometimes when the xmit() function frees the mbuf). When we succeed we always call advance. The putback will always copy the mbuf back to the top of the ring. Note that the putback cannot be used with a drbr_dequeue() only with drbr_peek(). We most of the time, in putback, would not need to copy it back since most likey the mbuf is still the same, but sometimes xmit() functions will change the mbuf via a pullup or other call. So the optimial case for the single consumer is to always copy it back. If we ever do a multiple_consumer (for lagg?) we will need a test and atomic in the put back possibly a seperate putback_mc() in the ring buf. Reviewed by: jhb@freebsd.org, jlv@freebsd.org	2013-02-07 15:20:54 +00:00
Sofian Brabez	61bfd86762	Use DEVMETHOD_END macro defined in sys/bus.h instead of {0, 0} sentinel on device_method_t arrays Reviewed by: cognet Approved by: cognet	2013-01-30 18:01:20 +00:00
Steven Hartland	31e85bd9cd	Fixed mbuf free when receive structures fail to allocate. This prevents quad igb card on high core machines, without any nmbcluster or igb queue tuning wedging the boot process if all nics are configured. Reviewed by: jfv Approved by: pjd (mentor) MFC after: 1 week	2013-01-12 16:05:55 +00:00
Gleb Smirnoff	c6499eccad	Mechanically substitute flags from historic mbuf allocator with malloc(9) flags in sys/dev.	2012-12-04 09:32:43 +00:00
Gleb Smirnoff	9c402aeb41	drbr_enqueue() awlays consumes mbuf, no matter did it fail or not. The mbuf pointer is no longer valid, so can't be reused after. Fix igb_mq_start() where mbuf pointer was used after drbr_enqueue(). This eventually leads us to all invocations of igb_mq_start_locked() called with third argument as NULL. This allows us to simplify this function. Submitted by: Karim Fodil-Lemelin <fodillemlinkarim gmail.com> Reviewed by: jfv	2012-11-26 20:03:57 +00:00
Eitan Adler	a8de37b024	This isn't functionally identical. In some cases a hint to disable unit 0 would in fact disable all units. This reverts r241856 Approved by: cperciva (implicit)	2012-10-22 13:06:09 +00:00
Eitan Adler	76b7512247	Now that device disabling is generic, remove extraneous code from the device drivers that used to provide this feature. Reviewed by: des Approved by: cperciva MFC after: 1 week	2012-10-22 03:41:14 +00:00
Gleb Smirnoff	063efed28c	The drbr(9) API appeared to be so unclear, that most drivers in tree used it incorrectly, which lead to inaccurate overrated if_obytes accounting. The drbr(9) used to update ifnet stats on drbr_enqueue(), which is not accurate since enqueuing doesn't imply successful processing by driver. Dequeuing neither mean that. Most drivers also called drbr_stats_update() which did accounting again, leading to doubled if_obytes statistics. And in case of severe transmitting, when a packet could be several times enqueued and dequeued it could have been accounted several times. o Thus, make drbr(9) API thinner. Now drbr(9) merely chooses between ALTQ queueing or buf_ring(9) queueing. - It doesn't touch the buf_ring stats any more. - It doesn't touch ifnet stats anymore. - drbr_stats_update() no longer exists. o buf_ring(9) handles its stats itself: - It handles br_drops itself. - br_prod_bytes stats are dropped. Rationale: no one ever reads them but update of a common counter on every packet negatively affects performance due to excessive cache invalidation. - buf_ring_enqueue_bytes() reduced to buf_ring_enqueue(), since we no longer account bytes. o Drivers handle their stats theirselves: if_obytes, if_omcasts. o mlx4(4), igb(4), em(4), vxge(4), oce(4) and ixv(4) no longer use drbr_stats_update(), and update ifnet stats theirselves. o bxe(4) was the most correct driver, it didn't call drbr_stats_update(), thus it was the only driver accurate under moderate load. Now it also maintains stats itself. o ixgbe(4) had already taken stats from hardware, so just - drop software stats updating. - take multicast packet count from hardware as well. o mxge(4) just no longer needs NO_SLOW_STATS define. o cxgb(4), cxgbe(4) need no change, since they obtain stats from hardware. Reviewed by: jfv, gnn	2012-09-28 18:28:27 +00:00
Sean Bruno	126a39ce60	This patch fixes a nit in the em, lem, and igb driver statistics. Increment adapter->dropped_pkts instead of if_ierrors because if_ierrors is overwritten by hw stats collection. Submitted by: Andrew Boyer <aboyer@averesystems.com> Reviewed by: Jack F Vogel <jfv@freebsd.org> MFC after: 2 weeks	2012-09-23 22:53:39 +00:00
Jack F Vogel	724f79462b	Make the polling interface in igb able to handle multiqueue, and correct the rxdone handling. Update the polling man page to include igb as well. Thanks to Mark Johnston for these changes.	2012-08-06 22:43:49 +00:00
Jack F Vogel	6aa4d618ca	Correct the mq_start routine to avoid out-of-order packet delivery, always enqueue when possible. Also correct the DEPLETED test as multiple bits might be set. Thanks to Randall Stewart for the changes!	2012-08-06 20:44:05 +00:00
Sean Bruno	8844c80848	CPU_NEXT() already handles wrapping around to the beginning. Also, in a system with sparse CPU IDs, you can have a valid CPU ID > mp_ncpus (e.g. if you have two CPUs 0 and 4, with mp_maxid == 4 and mp_ncpus == 2). Introduced at svn r235210 Submitted by: jhb@ Reviewed by: jfv@	2012-08-02 00:00:34 +00:00
Jack F Vogel	fcc144ad4e	Change the interface to the Energy Efficient Ethernet (EEE) setting in the igb and em driver. This was necessitated by a shared code change that I was given late in the game, a data type changed from bool to int, in the last update I dealt with it by a cast, but it was pointed out (thanks jhb) that there was a potential problem with this. John suggested this safer approach, and it is fine with me... MFC after:2 days (to catch the 9.1 update)	2012-07-07 20:21:05 +00:00
Jack F Vogel	996922aeee	Correct small regressions pointed out by jhb, thanks John. MFC after:5 days	2012-07-05 23:36:17 +00:00
Jack F Vogel	ab5d036272	Sync with Intel internal source: shared code update and small changes in core required Add support for new i210/i211 devices Improve queue calculation based on mac type MFC after:5 days	2012-07-05 20:26:57 +00:00
John Baldwin	03b0ca8b28	Commit a portion of 233708 I missed earlier and don't include the definition of igb_start() and igb_start_locked() (nor set if_start in the ifnet) when igb(4) uses if_transmit.	2012-06-01 15:52:41 +00:00
Sean Bruno	daf8162d1f	Modify the binding of queues to attach to as many CPUs as possible when using more than one igb(4) adapter. This means that queues will not be bound to the same CPUs if there are more CPUs availble. This is only applicable to a system that has multiple interfaces. Obtained from: Yahoo! Inc. MFC after: 3 days	2012-05-10 00:00:28 +00:00
John Baldwin	8546e82467	Reapply r223198 which was reverted in the previous vendor import. Some portions were already reapplied in r233708: - Use a dedicated task to handle deferred transmits from the if_transmit method instead of reusing the existing per-queue interrupt task. Reusing the per-queue interrupt task could result in both an interrupt thread and the taskqueue thread trying to handle received packets on a single queue resulting in out-of-order packet processing. - Call ether_ifdetach() earlier in igb_detach(). - Drain tasks and free taskqueues during igb_detach(). MFC after: 1 week	2012-04-11 21:33:45 +00:00
John Baldwin	d8a8648379	Fix a few issues with transmit handling in em(4) and igb(4): - Do not define the foo_start() methods or set if_start in the ifnet if multiq transmit is enabled. Also, set if_transmit and if_qflush before ether_ifattach rather than after when multiq transmit is enabled. This helps to ensure that the drivers never try to mix different transmit methods. - Properly restart transmit during resume. igb(4) was not restarting it at all, and em(4) was restarting even if the link was down and was calling the wrong method if multiq transmit was enabled. - Remove all the 'more' handling for transmit completions. Transmit completion processing does not have a processing limit, so it always runs to completion and never has more work to do when it returns. Instead, the previous code was returning 'true' anytime there were packets in the queue that weren't still in the process of being transmitted. The effect was that the driver would continuously reschedule a task to process TX completions in effect running at 100% CPU polling the hardware until it finished transmitting all of the packets in the ring. Now it will just wait for the next TX completion interrupt. - Restart packet transmission when the link becomes active. - Fix the MSI-X queue interrupt handlers to restart packet transmission if there are pending packets in the relevant software queue (IFQ or buf_ring) after processing TX completions. This is the root cause for the OACTIVE hangs as if the MSI-X queue handler drained all the pending packets from the TX ring, nothing would ever restart it. As such, remove some previously-added workarounds to reschedule a task to poll the TX ring anytime OACTIVE was set. Tested by: sbruno Reviewed by: jfv MFC after: 1 week	2012-03-30 19:54:48 +00:00
John Baldwin	c3173381be	Properly handle failures in igb_setup_msix() by returning 0 if MSI or MSI-X allocation fails. Reviewed by: jfv MFC after: 2 weeks	2012-03-01 22:13:10 +00:00
Luigi Rizzo	64ae02c365	A bunch of netmap fixes: USERSPACE: 1. add support for devices with different number of rx and tx queues; 2. add better support for zero-copy operation, adding an extra field to the netmap ring to indicate how many buffers we have already processed but not yet released (with help from Eddie Kohler); 3. The two changes above unfortunately require an API change, so while at it add a version field and some spares to the ioctl() argument to help detect mismatches. 4. update the manual page for the two changes above; 5. update sample applications in tools/tools/netmap KERNEL: 1. simplify the internal structures moving the global wait queues to the 'struct netmap_adapter'; 2. simplify the functions that map kring<->nic ring indexes 3. normalize device-specific code, helps mainteinance; 4. start exploring the impact of micro-optimizations (prefetch etc.) in the ixgbe driver. Use 'legacy' descriptors on the tx ring and prefetch slots gives about 20% speedup at 900 MHz. Another 7-10% would come from removing the explict calls to bus_dmamap* in the core (they are effectively NOPs in this case, but it takes expensive load of the per-buffer dma maps to figure out that they are all NULL. Rx performance not investigated. I am postponing the MFC so i can import a few more improvements before merging.	2012-02-27 19:05:01 +00:00
Luigi Rizzo	5644ccec61	(This commit only touches code within the DEV_NETMAP blocks) Introduce some functions to map NIC ring indexes into netmap ring indexes and vice versa. This way we can implement the bound checks only in one place (and hopefully in a correct way). On passing, make the code and comments more uniform across the various drivers.	2012-02-15 23:13:29 +00:00
Luigi Rizzo	6e10c8b8c5	small code cleanup in preparation for future modifications in the memory allocator used by netmap. No functional change, two small bug fixes: - in if_re.c add a missing bus_dmamap_sync() - in netmap.c comment out a spurious free() in an error handling block	2012-01-10 19:57:23 +00:00
Kevin Lo	5bbe0c5357	ether_ifattach() sets if_mtu to ETHERMTU, don't bother set it again Reviewed by: yongari	2012-01-07 09:41:57 +00:00
Luigi Rizzo	2d0d326d91	put back netmap support, deleted by mistake in a previous commit	2011-12-22 15:33:41 +00:00
John Baldwin	ef93f57495	Restore the sysctl changes from 223676 and 227309 lost in the previous import: - Add read-only sysctls for all of the tunables supported by the igb and em drivers. - Make the per-instance 'enable_aim' sysctl truly per-instance by having it change a per-instance variable (which is used to control AIM) rather than having all of the per-instance sysctls operate on a single global variable. While here, restore the previously existing hw.igb.rx_processing_limit tunable as it is very useful to be able to set a default tunable that applies to all adapters in the system.	2011-12-21 20:10:11 +00:00
Matthew D Fleming	30a497c860	Consistently use types in e1000 driver code: - Two struct members eee_disable are used in a function that expects an int *, so declare them int, not bool. - igb_tx_ctx_setup() returns a boolean value, so declare it bool, not int. - igb_header_split is passed to TUNABLE_INT, so delcare it int, not bool. - igb_tso_setup() returns a bool, so declare it bool, not boolean_t. - Do not re-define bool/true/false if the symbols already exist. MFC after: 2 weeks Sponsored by: Isilon Systems, LLC	2011-12-12 18:27:34 +00:00
Jack F Vogel	62aca36544	Last change still had an issue, one more time...	2011-12-11 18:46:14 +00:00
Jack F Vogel	133f283b45	Correct LINT build issues in the ioctl code.	2011-12-11 09:37:25 +00:00
Jack F Vogel	fd33ce416e	Part 2 of 2 New deltas for the 1G drivers. There have still been intermittent problems with apparent TX hangs for some customers. These have been problematic to reproduce but I believe these changes will address them. Testing on a number of fronts have been positive. EM: there is an important 'chicken bit' fix for 82574 in the shared code this is supported in the core here. - The TX path has been tightened up to improve performance. In particular UDP with jumbo frames was having problems, and the changes here have improved that. - OACTIVE has been used more carefully on the theory that some hangs may be due to a problem in this interaction - Problems with the RX init code, the "lazy" allocation and ring initialization has been found to cause problems in some newer client systems, and as it really is not that big a win (its not in a hot path) it seems best to remove it. - HWTSO was broken when VLAN HWTAGGING or HWFILTER is used, I found this was due to an error in setting up the descriptors in em_xmit. IGB: - TX is also improved here. With multiqueue I realized its very important to handle OACTIVE only under the CORE lock so there are no races between the queues. - Flow Control handling was broken in a couple ways, I have changed and I hope improved that in this delta. - UDP also had a problem in the TX path here, it was change to improve that. - On some hardware, with the driver static, a weird stray interrupt seems to sometimes fire and cause a panic in the RX mbuf refresh code. This is addressed by setting interrupts late in the init path, and also to set all interrupts bits off at the start of that.	2011-12-10 07:08:52 +00:00
Luigi Rizzo	579a6e3c4e	add netmap support for "em", "lem", "igb" and "re". On my hardware, "em" in netmap mode does about 1.388 Mpps on one card (on an Asus motherboard), and 1.1 Mpps on another card (PCIe bus). Both seem to be NIC-limited, because i have the same rate even with the CPU running at 150 MHz. On the "re" driver the tx throughput is around 420-450 Kpps on various (8111C and the like) chipsets. On the Rx side performance seems much better, and i can receive the full load generated by the "em" cards. "igb" is untested as i don't have the hardware.	2011-12-05 15:33:13 +00:00
Ed Schouten	6472ac3d8a	Mark all SYSCTL_NODEs static that have no corresponding SYSCTL_DECLs. The SYSCTL_NODE macro defines a list that stores all child-elements of that node. If there's no SYSCTL_DECL macro anywhere else, there's no reason why it shouldn't be static.	2011-11-07 15:43:11 +00:00
Jack F Vogel	980729b244	A fix to make the LINT-NOINET build happy, if this works out the ixgbe driver should be changed as well.	2011-07-07 00:46:50 +00:00
John Baldwin	b37e0f6e9a	- Add read-only sysctls for all of the tunables supported by the igb and em drivers. - Make the per-instance 'enable_aim' sysctl truly per-instance by having it change a per-instance variable (which is used to control AIM) rather than having all of the per-instance sysctls operate on a single global variable. Reviewed by: jfv (earlier version) MFC after: 1 week	2011-06-29 16:20:52 +00:00
Jack F Vogel	61cf9896b6	Put back the global for rx processing due to popular demand.	2011-06-23 17:42:27 +00:00
Jack F Vogel	1c1010187a	Eliminate some global tuneables in favor of adapter-specific, particular flow control and dma coalesce. Also improve the sysctl operation on those too. Add IPv6 detection in the ioctl code, this was done for ixgbe first, carrying that over. Add resource ability to disable particular adapter. Add HW TSO capability so vlans can make use of TSO	2011-06-20 22:59:29 +00:00
John Baldwin	1c2ed38455	- Use a dedicated task to handle deferred transmits from the if_transmit method instead of reusing the existing per-queue interrupt task. Reusing the per-queue interrupt task could result in both an interrupt thread and the taskqueue thread trying to handle received packets on a single queue resulting in out-of-order packet processing. - Don't define igb_start() at all on 8.0 and where if_transmit is used. Replace last remaining call to igb_start() with a loop to kick off transmit on each queue instead. - Call ether_ifdetach() earlier in igb_detach(). - Drain tasks and free taskqueues during igb_detach(). Reviewed by: jfv MFC after: 1 week	2011-06-17 20:06:52 +00:00
Jack F Vogel	cf696f26c9	Important update for the igb driver: - Add the change made in em to the actual unrefreshed number of descriptors is used as a basis in rxeof on the way out to determine if more refresh is needed. NOTE: there is a difference in the ring setup in igb, this is not accidental, it is necessitated by hardware behavior, when you reset the newer adapters it will not let you write RDH, it ALWAYS sets it to 0. Thus the way em does it is not possible. - Change the sysctl handling of flow control, it will now make the change dynamically when the variable setting changes rather than requiring a reset. - Change the eee sysctl naming, validation found the old unintuitive :) - Last but not least, some important performance tweaks in the TX path, I found that UDP behavior could be drastically hindered or improved with just small changes in the start loop. What I have here is what testing has shown to be the best overall. Its interesting to note that changing the clean threshold to start at a full half of the ring, made a BIG difference in performance. I hope that this will prove to be advantageous for most workloads. MFC in a week.	2011-04-05 21:55:43 +00:00
Jack F Vogel	1fd3c44f77	This delta updates the em driver to version 7.2.2 which has been undergoing test for some weeks. This improves the RX mbuf handling to avoid system hang due to depletion. Thanks to all those who have been testing the code, and to Beezar Liu for the design changes. Next the igb driver is updated for similar RX changes, but also to add new features support for our upcoming i350 family of adapters. MFC after a week	2011-03-18 18:54:00 +00:00
Jack F Vogel	15cb0f401a	Somehow the RX ring depletion fix got partially removed, replace the missing pieces.	2011-02-11 19:49:07 +00:00
Jack F Vogel	f0ecc46d04	Add support for the new I350 family of 1G interfaces. - this also includes virtualization support on these devices Correct some vlan issues we were seeing in test, jumbo frames on vlans did not work correctly, this was all due to confused logic around HW filters, the new code should now work for all uses. Important fix: when mbuf resources are depeleted, it was possible to completely empty the RX ring, and then the RX engine would stall forever. This is fixed by a flag being set whenever the refresh code fails due to an mbuf shortage, also the local timer now makes sure that all queues get an interrupt when it runs, the interrupt code will then always call rxeof, and in that routine the first thing done is now to check the refresh flag and call refresh_mbufs. This has been verified to fix this type 'hang'. Similar code will follow in the other drivers. Finally, sync up shared code for the I350 support. Thanks to everyone that has been reporting issues, and helping in the debug/test process!!	2011-02-11 01:00:26 +00:00
Matthew D Fleming	5bc0787f29	Specify a CTLTYPE_FOO so that a future sysctl(8) change does not need to rely on the format string.	2011-01-18 21:14:23 +00:00
Matthew D Fleming	8c49f18771	sysctl(9) cleanup checkpoint: amd64 GENERIC builds cleanly. Commit the Intel drivers.	2011-01-12 19:53:23 +00:00
Jack F Vogel	aa8dea58c6	Remove the bogus test in the TX context setup for IPV6, the size can be smaller than the constant when you are doing HW TAGGING, and you still need to process this packet in a normal way. I'm not sure where the notion to just return came from, but its wrong. MFC after: 3 days	2010-12-04 02:04:02 +00:00
Jack F Vogel	46b9f1ff9f	- New 82580 devices supported - Fixes from John Baldwin: vlan shadow tables made per/interface, make vlan hw setup only happen when capability enabled, and finally, make a tuneable interrupt rate. Thanks John! - Tweaked watchdog handling to avoid any false positives, now detection is in the TX clean path, with only the final check and init happening in the local timer. - limit queues to 8 for all devices, with 82576 or 82580 on larger machines it can get greater than this, and it seems mostly a resource waste to do so. Even 8 might be high but it can be manually reduced. - use 2k, 4k and now 9k clusters based on the MTU size. - rework the igb_refresh_mbuf() code, its important to make sure the descriptor is rewritten even when reusing mbufs since writeback clobbers things. MFC: in a few days, this delta needs to get to 8.2	2010-11-23 22:12:02 +00:00
Jack F Vogel	7d9119bdc4	Update code from Intel: - Sync shared code with Intel internal - New client chipset support added - em driver - fixes to 82574, limit queues to 1 but use MSIX - em driver - large changes in TX checksum offload and tso code, thanks to yongari. - some small changes for watchdog issues. - igb driver - local timer watchdog code was missing locking this and a couple other watchdog related fixes. - bug in rx discard found by Andrew Boyer, check for null pointer MFC: a week	2010-09-28 00:13:15 +00:00
John Baldwin	8385f4cf94	Tweak the stats exported by the e1000 drivers: - Add a single sysctl procedure to all three drivers to read an arbitrary register (the register is passed as arg2). Use it to replace existing routines in igb(4) that used a separate routine for each register, and to add support for missing stats in em(4) and lem(4). - Move the 'rx_overruns' and 'watchdog_timeouts' stats out of the MAC stats section as they are driver stats, not MAC counters. - Simplify the code that creates per-queue stats in igb(4) to use a single loop and remove duplicated code. - Properly read all 64 bits of the 'good octets received/transmitted' in em(4) and lem(4). - Actually read the interrupt count registers in em(4), and drop the 'host to card' sysctl stats from em(4) as they are not implemented in any of the hardware this driver supports. - Restore several stats to em(4) that were lost in the earlier stats conversion including per-queue stats. - Export several MAC stats in em(4) that were exported in igb(4) but not in em(4). - Export stats in lem(4) using individual sysctls as in em(4) and igb(4). Reviewed by: jfv MFC after: 1 week	2010-09-20 16:04:44 +00:00
Pyun YongHyeon	dd20cce19a	Do not allocate multicast array memory in multicast filter configuration function. For failed memory allocations, em(4)/lem(4) called panic(9) which is not acceptable on production box. igb(4)/ixgb(4)/ix(4) allocated the required memory in stack which consumed 768 bytes of stack memory which looks too big. To address these issues, allocate multicast array memory in device attach time and make multicast configuration success under any conditions. This change also removes the excessive use of memory in stack. Reviewed by: jfv	2010-08-28 00:34:22 +00:00
Pyun YongHyeon	ad1917be37	Do not call voluntary panic(9) in case of if_alloc() failure. Reviewed by: jfv	2010-08-28 00:09:19 +00:00
Pyun YongHyeon	5c8080f0d9	Make sure not to access unallocated stats memory. Reviewed by: jfv	2010-08-27 23:50:13 +00:00

1 2 3

105 Commits