Commit Graph

85 Commits

Author SHA1 Message Date
Luigi Rizzo
64ae02c365 A bunch of netmap fixes:
USERSPACE:
1. add support for devices with different number of rx and tx queues;

2. add better support for zero-copy operation, adding an extra field
   to the netmap ring to indicate how many buffers we have already processed
   but not yet released (with help from Eddie Kohler);

3. The two changes above unfortunately require an API change, so while
   at it add a version field and some spares to the ioctl() argument
   to help detect mismatches.

4. update the manual page for the two changes above;

5. update sample applications in tools/tools/netmap

KERNEL:

1. simplify the internal structures moving the global wait queues
   to the 'struct netmap_adapter';

2. simplify the functions that map kring<->nic ring indexes

3. normalize device-specific code, helps mainteinance;

4. start exploring the impact of micro-optimizations (prefetch etc.)
   in the ixgbe driver.
   Use 'legacy' descriptors on the tx ring and prefetch slots gives
   about 20% speedup at 900 MHz. Another 7-10% would come from removing
   the explict calls to bus_dmamap* in the core (they are effectively
   NOPs in this case, but it takes expensive load of the per-buffer
   dma maps to figure out that they are all NULL.

   Rx performance not investigated.

I am postponing the MFC so i can import a few more improvements
before merging.
2012-02-27 19:05:01 +00:00
Luigi Rizzo
5644ccec61 (This commit only touches code within the DEV_NETMAP blocks)
Introduce some functions to map NIC ring indexes into netmap ring
indexes and vice versa. This way we can implement the bound
checks only in one place (and hopefully in a correct way).

On passing, make the code and comments more uniform across the
various drivers.
2012-02-15 23:13:29 +00:00
Luigi Rizzo
6e10c8b8c5 small code cleanup in preparation for future modifications in
the memory allocator used by netmap. No functional change,
two small bug fixes:
- in if_re.c add a missing bus_dmamap_sync()
- in netmap.c comment out a spurious free() in an error handling block
2012-01-10 19:57:23 +00:00
Kevin Lo
5bbe0c5357 ether_ifattach() sets if_mtu to ETHERMTU, don't bother set it again
Reviewed by:	yongari
2012-01-07 09:41:57 +00:00
Luigi Rizzo
2d0d326d91 put back netmap support, deleted by mistake in a previous commit 2011-12-22 15:33:41 +00:00
John Baldwin
ef93f57495 Restore the sysctl changes from 223676 and 227309 lost in the previous
import:
- Add read-only sysctls for all of the tunables supported by the igb and
  em drivers.
- Make the per-instance 'enable_aim' sysctl truly per-instance by having it
  change a per-instance variable (which is used to control AIM) rather
  than having all of the per-instance sysctls operate on a single global
  variable.

While here, restore the previously existing hw.igb.rx_processing_limit
tunable as it is very useful to be able to set a default tunable that
applies to all adapters in the system.
2011-12-21 20:10:11 +00:00
Matthew D Fleming
30a497c860 Consistently use types in e1000 driver code:
- Two struct members eee_disable are used in a function that expects
   an int *, so declare them int, not bool.
 - igb_tx_ctx_setup() returns a boolean value, so declare it bool, not int.
 - igb_header_split is passed to TUNABLE_INT, so delcare it int, not bool.
 - igb_tso_setup() returns a bool, so declare it bool, not boolean_t.
 - Do not re-define bool/true/false if the symbols already exist.

MFC after:	2 weeks
Sponsored by:	Isilon Systems, LLC
2011-12-12 18:27:34 +00:00
Jack F Vogel
62aca36544 Last change still had an issue, one more time... 2011-12-11 18:46:14 +00:00
Jack F Vogel
133f283b45 Correct LINT build issues in the ioctl code. 2011-12-11 09:37:25 +00:00
Jack F Vogel
fd33ce416e Part 2 of 2 New deltas for the 1G drivers.
There have still been intermittent problems with apparent TX
hangs for some customers. These have been problematic to reproduce
but I believe these changes will address them. Testing on a number
of fronts have been positive.

EM: there is an important 'chicken bit' fix for 82574 in the shared
code this is supported in the core here.
    - The TX path has been tightened up to improve performance. In
      particular UDP with jumbo frames was having problems, and the
      changes here have improved that.
    - OACTIVE has been used more carefully on the theory that some
      hangs may be due to a problem in this interaction
    - Problems with the RX init code, the "lazy" allocation and
      ring initialization has been found to cause problems in some
      newer client systems, and as it really is not that big a win
      (its not in a hot path) it seems best to remove it.
    - HWTSO was broken when VLAN HWTAGGING or HWFILTER is used, I
      found this was due to an error in setting up the descriptors
      in em_xmit.

IGB:
    - TX is also improved here. With multiqueue I realized its very
      important to handle OACTIVE only under the CORE lock so there
      are no races between the queues.
    - Flow Control handling was broken in a couple ways, I have changed
      and I hope improved that in this delta.
    - UDP also had a problem in the TX path here, it was change to
      improve that.
    - On some hardware, with the driver static, a weird stray interrupt
      seems to sometimes fire and cause a panic in the RX mbuf refresh
      code. This is addressed by setting interrupts late in the init
      path, and also to set all interrupts bits off at the start of that.
2011-12-10 07:08:52 +00:00
Luigi Rizzo
579a6e3c4e add netmap support for "em", "lem", "igb" and "re".
On my hardware, "em" in netmap mode does about 1.388 Mpps
on one card (on an Asus motherboard), and 1.1 Mpps on another
card (PCIe bus). Both seem to be NIC-limited, because
i have the same rate even with the CPU running at 150 MHz.

On the "re" driver the tx throughput is around 420-450 Kpps
on various (8111C and the like) chipsets. On the Rx side
performance seems much better, and i can receive the full
load generated by the "em" cards.

"igb" is untested as i don't have the hardware.
2011-12-05 15:33:13 +00:00
Ed Schouten
6472ac3d8a Mark all SYSCTL_NODEs static that have no corresponding SYSCTL_DECLs.
The SYSCTL_NODE macro defines a list that stores all child-elements of
that node. If there's no SYSCTL_DECL macro anywhere else, there's no
reason why it shouldn't be static.
2011-11-07 15:43:11 +00:00
Jack F Vogel
980729b244 A fix to make the LINT-NOINET build happy, if this
works out the ixgbe driver should be changed as well.
2011-07-07 00:46:50 +00:00
John Baldwin
b37e0f6e9a - Add read-only sysctls for all of the tunables supported by the igb and
em drivers.
- Make the per-instance 'enable_aim' sysctl truly per-instance by having it
  change a per-instance variable (which is used to control AIM) rather
  than having all of the per-instance sysctls operate on a single global
  variable.

Reviewed by:	jfv (earlier version)
MFC after:	1 week
2011-06-29 16:20:52 +00:00
Jack F Vogel
61cf9896b6 Put back the global for rx processing due to popular demand. 2011-06-23 17:42:27 +00:00
Jack F Vogel
1c1010187a Eliminate some global tuneables in favor of adapter-specific,
particular flow control and dma coalesce. Also improve the
sysctl operation on those too.

Add IPv6 detection in the ioctl code, this was done for
ixgbe first, carrying that over.

Add resource ability to disable particular adapter.

Add HW TSO capability so vlans can make use of TSO
2011-06-20 22:59:29 +00:00
John Baldwin
1c2ed38455 - Use a dedicated task to handle deferred transmits from the if_transmit
method instead of reusing the existing per-queue interrupt task.
  Reusing the per-queue interrupt task could result in both an interrupt
  thread and the taskqueue thread trying to handle received packets on a
  single queue resulting in out-of-order packet processing.
- Don't define igb_start() at all on 8.0 and where if_transmit is used.
  Replace last remaining call to igb_start() with a loop to kick off
  transmit on each queue instead.
- Call ether_ifdetach() earlier in igb_detach().
- Drain tasks and free taskqueues during igb_detach().

Reviewed by:	jfv
MFC after:	1 week
2011-06-17 20:06:52 +00:00
Jack F Vogel
cf696f26c9 Important update for the igb driver:
- Add the change made in em to the actual unrefreshed number
    of descriptors is used as a basis in rxeof on the way out
    to determine if more refresh is needed. NOTE: there is a
    difference in the ring setup in igb, this is not accidental,
    it is necessitated by hardware behavior, when you reset the
    newer adapters it will not let you write RDH, it ALWAYS sets
    it to 0. Thus the way em does it is not possible.
  - Change the sysctl handling of flow control, it will now make
    the change dynamically when the variable setting changes rather
    than requiring a reset.
  - Change the eee sysctl naming, validation found the old unintuitive :)
  - Last but not least, some important performance tweaks in the TX
    path, I found that UDP behavior could be drastically hindered or
    improved with just small changes in the start loop. What I have
    here is what testing has shown to be the best overall. Its interesting
    to note that changing the clean threshold to start at a full half of
    the ring, made a BIG difference in performance.  I hope that this
    will prove to be advantageous for most workloads.

MFC in a week.
2011-04-05 21:55:43 +00:00
Jack F Vogel
1fd3c44f77 This delta updates the em driver to version 7.2.2 which has
been undergoing test for some weeks. This improves the RX
mbuf handling to avoid system hang due to depletion. Thanks
to all those who have been testing the code, and to Beezar
Liu for the design changes.

Next the igb driver is updated for similar RX changes, but
also to add new features support for our upcoming i350 family
of adapters.

MFC after a week
2011-03-18 18:54:00 +00:00
Jack F Vogel
15cb0f401a Somehow the RX ring depletion fix got partially removed,
replace the missing pieces.
2011-02-11 19:49:07 +00:00
Jack F Vogel
f0ecc46d04 Add support for the new I350 family of 1G interfaces.
- this also includes virtualization support on these devices

Correct some vlan issues we were seeing in test, jumbo frames on vlans
did not work correctly, this was all due to confused logic around HW
filters, the new code should now work for all uses.

Important fix: when mbuf resources are depeleted, it was possible to
completely empty the RX ring, and then the RX engine would stall
forever. This is fixed by a flag being set whenever the refresh code
fails due to an mbuf shortage, also the local timer now makes sure
that all queues get an interrupt when it runs, the interrupt code
will then always call rxeof, and in that routine the first thing done
is now to check the refresh flag and call refresh_mbufs. This has been
verified to fix this type 'hang'. Similar code will follow in the other
drivers.

Finally, sync up shared code for the I350 support.

Thanks to everyone that has been reporting issues, and helping in the
debug/test process!!
2011-02-11 01:00:26 +00:00
Matthew D Fleming
5bc0787f29 Specify a CTLTYPE_FOO so that a future sysctl(8) change does not need
to rely on the format string.
2011-01-18 21:14:23 +00:00
Matthew D Fleming
8c49f18771 sysctl(9) cleanup checkpoint: amd64 GENERIC builds cleanly.
Commit the Intel drivers.
2011-01-12 19:53:23 +00:00
Jack F Vogel
aa8dea58c6 Remove the bogus test in the TX context setup for IPV6,
the size can be smaller than the constant when you are
doing HW TAGGING, and you still need to process this
packet in a normal way. I'm not sure where the notion
to just return came from, but its wrong.

MFC after: 3 days
2010-12-04 02:04:02 +00:00
Jack F Vogel
46b9f1ff9f - New 82580 devices supported
- Fixes from John Baldwin: vlan shadow tables made per/interface,
  make vlan hw setup only happen when capability enabled, and
  finally, make a tuneable interrupt rate. Thanks John!
- Tweaked watchdog handling to avoid any false positives, now
  detection is in the TX clean path, with only the final check
  and init happening in the local timer.
- limit queues to 8 for all devices, with 82576 or 82580 on
  larger machines it can get greater than this, and it seems
  mostly a resource waste to do so. Even 8 might be high but
  it can be manually reduced.
- use 2k, 4k and now 9k clusters based on the MTU size.
- rework the igb_refresh_mbuf() code, its important to
  make sure the descriptor is rewritten even when reusing
  mbufs since writeback clobbers things.

MFC: in a few days, this delta needs to get to 8.2
2010-11-23 22:12:02 +00:00
Jack F Vogel
7d9119bdc4 Update code from Intel:
- Sync shared code with Intel internal
	- New client chipset support added
	- em driver - fixes to 82574, limit queues to 1 but use MSIX
	- em driver - large changes in TX checksum offload and tso
	  code, thanks to yongari.
	- some small changes for watchdog issues.
	- igb driver - local timer watchdog code was missing locking
	  this and a couple other watchdog related fixes.
	- bug in rx discard found by Andrew Boyer, check for null pointer

MFC: a week
2010-09-28 00:13:15 +00:00
John Baldwin
8385f4cf94 Tweak the stats exported by the e1000 drivers:
- Add a single sysctl procedure to all three drivers to read an arbitrary
  register (the register is passed as arg2).  Use it to replace existing
  routines in igb(4) that used a separate routine for each register, and
  to add support for missing stats in em(4) and lem(4).
- Move the 'rx_overruns' and 'watchdog_timeouts' stats out of the MAC stats
  section as they are driver stats, not MAC counters.
- Simplify the code that creates per-queue stats in igb(4) to use a single
  loop and remove duplicated code.
- Properly read all 64 bits of the 'good octets received/transmitted' in
  em(4) and lem(4).
- Actually read the interrupt count registers in em(4), and drop the
  'host to card' sysctl stats from em(4) as they are not implemented in
  any of the hardware this driver supports.
- Restore several stats to em(4) that were lost in the earlier stats
  conversion including per-queue stats.
- Export several MAC stats in em(4) that were exported in igb(4) but not
  in em(4).
- Export stats in lem(4) using individual sysctls as in em(4) and igb(4).

Reviewed by:	jfv
MFC after:	1 week
2010-09-20 16:04:44 +00:00
Pyun YongHyeon
dd20cce19a Do not allocate multicast array memory in multicast filter
configuration function. For failed memory allocations, em(4)/lem(4)
called panic(9) which is not acceptable on production box.
igb(4)/ixgb(4)/ix(4) allocated the required memory in stack which
consumed 768 bytes of stack memory which looks too big.

To address these issues, allocate multicast array memory in device
attach time and make multicast configuration success under any
conditions. This change also removes the excessive use of memory in
stack.

Reviewed by:	jfv
2010-08-28 00:34:22 +00:00
Pyun YongHyeon
ad1917be37 Do not call voluntary panic(9) in case of if_alloc() failure.
Reviewed by:	jfv
2010-08-28 00:09:19 +00:00
Pyun YongHyeon
5c8080f0d9 Make sure not to access unallocated stats memory.
Reviewed by:	jfv
2010-08-27 23:50:13 +00:00
Jack F Vogel
1ed622958d Eliminate the ambiguous queue setting logic for
the VF, it made it possible to have 2 queues which
we don't want, the HOST is unable to handle it.
2010-08-19 17:00:33 +00:00
Jack F Vogel
3b9b3fc3bf Put the early setting of the MAC type back, its
removal resulted in broken code in MSIX setup.
2010-08-06 20:55:49 +00:00
George V. Neville-Neil
6de27dc09f style(9) fix
MFC after:	1 week
2010-07-24 18:53:46 +00:00
George V. Neville-Neil
dc1ccafedd Fix a bug in the statistics code for tracking the head and
tail pointers of the tx and rx queues.   We needed a SYSCTL_PROC
to correctly get the values at run time.

Submitted by:	Andrew Boyer aboyer at averesystems.com
MFC after:	1 week
2010-07-23 17:53:39 +00:00
Jack F Vogel
d6ff3d4d13 Fix of a VLAN problem by jhb, the checksum capability
got lost along the way.

MFC: asap
2010-07-09 17:11:29 +00:00
Jack F Vogel
b827058579 OK, I was a bit sleep this morning and checked in
the core changes but left out the shared code, lol.
Well, and a couple fixes to the core... hopefully
this will all be complete now.

Happy happy joy joy :)
2010-06-30 21:05:51 +00:00
Jack F Vogel
911fddbcb5 SR-IOV support added to igb
What this provides is support for the 'virtual function'
interface that a FreeBSD VM may be assigned from a host
like KVM on Linux, or newer versions of Xen with such
support.

When the guest is set up with the capability, a special
limited function 82576 PCI device is present in its virtual
PCI space, so with this driver installed in the guest that
device will be detected and function nearly like the bare
metal, as it were.

The interface is only allowed a single queue in this configuration
however initial performance tests have looked very good.

Enjoy!!
2010-06-30 17:26:47 +00:00
George V. Neville-Neil
5ea4d522bb Make sure that all the exposed counters and variables are actually
being updated.

Pointed out by:	jfv
2010-06-24 21:17:58 +00:00
George V. Neville-Neil
9dc69a1589 Move statistics into the sysctl tree making it easier to find
and use them.
Add previously hidden statistics, some of which include interrupt
and host/card communication counters.
2010-06-16 17:36:53 +00:00
Jack F Vogel
dfc14ce06e Changes from John Baldwin adding to last commit,
change rxeof api for poll friendliness, and
eliminate unnecessary link tasklet use. Thanks John!
2010-06-16 16:37:36 +00:00
Jack F Vogel
0dee704f01 Change to have legacy interrupts use the same
handler had a flaw, thanks to John Baldwin for
finding it. Change which queue legacy tasks are
enqueued on.

MFC: soonest
2010-06-15 21:11:51 +00:00
Jack F Vogel
e6ec2d1fa6 Put back the lost bus_describe_intr() calls. 2010-06-11 21:35:19 +00:00
Jack F Vogel
2e6bd30b46 Add a couple fixes from Michael Tuexen.
Remove unneeded rxtx handler, make que handler generic.
Do not allocate header mbufs in rx ring if not doing hdr split.
Release the lock in rxeof call to stack.

MFC for 8.1 asap
2010-06-11 20:54:27 +00:00
John Baldwin
c29ae5520a Restore part of 200671 which was lost in previous driver changes:
- Add interrupt descriptions when using mulitple MSI-X interrupts.
2010-05-20 20:01:54 +00:00
Jack F Vogel
46168c5453 Small changes preparing for MFC, need to conditionalize
the buf_ring_free call, and lem is missing the WOL change
put into em.
2010-05-14 22:18:34 +00:00
Jack F Vogel
d43a118797 Add a missing fragment in the tx msix handler to invoke
another if all work is not done.

Sync the igb driver with changes suggested by yongari and
made in em, these made sense to be in both drivers.
2010-04-14 20:55:33 +00:00
Jack F Vogel
99f8e467e8 DUH, must be tired, I missed the second instance...
time for the weekend :)
2010-04-09 21:18:46 +00:00
Jack F Vogel
af8a24c8d0 Thanks to Michael Tuexen for catching this, bit set that
keeps the clock from being reset when writing to EITR was
incorrect, also there is a shared code #define for it anyway.
2010-04-09 21:16:45 +00:00
Jack F Vogel
500e8f26d7 Important fix got clobbered in the em driver, keeping
VLAN HWFILTER from being used by default, this breaks
stacked pseudo devices, and as it turns out, also breaks
virtual machines that happen to use VLANS (didn't know that
before :). Put the fix back into the em driver, and for good
measure add the same code to the igb driver where it should
have been anyway.
2010-04-08 00:50:43 +00:00
Jack F Vogel
657c04c3f8 The POLL code was missed in the queue conversion,
change the argument type to igb_rxeof() to the
correct type. Note, any users of POLLING must
be sure and set the number of queues to 1 for
things to work correctly.
2010-03-31 23:24:42 +00:00