Commit Graph

75 Commits

Author SHA1 Message Date
luigi
3ac0fcfb97 A bunch of netmap fixes:
USERSPACE:
1. add support for devices with different number of rx and tx queues;

2. add better support for zero-copy operation, adding an extra field
   to the netmap ring to indicate how many buffers we have already processed
   but not yet released (with help from Eddie Kohler);

3. The two changes above unfortunately require an API change, so while
   at it add a version field and some spares to the ioctl() argument
   to help detect mismatches.

4. update the manual page for the two changes above;

5. update sample applications in tools/tools/netmap

KERNEL:

1. simplify the internal structures moving the global wait queues
   to the 'struct netmap_adapter';

2. simplify the functions that map kring<->nic ring indexes

3. normalize device-specific code, helps mainteinance;

4. start exploring the impact of micro-optimizations (prefetch etc.)
   in the ixgbe driver.
   Use 'legacy' descriptors on the tx ring and prefetch slots gives
   about 20% speedup at 900 MHz. Another 7-10% would come from removing
   the explict calls to bus_dmamap* in the core (they are effectively
   NOPs in this case, but it takes expensive load of the per-buffer
   dma maps to figure out that they are all NULL.

   Rx performance not investigated.

I am postponing the MFC so i can import a few more improvements
before merging.
2012-02-27 19:05:01 +00:00
luigi
626117dcad (This commit only touches code within the DEV_NETMAP blocks)
Introduce some functions to map NIC ring indexes into netmap ring
indexes and vice versa. This way we can implement the bound
checks only in one place (and hopefully in a correct way).

On passing, make the code and comments more uniform across the
various drivers.
2012-02-15 23:13:29 +00:00
jfv
ae08051fd8 Wrap the bool typedef 2012-01-30 23:03:21 +00:00
jfv
f19302c07e New hardware support: Intel X540 adapter support added.
Some shared code reorganization along with the new adapter.
Sync changes to OACTIVE in igb into this driver.
Misc small fixes.
2012-01-30 16:42:02 +00:00
luigi
ef0f580b11 ixgbe changes:
- remove experimental code for disabling CRC
- use the correct constant for conversion between interrupt rate
  and EITR values (the previous values were off by a factor of 2)
- make dev.ix.N.queueM.interrupt_rate a RW sysctl variable.
  Changing individual values affects the queue immediately,
  and propagates to all interfaces at the next reinit.
- add dev.ix.N.queueM.irqs rdonly sysctl, to export the actual
  interrupt counts

Netmap-related changes for ixgbe:
- use the "new" format for TX descriptors in netmap mode.
- pass interrupt mitigation delays to the user process doing poll()
  on a netmap file descriptor.
  On the RX side this means we will not check the ring more than once
  per interrupt. This gives the process a chance to sleep and process
  packets in larger batches, thus reducing CPU usage.
  On the TX side we take this even further: completed transmissions are
  reclaimed every half ring even if the NIC interrupts more often.
  This saves even more CPU without any additional tx delays.

Generic Netmap-related changes:
- align the netmap_kring to cache lines so that there is no false sharing
  (possibly useful for multiqueue NICs and MSIX interrupts, which are
  handled by different cores). It's a minor improvement but it does not
  cost anything.

Reviewed by:	Jack Vogel
Approved by:	Jack Vogel
2012-01-26 09:55:16 +00:00
luigi
c20448129b netmap-related changes:
1. correct the initialization of RDT when there is an ixgbe_init()
   while a netmap client is active. This code was previously
   in ixgbe_initialize_receive_units() but RDT is overwritten
   shortly afterwards in ixgbe_init_locked()

2. add code (not active yet) to disable CRCSTRIP while in netmap mode.
   From all evidence i could gather, it seems that when the 82599 has to
   write a data block that is not a full cache line, it first reads
   the line (64 bytes) and then writes back the updated version.
   This hurts reception of min-sized frames, which are only 60 bytes
   if the CRC is stripped: i could never get above 11Mpps
   (received from one queue) with CRCSTRIP enabled, whyle 64+4-byte
   packets reach 14.2 Mpps (the theoretical maximum).
   Leaving the CRC in gets us 14.88Mpps for 60+4 byte frames,
   (and penalizes 64+4). The min-size case is important not just because
   it looks good in benchmarks, but also because this is the size
   of pure acks.
   Note we cannot leave CRCSTRIP on by default because it is
   incompatible with some other features (LRO etc.)
2012-01-19 09:36:19 +00:00
luigi
9b3b9221ef small code cleanup in preparation for future modifications in
the memory allocator used by netmap. No functional change,
two small bug fixes:
- in if_re.c add a missing bus_dmamap_sync()
- in netmap.c comment out a spurious free() in an error handling block
2012-01-10 19:57:23 +00:00
kevlo
41b065b8d4 ether_ifattach() sets if_mtu to ETHERMTU, don't bother set it again
Reviewed by:	yongari
2012-01-07 09:41:57 +00:00
mdf
d81aa8478d Consistently use types in ixgbe driver code:
- {ixgbe,ixv}_header_split is passed to TUNABLE_INT, so delcare it
   int, not bool.
 - {ixgbe,ixv}_tx_ctx_setup() returns a boolean value, so declare it
   bool, not int.
 - {ixgbe,ixv}_tso_setup() returns a bool, so declare it bool, not boolean_t.
 - {ixgbe,ixv}_txeof() returns a bool, so declare it bool, not boolean_t.
 - Do not re-define bool if the symbol already exists.

MFC after:	2 weeks
Sponsored by:	Isilon Systems, LLC
2011-12-12 18:27:28 +00:00
luigi
298ffde665 1. Fix the handling of link reset while in netmap more.
A link reset now is completely transparent for the netmap client:
   even if the NIC resets its own ring (e.g. restarting from 0),
   the client will not see any change in the current rx/tx positions,
   because the driver will keep track of the offset between the two.

2. make the device-specific code more uniform across different drivers
   There were some inconsistencies in the implementation of the netmap
   support routines, now drivers have been aligned to a common
   code structure.

3. import netmap support for ixgbe . This is implemented as a very
   small patch for ixgbe.c (233 lines, 11 chunks, mostly comments:
   in total the patch has only 54 lines of new code) , as most of
   the code is in an external file sys/dev/netmap/ixgbe_netmap.h ,
   following some initial comments from Jack Vogel about making
   changes less intrusive.
   (Note, i have emailed Jack multiple times asking if he had
   comments on this structure of the code; i got no reply so
   i assume he is fine with it).

Support for other drivers (em, lem, re, igb) will come later.

"ixgbe" is now the reference driver for netmap support. Both the
external file (sys/dev/netmap/ixgbe_netmap.h) and the device-specific
patches (in sys/dev/ixgbe/ixgbe.c) are heavily commented and should
serve as a reference for other device drivers.

Tested on i386 and amd64 with the pkt-gen program in tools/tools/netmap,
the sender does 14.88 Mpps at 1050 Mhz and 14.2 Mpps at 900 MHz
on an i7-860 with 4 cores and 82599 card. Haven't tried yet more
aggressive optimizations such as adding 'prefetch' instructions
in the time-critical parts of the code.
2011-12-05 12:06:53 +00:00
qingli
a3cb964a30 The maximum read size of incoming packets is done in 1024-byte increments.
The current code was rounding down the maximum frame size instead of
routing up, resulting in a read size of 1024 bytes, in the non-jumbo
frame case, and splitting the packets across multiple mbufs.

Consequently the above problem exposed another issue, which is when
packets were splitted across multiple mbufs, and all of the mbufs in the
chain have the M_PKTHDR flag set.

Submitted by:	original patch by Ray Ruvinskiy at BlueCoat dot com
Reviewed by:	jfv, kmacy, rwatson
Approved by:	re (rwatson)
MFC after:	5 days
2011-09-05 17:54:19 +00:00
jfv
a420ce0fad Cut and paste mistake corrected. 2011-06-02 05:31:54 +00:00
jfv
64fa4da5ee First off: update the driver README, the old one was horribly
crusty, and this still isn't perfect, but its at least a bit
more recent.

Secondly, a few improvements to the driver from Andrew Boyer,
support hint to allow devices to not attach, add VLAN_HWTSO
capability so vlans can use TSO, fix in the interrupt handler
to make sure the stack TX queue is processed. Oh, and also
make sure IPv6 does not cause a re-init in the ioctl routine.
Thanks for your efforts Andrew!

Thanks to Claudio Jeker for noticing the ixgbe_xmit() routine
was not correctly swapping the dma map from the first to the
last descriptor in a multi-descriptor transmission, corrected
this.
2011-06-02 00:34:57 +00:00
jfv
6ec3bafb99 Add a #define for driver portability to older OS 2011-04-28 23:21:40 +00:00
jfv
ebe803c491 - Add the RX refresh changes from igb to ixgbe
- Also a couple minor tweaks to the TX code from the same source.
- Add the INET ioctl code which has been missing from this driver,
  and which caused IP aliases to reset the interface.
- Last, some minor logic changes that just reflect upcoming
  hardware support, but have no other functional effect now.

MFC after a week
2011-04-25 23:34:21 +00:00
jhb
00c3c01f4f Do a sweep of the tree replacing calls to pci_find_extcap() with calls to
pci_find_cap() instead.
2011-03-23 13:10:15 +00:00
jfv
809122d5c7 Don't bother to run the flowcontrol code if there
is no change. Thanks to Andrew for the tweak.
2011-01-22 00:19:15 +00:00
jfv
2d67362e24 Missing case for 82598DA type adapter, thanks Andrew. 2011-01-22 00:08:06 +00:00
jfv
50008afcfe Leftover bogus TX UNLOCK removed. Thanks to
Andrew Boyer.
2011-01-21 23:55:28 +00:00
jfv
fbad453baa Update driver to version 2.3.8:
CRITICAL FIX - with stats changes the older 82598 will panic
	and trash the stack on driver load, FCOE registers ONLY exist
	in 82599 and must not be read otherwise.

	kern/153951 - to correct incorrect media type on adapters
	with pluggable modules I have eliminated the old static
	table in favor of a new dynamic shared code routine. This
	also has the benefit of detecting changes when a different
	module is inserted.

	Performance/enhancement to the Flow Director code from my
	linux coworker (the developer of the code).

	Fixes from Michael Tuexen - a data corruption problem on the
	82599 (CRITICAL), fix so the buf size correctly adjusts as
	the cluster changes, and max descriptors are set properly.
	Also added 16K clusters for those REALLY big jumbos :)

	In the RX path, the RX LOCK was not being released, and this
	causes LOR problems. Add the code that igb already has.

	Sync with in house shared code, this was necessary for the
	Flow Director fix.

MFC in 2 days
2011-01-19 19:36:27 +00:00
mdf
a21f29df61 Specify a CTLTYPE_FOO so that a future sysctl(8) change does not need
to rely on the format string.
2011-01-18 21:14:23 +00:00
mdf
0898b4847c sysctl(9) cleanup checkpoint: amd64 GENERIC builds cleanly.
Commit the Intel drivers.
2011-01-12 19:53:23 +00:00
jfv
16de4d95f5 CSUM flags need to be OS version sensitive in ixv code
MFC in 3 days
2011-01-07 23:39:41 +00:00
jfv
cc26d4fc8d kern/150247 - virtualization code also needs fix for 7.X to be buildable...
MFC in 3 days
2011-01-07 23:19:13 +00:00
jfv
9e1a6b2f78 Fix to kern/150247 - make ixgbe buildable for 7.x 2011-01-07 22:58:12 +00:00
jfv
b7f688f9ee kern/153772 fix variable names.
Thank you Andrew Boyer for catching these

MFC in 3 days
2011-01-07 22:34:56 +00:00
jfv
0aa625fd3e This small little change is a bug that drove me nuts
finding. The test to compare the mbuf m_len against
a fixed value and then returning needs to be removed.

When using VLANS and doing HW_TAGGING, and IPV6, the
ICMP6 packets actually fail this condition, the constant
assumes that the tag is IN the frame, and its not, so
the length is actually tiny. Furthermore, I'm not sure
what the point was to just return??

MFC after: 3 days
2010-12-04 01:43:38 +00:00
jfv
27ac42d29b Interrupt handler, and stats changes from Michael Tuexen,
thanks Michael!
2010-11-27 01:34:09 +00:00
jfv
a79f9a9e7d and the header... 2010-11-27 00:00:33 +00:00
jfv
649f983bb9 A couple fixes got clobbered, putting them back. 2010-11-26 23:57:13 +00:00
jfv
48bbd73f9a Update ixgbe driver to verion 2.3.6
- This adds a VM SRIOV interface, ixv, it is however
	  transparent to the user, it links with the ixgbe.ko,
	  but when ixgbe is loaded in a virtualized guest with
	  SRIOV configured this will be detected.
	- Sync shared code to latest
	- Many bug fixes and improvements, thanks to everyone
	  who has been using the driver and reporting issues.
2010-11-26 22:46:32 +00:00
brucec
696c4e1f9b Fix typos.
PR:	bin/148894
Submitted by:	olgeni
2010-11-09 10:59:09 +00:00
yongari
8203781f0e Do not allocate multicast array memory in multicast filter
configuration function. For failed memory allocations, em(4)/lem(4)
called panic(9) which is not acceptable on production box.
igb(4)/ixgb(4)/ix(4) allocated the required memory in stack which
consumed 768 bytes of stack memory which looks too big.

To address these issues, allocate multicast array memory in device
attach time and make multicast configuration success under any
conditions. This change also removes the excessive use of memory in
stack.

Reviewed by:	jfv
2010-08-28 00:34:22 +00:00
yongari
56b7a9338b Do not call voluntary panic(9) in case of if_alloc() failure.
Reviewed by:	jfv
2010-08-28 00:09:19 +00:00
kevlo
08695e1982 Fix build 2010-07-01 05:03:24 +00:00
jfv
af5d2bea56 Left out header change in last delta - new member
in adapter so that advertise changes can be done
to one port without the other changing.
2010-06-30 16:28:28 +00:00
glebius
9d2ae47a68 Fix build. 2010-06-30 11:17:55 +00:00
jfv
b5829f3d30 BAH, I apologize, the wrong version of the code got
fat fingered in place, this is the correct version
that actually works... <sheepish grin>

MFC: in a week
2010-06-30 01:10:08 +00:00
jfv
45306e3bca Add a new sysctl option, this will allow one to
limit the advertised speed of an SFP+ to 1G, effectively
"forcing" link at that lower speed. It is off by default
and is enabled by sysctl dev.ix.0.force_gig=1, 0 will
set it back to the norm.
2010-06-30 01:01:06 +00:00
jfv
3808a1936a Change the mbuf memory calls back to NOWAIT as a
problem has been seen in one case with doing the
M_WAITOK
2010-06-11 20:59:29 +00:00
jfv
a674331779 Remove a disable_queue from the beginning of the
interrupt handler, automask handles it.
Also, add in msix vector descriptions.

MFC for 8.1 asap
2010-06-11 19:03:59 +00:00
jfv
3112aa2505 Fixes for panic experienced in test at Intel, when
doing bidirectional stress traffic on 82598.

Also a couple bug fixes from Michael Tuexen, thank you!!

Add a workaround into the header so that 8 REL can use
the driver (adds local copy of ALTQ fix).

MFC: in a few days
2010-06-03 00:00:45 +00:00
jfv
3d5feb1c8b A few changes:
When not defining header split do not allocate mbufs,
  this can be a BIG savings in the mbuf memory pool.

  Also keep seperate dma maps for the header and
  payload pieces when doing header split. The basis
  of this code was a patch done a while ago by
  yongari, thank you :)

  A number of white space changes.

MFC: in a few days
2010-05-19 00:03:48 +00:00
jfv
dd5138420b A few minor fixes:
- add a moderation value to the Link vector
   - allow disabling HW RSC on the 82599 if LRO
     is not enabled.
   - correct error in the stats code
   - change optic type on the 82598 DA device

Thanks to Andrew Boyer for the changes.
2010-05-14 22:00:37 +00:00
jfv
18191dba8a Remove the tx queue selection based on the cpu whe
no flowid is present, this was causing some bad
reordering, now just use 0.

Also, add a few watchdog bits, and tx handler bits
that were corrected in igb.
2010-04-16 16:33:05 +00:00
jfv
b0211eb589 fix my clobber of the copyright date :) 2010-03-30 19:54:29 +00:00
jfv
6244a53a85 Thanks to Michael Tuexen for adding SCTP support for 82599,
also for finding a one character bug that kept TSO from working.

Sometimes with direct attach cables a failure can occur in init,
the old method of calling detach was broken, there is no way to
return an error to the system from init, so I have changed it to
return failure thru the ioctl.

And, have fixed the ALTQ code changes of Max Laier, sorry Max :)
2010-03-30 19:09:18 +00:00
jfv
8918ac92eb Update the driver to Intel version 2.1.6
- add some new hardware support for 82599
	- Big change to interrupt architecture, it now
	  uses a queue which contains an RX/TX pair as
	  the recipient of the interrupt. This will reduce
	  overall system interrupts/msix usage.
	- Improved RX mbuf handling: the old get_buf routine
	  is no longer synchronized with rxeof, this allows
	  the elimination of packet discards due to mbuf
	  allocation failure.
	- Much simplified and improved AIM code, it now
	  happens in the queue interrupt context and takes
	  into account both the traffic on the RX AND TX
	  side.
	- variety of small tweaks, like ring size, that have
	  been seen as performance improvements.
	- Thanks to those that provided feedback or suggested
	  changes, I hope I've caught all of them.
2010-03-27 00:21:40 +00:00
mlaier
b056acb864 Fix drbr and altq interaction:
- introduce drbr_needs_enqueue that returns whether the interface/br needs
   an enqueue operation: returns true if altq is enabled or there are
   already packets in the ring (as we need to maintain packet order)
 - update all drbr consumers
 - fix drbr_flush
 - avoid using the driver queue (IFQ_DRV_*) in the altq case as the
   multiqueue consumer does not provide enough protection, serialize altq
   interaction with the main queue lock
 - make drbr_dequeue_cond work with altq

Discussed with:		kmacy, yongari, jfv
MFC after:		4 weeks
2010-02-13 16:04:58 +00:00
mbr
7450f52a57 Remove extraneous semicolons, no functional changes.
Submitted by:	Marc Balmer <marc@msys.ch>
MFC after:	1 week
2010-01-07 21:01:37 +00:00