Commit Graph

353 Commits

Author SHA1 Message Date
Adrian Chadd
9756bd5982 Change uses of taskqueue_start_threads_pinned() -> taskqueue_start_threads_cpuset()
Differential Revision:	https://reviews.freebsd.org/D1897
Reviewed by:	jfv
2015-02-24 22:17:12 +00:00
Adrian Chadd
b2bdc62a95 Refactor / restructure the RSS code into generic, IPv4 and IPv6 specific
bits.

The motivation here is to eventually teach netisr and potentially
other networking subsystems a bit more about how RSS work queues / buckets
are configured so things have a hope of auto-configuring in the future.

* net/rss_config.[ch] takes care of the generic bits for doing
  configuration, hash function selection, etc;
* topelitz.[ch] is now in net/ rather than netinet/;
* (and would be in libkern if it didn't directly include RSS_KEYSIZE;
  that's a later thing to fix up.)
* netinet/in_rss.[ch] now just contains the IPv4 specific methods;
* and netinet/in6_rss.[ch] now just contains the IPv6 specific methods.

This should have no functional impact on anyone currently using
the RSS support.

Differential Revision:	D1383
Reviewed by:	gnn, jfv (intel driver bits)
2015-01-18 18:06:40 +00:00
Jack F Vogel
4da1bbcda5 Revert r275136, it was not approved, it was sloppy, if a feature
like this is needed please resubmit for Intel's approval.
2014-12-02 23:02:57 +00:00
Hans Petter Selasky
c25290420e Start process of removing the use of the deprecated "M_FLOWID" flag
from the FreeBSD network code. The flag is still kept around in the
"sys/mbuf.h" header file, but does no longer have any users. Instead
the "m_pkthdr.rsstype" field in the mbuf structure is now used to
decide the meaning of the "m_pkthdr.flowid" field. To modify the
"m_pkthdr.rsstype" field please use the existing "M_HASHTYPE_XXX"
macros as defined in the "sys/mbuf.h" header file.

This patch introduces new behaviour in the transmit direction.
Previously network drivers checked if "M_FLOWID" was set in "m_flags"
before using the "m_pkthdr.flowid" field. This check has now now been
replaced by checking if "M_HASHTYPE_GET(m)" is different from
"M_HASHTYPE_NONE". In the future more hashtypes will be added, for
example hashtypes for hardware dedicated flows.

"M_HASHTYPE_OPAQUE" indicates that the "m_pkthdr.flowid" value is
valid and has no particular type. This change removes the need for an
"if" statement in TCP transmit code checking for the presence of a
valid flowid value. The "if" statement mentioned above is now a direct
variable assignment which is then later checked by the respective
network drivers like before.

Additional notes:
- The SCTP code changes will be committed as a separate patch.
- Removal of the "M_FLOWID" flag will also be done separately.
- The FreeBSD version has been bumped.

MFC after:	1 month
Sponsored by:	Mellanox Technologies
2014-12-01 11:45:24 +00:00
Alfred Perlstein
56c14bca7e Make igb and ixgbe check tunables at probe time.
This allows one to make a kernel module to tune the
number of queues before the driver loads.

This is needed so that a module at SI_SUB_CPU can set
tunables for these drivers to take.  Otherwise getenv
is called too early by the TUNABLE macros.

Reviewed by: smh
Phabric: https://reviews.freebsd.org/D1149
2014-11-26 20:19:36 +00:00
Hans Petter Selasky
f0188618f2 Fix multiple incorrect SYSCTL arguments in the kernel:
- Wrong integer type was specified.

- Wrong or missing "access" specifier. The "access" specifier
sometimes included the SYSCTL type, which it should not, except for
procedural SYSCTL nodes.

- Logical OR where binary OR was expected.

- Properly assert the "access" argument passed to all SYSCTL macros,
using the CTASSERT macro. This applies to both static- and dynamically
created SYSCTLs.

- Properly assert the the data type for both static and dynamic
SYSCTLs. In the case of static SYSCTLs we only assert that the data
pointed to by the SYSCTL data pointer has the correct size, hence
there is no easy way to assert types in the C language outside a
C-function.

- Rewrote some code which doesn't pass a constant "access" specifier
when creating dynamic SYSCTL nodes, which is now a requirement.

- Updated "EXAMPLES" section in SYSCTL manual page.

MFC after:	3 days
Sponsored by:	Mellanox Technologies
2014-10-21 07:31:21 +00:00
John Baldwin
8423f42aa8 Various fixes to stats:
- Read the counts of received, dropped, and transmitted management
  packets and add sysctl nodes for them.
- Fix the total octets received/transmitted to read all 64 bits of
  the counters.
- Add missing sysctl nodes for rlec, tncrs, fcruc, tor, and tot.
- Remove spurious spaces.

Reviewed by:	Eric Joyner @ Intel
MFC after:	1 week
2014-10-10 16:36:25 +00:00
Gleb Smirnoff
bd071d4d19 - Remove empty wrappers ether_poll_[de]register_drv(). [1]
- Move polling(9) declarations out of ifq.h back to if_var.h
  they are absolutely unrelated to queues.

Submitted by:	Mikhail <mp lenta.ru> [1]
2014-09-28 14:05:18 +00:00
Gleb Smirnoff
5d53210ced - Provide igb_get_counter() to return counters that are not collected,
but taken from hardware.
- Mechanically convert to if_inc_counter() the rest of counters.
2014-09-19 11:49:41 +00:00
Adrian Chadd
0936a8208b Fix the handling of EOP in status descriptors for if_igb(4) and don't
double-free mbufs.

Like ixgbe(4) chipsets, EOP is only set on the final descriptor
in a chain of descriptors.  So, to free the whole list of descriptors,
we should free the current slot _and_ the assembled list of descriptors
that make up the fragment list.

The existing code was setting discard once it saw EOP + an error status;
it then freed all the subsequent descriptors until the next EOP. That's
totally the wrong order.
2014-09-18 16:20:17 +00:00
Gleb Smirnoff
df3601781d - Use if_inc_counter() to increment various counters.
- Do not ever set a counter to a value. For those counters
  that we don't increment, but return directly from hardware
  create cases in if_get_counter() method.

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2014-09-18 15:56:14 +00:00
Adrian Chadd
1c2427605c Set DROP_EN on each RX queue if transmit flow-control is disabled.
This allows the NIC to drop frames on the receive queue and not
cause the MAC to block on receiving to _any_ queue.

Tested:

igb0@pci0:5:0:0:        class=0x020000 card=0x152115d9 chip=0x15218086 rev=0x01 hdr=0x00
    vendor     = 'Intel Corporation'
    device     = 'I350 Gigabit Network Connection'
    class      = network
    subclass   = ethernet

Discussed with: Eric Joyner <eric.joyner@intel.com>

MFC after:	1 week
Sponsored by:	Norse Corp, Inc.
2014-09-15 19:53:49 +00:00
Gleb Smirnoff
09a8241fc9 It is actually possible to have if_t a typedef to non-void type,
and keep both converted to drvapi and non-converted drivers
compilable.

o Make if_t typedef to struct ifnet *.
o Remove shim functions.

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2014-08-31 12:48:13 +00:00
Gleb Smirnoff
1bffa9511f Use define from if_var.h to access a field inside struct if_data,
that resides in struct ifnet.

Sponsored by:	Nginx, Inc.
2014-08-30 19:55:54 +00:00
Luigi Rizzo
4bf50f18eb Update to the current version of netmap.
Mostly bugfixes or features developed in the past 6 months,
so this is a 10.1 candidate.

Basically no user API changes (some bugfixes in sys/net/netmap_user.h).

In detail:

1. netmap support for virtio-net, including in netmap mode.
  Under bhyve and with a netmap backend [2] we reach over 1Mpps
  with standard APIs (e.g. libpcap), and 5-8 Mpps in netmap mode.

2. (kernel) add support for multiple memory allocators, so we can
  better partition physical and virtual interfaces giving access
  to separate users. The most visible effect is one additional
  argument to the various kernel functions to compute buffer
  addresses. All netmap-supported drivers are affected, but changes
  are mechanical and trivial

3. (kernel) simplify the prototype for *txsync() and *rxsync()
  driver methods. All netmap drivers affected, changes mostly mechanical.

4. add support for netmap-monitor ports. Think of it as a mirroring
  port on a physical switch: a netmap monitor port replicates traffic
  present on the main port. Restrictions apply. Drive carefully.

5. if_lem.c: support for various paravirtualization features,
  experimental and disabled by default.
  Most of these are described in our ANCS'13 paper [1].
  Paravirtualized support in netmap mode is new, and beats the
  numbers in the paper by a large factor (under qemu-kvm,
  we measured gues-host throughput up to 10-12 Mpps).

A lot of refactoring and additional documentation in the files
in sys/dev/netmap, but apart from #2 and #3 above, almost nothing
of this stuff is visible to other kernel parts.

Example programs in tools/tools/netmap have been updated with bugfixes
and to support more of the existing features.

This is meant to go into 10.1 so we plan an MFC before the Aug.22 deadline.

A lot of this code has been contributed by my colleagues at UNIPI,
including Giuseppe Lettieri, Vincenzo Maffione, Stefano Garzarella.

MFC after:	3 days.
2014-08-16 15:00:01 +00:00
Adrian Chadd
fa4be7cc42 Fix the igb(4) redirection table to correctly populate.
This is similar to the ixgbe(4) fix.

Tested:

* Intel I350 gigabit adapter
2014-07-23 05:40:28 +00:00
Hiren Panchasara
eee92ad073 The description is a bit misleading. Trying to make it more obvious.
Phabric:    https://phabric.freebsd.org/D435
Reviewed by:	gnn
2014-07-18 16:25:35 +00:00
Rick Macklem
e2ade3b6f7 Move the "retry:" label so that the calls to m_pullup() are
not done after the call to m_defrag(). This fixes a problem
where m_pullup() would prepend an mbuf to the list created
by m_defrag() making the chain greater than 32 again.

Tested by:	rcarter@pinyon.org
Reviewed by:	yongari, jfv
MFC after:	2 weeks
2014-07-15 23:32:13 +00:00
Mark Johnston
58e6549541 Correct the setting of the VID in transmit descriptors when hardware VLAN
tagging is enabled. This was broken in r266978.

Reported by:	gjb
Tested by:	gjb
2014-07-10 16:46:46 +00:00
Adrian Chadd
8c0d2adf3f Initialise these variables so gcc doesn't complain.
Submitted by:	luigi
2014-06-30 23:34:36 +00:00
Adrian Chadd
1d72a9bea9 Add initial RSS awareness to the igb(4) driver.
The igb(4) hardware is capable of RSS hashing RX packets and doing RSS
queue selection for up to 8 queues.  (I believe some hardware is limited
to 4 queues, but I haven't tested on that.)

However, even if multi-queue is enabled for igb(4), the RX path doesn't use
the RSS flowid from the received descriptor.  It just uses the MSIX queue id.

This patch does a handful of things if RSS is enabled:

* Instead of using a random key at boot, fetch the RSS key from the RSS code
  and program that in to the RSS redirection table.

  That whole chunk of code should be double checked for endian correctness.

* Use the RSS queue mapping to CPU ID to figure out where to thread pin
  the RX swi thread and the taskqueue threads for each queue.

* The software queue is now really an "RSS bucket".

* When programming the RSS indirection table, use the RSS code to
  figure out which RSS bucket each slot in the indirection table maps
  to.

* When transmitting, use the flowid RSS mapping if the mbuf has
  an RSS aware hash.  The existing method wasn't guaranteed to align
  correctly with the destination RSS bucket (and thus CPU ID.)

This code warns if the number of RSS buckets isn't the same as the
automatically configured number of hardware queues.  The administrator
will have to tweak one of them for better performance.

There's currently no way to re-balance the RSS indirection table after
startup.  I'll worry about that later.

Additionally, it may be worthwhile to always use the full 32 bit flowid if
multi-queue is enabled.  It'll make things like lagg(4) behave better with
respect to traffic distribution.
2014-06-30 04:34:59 +00:00
Hans Petter Selasky
af3b2549c4 Pull in r267961 and r267973 again. Fix for issues reported will follow. 2014-06-28 03:56:17 +00:00
Glen Barber
37a107a407 Revert r267961, r267973:
These changes prevent sysctl(8) from returning proper output,
such as:

 1) no output from sysctl(8)
 2) erroneously returning ENOMEM with tools like truss(1)
    or uname(1)
 truss: can not get etype: Cannot allocate memory
2014-06-27 22:05:21 +00:00
Hans Petter Selasky
3da1cf1e88 Extend the meaning of the CTLFLAG_TUN flag to automatically check if
there is an environment variable which shall initialize the SYSCTL
during early boot. This works for all SYSCTL types both statically and
dynamically created ones, except for the SYSCTL NODE type and SYSCTLs
which belong to VNETs. A new flag, CTLFLAG_NOFETCH, has been added to
be used in the case a tunable sysctl has a custom initialisation
function allowing the sysctl to still be marked as a tunable. The
kernel SYSCTL API is mostly the same, with a few exceptions for some
special operations like iterating childrens of a static/extern SYSCTL
node. This operation should probably be made into a factored out
common macro, hence some device drivers use this. The reason for
changing the SYSCTL API was the need for a SYSCTL parent OID pointer
and not only the SYSCTL parent OID list pointer in order to quickly
generate the sysctl path. The motivation behind this patch is to avoid
parameter loading cludges inside the OFED driver subsystem. Instead of
adding special code to the OFED driver subsystem to post-load tunables
into dynamically created sysctls, we generalize this in the kernel.

Other changes:
- Corrected a possibly incorrect sysctl name from "hw.cbb.intr_mask"
to "hw.pcic.intr_mask".
- Removed redundant TUNABLE statements throughout the kernel.
- Some minor code rewrites in connection to removing not needed
TUNABLE statements.
- Added a missing SYSCTL_DECL().
- Wrapped two very long lines.
- Avoid malloc()/free() inside sysctl string handling, in case it is
called to initialize a sysctl from a tunable, hence malloc()/free() is
not ready when sysctls from the sysctl dataset are registered.
- Bumped FreeBSD version to indicate SYSCTL API change.

MFC after:	2 weeks
Sponsored by:	Mellanox Technologies
2014-06-27 16:33:43 +00:00
Jack F Vogel
8cc64f1e21 Sync the E1000 shared code with Intel internal, this adds fixes,
and more importantly, new I218 adapter support to the em driver.

MFC after: 1 week
2014-06-26 21:33:32 +00:00
John Baldwin
46e89834dc - Don't compare bus_dma map pointers for static DMA allocations against
NULL to determine if bus_dmamap_unload() or bus_dmamem_free() should be
  called.  Instead, check the associated bus and virtual addresses.
- Don't clear static DMA maps to NULL.

Reviewed by:	jfv
2014-06-12 11:15:19 +00:00
Luigi Rizzo
c7156fe92f make sure if_transmit returns 0 if the mbuf is enqueued.
ixgbe/ixv.c still needs a similar fix but it takes a little
more restructuring of the code.

MFC after:	3 days
2014-06-06 20:49:56 +00:00
Marcel Moolenaar
9e11529015 Convert em(4) to use the driver API.
Submitted by:   Anuranjan Shukla <anshukla@juniper.net>
Obtained from:  Juniper Networks, Inc.
2014-06-02 18:52:03 +00:00
Luigi Rizzo
0d88706547 reference the correct variable in a comment
MFC after:	3 days
2014-05-28 06:50:16 +00:00
Eitan Adler
eb0a187849 e1000: add missing braces
Obtained from:	DragonFlyBSD
2014-05-26 02:19:50 +00:00
George V. Neville-Neil
e1cda2b313 The timestamp bit is number 17, and not number 9, in the stat error
field of the receive descriptor.

MFC after:	1 week
2014-01-30 18:32:33 +00:00
Gleb Smirnoff
3dbdfe820b Fix compilation with IGB_LEGACY_TX defined.
PR:		185909
Submitted by:	Aurelien Rougemont <beorn binaries.fr>
2014-01-25 20:39:23 +00:00
Luigi Rizzo
17885a7bfd It is 2014 and we have a new version of netmap.
Most relevant features:

- netmap emulation on any NIC, even those without native netmap support.

  On the ixgbe we have measured about 4Mpps/core/queue in this mode,
  which is still a lot more than with sockets/bpf.

- seamless interconnection of VALE switch, NICs and host stack.

  If you disable accelerations on your NIC (say em0)

        ifconfig em0 -txcsum -txcsum

  you can use the VALE switch to connect the NIC and the host stack:

        vale-ctl -h valeXX:em0

  allowing sharing the NIC with other netmap clients.

- THE USER API HAS SLIGHTLY CHANGED (head/cur/tail pointers
  instead of pointers/count as before). This was unavoidable to support,
  in the future, multiple threads operating on the same rings.
  Netmap clients require very small source code changes to compile again.
      On the plus side, the new API should be easier to understand
  and the internals are a lot simpler.

The manual page has been updated extensively to reflect the current
features and give some examples.

This is the result of work of several people including Giuseppe Lettieri,
Vincenzo Maffione, Michio Honda and myself, and has been financially
supported by EU projects CHANGE and OPENLAB, from NetApp University
Research Fund, NEC, and of course the Universita` di Pisa.
2014-01-06 12:53:15 +00:00
Luigi Rizzo
7091cd69d0 use the correct netmap <-> nic slot mapping on the transmit ring for 'lem'.
This bug would manifest only in netmap mode and on packets transmitted after
a NIC reset while netmap mode is active.

MFC after:	3 days
2013-12-26 05:22:38 +00:00
Eitan Adler
7a22215c53 Fix undefined behavior: (1 << 31) is not defined as 1 is an int and this
shifts into the sign bit.  Instead use (1U << 31) which gets the
expected result.

This fix is not ideal as it assumes a 32 bit int, but does fix the issue
for most cases.

A similar change was made in OpenBSD.

Discussed with:	-arch, rdivacky
Reviewed by:	cperciva
2013-11-30 22:17:27 +00:00
Konstantin Belousov
d480f5b820 Fix several issues with the busdma(9) KPI use in the e1000 drivers.
The problems do not affect bouncing busdma in a visible way, but are
critical for the dmar backend.

- The bus_dmamap_create(9) is not documented to take BUS_DMA_NOWAIT flag.
- Unload descriptor map after receive.
- Do not reset descriptor map to NULL, bus_dmamap_load(9) requires
  valid map, and also this leaks the map.

Reported and tested by:	pho
Approved by:	jfv
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2013-11-02 09:16:11 +00:00
Luigi Rizzo
ce3ee1e7c4 update to the latest netmap snapshot.
This includes the following:
- use separate memory regions for VALE ports
- locking fixes
- some simplifications in the NIC-specific routines
- performance improvements for the VALE switch
- some new features in the pkt-gen test program
- documentation updates

There are small API changes that require programs to be recompiled
(NETMAP_API has been bumped so you will detect old binaries at runtime).

In particular:
- struct netmap_slot now is 16 bytes to support an extra pointer,
  which may save one data copy when using VALE ports or VMs;
- the struct netmap_if has two extra fields;

MFC after:	3 days
2013-11-01 21:21:14 +00:00
Gleb Smirnoff
76039bc84f The r48589 promised to remove implicit inclusion of if_var.h soon. Prepare
to this event, adding if_var.h to files that do need it. Also, include
all includes that now are included due to implicit pollution via if_var.h

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2013-10-26 17:58:36 +00:00
Jack F Vogel
7609433eb6 Update the Intel igb driver to version 2.4.0
- This version has support for the new Intel Avoton systems,
including 2.5Gb support, further it now has IPv6/TSO6 support as
well. Shared code has been updated where necessary as well. Thanks
to my new assistant Eric Joyner for doing the transmit path changes
to bring in the IPv6/TSO6 support. Thanks to Gleb for catching the
one bug and change needed in NETMAP.

Approved by: re
2013-10-09 17:32:52 +00:00
Hiren Panchasara
5b9d734b08 Expose system level ixgbe sysctls.
Device level sysctls are already exposed as dev.ix.<device>

Fixing the case where number of queues for igb is auto-tuned and
hw.igb.num_queues does not return current/updated value.

Reviewed by:	jfv
Approved by:	re (delphij)
MFC after:	2 weeks
2013-10-05 19:17:56 +00:00
Andre Oppermann
1b4381afbb Restructure the mbuf pkthdr to make it fit for upcoming capabilities and
features.  The changes in particular are:

o Remove rarely used "header" pointer and replace it with a 64bit protocol/
  layer specific union PH_loc for local use.  Protocols can flexibly overlay
  their own 8 to 64 bit fields to store information while the packet is
  worked on.

o Mechanically convert IP reassembly, IGMP/MLD and ATM to use pkthdr.PH_loc
  instead of pkthdr.header.

o Extend csum_flags to 64bits to allow for additional future offload
  information to be carried (e.g. iSCSI, IPsec offload, and others).

o Move the RSS hash type enumerator from abusing m_flags to its own 8bit
  rsstype field.  Adjust accessor macros.

o Add cosqos field to store Class of Service / Quality of Service information
  with the packet.  It is not yet supported in any drivers but allows us to
  get on par with Cisco/Juniper in routing applications (plus MPLS QoS) with
  a modernized ALTQ.

o Add four 8 bit fields l[2-5]hlen to store the relative header offsets
  from the start of the packet.  This is important for various offload
  capabilities and to relieve the drivers from having to parse the packet
  and protocol headers to find out location of checksums and other
  information.  Header parsing in drivers is a lot of copy-paste and
  unhandled corner cases which we want to avoid.

o Add another flexible 64bit union to map various additional persistent
  packet information, like ether_vtag, tso_segsz and csum fields.
  Depending on the csum_flags settings some fields may have different usage
  making it very flexible and adaptable to future capabilities.

o Restructure the CSUM flags to better signify their outbound (down the
  stack) and inbound (up the stack) use.  The CSUM flags used to be a bit
  chaotic and rather poorly documented leading to incorrect use in many
  places.  Bring clarity into their use through better naming.
  Compatibility mappings are provided to preserve the API.  The drivers
  can be corrected one by one and MFC'd without issue.

o The size of pkthdr stays the same at 48/56bytes (32/64bit architectures).

Sponsored by:	The FreeBSD Foundation
2013-08-24 19:51:18 +00:00
Jack F Vogel
83cef45266 Alter the mq_start routine to do a TRYLOCK and call to the locked routine
rather than just queueing. The former code was an attempt at getting
UDP performance up, but there have been customer reports of problems with it,
so the ixgbe approach seems the best solution for now.
2013-08-13 00:25:39 +00:00
Scott Long
c68534f1d5 Update PCI drivers to no longer look at the MEMIO-enabled bit in the PCI
command register.  The lazy BAR allocation code in FreeBSD sometimes
disables this bit when it detects a range conflict, and will re-enable
it on demand when a driver allocates the BAR.  Thus, the bit is no longer
a reliable indication of capability, and should not be checked.  This
results in the elimination of a lot of code from drivers, and also gives
the opportunity to simplify a lot of drivers to use a helper API to set
the busmaster enable bit.

This changes fixes some recent reports of disk controllers and their
associated drives/enclosures disappearing during boot.

Submitted by:	jhb
Reviewed by:	jfv, marius, achadd, achim
MFC after:	1 day
2013-08-12 23:30:01 +00:00
Jack F Vogel
4dc63104ae Improve the MSIX setup code in the drivers, thanks to Marius for
the changes. Make sure that pci_alloc_msix() does give us the vectors
we need and fall back to MSI when it doesn't, also release any that
were allocated when insufficient.

MFC after: 3 days
2013-08-12 22:54:38 +00:00
Jack F Vogel
d0913b7f25 Make the various driver MSIX setup routines fallback to MSI more
gracefully. This change was suggested by Marius Strobl, thank you.

PR: kern/181016
MFC after: ASAP
2013-08-06 21:01:38 +00:00
Jack F Vogel
54a6317360 When the igb driver is static there are cases when early interrupts occur,
resulting in a panic in refresh_mbufs, to prevent this add a check in the
interrupt handler for DRV_RUNNING.

MFC after: 1 day (critical for 9.2)
2013-08-06 18:00:53 +00:00
Jack F Vogel
a1db87ec73 Change the E1000 driver option header handling to match the
ixgbe driver. As it was, when building them as a module INET
and INET6 are not defined. In these drivers it does not cause
a panic, however it does result in different behavior in the
ioctl routine when you are using a module vs static, and I
think the behavior should be the same.

MFC after: 3 days
2013-07-12 22:36:26 +00:00
Luigi Rizzo
4dc07530d7 if_lem.c: make sure that lem_rxeof() can drain the entire rx queue
irrespective of the setting of lem_rx_process_limit, while
	giving a chance to the taskqueue scheduler to act after
	each chunk.
	This makes lem_rxeof similar to the one in if_em.c and if_igb.c .

if_lem.c and if_em.c: add a sysctl to manually configure the
	'itr' moderation register.

Approved by:	Jack Vogel
2013-05-09 17:07:30 +00:00
Luigi Rizzo
1405478115 simplify the code to initialize the RDT while in netmap mode. 2013-05-09 16:57:02 +00:00
Eitan Adler
f7efb9e28e Update Intel email address.
PR:		docs/175349
Submitted by:	Lars Eggert <lars@netapp.com>
Discussed with:	jfv
2013-05-02 01:36:52 +00:00
Luigi Rizzo
9b2e4517d5 use netmap_rx_irq() and netmap_tx_irq() instead of replicating the
logic in the individual driver.
2013-04-30 16:51:58 +00:00
Luigi Rizzo
d61ba75247 use netmap_rx_irq() / netmap_tx_irq() to handle interrupts in
netmap mode, removing the logic from individual drivers.

(note: if_lem.c not updated yet due to some other pending modifications)
2013-04-30 16:18:29 +00:00
Jack F Vogel
386c110e3c Corrections to the RX checksum code, make sure its disabled as
well as enabled when necessary. And simplify the checksum routine
itself, adding UDP bit to the test. Thanks to Kevin Lo for pointing
out the problems and code suggestions.
2013-04-15 17:01:42 +00:00
Jack F Vogel
f0105d2d23 Simplify allocate_legacy code, txr pointer was breaking LEGACY compile,
thanks to Nick Rogers for pointing this out.
2013-04-10 17:51:39 +00:00
Jack F Vogel
3b0b7ffbb9 Correct the multicast handling in the E1000 drivers as was
done in ixgbe, thanks to Mike Karels for this fix. When exiting
promiscuous mode MPE bit was being unconditionally cleared, this
should not be done if we are in MAX multicast groups.
2013-04-03 23:39:54 +00:00
Sean Bruno
8e3ff376cf Update man page for igb(4) with a little bit of information about
hw.igb.num_queues for those so inclined.

PR:		kern/177384
Submitted by:	hiren.panchasara@gmail.com
Reviewed by:	sbruno@
Approved by:	jfv@
Obtained from:	Yahoo! Inc.
MFC after:	2 weeks
2013-04-03 21:55:19 +00:00
Jack F Vogel
be2095895a Change the define in the header to eliminate unnecessary data
when using LEGACY TX.
2013-03-29 18:46:13 +00:00
Jack F Vogel
c05891a6da Change defines in the igb driver to allow an easier selection of
the older if_start/non-multiqueue interface from the stack. This
is not the default, but can be turned on in the Makefile now regardless
of the OS level to allow either testing or use of ALTQ.

MFC after: one week
2013-03-29 18:25:45 +00:00
Jack F Vogel
6ab6bfe32f Refresh on the shared code for the E1000 drivers.
- bear with me, there are lots of white space changes, I would not
    do them, but I am a mere consumer of this stuff and if these drivers
    are to stay in shape they need to be taken.

em driver changes: support for the new i217/i218 interfaces

igb driver changes:
  - TX mq start has a quick turnaround to the stack
  - Link/media handling improvement
  - When link status changes happen the current flow control state
    will now be displayed.
  - A few white space/style changes.

lem driver changes:
  - the shared code uncovered a bogus write to the RLPML register
    (which does not exist in this hardware) in the vlan code,this
    is removed.
2013-02-21 00:25:45 +00:00
Randall Stewart
ded5ea6a25 This fixes a out-of-order problem with several
of the newer drivers. The basic problem was
that the driver was pulling the mbuf off the
drbr ring and then when sending with xmit(), encounting
a full transmit ring. Thus the lower layer
xmit() function would return an error, and the
drivers would then append the data back on to the ring.
For TCP this is a horrible scenario sure to bring
on a fast-retransmit.

The fix is to use drbr_peek() to pull the data pointer
but not remove it from the ring. If it fails then
we either call the new drbr_putback or drbr_advance
method. Advance moves it forward (we do this sometimes
when the xmit() function frees the mbuf). When
we succeed we always call advance. The
putback will always copy the mbuf back to the top
of the ring. Note that the putback *cannot* be used
with a drbr_dequeue() only with drbr_peek(). We most
of the time, in putback, would not need to copy it
back since most likey the mbuf is still the same, but
sometimes xmit() functions will change the mbuf via
a pullup or other call. So the optimial case for
the single consumer is to always copy it back. If
we ever do a multiple_consumer (for lagg?) we
will  need a test and atomic in the put back possibly
a seperate putback_mc() in the ring buf.

Reviewed by:	jhb@freebsd.org, jlv@freebsd.org
2013-02-07 15:20:54 +00:00
Sofian Brabez
61bfd86762 Use DEVMETHOD_END macro defined in sys/bus.h instead of {0, 0} sentinel on device_method_t arrays
Reviewed by:	cognet
Approved by:	cognet
2013-01-30 18:01:20 +00:00
Steven Hartland
31e85bd9cd Fixed mbuf free when receive structures fail to allocate.
This prevents quad igb card on high core machines, without any nmbcluster or
igb queue tuning wedging the boot process if all nics are configured.

Reviewed by:	jfv
Approved by:	pjd (mentor)
MFC after:	1 week
2013-01-12 16:05:55 +00:00
Gleb Smirnoff
c6499eccad Mechanically substitute flags from historic mbuf allocator with
malloc(9) flags in sys/dev.
2012-12-04 09:32:43 +00:00
Gleb Smirnoff
9c402aeb41 drbr_enqueue() awlays consumes mbuf, no matter did it
fail or not. The mbuf pointer is no longer valid, so
can't be reused after.

  Fix igb_mq_start() where mbuf pointer was used after
drbr_enqueue().

  This eventually leads us to all invocations of
igb_mq_start_locked() called with third argument as NULL.
This allows us to simplify this function.

Submitted by:	Karim Fodil-Lemelin <fodillemlinkarim gmail.com>
Reviewed by:	jfv
2012-11-26 20:03:57 +00:00
Eitan Adler
2da1951583 Now that device disabling is generic, remove extraneous code from the
device drivers that used to provide this feature.

This is a subset of 241856 (which was reverted)

Reviewed by:	des
Approved by:	cperciva (implicit)
MFC after:	1 week
2012-10-22 22:29:48 +00:00
Eitan Adler
a8de37b024 This isn't functionally identical. In some cases a hint to disable
unit 0 would in fact disable all units.

This reverts r241856

Approved by: cperciva (implicit)
2012-10-22 13:06:09 +00:00
Eitan Adler
76b7512247 Now that device disabling is generic, remove extraneous code from the
device drivers that used to provide this feature.

Reviewed by:	des
Approved by:	cperciva
MFC after:	1 week
2012-10-22 03:41:14 +00:00
Eitan Adler
db702c59cf remove duplicate semicolons where possible.
Approved by:	cperciva
MFC after:	1 week
2012-10-22 03:00:37 +00:00
Gleb Smirnoff
063efed28c The drbr(9) API appeared to be so unclear, that most drivers in
tree used it incorrectly, which lead to inaccurate overrated
if_obytes accounting. The drbr(9) used to update ifnet stats on
drbr_enqueue(), which is not accurate since enqueuing doesn't
imply successful processing by driver. Dequeuing neither mean
that. Most drivers also called drbr_stats_update() which did
accounting again, leading to doubled if_obytes statistics. And
in case of severe transmitting, when a packet could be several
times enqueued and dequeued it could have been accounted several
times.

o Thus, make drbr(9) API thinner. Now drbr(9) merely chooses between
  ALTQ queueing or buf_ring(9) queueing.
  - It doesn't touch the buf_ring stats any more.
  - It doesn't touch ifnet stats anymore.
  - drbr_stats_update() no longer exists.

o buf_ring(9) handles its stats itself:
  - It handles br_drops itself.
  - br_prod_bytes stats are dropped. Rationale: no one ever
    reads them but update of a common counter on every packet
    negatively affects performance due to excessive cache
    invalidation.
  - buf_ring_enqueue_bytes() reduced to buf_ring_enqueue(), since
    we no longer account bytes.

o Drivers handle their stats theirselves: if_obytes, if_omcasts.

o mlx4(4), igb(4), em(4), vxge(4), oce(4) and  ixv(4) no longer
  use drbr_stats_update(), and update ifnet stats theirselves.

o bxe(4) was the most correct driver, it didn't call
  drbr_stats_update(), thus it was the only driver accurate under
  moderate load. Now it also maintains stats itself.

o ixgbe(4) had already taken stats from hardware, so just
  - drop software stats updating.
  - take multicast packet count from hardware as well.

o mxge(4) just no longer needs NO_SLOW_STATS define.

o cxgb(4), cxgbe(4) need no change, since they obtain stats
  from hardware.

Reviewed by:	jfv, gnn
2012-09-28 18:28:27 +00:00
John Baldwin
aceb040376 Merge similar fixes from 223198 from igb to ixgbe:
- Use a dedicated task to handle deferred transmits from the if_transmit
  method instead of reusing the existing per-queue interrupt task.
  Reusing the per-queue interrupt task could result in both an interrupt
  thread and the taskqueue thread trying to handle received packets on a
  single queue resulting in out-of-order packet processing and lock
  contention.
- Don't define ixgbe_start() at all where if_transmit is used.

Tested by:	Vijay Singh
Reviewed by:	jfv
MFC after:	2 weeks
2012-09-26 18:11:43 +00:00
Sean Bruno
126a39ce60 This patch fixes a nit in the em, lem, and igb driver statistics. Increment
adapter->dropped_pkts instead of if_ierrors because if_ierrors is
overwritten by hw stats collection.

Submitted by:	Andrew Boyer <aboyer@averesystems.com>
Reviewed by:	Jack F Vogel <jfv@freebsd.org>
MFC after:	2 weeks
2012-09-23 22:53:39 +00:00
Gavin Atkinson
e935190a33 Switch some PCI register reads from using magic numbers to using the names
defined in pcireg.h

MFC after:	1 week
2012-09-19 12:27:23 +00:00
Gavin Atkinson
389c8bd51e Align the PCI Express #defines with the style used for the PCI-X
#defines.  This also has the advantage that it makes the names more
compact, iand also allows us to correct the non-uniform naming of
the PCIM_LINK_* defines, making them all consistent amongst themselves.

This is a mostly mechanical rename:
  s/PCIR_EXPRESS_/PCIER_/g
  s/PCIM_EXP_/PCIEM_/g
  s/PCIM_LINK_/PCIEM_LINK_/g

When this is MFC'd, #defines will be added for the old names to assist
out-of-tree drivers.

Discussed with:	jhb
MFC after:	1 week
2012-09-18 22:04:59 +00:00
Eitan Adler
96240c89f0 Correct double "the the"
Approved by:	cperciva
MFC after:	3 days
2012-09-14 21:28:56 +00:00
Jack F Vogel
252781f47d Customer report of a panic on boot due to the old
"m_getjcl:invalid cluster type" that occurred some
time back with the igb driver. This happens often when
booting over the net. I believe the NIC hardware is left
in a warm state when handed over to the driver, and a stray
RX interrupt happens earlier than the code is prepared for
it to happen. This change was verified to fix the problem,
its kind of a bandaid... but it is similar to what was done
in the igb code.
2012-08-15 17:12:40 +00:00
Jack F Vogel
724f79462b Make the polling interface in igb able to handle
multiqueue, and correct the rxdone handling. Update
the polling man page to include igb as well.

Thanks to Mark Johnston for these changes.
2012-08-06 22:43:49 +00:00
Jack F Vogel
6aa4d618ca Correct the mq_start routine to avoid out-of-order
packet delivery, always enqueue when possible. Also
correct the DEPLETED test as multiple bits might be
set.  Thanks to Randall Stewart for the changes!
2012-08-06 20:44:05 +00:00
Sean Bruno
8844c80848 CPU_NEXT() already handles wrapping around to the beginning. Also, in a
system with sparse CPU IDs, you can have a valid CPU ID > mp_ncpus (e.g. if
you have two CPUs 0 and 4, with mp_maxid == 4 and mp_ncpus == 2).

Introduced at svn r235210

Submitted by:	jhb@
Reviewed by:	jfv@
2012-08-02 00:00:34 +00:00
Jack F Vogel
b4750260cd Clean up some unused leftover code from em
Make IRQ style a tuneable
Fix lock handling in the interrupt handler

MFC after:3 days
2012-07-31 18:44:10 +00:00
Luigi Rizzo
fc1fa1f2fe remove some extra testing code that slipped into the previous commit
Reported-by: Alexander Motin
2012-07-25 12:51:33 +00:00
Luigi Rizzo
8d1717963b Use legacy interrupts as a default. This gives up to 10% speedup
when used in qemu (and this driver is for non-PCIe cards,
so probably its largest use is in virtualized environments).

Approved by:	Jack Vogel
MFC after:	3 days
2012-07-25 11:28:15 +00:00
Jack F Vogel
fcc144ad4e Change the interface to the Energy Efficient Ethernet (EEE)
setting in the igb and em driver. This was necessitated by
a shared code change that I was given late in the game, a data
type changed from bool to int, in the last update I dealt with
it by a cast, but it was pointed out (thanks jhb) that there
was a potential problem with this. John suggested this safer
approach, and it is fine with me...

MFC after:2 days (to catch the 9.1 update)
2012-07-07 20:21:05 +00:00
Jack F Vogel
996922aeee Correct small regressions pointed out by jhb, thanks John.
MFC after:5 days
2012-07-05 23:36:17 +00:00
Jack F Vogel
ab5d036272 Sync with Intel internal source:
shared code update and small changes in core required
Add support for new i210/i211 devices
Improve queue calculation based on mac type

MFC after:5 days
2012-07-05 20:26:57 +00:00
John Baldwin
03b0ca8b28 Commit a portion of 233708 I missed earlier and don't include the
definition of igb_start() and igb_start_locked() (nor set if_start in
the ifnet) when igb(4) uses if_transmit.
2012-06-01 15:52:41 +00:00
Kevin Lo
4d8b94d278 Initialize "error" to zero when it's declared in em_setup_receive_ring() 2012-05-11 03:15:22 +00:00
Sean Bruno
daf8162d1f Modify the binding of queues to attach to as many CPUs
as possible when using more than one igb(4) adapter.  This
means that queues will not be bound to the same CPUs if
there are more CPUs availble.

This is only applicable to a system that has multiple interfaces.

Obtained from:	Yahoo! Inc.
MFC after:	3 days
2012-05-10 00:00:28 +00:00
Ed Maste
a3d2552747 Fix cut-and-paste comment error
Submitted by:	sbruno
2012-04-25 02:05:14 +00:00
John Baldwin
8546e82467 Reapply r223198 which was reverted in the previous vendor import. Some
portions were already reapplied in r233708:
- Use a dedicated task to handle deferred transmits from the if_transmit
  method instead of reusing the existing per-queue interrupt task.
  Reusing the per-queue interrupt task could result in both an interrupt
  thread and the taskqueue thread trying to handle received packets on a
  single queue resulting in out-of-order packet processing.
- Call ether_ifdetach() earlier in igb_detach().
- Drain tasks and free taskqueues during igb_detach().

MFC after:	1 week
2012-04-11 21:33:45 +00:00
John Baldwin
d8a8648379 Fix a few issues with transmit handling in em(4) and igb(4):
- Do not define the foo_start() methods or set if_start in the ifnet if
  multiq transmit is enabled.  Also, set if_transmit and if_qflush before
  ether_ifattach rather than after when multiq transmit is enabled.  This
  helps to ensure that the drivers never try to mix different transmit
  methods.
- Properly restart transmit during resume.  igb(4) was not restarting it
  at all, and em(4) was restarting even if the link was down and was
  calling the wrong method if multiq transmit was enabled.
- Remove all the 'more' handling for transmit completions.  Transmit
  completion processing does not have a processing limit, so it always
  runs to completion and never has more work to do when it returns.
  Instead, the previous code was returning 'true' anytime there were
  packets in the queue that weren't still in the process of being
  transmitted.  The effect was that the driver would continuously
  reschedule a task to process TX completions in effect running at 100%
  CPU polling the hardware until it finished transmitting all of the
  packets in the ring.  Now it will just wait for the next TX completion
  interrupt.
- Restart packet transmission when the link becomes active.
- Fix the MSI-X queue interrupt handlers to restart packet transmission if
  there are pending packets in the relevant software queue (IFQ or buf_ring)
  after processing TX completions.  This is the root cause for the OACTIVE
  hangs as if the MSI-X queue handler drained all the pending packets from
  the TX ring, nothing would ever restart it.  As such, remove some
  previously-added workarounds to reschedule a task to poll the TX ring
  anytime OACTIVE was set.

Tested by:	sbruno
Reviewed by:	jfv
MFC after:	1 week
2012-03-30 19:54:48 +00:00
Marius Strobl
b3f46df007 Initialize the mutexes used for the NVM and the swflag as MTX_DUPOK in
order to avoid otherwise harmless witness warnings when these are acquired
at the same time and due to both using MTX_NETWORK_LOCK as their type.
The right fix actually would be to use different, descriptive types for
these. However, the latter would require undesirable changes to the shared
code base. Another approach would be to just supply NULL as the type, which
was deemed as less desirable though as it would cause the unique but cryptic
name also to be used for the type and to diverge from the type used by other
network device drivers.

MFC after:	1 week
2012-03-24 15:15:34 +00:00
John Baldwin
c3173381be Properly handle failures in igb_setup_msix() by returning 0 if MSI or MSI-X
allocation fails.

Reviewed by:	jfv
MFC after:	2 weeks
2012-03-01 22:13:10 +00:00
Luigi Rizzo
64ae02c365 A bunch of netmap fixes:
USERSPACE:
1. add support for devices with different number of rx and tx queues;

2. add better support for zero-copy operation, adding an extra field
   to the netmap ring to indicate how many buffers we have already processed
   but not yet released (with help from Eddie Kohler);

3. The two changes above unfortunately require an API change, so while
   at it add a version field and some spares to the ioctl() argument
   to help detect mismatches.

4. update the manual page for the two changes above;

5. update sample applications in tools/tools/netmap

KERNEL:

1. simplify the internal structures moving the global wait queues
   to the 'struct netmap_adapter';

2. simplify the functions that map kring<->nic ring indexes

3. normalize device-specific code, helps mainteinance;

4. start exploring the impact of micro-optimizations (prefetch etc.)
   in the ixgbe driver.
   Use 'legacy' descriptors on the tx ring and prefetch slots gives
   about 20% speedup at 900 MHz. Another 7-10% would come from removing
   the explict calls to bus_dmamap* in the core (they are effectively
   NOPs in this case, but it takes expensive load of the per-buffer
   dma maps to figure out that they are all NULL.

   Rx performance not investigated.

I am postponing the MFC so i can import a few more improvements
before merging.
2012-02-27 19:05:01 +00:00
Luigi Rizzo
5644ccec61 (This commit only touches code within the DEV_NETMAP blocks)
Introduce some functions to map NIC ring indexes into netmap ring
indexes and vice versa. This way we can implement the bound
checks only in one place (and hopefully in a correct way).

On passing, make the code and comments more uniform across the
various drivers.
2012-02-15 23:13:29 +00:00
Eitan Adler
c6618b7f79 GS105v3 exhibit the same behavior
PR:			docs/135999
Submitted by:	Boris Kochergin <spawky@acm.poly.edu>
No objection from:	jfv
Approved by:	cperciva
MFC after:		3 days
2012-01-29 14:52:42 +00:00
Luigi Rizzo
ce9f43b467 clear the pointer after freeing the mbuf. Without that, we
risk a double free if the subsequent mbuf allocation fails.
This bug is not netmap-related and was introduced in  rev. 228387
2012-01-12 17:30:44 +00:00
Luigi Rizzo
467bd5c2cb fix the initialization of the rings when netmap is used,
to adapt it to the changes in  228387 .
Now the code is similar to the one used in other drivers.
Not applicable to stable/9 and stable/8
2012-01-12 17:28:00 +00:00
Luigi Rizzo
6e10c8b8c5 small code cleanup in preparation for future modifications in
the memory allocator used by netmap. No functional change,
two small bug fixes:
- in if_re.c add a missing bus_dmamap_sync()
- in netmap.c comment out a spurious free() in an error handling block
2012-01-10 19:57:23 +00:00
Kevin Lo
5bbe0c5357 ether_ifattach() sets if_mtu to ETHERMTU, don't bother set it again
Reviewed by:	yongari
2012-01-07 09:41:57 +00:00
Robert Watson
19d52de5f4 When extracting the VLAN tag from if_em and if_lem receive descriptor
rings, copy the whole VLAN tag, not just the VLAN ID.  This fixes a
problem in which VLAN priority information was dropped when using
offloaded VLAN processing with these drivers.

Discussed with:	jfv, rrs
Sponsored by:	ADARA Networks, Inc.
MFC after:	3 days
2012-01-05 17:30:15 +00:00
Luigi Rizzo
2d0d326d91 put back netmap support, deleted by mistake in a previous commit 2011-12-22 15:33:41 +00:00
John Baldwin
ef93f57495 Restore the sysctl changes from 223676 and 227309 lost in the previous
import:
- Add read-only sysctls for all of the tunables supported by the igb and
  em drivers.
- Make the per-instance 'enable_aim' sysctl truly per-instance by having it
  change a per-instance variable (which is used to control AIM) rather
  than having all of the per-instance sysctls operate on a single global
  variable.

While here, restore the previously existing hw.igb.rx_processing_limit
tunable as it is very useful to be able to set a default tunable that
applies to all adapters in the system.
2011-12-21 20:10:11 +00:00
Matthew D Fleming
30a497c860 Consistently use types in e1000 driver code:
- Two struct members eee_disable are used in a function that expects
   an int *, so declare them int, not bool.
 - igb_tx_ctx_setup() returns a boolean value, so declare it bool, not int.
 - igb_header_split is passed to TUNABLE_INT, so delcare it int, not bool.
 - igb_tso_setup() returns a bool, so declare it bool, not boolean_t.
 - Do not re-define bool/true/false if the symbols already exist.

MFC after:	2 weeks
Sponsored by:	Isilon Systems, LLC
2011-12-12 18:27:34 +00:00
Jack F Vogel
62aca36544 Last change still had an issue, one more time... 2011-12-11 18:46:14 +00:00
Jack F Vogel
133f283b45 Correct LINT build issues in the ioctl code. 2011-12-11 09:37:25 +00:00
Jack F Vogel
96b38ade36 Fix NETMAP code problem in the build. 2011-12-10 18:00:53 +00:00
Jack F Vogel
fd33ce416e Part 2 of 2 New deltas for the 1G drivers.
There have still been intermittent problems with apparent TX
hangs for some customers. These have been problematic to reproduce
but I believe these changes will address them. Testing on a number
of fronts have been positive.

EM: there is an important 'chicken bit' fix for 82574 in the shared
code this is supported in the core here.
    - The TX path has been tightened up to improve performance. In
      particular UDP with jumbo frames was having problems, and the
      changes here have improved that.
    - OACTIVE has been used more carefully on the theory that some
      hangs may be due to a problem in this interaction
    - Problems with the RX init code, the "lazy" allocation and
      ring initialization has been found to cause problems in some
      newer client systems, and as it really is not that big a win
      (its not in a hot path) it seems best to remove it.
    - HWTSO was broken when VLAN HWTAGGING or HWFILTER is used, I
      found this was due to an error in setting up the descriptors
      in em_xmit.

IGB:
    - TX is also improved here. With multiqueue I realized its very
      important to handle OACTIVE only under the CORE lock so there
      are no races between the queues.
    - Flow Control handling was broken in a couple ways, I have changed
      and I hope improved that in this delta.
    - UDP also had a problem in the TX path here, it was change to
      improve that.
    - On some hardware, with the driver static, a weird stray interrupt
      seems to sometimes fire and cause a panic in the RX mbuf refresh
      code. This is addressed by setting interrupts late in the init
      path, and also to set all interrupts bits off at the start of that.
2011-12-10 07:08:52 +00:00
Jack F Vogel
4dab5c3769 Part 1 of two parts, this is the shared code changes in
support of new deltas for both em and igb drivers.

Note that I am not able to track all the bugs fixed in
this code, I am a consumer of it as a component of my
core drivers. It is important to keep the FreeBSD drivers
up to date with it however.

One important note is there is a key fix for 82574 in this
update. Also, there are lots of white space changes, I am
not happy about them but have no control over it :)
2011-12-10 06:55:02 +00:00
Luigi Rizzo
579a6e3c4e add netmap support for "em", "lem", "igb" and "re".
On my hardware, "em" in netmap mode does about 1.388 Mpps
on one card (on an Asus motherboard), and 1.1 Mpps on another
card (PCIe bus). Both seem to be NIC-limited, because
i have the same rate even with the CPU running at 150 MHz.

On the "re" driver the tx throughput is around 420-450 Kpps
on various (8111C and the like) chipsets. On the Rx side
performance seems much better, and i can receive the full
load generated by the "em" cards.

"igb" is untested as i don't have the hardware.
2011-12-05 15:33:13 +00:00
Ed Schouten
6472ac3d8a Mark all SYSCTL_NODEs static that have no corresponding SYSCTL_DECLs.
The SYSCTL_NODE macro defines a list that stores all child-elements of
that node. If there's no SYSCTL_DECL macro anywhere else, there's no
reason why it shouldn't be static.
2011-11-07 15:43:11 +00:00
Eitan Adler
36daf0495a - change "is is" to "is" or "it is"
- change "the the" to "the"

Approved by:	lstewart
Approved by:	sahil (mentor)
MFC after:	3 days
2011-10-16 14:30:28 +00:00
Ryan Stone
11dc48dfde Clear transmit checksum offload context state upon lem(4) interface
initialization.  Prior to this change packets may be transmitted with an
incorrect checksum.

Em(4) already has an equivalent change in r213234.

Obtained From:  Sandvine
MFC After:      1 week
Approved by:    re (bz)
2011-09-17 13:48:09 +00:00
Jack F Vogel
980729b244 A fix to make the LINT-NOINET build happy, if this
works out the ixgbe driver should be changed as well.
2011-07-07 00:46:50 +00:00
John Baldwin
b37e0f6e9a - Add read-only sysctls for all of the tunables supported by the igb and
em drivers.
- Make the per-instance 'enable_aim' sysctl truly per-instance by having it
  change a per-instance variable (which is used to control AIM) rather
  than having all of the per-instance sysctls operate on a single global
  variable.

Reviewed by:	jfv (earlier version)
MFC after:	1 week
2011-06-29 16:20:52 +00:00
Jack F Vogel
61cf9896b6 Put back the global for rx processing due to popular demand. 2011-06-23 17:42:27 +00:00
Jack F Vogel
1c1010187a Eliminate some global tuneables in favor of adapter-specific,
particular flow control and dma coalesce. Also improve the
sysctl operation on those too.

Add IPv6 detection in the ioctl code, this was done for
ixgbe first, carrying that over.

Add resource ability to disable particular adapter.

Add HW TSO capability so vlans can make use of TSO
2011-06-20 22:59:29 +00:00
John Baldwin
1c2ed38455 - Use a dedicated task to handle deferred transmits from the if_transmit
method instead of reusing the existing per-queue interrupt task.
  Reusing the per-queue interrupt task could result in both an interrupt
  thread and the taskqueue thread trying to handle received packets on a
  single queue resulting in out-of-order packet processing.
- Don't define igb_start() at all on 8.0 and where if_transmit is used.
  Replace last remaining call to igb_start() with a loop to kick off
  transmit on each queue instead.
- Call ether_ifdetach() earlier in igb_detach().
- Drain tasks and free taskqueues during igb_detach().

Reviewed by:	jfv
MFC after:	1 week
2011-06-17 20:06:52 +00:00
Jack F Vogel
3cec53b8a7 Add an initialization to the error variable, without
this there is a rare return path that bogusly appears
to fail when it should not.  Also white space correction.

Thanks to Arnaud Lacombe for noticing the problem.
2011-05-05 17:28:45 +00:00
Jack F Vogel
efa439f740 Small change to make backporting to stable/7,
thanks to Arnaud Lacombe for suggesting it.
2011-04-28 22:21:53 +00:00
Jack F Vogel
cf696f26c9 Important update for the igb driver:
- Add the change made in em to the actual unrefreshed number
    of descriptors is used as a basis in rxeof on the way out
    to determine if more refresh is needed. NOTE: there is a
    difference in the ring setup in igb, this is not accidental,
    it is necessitated by hardware behavior, when you reset the
    newer adapters it will not let you write RDH, it ALWAYS sets
    it to 0. Thus the way em does it is not possible.
  - Change the sysctl handling of flow control, it will now make
    the change dynamically when the variable setting changes rather
    than requiring a reset.
  - Change the eee sysctl naming, validation found the old unintuitive :)
  - Last but not least, some important performance tweaks in the TX
    path, I found that UDP behavior could be drastically hindered or
    improved with just small changes in the start loop. What I have
    here is what testing has shown to be the best overall. Its interesting
    to note that changing the clean threshold to start at a full half of
    the ring, made a BIG difference in performance.  I hope that this
    will prove to be advantageous for most workloads.

MFC in a week.
2011-04-05 21:55:43 +00:00
Jack F Vogel
62d8da8c3a Fix to an error condition case, when an mbuf chain
get's defragged due to a mapping failure the header
pointers will be invalidated and can result in a
TSO or other failure down the line. So, when the
remapping occurs force a retry thru the offload
calculation code. Thanks to Andrew Boyer for discovering
this and cooking up the fix!!
2011-04-01 20:24:51 +00:00
Jack F Vogel
e61e0b91af Change the refresh_mbuf logic slightly, add an inline
to calculate the outstanding descriptors that need to be
refreshed at any time, and use THAT in rxeof to determine
if refreshing needs to be done. Also change the local_timer
to simply fire off the appropriate interrupt rather than
schedule a tasklet, its simpler.

MFC in two weeks
2011-04-01 18:48:31 +00:00
John Baldwin
3b0a4aef96 Do a sweep of the tree replacing calls to pci_find_extcap() with calls to
pci_find_cap() instead.
2011-03-23 13:10:15 +00:00
Jack F Vogel
d69437b146 A cut and paste here was wrong also. 2011-03-19 00:31:35 +00:00
Jack F Vogel
1c88efb221 Correct broken define 2011-03-19 00:19:18 +00:00
Jack F Vogel
1fd3c44f77 This delta updates the em driver to version 7.2.2 which has
been undergoing test for some weeks. This improves the RX
mbuf handling to avoid system hang due to depletion. Thanks
to all those who have been testing the code, and to Beezar
Liu for the design changes.

Next the igb driver is updated for similar RX changes, but
also to add new features support for our upcoming i350 family
of adapters.

MFC after a week
2011-03-18 18:54:00 +00:00
Rebecca Cran
6bccea7c2b Fix typos - remove duplicate "the".
PR:	bin/154928
Submitted by:	Eitan Adler <lists at eitanadler.com>
MFC after: 	3 days
2011-02-21 09:01:34 +00:00
Jack F Vogel
730d313078 Fix the shared code to be consistent with Intel-internal,
and now build.
2011-02-12 00:07:40 +00:00
Doug Barton
8c1bde8dd0 Restore 2 prototypes that seem to have been mistakenly removed in r218582.
I've manually twiddled the whitespace for e1000_commit_fc_settings_generic
to match the others in the file.

Submitted by:	dim
Tested by:	me
2011-02-11 23:08:34 +00:00
Jack F Vogel
15cb0f401a Somehow the RX ring depletion fix got partially removed,
replace the missing pieces.
2011-02-11 19:49:07 +00:00
Jack F Vogel
4b52dc6ad6 Revert changes made here, they will cause a conflict
later on with our shared code.
2011-02-11 19:03:00 +00:00
Jack F Vogel
fe954359d6 Inconsistencies in the updated igb shared code and the older
em/lem, breaking the build, correcting that.
2011-02-11 17:18:42 +00:00
Bjoern A. Zeeb
b60688be0f After r218530 export several functions which are no longer private to
e1000_mac.c but part of the e1000_api.

X-MFC with: 218530 by jfv
2011-02-11 09:58:38 +00:00
Jack F Vogel
f0ecc46d04 Add support for the new I350 family of 1G interfaces.
- this also includes virtualization support on these devices

Correct some vlan issues we were seeing in test, jumbo frames on vlans
did not work correctly, this was all due to confused logic around HW
filters, the new code should now work for all uses.

Important fix: when mbuf resources are depeleted, it was possible to
completely empty the RX ring, and then the RX engine would stall
forever. This is fixed by a flag being set whenever the refresh code
fails due to an mbuf shortage, also the local timer now makes sure
that all queues get an interrupt when it runs, the interrupt code
will then always call rxeof, and in that routine the first thing done
is now to check the refresh flag and call refresh_mbufs. This has been
verified to fix this type 'hang'. Similar code will follow in the other
drivers.

Finally, sync up shared code for the I350 support.

Thanks to everyone that has been reporting issues, and helping in the
debug/test process!!
2011-02-11 01:00:26 +00:00
Jack F Vogel
fbfbce8ae9 Fix for kern/152853, pullup at the wrong point
is breaking UDP. Thanks to Petr Lampa for the
patch.
2011-01-19 18:20:11 +00:00
Matthew D Fleming
5bc0787f29 Specify a CTLTYPE_FOO so that a future sysctl(8) change does not need
to rely on the format string.
2011-01-18 21:14:23 +00:00
Matthew D Fleming
8c49f18771 sysctl(9) cleanup checkpoint: amd64 GENERIC builds cleanly.
Commit the Intel drivers.
2011-01-12 19:53:23 +00:00
Jack F Vogel
599564e633 A couple problems discovered by Andrew Boyer:
- failure code in em_xmit got mangled along the way
     and was not properly handling errors.
   - local timer code had a leftover UNLOCK call that
     should be removed.

MFC after 3 days
2011-01-12 00:23:47 +00:00
Jack F Vogel
1ce42f7249 Correct build error. 2010-12-04 06:38:21 +00:00
Jack F Vogel
aa8dea58c6 Remove the bogus test in the TX context setup for IPV6,
the size can be smaller than the constant when you are
doing HW TAGGING, and you still need to process this
packet in a normal way. I'm not sure where the notion
to just return came from, but its wrong.

MFC after: 3 days
2010-12-04 02:04:02 +00:00
Jack F Vogel
9d43b64dbf Small cut and paste bug in flow control string fixed.
Second, correct the discard/refresh_mbufs code to behave
more like igb, there have been panics due to discards and
this should fix them.

MFC after: 3 days
2010-12-04 01:59:58 +00:00
Jack F Vogel
f4b4ba3ed2 Unwanted extra call to set_vlan_hw added back by mistake. 2010-11-26 22:36:47 +00:00
Jack F Vogel
12203744da The purpose of this change is to add a routine to
disable ASPM L0S and L1 LINK states on 82573, 82574,
and 82583. The theory is that this is behind certain
hangs being experienced by some customers.

Also included a small optimization in the rxeof routine
that was in my internal code.

Change the PBA size for pchlan, it was incorrect.

MFC after: 3 days
2010-11-24 22:24:07 +00:00
Jack F Vogel
ffdb071a24 Add shared code glue for new 82580 devices. 2010-11-24 01:13:55 +00:00
Jack F Vogel
46b9f1ff9f - New 82580 devices supported
- Fixes from John Baldwin: vlan shadow tables made per/interface,
  make vlan hw setup only happen when capability enabled, and
  finally, make a tuneable interrupt rate. Thanks John!
- Tweaked watchdog handling to avoid any false positives, now
  detection is in the TX clean path, with only the final check
  and init happening in the local timer.
- limit queues to 8 for all devices, with 82576 or 82580 on
  larger machines it can get greater than this, and it seems
  mostly a resource waste to do so. Even 8 might be high but
  it can be manually reduced.
- use 2k, 4k and now 9k clusters based on the MTU size.
- rework the igb_refresh_mbuf() code, its important to
  make sure the descriptor is rewritten even when reusing
  mbufs since writeback clobbers things.

MFC: in a few days, this delta needs to get to 8.2
2010-11-23 22:12:02 +00:00
Jack F Vogel
e4c690b4f0 Sync the lem code up with the vlan and other fixes in em.
Delete a unneeded test from the beginning of em_xmit.
CRITICAL: shared code fix for 82574, a mutex might not be
          released, this can cause hangs.
2010-11-01 20:19:25 +00:00
Jack F Vogel
35928b338e In the data setup code for doing offloads the
ip and tcp pointers were not reset after some
pullups. In practice this led to an NFS mount
failure when using UDP reported by Kevin Lo,
thanks Kevin. Fix from yongari, thank you!
2010-10-28 00:16:54 +00:00
Jack F Vogel
7deff7f9b4 Bug fix delta to the em driver:
- Chasin down bogus watchdogs has led to an improved
	  design to this handling, the hang decision takes
	  place in the tx cleanup, with only a simple report
	  check in local_timer. Our tests have shown no false
	  watchdogs with this code.
	- VLAN fixes from jhb, the shadow vfta should be per
	  interface, but as global it was not. Thanks John.
	- Bug fixes in the support for new PCH2 hardware.
	- Thanks for all the help and feedback on the driver,
	  changes to lem with be coming shortly as well.
2010-10-26 00:07:58 +00:00
Jack F Vogel
7d9119bdc4 Update code from Intel:
- Sync shared code with Intel internal
	- New client chipset support added
	- em driver - fixes to 82574, limit queues to 1 but use MSIX
	- em driver - large changes in TX checksum offload and tso
	  code, thanks to yongari.
	- some small changes for watchdog issues.
	- igb driver - local timer watchdog code was missing locking
	  this and a couple other watchdog related fixes.
	- bug in rx discard found by Andrew Boyer, check for null pointer

MFC: a week
2010-09-28 00:13:15 +00:00
John Baldwin
8385f4cf94 Tweak the stats exported by the e1000 drivers:
- Add a single sysctl procedure to all three drivers to read an arbitrary
  register (the register is passed as arg2).  Use it to replace existing
  routines in igb(4) that used a separate routine for each register, and
  to add support for missing stats in em(4) and lem(4).
- Move the 'rx_overruns' and 'watchdog_timeouts' stats out of the MAC stats
  section as they are driver stats, not MAC counters.
- Simplify the code that creates per-queue stats in igb(4) to use a single
  loop and remove duplicated code.
- Properly read all 64 bits of the 'good octets received/transmitted' in
  em(4) and lem(4).
- Actually read the interrupt count registers in em(4), and drop the
  'host to card' sysctl stats from em(4) as they are not implemented in
  any of the hardware this driver supports.
- Restore several stats to em(4) that were lost in the earlier stats
  conversion including per-queue stats.
- Export several MAC stats in em(4) that were exported in igb(4) but not
  in em(4).
- Export stats in lem(4) using individual sysctls as in em(4) and igb(4).

Reviewed by:	jfv
MFC after:	1 week
2010-09-20 16:04:44 +00:00