Commit Graph

162 Commits

Author SHA1 Message Date
Hans Petter Selasky
e936121d31 Add optimizing LRO wrapper:
- Add optimizing LRO wrapper which pre-sorts all incoming packets
  according to the hash type and flowid. This prevents exhaustion of
  the LRO entries due to too many connections at the same time.
  Testing using a larger number of higher bandwidth TCP connections
  showed that the incoming ACK packet aggregation rate increased from
  ~1.3:1 to almost 3:1. Another test showed that for a number of TCP
  connections greater than 16 per hardware receive ring, where 8 TCP
  connections was the LRO active entry limit, there was a significant
  improvement in throughput due to being able to fully aggregate more
  than 8 TCP stream. For very few very high bandwidth TCP streams, the
  optimizing LRO wrapper will add CPU usage instead of reducing CPU
  usage. This is expected. Network drivers which want to use the
  optimizing LRO wrapper needs to call "tcp_lro_queue_mbuf()" instead
  of "tcp_lro_rx()" and "tcp_lro_flush_all()" instead of
  "tcp_lro_flush()". Further the LRO control structure must be
  initialized using "tcp_lro_init_args()" passing a non-zero number
  into the "lro_mbufs" argument.

- Make LRO statistics 64-bit. Previously 32-bit integers were used for
  statistics which can be prone to wrap-around. Fix this while at it
  and update all SYSCTL's which expose LRO statistics.

- Ensure all data is freed when destroying a LRO control structures,
  especially leftover LRO entries.

- Reduce number of memory allocations needed when setting up a LRO
  control structure by precomputing the total amount of memory needed.

- Add own memory allocation counter for LRO.

- Bump the FreeBSD version to force recompilation of all KLDs due to
  change of the LRO control structure size.

Sponsored by:	Mellanox Technologies
Reviewed by:	gallatin, sbruno, rrs, gnn, transport
Tested by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D4914
2016-01-19 15:33:28 +00:00
Marius Strobl
e12a9f251e Given that em(4), lem(4) and igb(4) hardware doesn't require the
alignment guarantees provided by m_defrag(9), use m_collapse(9)
instead for performance reasons.
While at it, sanitize the statistics softc members, i. e. retire
unused ones and add SYSCTL nodes missing for actually used ones.

Differential Revision:	https://reviews.freebsd.org/D4717
2016-01-13 21:47:27 +00:00
Sean Bruno
16f8a79b72 Add support for sysctl knobs to live tune the tx packet processing limits
in igb and fix a wrap-around bug.

Reviewed by:	hiren
Obtained from:  Jason (j@nitrology.com)
MFC after:	2 weeks
Sponsored by:	LimeLight Networks
Differential Revision:	https://reviews.freebsd.org/D4039
2015-12-23 21:54:05 +00:00
Michael Tuexen
947d5c8d49 Add support for SCTP checksum offloading for the 82580 controller
similar to the 82576 controller.
Tested with Intel i340 cards.

Reviewed by:	erj@
MFC after:	1 week
2015-11-10 10:55:57 +00:00
Sean Bruno
e373323fe2 Revert 287914,287762.
Reports of breakage on igb(4) have been narrowed down to 287762 and 287914
is an dependant change.

Submitted by:	erj
2015-09-19 18:22:59 +00:00
Sean Bruno
a44aa8e030 Update em(4) with D3162 after testing further on hardware that failed
to attach with the last version of this commit. This commit fixes
attach failures on "ICH8" class devices via modifications to
e1000_init_nvm_params_ich8lan()

-   Fix compiler warning in 80003es2lan.c
-   Add return value handler for e1000_*_kmrn_reg_80003es2lan
-   Fix usage of DEBUGOUT
-   Remove unnecessary variable initializations.
-   Removed unused variables (complaints from gcc).
-   Edit defines in 82571.h.
-   Add workaround for igb hw errata.
-   Shared code changes for Skylake/I219 support.
-   Remove unused OBFF and LTR functions.

Tested by some of the folks that reported breakage in previous incarnation.
Thanks to AllanJude, gjb, gnn, tijl for tempting fate with their machines.

Submitted by:	erj@freebsd.org
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D3162
2015-09-13 18:26:05 +00:00
Sean Bruno
c597a0199e Revert last two commits to em(4)/igb(4). Reports are coming in that
this breaks initialization and reads from EEPROM on boot/driver load.

r287469 is being reverted as a dependancy on r287467
2015-09-05 21:12:19 +00:00
Sean Bruno
67ebffd348 e1000: Shared code updates
-    Fix compiler warning in 80003es2lan.c
-    Add return value handler for e1000_*_kmrn_reg_80003es2lan
-    Fix usage of DEBUGOUT
-    Remove unnecessary variable initializations.
-    Removed unused variables (complaints from gcc).
-    Edit defines in 82571.h.
-    Add workaround for igb hw errata.
-    Shared code changes for Skylake/I219 support.
-    Remove unused OBFF and LTR functions.

Differential Revision:	https://reviews.freebsd.org/D3162
Submitted by:	erj
MFC after:	1 month
Sponsored by:	Intel Corporation
2015-09-04 16:30:48 +00:00
Sean Bruno
02415af2ee igb(4): Update and fix HW errata
- HW errata workaround for IPv6 offload w/ extension headers
- Edited start of if_igb.c (Device IDs / #includes) to match ixgbe/ixl

Differential Revision:	https://reviews.freebsd.org/D3165
Submitted by:	erj
MFC after:	1 month
Sponsored by:	Intel Corporation
2015-09-04 16:07:27 +00:00
Sean Bruno
7c669ab6cc Bump all copywrite dates to 2015
Differential Revision:	https://reviews.freebsd.org/D3160
Submitted by:	erj
MFC after:	2 weeks
Sponsored by:	Intel Corportation
2015-08-16 20:13:58 +00:00
Hans Petter Selasky
577c341353 Free mbufs when busdma loading fails.
Reviewed by:	erj, sbruno
MFC after:	1 month
2015-08-01 20:40:37 +00:00
Luigi Rizzo
847bf38369 Sync netmap sources with the version in our private tree.
This commit contains large contributions from Giuseppe Lettieri and
Stefano Garzarella, is partly supported by grants from Verisign and Cisco,
and brings in the following:

- fix zerocopy monitor ports and introduce copying monitor ports
  (the latter are lower performance but give access to all traffic
  in parallel with the application)

- exclusive open mode, useful to implement solutions that recover
  from crashes of the main netmap client (suggested by Patrick Kelsey)

- revised memory allocator in preparation for the 'passthrough mode'
  (ptnetmap) recently presented at bsdcan. ptnetmap is described in
        S. Garzarella, G. Lettieri, L. Rizzo;
        Virtual device passthrough for high speed VM networking,
        ACM/IEEE ANCS 2015, Oakland (CA) May 2015
        http://info.iet.unipi.it/~luigi/research.html

- fix rx CRC handing on ixl

- add module dependencies for netmap when building drivers as modules

- minor simplifications to device-specific routines (*txsync, *rxsync)

- general code cleanup (remove unused variables, introduce macros
  to access rings and remove duplicate code,

Applications do not need to be recompiled, unless of course
they want to use the new features (monitors and exclusive open).

Those willing to try this code on stable/10 can just update the
sys/dev/netmap/*, sys/net/netmap* with the version in HEAD
and apply the small patches to individual device drivers.

MFC after:	1 month
Sponsored by:	(partly) Verisign, Cisco
2015-07-10 05:51:36 +00:00
John Baldwin
625d12c609 Various fixes to the stats in igb(4), ixgbe(4), and ixl(4).
- Use hardware counters for ifnet stats in igb(4) when possible.  This
  ensures these stats include packets that bypass the regular stack via
  netmap.
- Don't derefence values off the end of the igb(4) VF stats structure.
  Instead, add a dedicated if_get_counter method for igb(4) VF interfaces.
- Report missed packets on igb(4) as input queue drops rather than an
  input error.
- Report bug_ring drop counts as output queue drops for igb(4) and ixgbe(4).
- Export the buf_ring drop stats for individual rings via sysctl on
  ixgbe(4).
- Fix a typo that in ixl(4) that caused output queue drops to be reported
  as input queue drops and input queue drops to be unreported.

Differential Revision:	https://reviews.freebsd.org/D2402
Reviewed by:	jfv, rstone (6)
Sponsored by:	Norse Corp, Inc.
2015-04-30 18:23:38 +00:00
Hiren Panchasara
270538b2b6 For igb(4), when we are doing multiqueue, we are all setup to have full 32bit
RSS hash from the card. We do not need to hide that under "ifdef RSS" and should
expose that by default so others like lagg(4) can use that and avoid hashing the
traffic by themselves.
While here, improve comments and get rid of hidden/unimplemented RSS support
code for UDP.

Differential Revision:	https://reviews.freebsd.org/D2296
Reviewed by:	jfv, erj
Discussed with:	adrian
Sponsored by:	Limelight Networks
2015-04-21 20:24:15 +00:00
Adrian Chadd
977dc4e243 Migrate using CPU_ZERO() + CPU_SET() -> CPU_SETOF().
Tested:

* ixgbe, igb, RSS enabled

Submitted by:	jhb
Sponsored by:	Norse Corp, Inc.
2015-02-25 21:44:53 +00:00
Adrian Chadd
9756bd5982 Change uses of taskqueue_start_threads_pinned() -> taskqueue_start_threads_cpuset()
Differential Revision:	https://reviews.freebsd.org/D1897
Reviewed by:	jfv
2015-02-24 22:17:12 +00:00
Adrian Chadd
b2bdc62a95 Refactor / restructure the RSS code into generic, IPv4 and IPv6 specific
bits.

The motivation here is to eventually teach netisr and potentially
other networking subsystems a bit more about how RSS work queues / buckets
are configured so things have a hope of auto-configuring in the future.

* net/rss_config.[ch] takes care of the generic bits for doing
  configuration, hash function selection, etc;
* topelitz.[ch] is now in net/ rather than netinet/;
* (and would be in libkern if it didn't directly include RSS_KEYSIZE;
  that's a later thing to fix up.)
* netinet/in_rss.[ch] now just contains the IPv4 specific methods;
* and netinet/in6_rss.[ch] now just contains the IPv6 specific methods.

This should have no functional impact on anyone currently using
the RSS support.

Differential Revision:	D1383
Reviewed by:	gnn, jfv (intel driver bits)
2015-01-18 18:06:40 +00:00
Jack F Vogel
4da1bbcda5 Revert r275136, it was not approved, it was sloppy, if a feature
like this is needed please resubmit for Intel's approval.
2014-12-02 23:02:57 +00:00
Hans Petter Selasky
c25290420e Start process of removing the use of the deprecated "M_FLOWID" flag
from the FreeBSD network code. The flag is still kept around in the
"sys/mbuf.h" header file, but does no longer have any users. Instead
the "m_pkthdr.rsstype" field in the mbuf structure is now used to
decide the meaning of the "m_pkthdr.flowid" field. To modify the
"m_pkthdr.rsstype" field please use the existing "M_HASHTYPE_XXX"
macros as defined in the "sys/mbuf.h" header file.

This patch introduces new behaviour in the transmit direction.
Previously network drivers checked if "M_FLOWID" was set in "m_flags"
before using the "m_pkthdr.flowid" field. This check has now now been
replaced by checking if "M_HASHTYPE_GET(m)" is different from
"M_HASHTYPE_NONE". In the future more hashtypes will be added, for
example hashtypes for hardware dedicated flows.

"M_HASHTYPE_OPAQUE" indicates that the "m_pkthdr.flowid" value is
valid and has no particular type. This change removes the need for an
"if" statement in TCP transmit code checking for the presence of a
valid flowid value. The "if" statement mentioned above is now a direct
variable assignment which is then later checked by the respective
network drivers like before.

Additional notes:
- The SCTP code changes will be committed as a separate patch.
- Removal of the "M_FLOWID" flag will also be done separately.
- The FreeBSD version has been bumped.

MFC after:	1 month
Sponsored by:	Mellanox Technologies
2014-12-01 11:45:24 +00:00
Alfred Perlstein
56c14bca7e Make igb and ixgbe check tunables at probe time.
This allows one to make a kernel module to tune the
number of queues before the driver loads.

This is needed so that a module at SI_SUB_CPU can set
tunables for these drivers to take.  Otherwise getenv
is called too early by the TUNABLE macros.

Reviewed by: smh
Phabric: https://reviews.freebsd.org/D1149
2014-11-26 20:19:36 +00:00
Hans Petter Selasky
f0188618f2 Fix multiple incorrect SYSCTL arguments in the kernel:
- Wrong integer type was specified.

- Wrong or missing "access" specifier. The "access" specifier
sometimes included the SYSCTL type, which it should not, except for
procedural SYSCTL nodes.

- Logical OR where binary OR was expected.

- Properly assert the "access" argument passed to all SYSCTL macros,
using the CTASSERT macro. This applies to both static- and dynamically
created SYSCTLs.

- Properly assert the the data type for both static and dynamic
SYSCTLs. In the case of static SYSCTLs we only assert that the data
pointed to by the SYSCTL data pointer has the correct size, hence
there is no easy way to assert types in the C language outside a
C-function.

- Rewrote some code which doesn't pass a constant "access" specifier
when creating dynamic SYSCTL nodes, which is now a requirement.

- Updated "EXAMPLES" section in SYSCTL manual page.

MFC after:	3 days
Sponsored by:	Mellanox Technologies
2014-10-21 07:31:21 +00:00
John Baldwin
8423f42aa8 Various fixes to stats:
- Read the counts of received, dropped, and transmitted management
  packets and add sysctl nodes for them.
- Fix the total octets received/transmitted to read all 64 bits of
  the counters.
- Add missing sysctl nodes for rlec, tncrs, fcruc, tor, and tot.
- Remove spurious spaces.

Reviewed by:	Eric Joyner @ Intel
MFC after:	1 week
2014-10-10 16:36:25 +00:00
Gleb Smirnoff
5d53210ced - Provide igb_get_counter() to return counters that are not collected,
but taken from hardware.
- Mechanically convert to if_inc_counter() the rest of counters.
2014-09-19 11:49:41 +00:00
Adrian Chadd
0936a8208b Fix the handling of EOP in status descriptors for if_igb(4) and don't
double-free mbufs.

Like ixgbe(4) chipsets, EOP is only set on the final descriptor
in a chain of descriptors.  So, to free the whole list of descriptors,
we should free the current slot _and_ the assembled list of descriptors
that make up the fragment list.

The existing code was setting discard once it saw EOP + an error status;
it then freed all the subsequent descriptors until the next EOP. That's
totally the wrong order.
2014-09-18 16:20:17 +00:00
Adrian Chadd
1c2427605c Set DROP_EN on each RX queue if transmit flow-control is disabled.
This allows the NIC to drop frames on the receive queue and not
cause the MAC to block on receiving to _any_ queue.

Tested:

igb0@pci0:5:0:0:        class=0x020000 card=0x152115d9 chip=0x15218086 rev=0x01 hdr=0x00
    vendor     = 'Intel Corporation'
    device     = 'I350 Gigabit Network Connection'
    class      = network
    subclass   = ethernet

Discussed with: Eric Joyner <eric.joyner@intel.com>

MFC after:	1 week
Sponsored by:	Norse Corp, Inc.
2014-09-15 19:53:49 +00:00
Gleb Smirnoff
1bffa9511f Use define from if_var.h to access a field inside struct if_data,
that resides in struct ifnet.

Sponsored by:	Nginx, Inc.
2014-08-30 19:55:54 +00:00
Luigi Rizzo
4bf50f18eb Update to the current version of netmap.
Mostly bugfixes or features developed in the past 6 months,
so this is a 10.1 candidate.

Basically no user API changes (some bugfixes in sys/net/netmap_user.h).

In detail:

1. netmap support for virtio-net, including in netmap mode.
  Under bhyve and with a netmap backend [2] we reach over 1Mpps
  with standard APIs (e.g. libpcap), and 5-8 Mpps in netmap mode.

2. (kernel) add support for multiple memory allocators, so we can
  better partition physical and virtual interfaces giving access
  to separate users. The most visible effect is one additional
  argument to the various kernel functions to compute buffer
  addresses. All netmap-supported drivers are affected, but changes
  are mechanical and trivial

3. (kernel) simplify the prototype for *txsync() and *rxsync()
  driver methods. All netmap drivers affected, changes mostly mechanical.

4. add support for netmap-monitor ports. Think of it as a mirroring
  port on a physical switch: a netmap monitor port replicates traffic
  present on the main port. Restrictions apply. Drive carefully.

5. if_lem.c: support for various paravirtualization features,
  experimental and disabled by default.
  Most of these are described in our ANCS'13 paper [1].
  Paravirtualized support in netmap mode is new, and beats the
  numbers in the paper by a large factor (under qemu-kvm,
  we measured gues-host throughput up to 10-12 Mpps).

A lot of refactoring and additional documentation in the files
in sys/dev/netmap, but apart from #2 and #3 above, almost nothing
of this stuff is visible to other kernel parts.

Example programs in tools/tools/netmap have been updated with bugfixes
and to support more of the existing features.

This is meant to go into 10.1 so we plan an MFC before the Aug.22 deadline.

A lot of this code has been contributed by my colleagues at UNIPI,
including Giuseppe Lettieri, Vincenzo Maffione, Stefano Garzarella.

MFC after:	3 days.
2014-08-16 15:00:01 +00:00
Adrian Chadd
fa4be7cc42 Fix the igb(4) redirection table to correctly populate.
This is similar to the ixgbe(4) fix.

Tested:

* Intel I350 gigabit adapter
2014-07-23 05:40:28 +00:00
Hiren Panchasara
eee92ad073 The description is a bit misleading. Trying to make it more obvious.
Phabric:    https://phabric.freebsd.org/D435
Reviewed by:	gnn
2014-07-18 16:25:35 +00:00
Adrian Chadd
8c0d2adf3f Initialise these variables so gcc doesn't complain.
Submitted by:	luigi
2014-06-30 23:34:36 +00:00
Adrian Chadd
1d72a9bea9 Add initial RSS awareness to the igb(4) driver.
The igb(4) hardware is capable of RSS hashing RX packets and doing RSS
queue selection for up to 8 queues.  (I believe some hardware is limited
to 4 queues, but I haven't tested on that.)

However, even if multi-queue is enabled for igb(4), the RX path doesn't use
the RSS flowid from the received descriptor.  It just uses the MSIX queue id.

This patch does a handful of things if RSS is enabled:

* Instead of using a random key at boot, fetch the RSS key from the RSS code
  and program that in to the RSS redirection table.

  That whole chunk of code should be double checked for endian correctness.

* Use the RSS queue mapping to CPU ID to figure out where to thread pin
  the RX swi thread and the taskqueue threads for each queue.

* The software queue is now really an "RSS bucket".

* When programming the RSS indirection table, use the RSS code to
  figure out which RSS bucket each slot in the indirection table maps
  to.

* When transmitting, use the flowid RSS mapping if the mbuf has
  an RSS aware hash.  The existing method wasn't guaranteed to align
  correctly with the destination RSS bucket (and thus CPU ID.)

This code warns if the number of RSS buckets isn't the same as the
automatically configured number of hardware queues.  The administrator
will have to tweak one of them for better performance.

There's currently no way to re-balance the RSS indirection table after
startup.  I'll worry about that later.

Additionally, it may be worthwhile to always use the full 32 bit flowid if
multi-queue is enabled.  It'll make things like lagg(4) behave better with
respect to traffic distribution.
2014-06-30 04:34:59 +00:00
Hans Petter Selasky
af3b2549c4 Pull in r267961 and r267973 again. Fix for issues reported will follow. 2014-06-28 03:56:17 +00:00
Glen Barber
37a107a407 Revert r267961, r267973:
These changes prevent sysctl(8) from returning proper output,
such as:

 1) no output from sysctl(8)
 2) erroneously returning ENOMEM with tools like truss(1)
    or uname(1)
 truss: can not get etype: Cannot allocate memory
2014-06-27 22:05:21 +00:00
Hans Petter Selasky
3da1cf1e88 Extend the meaning of the CTLFLAG_TUN flag to automatically check if
there is an environment variable which shall initialize the SYSCTL
during early boot. This works for all SYSCTL types both statically and
dynamically created ones, except for the SYSCTL NODE type and SYSCTLs
which belong to VNETs. A new flag, CTLFLAG_NOFETCH, has been added to
be used in the case a tunable sysctl has a custom initialisation
function allowing the sysctl to still be marked as a tunable. The
kernel SYSCTL API is mostly the same, with a few exceptions for some
special operations like iterating childrens of a static/extern SYSCTL
node. This operation should probably be made into a factored out
common macro, hence some device drivers use this. The reason for
changing the SYSCTL API was the need for a SYSCTL parent OID pointer
and not only the SYSCTL parent OID list pointer in order to quickly
generate the sysctl path. The motivation behind this patch is to avoid
parameter loading cludges inside the OFED driver subsystem. Instead of
adding special code to the OFED driver subsystem to post-load tunables
into dynamically created sysctls, we generalize this in the kernel.

Other changes:
- Corrected a possibly incorrect sysctl name from "hw.cbb.intr_mask"
to "hw.pcic.intr_mask".
- Removed redundant TUNABLE statements throughout the kernel.
- Some minor code rewrites in connection to removing not needed
TUNABLE statements.
- Added a missing SYSCTL_DECL().
- Wrapped two very long lines.
- Avoid malloc()/free() inside sysctl string handling, in case it is
called to initialize a sysctl from a tunable, hence malloc()/free() is
not ready when sysctls from the sysctl dataset are registered.
- Bumped FreeBSD version to indicate SYSCTL API change.

MFC after:	2 weeks
Sponsored by:	Mellanox Technologies
2014-06-27 16:33:43 +00:00
John Baldwin
46e89834dc - Don't compare bus_dma map pointers for static DMA allocations against
NULL to determine if bus_dmamap_unload() or bus_dmamem_free() should be
  called.  Instead, check the associated bus and virtual addresses.
- Don't clear static DMA maps to NULL.

Reviewed by:	jfv
2014-06-12 11:15:19 +00:00
Luigi Rizzo
c7156fe92f make sure if_transmit returns 0 if the mbuf is enqueued.
ixgbe/ixv.c still needs a similar fix but it takes a little
more restructuring of the code.

MFC after:	3 days
2014-06-06 20:49:56 +00:00
Luigi Rizzo
0d88706547 reference the correct variable in a comment
MFC after:	3 days
2014-05-28 06:50:16 +00:00
Gleb Smirnoff
3dbdfe820b Fix compilation with IGB_LEGACY_TX defined.
PR:		185909
Submitted by:	Aurelien Rougemont <beorn binaries.fr>
2014-01-25 20:39:23 +00:00
Luigi Rizzo
17885a7bfd It is 2014 and we have a new version of netmap.
Most relevant features:

- netmap emulation on any NIC, even those without native netmap support.

  On the ixgbe we have measured about 4Mpps/core/queue in this mode,
  which is still a lot more than with sockets/bpf.

- seamless interconnection of VALE switch, NICs and host stack.

  If you disable accelerations on your NIC (say em0)

        ifconfig em0 -txcsum -txcsum

  you can use the VALE switch to connect the NIC and the host stack:

        vale-ctl -h valeXX:em0

  allowing sharing the NIC with other netmap clients.

- THE USER API HAS SLIGHTLY CHANGED (head/cur/tail pointers
  instead of pointers/count as before). This was unavoidable to support,
  in the future, multiple threads operating on the same rings.
  Netmap clients require very small source code changes to compile again.
      On the plus side, the new API should be easier to understand
  and the internals are a lot simpler.

The manual page has been updated extensively to reflect the current
features and give some examples.

This is the result of work of several people including Giuseppe Lettieri,
Vincenzo Maffione, Michio Honda and myself, and has been financially
supported by EU projects CHANGE and OPENLAB, from NetApp University
Research Fund, NEC, and of course the Universita` di Pisa.
2014-01-06 12:53:15 +00:00
Konstantin Belousov
d480f5b820 Fix several issues with the busdma(9) KPI use in the e1000 drivers.
The problems do not affect bouncing busdma in a visible way, but are
critical for the dmar backend.

- The bus_dmamap_create(9) is not documented to take BUS_DMA_NOWAIT flag.
- Unload descriptor map after receive.
- Do not reset descriptor map to NULL, bus_dmamap_load(9) requires
  valid map, and also this leaks the map.

Reported and tested by:	pho
Approved by:	jfv
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2013-11-02 09:16:11 +00:00
Luigi Rizzo
ce3ee1e7c4 update to the latest netmap snapshot.
This includes the following:
- use separate memory regions for VALE ports
- locking fixes
- some simplifications in the NIC-specific routines
- performance improvements for the VALE switch
- some new features in the pkt-gen test program
- documentation updates

There are small API changes that require programs to be recompiled
(NETMAP_API has been bumped so you will detect old binaries at runtime).

In particular:
- struct netmap_slot now is 16 bytes to support an extra pointer,
  which may save one data copy when using VALE ports or VMs;
- the struct netmap_if has two extra fields;

MFC after:	3 days
2013-11-01 21:21:14 +00:00
Gleb Smirnoff
76039bc84f The r48589 promised to remove implicit inclusion of if_var.h soon. Prepare
to this event, adding if_var.h to files that do need it. Also, include
all includes that now are included due to implicit pollution via if_var.h

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2013-10-26 17:58:36 +00:00
Jack F Vogel
7609433eb6 Update the Intel igb driver to version 2.4.0
- This version has support for the new Intel Avoton systems,
including 2.5Gb support, further it now has IPv6/TSO6 support as
well. Shared code has been updated where necessary as well. Thanks
to my new assistant Eric Joyner for doing the transmit path changes
to bring in the IPv6/TSO6 support. Thanks to Gleb for catching the
one bug and change needed in NETMAP.

Approved by: re
2013-10-09 17:32:52 +00:00
Hiren Panchasara
5b9d734b08 Expose system level ixgbe sysctls.
Device level sysctls are already exposed as dev.ix.<device>

Fixing the case where number of queues for igb is auto-tuned and
hw.igb.num_queues does not return current/updated value.

Reviewed by:	jfv
Approved by:	re (delphij)
MFC after:	2 weeks
2013-10-05 19:17:56 +00:00
Andre Oppermann
1b4381afbb Restructure the mbuf pkthdr to make it fit for upcoming capabilities and
features.  The changes in particular are:

o Remove rarely used "header" pointer and replace it with a 64bit protocol/
  layer specific union PH_loc for local use.  Protocols can flexibly overlay
  their own 8 to 64 bit fields to store information while the packet is
  worked on.

o Mechanically convert IP reassembly, IGMP/MLD and ATM to use pkthdr.PH_loc
  instead of pkthdr.header.

o Extend csum_flags to 64bits to allow for additional future offload
  information to be carried (e.g. iSCSI, IPsec offload, and others).

o Move the RSS hash type enumerator from abusing m_flags to its own 8bit
  rsstype field.  Adjust accessor macros.

o Add cosqos field to store Class of Service / Quality of Service information
  with the packet.  It is not yet supported in any drivers but allows us to
  get on par with Cisco/Juniper in routing applications (plus MPLS QoS) with
  a modernized ALTQ.

o Add four 8 bit fields l[2-5]hlen to store the relative header offsets
  from the start of the packet.  This is important for various offload
  capabilities and to relieve the drivers from having to parse the packet
  and protocol headers to find out location of checksums and other
  information.  Header parsing in drivers is a lot of copy-paste and
  unhandled corner cases which we want to avoid.

o Add another flexible 64bit union to map various additional persistent
  packet information, like ether_vtag, tso_segsz and csum fields.
  Depending on the csum_flags settings some fields may have different usage
  making it very flexible and adaptable to future capabilities.

o Restructure the CSUM flags to better signify their outbound (down the
  stack) and inbound (up the stack) use.  The CSUM flags used to be a bit
  chaotic and rather poorly documented leading to incorrect use in many
  places.  Bring clarity into their use through better naming.
  Compatibility mappings are provided to preserve the API.  The drivers
  can be corrected one by one and MFC'd without issue.

o The size of pkthdr stays the same at 48/56bytes (32/64bit architectures).

Sponsored by:	The FreeBSD Foundation
2013-08-24 19:51:18 +00:00
Jack F Vogel
83cef45266 Alter the mq_start routine to do a TRYLOCK and call to the locked routine
rather than just queueing. The former code was an attempt at getting
UDP performance up, but there have been customer reports of problems with it,
so the ixgbe approach seems the best solution for now.
2013-08-13 00:25:39 +00:00
Scott Long
c68534f1d5 Update PCI drivers to no longer look at the MEMIO-enabled bit in the PCI
command register.  The lazy BAR allocation code in FreeBSD sometimes
disables this bit when it detects a range conflict, and will re-enable
it on demand when a driver allocates the BAR.  Thus, the bit is no longer
a reliable indication of capability, and should not be checked.  This
results in the elimination of a lot of code from drivers, and also gives
the opportunity to simplify a lot of drivers to use a helper API to set
the busmaster enable bit.

This changes fixes some recent reports of disk controllers and their
associated drives/enclosures disappearing during boot.

Submitted by:	jhb
Reviewed by:	jfv, marius, achadd, achim
MFC after:	1 day
2013-08-12 23:30:01 +00:00
Jack F Vogel
4dc63104ae Improve the MSIX setup code in the drivers, thanks to Marius for
the changes. Make sure that pci_alloc_msix() does give us the vectors
we need and fall back to MSI when it doesn't, also release any that
were allocated when insufficient.

MFC after: 3 days
2013-08-12 22:54:38 +00:00
Jack F Vogel
d0913b7f25 Make the various driver MSIX setup routines fallback to MSI more
gracefully. This change was suggested by Marius Strobl, thank you.

PR: kern/181016
MFC after: ASAP
2013-08-06 21:01:38 +00:00
Jack F Vogel
54a6317360 When the igb driver is static there are cases when early interrupts occur,
resulting in a panic in refresh_mbufs, to prevent this add a check in the
interrupt handler for DRV_RUNNING.

MFC after: 1 day (critical for 9.2)
2013-08-06 18:00:53 +00:00