Commit Graph

12 Commits

Author SHA1 Message Date
Luigi Rizzo
1a26580ee8 - use struct ifnet as explicit type of the argument to the
txsync() and rxsync() callbacks, removing some variables made
  useless by this change;

- add generic lock and irq handling routines. These can be useful
  in case there are no driver locks that we can reuse;

- add a few macros to reduce differences with the Linux version.
2012-02-13 18:56:34 +00:00
Luigi Rizzo
5819da83ce - change the buffer size from a constant to a
TUNABLE variable (hw.netmap.buf_size) so we can experiment
  with values different from 2048 which may give better cache performance.

- rearrange the memory allocation code so it will be easier
  to replace it with a different implementation. The current code
  relies on a single large contiguous chunk of memory obtained through
  contigmalloc.
  The new implementation (not committed yet) uses multiple
  smaller chunks which are easier to fit in a fragmented address
  space.
2012-02-08 11:43:29 +00:00
Luigi Rizzo
2157a17ce2 ixgbe changes:
- remove experimental code for disabling CRC
- use the correct constant for conversion between interrupt rate
  and EITR values (the previous values were off by a factor of 2)
- make dev.ix.N.queueM.interrupt_rate a RW sysctl variable.
  Changing individual values affects the queue immediately,
  and propagates to all interfaces at the next reinit.
- add dev.ix.N.queueM.irqs rdonly sysctl, to export the actual
  interrupt counts

Netmap-related changes for ixgbe:
- use the "new" format for TX descriptors in netmap mode.
- pass interrupt mitigation delays to the user process doing poll()
  on a netmap file descriptor.
  On the RX side this means we will not check the ring more than once
  per interrupt. This gives the process a chance to sleep and process
  packets in larger batches, thus reducing CPU usage.
  On the TX side we take this even further: completed transmissions are
  reclaimed every half ring even if the NIC interrupts more often.
  This saves even more CPU without any additional tx delays.

Generic Netmap-related changes:
- align the netmap_kring to cache lines so that there is no false sharing
  (possibly useful for multiqueue NICs and MSIX interrupts, which are
  handled by different cores). It's a minor improvement but it does not
  cost anything.

Reviewed by:	Jack Vogel
Approved by:	Jack Vogel
2012-01-26 09:55:16 +00:00
Luigi Rizzo
bcda432e01 indentation and whitespace fixes 2012-01-13 11:58:06 +00:00
Luigi Rizzo
6dba29a285 Two performance-related fixes:
1. as reported by Alexander Fiveg, the allocator was reporting
   half of the allocated memory. Fix this by exiting from the
   loop earlier (not too critical because this code is going
   away soon).

2. following a discussion on freebsd-current
    http://lists.freebsd.org/pipermail/freebsd-current/2012-January/031144.html
   turns out that (re)loading the dmamap was expensive and not optimized.
   This operation is in the critical path when doing zero-copy forwarding
   between interfaces.
   At least on netmap and i386/amd64, the bus_dmamap_load can be
   completely bypassed if the map is NULL, so we do it.

The latter change gives an almost 3x improvement in forwarding
performance, from the previous 9.5Mpps at 2.9GHz to the current
line rate (14.2Mpps) at 1.733GHz. (this is for 64+4 byte packets,
in other configurations the PCIe bus is a bottleneck).
2012-01-13 10:21:15 +00:00
Luigi Rizzo
446ee30192 other simplifications in the internal interfaces to the
memory allocator.
2012-01-10 23:02:01 +00:00
Luigi Rizzo
6e10c8b8c5 small code cleanup in preparation for future modifications in
the memory allocator used by netmap. No functional change,
two small bug fixes:
- in if_re.c add a missing bus_dmamap_sync()
- in netmap.c comment out a spurious free() in an error handling block
2012-01-10 19:57:23 +00:00
Luigi Rizzo
d0c7b0751a 1. don't use if_pspare directly, but through a macro WMA()
2. move a variable declaration at the beginning of a block
2011-12-23 16:03:57 +00:00
Luigi Rizzo
02ad408380 revise the implementation of the rings connected to the host stack 2011-12-05 15:21:21 +00:00
Luigi Rizzo
506cc70cce 1. Fix the handling of link reset while in netmap more.
A link reset now is completely transparent for the netmap client:
   even if the NIC resets its own ring (e.g. restarting from 0),
   the client will not see any change in the current rx/tx positions,
   because the driver will keep track of the offset between the two.

2. make the device-specific code more uniform across different drivers
   There were some inconsistencies in the implementation of the netmap
   support routines, now drivers have been aligned to a common
   code structure.

3. import netmap support for ixgbe . This is implemented as a very
   small patch for ixgbe.c (233 lines, 11 chunks, mostly comments:
   in total the patch has only 54 lines of new code) , as most of
   the code is in an external file sys/dev/netmap/ixgbe_netmap.h ,
   following some initial comments from Jack Vogel about making
   changes less intrusive.
   (Note, i have emailed Jack multiple times asking if he had
   comments on this structure of the code; i got no reply so
   i assume he is fine with it).

Support for other drivers (em, lem, re, igb) will come later.

"ixgbe" is now the reference driver for netmap support. Both the
external file (sys/dev/netmap/ixgbe_netmap.h) and the device-specific
patches (in sys/dev/ixgbe/ixgbe.c) are heavily commented and should
serve as a reference for other device drivers.

Tested on i386 and amd64 with the pkt-gen program in tools/tools/netmap,
the sender does 14.88 Mpps at 1050 Mhz and 14.2 Mpps at 900 MHz
on an i7-860 with 4 cores and 82599 card. Haven't tried yet more
aggressive optimizations such as adding 'prefetch' instructions
in the time-critical parts of the code.
2011-12-05 12:06:53 +00:00
Luigi Rizzo
85df379184 fix formatting warning using casts. The numbers involved
are small and these are debug statements, so there is no reason to
obfuscate the format string with PRIsomeKINDofINTEGER
2011-11-23 09:45:48 +00:00
Luigi Rizzo
68b8534bdf Bring in support for netmap, a framework for very efficient packet
I/O from userspace, capable of line rate at 10G, see

	http://info.iet.unipi.it/~luigi/netmap/

At this time I am bringing in only the generic code (sys/dev/netmap/
plus two headers under sys/net/), and some sample applications in
tools/tools/netmap. There is also a manpage in share/man/man4 [1]

In order to make use of the framework you need to build a kernel
with "device netmap", and patch individual drivers with the code
that you can find in

	sys/dev/netmap/head.diff

The file will go away as the relevant pieces are committed to
the various device drivers, which should happen in a few days
after talking to the driver maintainers.

Netmap support is available at the moment for Intel 10G and 1G
cards (ixgbe, em/lem/igb), and for the Realtek 1G card ("re").
I have partial patches for "bge" and am starting to work on "cxgbe".
Hopefully changes are trivial enough so interested third parties
can submit their patches. Interested people can contact me
for advice on how to add netmap support to specific devices.

CREDITS:
    Netmap has been developed by Luigi Rizzo and other collaborators
    at the Universita` di Pisa, and supported by EU project CHANGE
    (http://www.change-project.eu/)
    The code is distributed under a BSD Copyright.

[1] In my opinion is a bad idea to have all manpage in one directory.
  We should place kernel documentation in the same dir that contains
  the code, which would make it much simpler to keep doc and code
  in sync, reduce the clutter in share/man/ and incidentally is
  the policy used for all of userspace code.
  Makefiles and doc tools can be trivially adjusted to find the
  manpages in the relevant subdirs.
2011-11-17 12:17:39 +00:00