Commit Graph

42 Commits

Author SHA1 Message Date
Luigi Rizzo
0e73f29ae2 add support for private knote lock (reduces lock contention),
adapting OS_selrecord accordingly.
Problem and fix suggested by adrian and jmg
2014-11-13 00:40:34 +00:00
Luigi Rizzo
039dd540f5 in the Linux section, properly define the NMG_LOCK type.
Also import WITH_GENERIC in preparation to adding fine-grained
options to disable specific netmap components.
2014-11-11 00:13:28 +00:00
Luigi Rizzo
6435a0dc1b fix a typo 2014-11-10 21:00:23 +00:00
Luigi Rizzo
7f154b713a adapt the code to different freebsd versions.
Not necessary to MFC
2014-09-25 15:57:57 +00:00
Gleb Smirnoff
997d2d833f Provide pointer from struct ifnet to struct netmap_adapter,
instead of abusing spare field.
2014-08-31 11:33:19 +00:00
Navdeep Parhar
9721a22d4a Change netmap's global lock to sx instead of a mutex.
Reviewed by:	luigi@
MFC after:	1 day
2014-08-20 23:37:44 +00:00
Luigi Rizzo
4bf50f18eb Update to the current version of netmap.
Mostly bugfixes or features developed in the past 6 months,
so this is a 10.1 candidate.

Basically no user API changes (some bugfixes in sys/net/netmap_user.h).

In detail:

1. netmap support for virtio-net, including in netmap mode.
  Under bhyve and with a netmap backend [2] we reach over 1Mpps
  with standard APIs (e.g. libpcap), and 5-8 Mpps in netmap mode.

2. (kernel) add support for multiple memory allocators, so we can
  better partition physical and virtual interfaces giving access
  to separate users. The most visible effect is one additional
  argument to the various kernel functions to compute buffer
  addresses. All netmap-supported drivers are affected, but changes
  are mechanical and trivial

3. (kernel) simplify the prototype for *txsync() and *rxsync()
  driver methods. All netmap drivers affected, changes mostly mechanical.

4. add support for netmap-monitor ports. Think of it as a mirroring
  port on a physical switch: a netmap monitor port replicates traffic
  present on the main port. Restrictions apply. Drive carefully.

5. if_lem.c: support for various paravirtualization features,
  experimental and disabled by default.
  Most of these are described in our ANCS'13 paper [1].
  Paravirtualized support in netmap mode is new, and beats the
  numbers in the paper by a large factor (under qemu-kvm,
  we measured gues-host throughput up to 10-12 Mpps).

A lot of refactoring and additional documentation in the files
in sys/dev/netmap, but apart from #2 and #3 above, almost nothing
of this stuff is visible to other kernel parts.

Example programs in tools/tools/netmap have been updated with bugfixes
and to support more of the existing features.

This is meant to go into 10.1 so we plan an MFC before the Aug.22 deadline.

A lot of this code has been contributed by my colleagues at UNIPI,
including Giuseppe Lettieri, Vincenzo Maffione, Stefano Garzarella.

MFC after:	3 days.
2014-08-16 15:00:01 +00:00
Luigi Rizzo
46aa1303f3 sync the code with the one in stable/10
(wrap the if_t compatibilty function into a __FreeBSD_version
conditional block)
2014-06-09 15:44:31 +00:00
Luigi Rizzo
89cc25561c align comments with the ones in our development trunk 2014-06-06 14:58:25 +00:00
Luigi Rizzo
43ed1d3c76 whitespace change: remove trailing whitespace 2014-06-05 21:12:41 +00:00
Marcel Moolenaar
62d76917b8 Introduce a procedural interface to the ifnet structure. The new
interface allows the ifnet structure to be defined as an opaque
type in NIC drivers.  This then allows the ifnet structure to be
changed without a need to change or recompile NIC drivers.

Put differently, NIC drivers can be written and compiled once and
be used with different network stack implementations, provided of
course that those network stack implementations have an API and
ABI compatible interface.

This commit introduces the 'if_t' type to replace 'struct ifnet *'
as the type of a network interface. The 'if_t' type is defined as
'void *' to enable the compiler to perform type conversion to
'struct ifnet *' and vice versa where needed and without warnings.
The functions that implement the API are the only functions that
need to have an explicit cast.

The MII code has been converted to use the driver API to avoid
unnecessary code churn. Code churn comes from having to work with
both converted and unconverted drivers in correlation with having
callback functions that take an interface. By converting the MII
code first, the callback functions can be defined so that the
compiler will perform the typecasts automatically.

As soon as all drivers have been converted, the if_t type can be
redefined as needed and the API functions can be fix to not need
an explicit cast.

The immediate benefactors of this change are:
1.  Juniper Networks - The network stack implementation in Junos
    is entirely different from FreeBSD's one and this change
    allows Juniper to build "stock" NIC drivers that can be used
    in combination with both the FreeBSD and Junos stacks.
2.  FreeBSD - This change opens the door towards changing ifnet
    and implementing new features and optimizations in the network
    stack without it requiring a change in the many NIC drivers
    FreeBSD has.

Submitted by:	Anuranjan Shukla <anshukla@juniper.net>
Reviewed by:	glebius@
Obtained from:	Juniper Networks, Inc.
2014-06-02 17:54:39 +00:00
Luigi Rizzo
f0ea3689a9 This new version of netmap brings you the following:
- netmap pipes, providing bidirectional blocking I/O while moving
  100+ Mpps between processes using shared memory channels
  (no mistake: over one hundred million. But mind you, i said
  *moving* not *processing*);

- kqueue support (BHyVe needs it);

- improved user library. Just the interface name lets you select a NIC,
  host port, VALE switch port, netmap pipe, and individual queues.
  The upcoming netmap-enabled libpcap will use this feature.

- optional extra buffers associated to netmap ports, for applications
  that need to buffer data yet don't want to make copies.

- segmentation offloading for the VALE switch, useful between VMs.

and a number of bug fixes and performance improvements.

My colleagues Giuseppe Lettieri and Vincenzo Maffione did a substantial
amount of work on these features so we owe them a big thanks.

There are some external repositories that can be of interest:

    https://code.google.com/p/netmap
        our public repository for netmap/VALE code, including
        linux versions and other stuff that does not belong here,
        such as python bindings.

    https://code.google.com/p/netmap-libpcap
        a clone of the libpcap repository with netmap support.
	With this any libpcap client has access to most netmap
	feature with no recompilation. E.g. tcpdump can filter
	packets at 10-15 Mpps.

    https://code.google.com/p/netmap-ipfw
        a userspace version of ipfw+dummynet which uses netmap
        to send/receive packets. Speed is up in the 7-10 Mpps
        range per core for simple rulesets.

Both netmap-libpcap and netmap-ipfw will be merged upstream at some
point, but while this happens it is useful to have access to them.

And yes, this code will be merged soon. It is infinitely better
than the version currently in 10 and 9.

MFC after:	3 days
2014-02-15 04:53:04 +00:00
Luigi Rizzo
fb25194fb0 fix use after free when releasing a netmap adapter.
Submitted by:	Giuseppe Lettieri
2014-01-07 21:14:28 +00:00
Luigi Rizzo
17885a7bfd It is 2014 and we have a new version of netmap.
Most relevant features:

- netmap emulation on any NIC, even those without native netmap support.

  On the ixgbe we have measured about 4Mpps/core/queue in this mode,
  which is still a lot more than with sockets/bpf.

- seamless interconnection of VALE switch, NICs and host stack.

  If you disable accelerations on your NIC (say em0)

        ifconfig em0 -txcsum -txcsum

  you can use the VALE switch to connect the NIC and the host stack:

        vale-ctl -h valeXX:em0

  allowing sharing the NIC with other netmap clients.

- THE USER API HAS SLIGHTLY CHANGED (head/cur/tail pointers
  instead of pointers/count as before). This was unavoidable to support,
  in the future, multiple threads operating on the same rings.
  Netmap clients require very small source code changes to compile again.
      On the plus side, the new API should be easier to understand
  and the internals are a lot simpler.

The manual page has been updated extensively to reflect the current
features and give some examples.

This is the result of work of several people including Giuseppe Lettieri,
Vincenzo Maffione, Michio Honda and myself, and has been financially
supported by EU projects CHANGE and OPENLAB, from NetApp University
Research Fund, NEC, and of course the Universita` di Pisa.
2014-01-06 12:53:15 +00:00
Luigi Rizzo
2e159ef0b5 fix the build using __builtin_prefetch() instead of redefining prefetch() 2013-12-16 23:57:43 +00:00
Luigi Rizzo
f9790aeb88 split netmap code according to functions:
- netmap.c		base code
- netmap_freebsd.c	FreeBSD-specific code
- netmap_generic.c	emulate netmap over standard drivers
- netmap_mbq.c		simple mbuf tailq
- netmap_mem2.c		memory management
- netmap_vale.c		VALE switch

simplify devce-specific code
2013-12-15 08:37:24 +00:00
Luigi Rizzo
ce3ee1e7c4 update to the latest netmap snapshot.
This includes the following:
- use separate memory regions for VALE ports
- locking fixes
- some simplifications in the NIC-specific routines
- performance improvements for the VALE switch
- some new features in the pkt-gen test program
- documentation updates

There are small API changes that require programs to be recompiled
(NETMAP_API has been bumped so you will detect old binaries at runtime).

In particular:
- struct netmap_slot now is 16 bytes to support an extra pointer,
  which may save one data copy when using VALE ports or VMs;
- the struct netmap_if has two extra fields;

MFC after:	3 days
2013-11-01 21:21:14 +00:00
Luigi Rizzo
f18be5766f Bring in a number of new features, mostly implemented by Michio Honda:
- the VALE switch now support up to 254 destinations per switch,
  unicast or broadcast (multicast goes to all ports).

- we can attach hw interfaces and the host stack to a VALE switch,
  which means we will be able to use it more or less as a native bridge
  (minor tweaks still necessary).
  A 'vale-ctl' program is supplied in tools/tools/netmap
  to attach/detach ports the switch, and list current configuration.

- the lookup function in the VALE switch can be reassigned to
  something else, similar to the pf hooks. This will enable
  attaching the firewall, or other processing functions (e.g. in-kernel
  openvswitch) directly on the netmap port.

The internal API used by device drivers does not change.

Userspace applications should be recompiled because we
bump NETMAP_API as we now use some fields in the struct nmreq
that were previously ignored -- otherwise, data structures
are the same.

Manpages will be committed separately.
2013-05-30 14:07:14 +00:00
Luigi Rizzo
849bec0e76 Partial cleanup in preparation for upcoming changes:
- netmap_rx_irq()/netmap_tx_irq() can now be called by FreeBSD drivers
  hiding the logic for handling NIC interrupts in netmap mode.
  This also simplifies the case of NICs attached to VALE switches.
     Individual drivers will be updated with separate commits.

- use the same refcount() API for FreeBSD and linux

- plus some comments, typos and formatting fixes

Portions contributed by Michio Honda
2013-04-30 16:08:34 +00:00
Luigi Rizzo
d4b42e0869 whitespace changes:
remove $Id$ lines, and add blank lines around some #if / #elif /#endif
2013-04-29 18:00:53 +00:00
Luigi Rizzo
2579e2d715 mostly whitespace changes:
- remove vestiges of the old memory allocator
- clean up some comments
2013-04-19 21:08:21 +00:00
Luigi Rizzo
ae10d1afee control some debugging messages with dev.netmap.verbose
add infrastracture to adapt to changes in number of queues
and buffers at runtime
2013-01-23 03:51:47 +00:00
Luigi Rizzo
1dce924d25 add some definition and driver changes in preparation for
two upcoming features:

semi-transparent mode:
    when a device is opened in this mode, the
    user program will be able to mark slots that must be forwarded
    to the "other" side (i.e. from NIC to host stack, or viceversa),
    and the forwarding will occur automatically at the next netmap syscall.
    This saves the need to open another file descriptor and do
    the forwarding manually.

direct-forwarding mode:
    when operating with a VALE port, the user can specify in the slot
    the actual destination port, overriding the forwarding decision
    made by a lookup of the destination MAC. This can be useful to
    implement packet dispatchers.

No API changes will be introduced.
No new functionality in this patch yet.
2013-01-17 22:14:58 +00:00
Luigi Rizzo
8241616dc5 This is an import of code, mostly from Giuseppe Lettieri,
that revises the netmap memory allocator so that the
various parameters (number and size of buffers, rings, descriptors)
can be modified at runtime through sysctl variables.
The changes become effective when no netmap clients are active.

The API is mostly unchanged, although the NIOCUNREGIF ioctl now
does not bring the interface back to normal mode: and you
need to close the file descriptor for that.
This change was necessary to track who is using the mapped region,
and since it is a simplification of the API there was no
incentive in trying to preserve NIOCUNREGIF.
We will remove the ioctl from the kernel next time we need
a real API change (and version bump).

Among other things, buffer allocation when opening devices is
now much faster: it used to take O(N^2) time, now it is linear.

Submitted by:	Giuseppe Lettieri
2012-10-19 04:13:12 +00:00
Ed Maste
24e57ec96d Clarify comments about number of tx / rx rings 2012-08-08 15:27:01 +00:00
Luigi Rizzo
b3d5301688 fix some signed/unsigned warnings in the netmap code.
Unfortunately the original drivers still have a lot of
sign conversion/comparison warnings.
2012-08-02 11:59:43 +00:00
Luigi Rizzo
d198a63d44 remove a redundant MALLOC_DECLARE 2012-07-31 05:51:48 +00:00
Luigi Rizzo
826e7ddbfc remove unused definition, whitespace cleanup 2012-07-27 10:31:26 +00:00
Luigi Rizzo
f196ce3869 Add support for VALE bridges to the netmap core, see
http://info.iet.unipi.it/~luigi/vale/

VALE lets you dynamically instantiate multiple software bridges
that talk the netmap API (and are *extremely* fast), so you can test
netmap applications without the need for high end hardware.

This is particularly useful as I am completing a netmap-aware
version of ipfw, and VALE provides an excellent testing platform.

Also, I also have netmap backends for qemu mostly ready for commit
to the port, and this too will let you interconnect virtual machines
at high speed without fiddling with bridges, tap or other slow solutions.

The API for applications is unchanged, so you can use the code
in tools/tools/netmap (which i will update soon) on the VALE ports.

This commit also syncs the code with the one in my internal repository,
so you will see some conditional code for other platforms.
The code should run mostly unmodified on stable/9 so people interested
in trying it can just copy sys/dev/netmap/ and sys/net/netmap*.h
from HEAD

VALE is joint work with my colleague Giuseppe Lettieri, and
is partly supported by the EU Projects CHANGE and OPENLAB
2012-07-26 16:45:28 +00:00
Luigi Rizzo
d76bf4ff7b A bit of cleanup in the names of fields of netmap-related structures.
Use the name 'ring' instead of 'queue' in all fields.
Bump NETMAP_API.
2012-04-13 16:03:07 +00:00
Luigi Rizzo
3c0caf6ce6 Some code restructuring to bring the memory allocator out of netmap.c
and make it easier to replace it with a different implementation.
On passing, also fix indentation.

NOTE: I know that #include "foo.c" is ugly, but the alternative
(add another entry to sys/conf/files, add a separate header with
structs and prototypes, and expose functions that are meant to
be private) looks even worse to me.
We need a more modular way to specify dependencies and build options.
2012-04-12 11:27:09 +00:00
Luigi Rizzo
64ae02c365 A bunch of netmap fixes:
USERSPACE:
1. add support for devices with different number of rx and tx queues;

2. add better support for zero-copy operation, adding an extra field
   to the netmap ring to indicate how many buffers we have already processed
   but not yet released (with help from Eddie Kohler);

3. The two changes above unfortunately require an API change, so while
   at it add a version field and some spares to the ioctl() argument
   to help detect mismatches.

4. update the manual page for the two changes above;

5. update sample applications in tools/tools/netmap

KERNEL:

1. simplify the internal structures moving the global wait queues
   to the 'struct netmap_adapter';

2. simplify the functions that map kring<->nic ring indexes

3. normalize device-specific code, helps mainteinance;

4. start exploring the impact of micro-optimizations (prefetch etc.)
   in the ixgbe driver.
   Use 'legacy' descriptors on the tx ring and prefetch slots gives
   about 20% speedup at 900 MHz. Another 7-10% would come from removing
   the explict calls to bus_dmamap* in the core (they are effectively
   NOPs in this case, but it takes expensive load of the per-buffer
   dma maps to figure out that they are all NULL.

   Rx performance not investigated.

I am postponing the MFC so i can import a few more improvements
before merging.
2012-02-27 19:05:01 +00:00
Luigi Rizzo
5644ccec61 (This commit only touches code within the DEV_NETMAP blocks)
Introduce some functions to map NIC ring indexes into netmap ring
indexes and vice versa. This way we can implement the bound
checks only in one place (and hopefully in a correct way).

On passing, make the code and comments more uniform across the
various drivers.
2012-02-15 23:13:29 +00:00
Luigi Rizzo
1a26580ee8 - use struct ifnet as explicit type of the argument to the
txsync() and rxsync() callbacks, removing some variables made
  useless by this change;

- add generic lock and irq handling routines. These can be useful
  in case there are no driver locks that we can reuse;

- add a few macros to reduce differences with the Linux version.
2012-02-13 18:56:34 +00:00
Luigi Rizzo
5819da83ce - change the buffer size from a constant to a
TUNABLE variable (hw.netmap.buf_size) so we can experiment
  with values different from 2048 which may give better cache performance.

- rearrange the memory allocation code so it will be easier
  to replace it with a different implementation. The current code
  relies on a single large contiguous chunk of memory obtained through
  contigmalloc.
  The new implementation (not committed yet) uses multiple
  smaller chunks which are easier to fit in a fragmented address
  space.
2012-02-08 11:43:29 +00:00
Luigi Rizzo
2157a17ce2 ixgbe changes:
- remove experimental code for disabling CRC
- use the correct constant for conversion between interrupt rate
  and EITR values (the previous values were off by a factor of 2)
- make dev.ix.N.queueM.interrupt_rate a RW sysctl variable.
  Changing individual values affects the queue immediately,
  and propagates to all interfaces at the next reinit.
- add dev.ix.N.queueM.irqs rdonly sysctl, to export the actual
  interrupt counts

Netmap-related changes for ixgbe:
- use the "new" format for TX descriptors in netmap mode.
- pass interrupt mitigation delays to the user process doing poll()
  on a netmap file descriptor.
  On the RX side this means we will not check the ring more than once
  per interrupt. This gives the process a chance to sleep and process
  packets in larger batches, thus reducing CPU usage.
  On the TX side we take this even further: completed transmissions are
  reclaimed every half ring even if the NIC interrupts more often.
  This saves even more CPU without any additional tx delays.

Generic Netmap-related changes:
- align the netmap_kring to cache lines so that there is no false sharing
  (possibly useful for multiqueue NICs and MSIX interrupts, which are
  handled by different cores). It's a minor improvement but it does not
  cost anything.

Reviewed by:	Jack Vogel
Approved by:	Jack Vogel
2012-01-26 09:55:16 +00:00
Luigi Rizzo
bcda432e01 indentation and whitespace fixes 2012-01-13 11:58:06 +00:00
Luigi Rizzo
6dba29a285 Two performance-related fixes:
1. as reported by Alexander Fiveg, the allocator was reporting
   half of the allocated memory. Fix this by exiting from the
   loop earlier (not too critical because this code is going
   away soon).

2. following a discussion on freebsd-current
    http://lists.freebsd.org/pipermail/freebsd-current/2012-January/031144.html
   turns out that (re)loading the dmamap was expensive and not optimized.
   This operation is in the critical path when doing zero-copy forwarding
   between interfaces.
   At least on netmap and i386/amd64, the bus_dmamap_load can be
   completely bypassed if the map is NULL, so we do it.

The latter change gives an almost 3x improvement in forwarding
performance, from the previous 9.5Mpps at 2.9GHz to the current
line rate (14.2Mpps) at 1.733GHz. (this is for 64+4 byte packets,
in other configurations the PCIe bus is a bottleneck).
2012-01-13 10:21:15 +00:00
Luigi Rizzo
6e10c8b8c5 small code cleanup in preparation for future modifications in
the memory allocator used by netmap. No functional change,
two small bug fixes:
- in if_re.c add a missing bus_dmamap_sync()
- in netmap.c comment out a spurious free() in an error handling block
2012-01-10 19:57:23 +00:00
Luigi Rizzo
d0c7b0751a 1. don't use if_pspare directly, but through a macro WMA()
2. move a variable declaration at the beginning of a block
2011-12-23 16:03:57 +00:00
Luigi Rizzo
506cc70cce 1. Fix the handling of link reset while in netmap more.
A link reset now is completely transparent for the netmap client:
   even if the NIC resets its own ring (e.g. restarting from 0),
   the client will not see any change in the current rx/tx positions,
   because the driver will keep track of the offset between the two.

2. make the device-specific code more uniform across different drivers
   There were some inconsistencies in the implementation of the netmap
   support routines, now drivers have been aligned to a common
   code structure.

3. import netmap support for ixgbe . This is implemented as a very
   small patch for ixgbe.c (233 lines, 11 chunks, mostly comments:
   in total the patch has only 54 lines of new code) , as most of
   the code is in an external file sys/dev/netmap/ixgbe_netmap.h ,
   following some initial comments from Jack Vogel about making
   changes less intrusive.
   (Note, i have emailed Jack multiple times asking if he had
   comments on this structure of the code; i got no reply so
   i assume he is fine with it).

Support for other drivers (em, lem, re, igb) will come later.

"ixgbe" is now the reference driver for netmap support. Both the
external file (sys/dev/netmap/ixgbe_netmap.h) and the device-specific
patches (in sys/dev/ixgbe/ixgbe.c) are heavily commented and should
serve as a reference for other device drivers.

Tested on i386 and amd64 with the pkt-gen program in tools/tools/netmap,
the sender does 14.88 Mpps at 1050 Mhz and 14.2 Mpps at 900 MHz
on an i7-860 with 4 cores and 82599 card. Haven't tried yet more
aggressive optimizations such as adding 'prefetch' instructions
in the time-critical parts of the code.
2011-12-05 12:06:53 +00:00
Luigi Rizzo
68b8534bdf Bring in support for netmap, a framework for very efficient packet
I/O from userspace, capable of line rate at 10G, see

	http://info.iet.unipi.it/~luigi/netmap/

At this time I am bringing in only the generic code (sys/dev/netmap/
plus two headers under sys/net/), and some sample applications in
tools/tools/netmap. There is also a manpage in share/man/man4 [1]

In order to make use of the framework you need to build a kernel
with "device netmap", and patch individual drivers with the code
that you can find in

	sys/dev/netmap/head.diff

The file will go away as the relevant pieces are committed to
the various device drivers, which should happen in a few days
after talking to the driver maintainers.

Netmap support is available at the moment for Intel 10G and 1G
cards (ixgbe, em/lem/igb), and for the Realtek 1G card ("re").
I have partial patches for "bge" and am starting to work on "cxgbe".
Hopefully changes are trivial enough so interested third parties
can submit their patches. Interested people can contact me
for advice on how to add netmap support to specific devices.

CREDITS:
    Netmap has been developed by Luigi Rizzo and other collaborators
    at the Universita` di Pisa, and supported by EU project CHANGE
    (http://www.change-project.eu/)
    The code is distributed under a BSD Copyright.

[1] In my opinion is a bad idea to have all manpage in one directory.
  We should place kernel documentation in the same dir that contains
  the code, which would make it much simpler to keep doc and code
  in sync, reduce the clutter in share/man/ and incidentally is
  the policy used for all of userspace code.
  Makefiles and doc tools can be trivially adjusted to find the
  manpages in the relevant subdirs.
2011-11-17 12:17:39 +00:00