- netmap_rx_irq()/netmap_tx_irq() can now be called by FreeBSD drivers
hiding the logic for handling NIC interrupts in netmap mode.
This also simplifies the case of NICs attached to VALE switches.
Individual drivers will be updated with separate commits.
- use the same refcount() API for FreeBSD and linux
- plus some comments, typos and formatting fixes
Portions contributed by Michio Honda
* VM_OBJECT_LOCK and VM_OBJECT_UNLOCK are mapped to write operations
* VM_OBJECT_SLEEP() is introduced as a general purpose primitve to
get a sleep operation using a VM_OBJECT_LOCK() as protection
* The approach must bear with vm_pager.h namespace pollution so many
files require including directly rwlock.h
By setting dev.netmap.fwd=1 (or enabling the feature with a per-ring flag),
packets are forwarded between the NIC and the host stack unless the
netmap client clears the NS_FORWARD flag on the individual descriptors.
This feature greatly simplifies applications where some traffic
(think of ARP, control traffic, ssh sessions...) must be processed
by the host stack, whereas the bulk is handled by the netmap process
which simply (un)marks packets that should not be forwarded.
The default is chosen so that now a netmap receiver operates
in a mode very similar to bpf.
Of course there is no free lunch: traffic to/from the host stack
still operates at OS speed (or less, as there is one extra copy in
one direction).
HOWEVER, since traffic goes to the user process before being
reinjected, and reinjection occurs in a user context, you get some
form of livelock protection for free.
two upcoming features:
semi-transparent mode:
when a device is opened in this mode, the
user program will be able to mark slots that must be forwarded
to the "other" side (i.e. from NIC to host stack, or viceversa),
and the forwarding will occur automatically at the next netmap syscall.
This saves the need to open another file descriptor and do
the forwarding manually.
direct-forwarding mode:
when operating with a VALE port, the user can specify in the slot
the actual destination port, overriding the forwarding decision
made by a lookup of the destination MAC. This can be useful to
implement packet dispatchers.
No API changes will be introduced.
No new functionality in this patch yet.
previous names, 'ptag' and 'pmap' -- p stands for packet.
This change reduces the difference between the code in stable/9
and head, and also helps using the same ixgbe_netmap.h on both branches.
Approved by: Jack Vogel
that revises the netmap memory allocator so that the
various parameters (number and size of buffers, rings, descriptors)
can be modified at runtime through sysctl variables.
The changes become effective when no netmap clients are active.
The API is mostly unchanged, although the NIOCUNREGIF ioctl now
does not bring the interface back to normal mode: and you
need to close the file descriptor for that.
This change was necessary to track who is using the mapped region,
and since it is a simplification of the API there was no
incentive in trying to preserve NIOCUNREGIF.
We will remove the ioctl from the kernel next time we need
a real API change (and version bump).
Among other things, buffer allocation when opening devices is
now much faster: it used to take O(N^2) time, now it is linear.
Submitted by: Giuseppe Lettieri
- Move destruction of per-ring locks to netmap_dtor_locked to mirror the
initialization that happens in NIOCREGIF. Otherwise unloading a netmap-
capable interface that was never put into netmap mode would try to
mtx_destroy an uninitialized mutex, and panic.
- Destroy core_lock in netmap_detach, mirroring init in netmap_attach.
- Also comment out the knlist_destroy for now as there is currently no
knlist_init.
Sponsored by: ADARA Networks
Reviewed by: luigi@
http://info.iet.unipi.it/~luigi/vale/
VALE lets you dynamically instantiate multiple software bridges
that talk the netmap API (and are *extremely* fast), so you can test
netmap applications without the need for high end hardware.
This is particularly useful as I am completing a netmap-aware
version of ipfw, and VALE provides an excellent testing platform.
Also, I also have netmap backends for qemu mostly ready for commit
to the port, and this too will let you interconnect virtual machines
at high speed without fiddling with bridges, tap or other slow solutions.
The API for applications is unchanged, so you can use the code
in tools/tools/netmap (which i will update soon) on the VALE ports.
This commit also syncs the code with the one in my internal repository,
so you will see some conditional code for other platforms.
The code should run mostly unmodified on stable/9 so people interested
in trying it can just copy sys/dev/netmap/ and sys/net/netmap*.h
from HEAD
VALE is joint work with my colleague Giuseppe Lettieri, and
is partly supported by the EU Projects CHANGE and OPENLAB
Contrarily to what i wrote in my previous commit, the 82599
does include the CRC in the length. The operating mode is
reset in ixgbe_init_locked() and so we need to hook into
the places where the two registers (HLREG0 and RDRXCTL) are
modified.
does not include the CRC irrespective of the setting
of CRCSTRIP. The 82599 data sheets (sec. 7.1.6) say differently.
Very strange. Need to check what happens on legacy descriptors,
but for the time being this restores functionality.
and make it easier to replace it with a different implementation.
On passing, also fix indentation.
NOTE: I know that #include "foo.c" is ugly, but the alternative
(add another entry to sys/conf/files, add a separate header with
structs and prototypes, and expose functions that are meant to
be private) looks even worse to me.
We need a more modular way to specify dependencies and build options.
- add a sysctl, dev.netmap.ix_crcstrip, to control whether ixgbe should
strip the CRC on received frames. Defaults to 0, which keeps the CRC.
and improves performance when receiving min-sized (64-byte) frames.
This matters because min-sized frames is one of the standard
benchmarks for switches and routers, some chipsets seem to issue
read-modify-write cycles for PCIe transactions that are not a
full cache line, and a min-sized frame triggers the bug, resulting
in reduced throughput -- 9.7 instead of 14.88 Mpps -- and heavy
bus load.
- for the time being, always look for incoming packets on a select/poll
even if there has not been an interrupt in the meantime. This is
only a temporary workaround for a probable race condition in keeping
track of rx interrupts.
Add a couple of diagnostic vars to help studying the problem.
USERSPACE:
1. add support for devices with different number of rx and tx queues;
2. add better support for zero-copy operation, adding an extra field
to the netmap ring to indicate how many buffers we have already processed
but not yet released (with help from Eddie Kohler);
3. The two changes above unfortunately require an API change, so while
at it add a version field and some spares to the ioctl() argument
to help detect mismatches.
4. update the manual page for the two changes above;
5. update sample applications in tools/tools/netmap
KERNEL:
1. simplify the internal structures moving the global wait queues
to the 'struct netmap_adapter';
2. simplify the functions that map kring<->nic ring indexes
3. normalize device-specific code, helps mainteinance;
4. start exploring the impact of micro-optimizations (prefetch etc.)
in the ixgbe driver.
Use 'legacy' descriptors on the tx ring and prefetch slots gives
about 20% speedup at 900 MHz. Another 7-10% would come from removing
the explict calls to bus_dmamap* in the core (they are effectively
NOPs in this case, but it takes expensive load of the per-buffer
dma maps to figure out that they are all NULL.
Rx performance not investigated.
I am postponing the MFC so i can import a few more improvements
before merging.
- remove the KEVENT code, which was incomplete and not compiled anyways;
- change some while() loops into for()
- adjust indentation
- remove extra whitespace
MFC after: 1 week
Introduce some functions to map NIC ring indexes into netmap ring
indexes and vice versa. This way we can implement the bound
checks only in one place (and hopefully in a correct way).
On passing, make the code and comments more uniform across the
various drivers.
txsync() and rxsync() callbacks, removing some variables made
useless by this change;
- add generic lock and irq handling routines. These can be useful
in case there are no driver locks that we can reuse;
- add a few macros to reduce differences with the Linux version.