Bring in support for netmap, a framework for very efficient packet

I/O from userspace, capable of line rate at 10G, see

	http://info.iet.unipi.it/~luigi/netmap/

At this time I am bringing in only the generic code (sys/dev/netmap/
plus two headers under sys/net/), and some sample applications in
tools/tools/netmap. There is also a manpage in share/man/man4 [1]

In order to make use of the framework you need to build a kernel
with "device netmap", and patch individual drivers with the code
that you can find in

	sys/dev/netmap/head.diff

The file will go away as the relevant pieces are committed to
the various device drivers, which should happen in a few days
after talking to the driver maintainers.

Netmap support is available at the moment for Intel 10G and 1G
cards (ixgbe, em/lem/igb), and for the Realtek 1G card ("re").
I have partial patches for "bge" and am starting to work on "cxgbe".
Hopefully changes are trivial enough so interested third parties
can submit their patches. Interested people can contact me
for advice on how to add netmap support to specific devices.

CREDITS:
    Netmap has been developed by Luigi Rizzo and other collaborators
    at the Universita` di Pisa, and supported by EU project CHANGE
    (http://www.change-project.eu/)
    The code is distributed under a BSD Copyright.

[1] In my opinion is a bad idea to have all manpage in one directory.
  We should place kernel documentation in the same dir that contains
  the code, which would make it much simpler to keep doc and code
  in sync, reduce the clutter in share/man/ and incidentally is
  the policy used for all of userspace code.
  Makefiles and doc tools can be trivially adjusted to find the
  manpages in the relevant subdirs.
This commit is contained in:
Luigi Rizzo 2011-11-17 12:17:39 +00:00
parent a93c40bb62
commit 68b8534bdf
Notes: svn2git 2020-12-20 02:59:44 +00:00
svn path=/head/; revision=227614
19 changed files with 7507 additions and 0 deletions

View File

@ -253,6 +253,7 @@ MAN= aac.4 \
net80211.4 \
netgraph.4 \
netintro.4 \
netmap.4 \
${_nfe.4} \
${_nfsmb.4} \
ng_async.4 \

300
share/man/man4/netmap.4 Normal file
View File

@ -0,0 +1,300 @@
.\" Copyright (c) 2011 Matteo Landi, Luigi Rizzo, Universita` di Pisa
.\" All rights reserved.
.\"
.\" Redistribution and use in source and binary forms, with or without
.\" modification, are permitted provided that the following conditions
.\" are met:
.\" 1. Redistributions of source code must retain the above copyright
.\" notice, this list of conditions and the following disclaimer.
.\" 2. Redistributions in binary form must reproduce the above copyright
.\" notice, this list of conditions and the following disclaimer in the
.\" documentation and/or other materials provided with the distribution.
.\"
.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
.\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
.\" SUCH DAMAGE.
.\"
.\" This document is derived in part from the enet man page (enet.4)
.\" distributed with 4.3BSD Unix.
.\"
.\" $FreeBSD$
.\" $Id: netmap.4 9662 2011-11-16 13:18:06Z luigi $: stable/8/share/man/man4/bpf.4 181694 2008-08-13 17:45:06Z ed $
.\"
.Dd November 16, 2011
.Dt NETMAP 4
.Os
.Sh NAME
.Nm netmap
.Nd a framework for fast packet I/O
.Sh SYNOPSIS
.Cd device netmap
.Sh DESCRIPTION
.Nm
is a framework for fast and safe access to network devices
(reaching 14.88 Mpps at less than 1 GHz).
.Nm
uses memory mapped buffers and metadata
(buffer indexes and lengths) to communicate with the kernel,
which is in charge of validating information through
.Pa ioctl()
and
.Pa select()/poll().
.Nm
can exploit the parallelism in multiqueue devices and
multicore systems.
.Pp
.Pp
.Nm
requires explicit support in device drivers.
For a list of supported devices, see the end of this manual page.
.Sh OPERATION
.Nm
clients must first open the
.Pa open("/dev/netmap") ,
and then issue an
.Pa ioctl(...,NIOCREGIF,...)
to bind the file descriptor to a network device.
.Pp
When a device is put in
.Nm
mode, its data path is disconnected from the host stack.
The processes owning the file descriptor
can exchange packets with the device, or with the host stack,
through an mmapped memory region that contains pre-allocated
buffers and metadata.
.Pp
Non blocking I/O is done with special
.Pa ioctl()'s ,
whereas the file descriptor can be passed to
.Pa select()/poll()
to be notified about incoming packet or available transmit buffers.
.Ss Data structures
All data structures for all devices in
.Nm
mode are in a memory
region shared by the kernel and all processes
who open
.Pa /dev/netmap
(NOTE: visibility may be restricted in future implementations).
All references between the shared data structure
are relative (offsets or indexes). Some macros help converting
them into actual pointers.
.Pp
The data structures in shared memory are the following:
.Pp
.Bl -tag -width XXX
.It Dv struct netmap_if (one per interface)
indicates the number of rings supported by an interface, their
sizes, and the offsets of the
.Pa netmap_rings
associated to the interface.
The offset of a
.Pa struct netmap_if
in the shared memory region is indicated by the
.Pa nr_offset
field in the structure returned by the
.Pa NIOCREGIF
(see below).
.Bd -literal
struct netmap_if {
char ni_name[IFNAMSIZ]; /* name of the interface. */
const u_int ni_num_queues; /* number of hw ring pairs */
const ssize_t ring_ofs[]; /* offset of tx and rx rings */
};
.Ed
.It Dv struct netmap_ring (one per ring)
contains the index of the current read or write slot (cur),
the number of slots available for reception or transmission (avail),
and an array of
.Pa slots
describing the buffers.
There is one ring pair for each of the N hardware ring pairs
supported by the card (numbered 0..N-1), plus
one ring pair (numbered N) for packets from/to the host stack.
.Bd -literal
struct netmap_ring {
const ssize_t buf_ofs;
const uint32_t num_slots; /* number of slots in the ring. */
uint32_t avail; /* number of usable slots */
uint32_t cur; /* 'current' index for the user side */
const uint16_t nr_buf_size;
uint16_t flags;
struct netmap_slot slot[0]; /* array of slots. */
}
.Ed
.It Dv struct netmap_slot (one per packet)
contains the metadata for a packet: a buffer index (buf_idx),
a buffer length (len), and some flags.
.Bd -literal
struct netmap_slot {
uint32_t buf_idx; /* buffer index */
uint16_t len; /* packet length */
uint16_t flags; /* buf changed, etc. */
#define NS_BUF_CHANGED 0x0001 /* must resync, buffer changed */
#define NS_REPORT 0x0002 /* tell hw to report results
* e.g. by generating an interrupt
*/
};
.Ed
.It Dv packet buffers
are fixed size (approximately 2k) buffers allocated by the kernel
that contain packet data. Buffers addresses are computed through
macros.
.El
.Pp
Some macros support the access to objects in the shared memory
region. In particular:
.Bd -literal
struct netmap_if *nifp;
...
struct netmap_ring *txring = NETMAP_TXRING(nifp, i);
struct netmap_ring *rxring = NETMAP_RXRING(nifp, i);
int i = txring->slot[txring->cur].buf_idx;
char *buf = NETMAP_BUF(txring, i);
.Ed
.Ss IOCTLS
.Pp
.Nm
supports some ioctl() to synchronize the state of the rings
between the kernel and the user processes, plus some
to query and configure the interface.
The former do not require any argument, whereas the latter
use a
.Pa struct netmap_req
defined as follows:
.Bd -literal
struct nmreq {
char nr_name[IFNAMSIZ];
uint32_t nr_offset; /* nifp offset in the shared region */
uint32_t nr_memsize; /* size of the shared region */
uint32_t nr_numdescs; /* descriptors per queue */
uint16_t nr_numqueues;
uint16_t nr_ringid; /* ring(s) we care about */
#define NETMAP_HW_RING 0x4000 /* low bits indicate one hw ring */
#define NETMAP_SW_RING 0x2000 /* we process the sw ring */
#define NETMAP_NO_TX_POLL 0x1000 /* no gratuitous txsync on poll */
#define NETMAP_RING_MASK 0xfff /* the actual ring number */
};
.Ed
A device descriptor obtained through
.Pa /dev/netmap
also supports the ioctl supported by network devices.
.Pp
The netmap-specific
.Xr ioctl 2
command codes below are defined in
.In net/netmap.h
and are:
.Bl -tag -width XXXX
.It Dv NIOCGINFO
returns information about the interface named in nr_name.
On return, nr_memsize indicates the size of the shared netmap
memory region (this is device-independent),
nr_numslots indicates how many buffers are in a ring,
nr_numrings indicates the number of rings supported by the hardware.
.Pp
If the device does not support netmap, the ioctl returns EINVAL.
.It Dv NIOCREGIF
puts the interface named in nr_name into netmap mode, disconnecting
it from the host stack, and/or defines which rings are controlled
through this file descriptor.
On return, it gives the same info as NIOCGINFO, and nr_ringid
indicates the identity of the rings controlled through the file
descriptor.
.Pp
Possible values for nr_ringid are
.Bl -tag -width XXXXX
.It 0
default, all hardware rings
.It NETMAP_SW_RING
the ``host rings'' connecting to the host stack
.It NETMAP_HW_RING + i
the i-th hardware ring
.El
By default, a
.Nm poll
or
.Nm select
call pushes out any pending packets on the transmit ring, even if
no write events are specified.
The feature can be disabled by or-ing
.Nm NETMAP_NO_TX_SYNC
to nr_ringid.
But normally you should keep this feature unless you are using
separate file descriptors for the send and receive rings, because
otherwise packets are pushed out only if NETMAP_TXSYNC is called,
or the send queue is full.
.Pp
.Pa NIOCREGIF
can be used multiple times to change the association of a
file descriptor to a ring pair, always within the same device.
.It Dv NIOCUNREGIF
brings an interface back to normal mode.
.It Dv NIOCTXSYNC
tells the hardware of new packets to transmit, and updates the
number of slots available for transmission.
.It Dv NIOCRXSYNC
tells the hardware of consumed packets, and asks for newly available
packets.
.El
.Ss SYSTEM CALLS
.Nm
uses
.Nm select
and
.Nm poll
to wake up processes when significant events occur.
.Sh EXAMPLES
The following code implements a traffic generator
.Pp
.Bd -literal -compact
#include <net/netmap.h>
#include <net/netmap_user.h>
struct netmap_if *nifp;
struct netmap_ring *ring;
struct netmap_request nmr;
fd = open("/dev/netmap", O_RDWR);
bzero(&nmr, sizeof(nmr));
strcpy(nmr.nm_name, "ix0");
ioctl(fd, NIOCREG, &nmr);
p = mmap(0, nmr.memsize, fd);
nifp = NETMAP_IF(p, nmr.offset);
ring = NETMAP_TXRING(nifp, 0);
fds.fd = fd;
fds.events = POLLOUT;
for (;;) {
poll(list, 1, -1);
while (ring->avail-- > 0) {
i = ring->cur;
buf = NETMAP_BUF(ring, ring->slot[i].buf_index);
... prepare packet in buf ...
ring->slot[i].len = ... packet length ...
ring->cur = NETMAP_RING_NEXT(ring, i);
}
}
.Ed
.Sh SUPPORTED INTERFACES
.Nm
supports the following interfaces:
.Xr em 4 ,
.Xr ixgbe 4 ,
.Xr re 4 ,
.Sh AUTHORS
The
.Nm
framework has been designed and implemented by
.An Luigi Rizzo
and
.An Matteo Landi
in 2011 at the Universita` di Pisa.

654
sys/dev/netmap/head.diff Normal file
View File

@ -0,0 +1,654 @@
Index: conf/NOTES
===================================================================
--- conf/NOTES (revision 227552)
+++ conf/NOTES (working copy)
@@ -799,6 +799,12 @@
# option. DHCP requires bpf.
device bpf
+# The `netmap' device implements memory-mapped access to network
+# devices from userspace, enabling wire-speed packet capture and
+# generation even at 10Gbit/s. Requires support in the device
+# driver. Supported drivers are ixgbe, e1000, re.
+device netmap
+
# The `disc' device implements a minimal network interface,
# which throws away all packets sent and never receives any. It is
# included for testing and benchmarking purposes.
Index: conf/files
===================================================================
--- conf/files (revision 227552)
+++ conf/files (working copy)
@@ -1507,6 +1507,7 @@
dev/my/if_my.c optional my
dev/ncv/ncr53c500.c optional ncv
dev/ncv/ncr53c500_pccard.c optional ncv pccard
+dev/netmap/netmap.c optional netmap
dev/nge/if_nge.c optional nge
dev/nxge/if_nxge.c optional nxge
dev/nxge/xgehal/xgehal-device.c optional nxge
Index: conf/options
===================================================================
--- conf/options (revision 227552)
+++ conf/options (working copy)
@@ -689,6 +689,7 @@
# various 'device presence' options.
DEV_BPF opt_bpf.h
+DEV_NETMAP opt_global.h
DEV_MCA opt_mca.h
DEV_CARP opt_carp.h
DEV_SPLASH opt_splash.h
Index: dev/e1000/if_igb.c
===================================================================
--- dev/e1000/if_igb.c (revision 227552)
+++ dev/e1000/if_igb.c (working copy)
@@ -369,6 +369,9 @@
&igb_rx_process_limit, 0,
"Maximum number of received packets to process at a time, -1 means unlimited");
+#ifdef DEV_NETMAP
+#include <dev/netmap/if_igb_netmap.h>
+#endif /* DEV_NETMAP */
/*********************************************************************
* Device identification routine
*
@@ -664,6 +667,9 @@
adapter->led_dev = led_create(igb_led_func, adapter,
device_get_nameunit(dev));
+#ifdef DEV_NETMAP
+ igb_netmap_attach(adapter);
+#endif /* DEV_NETMAP */
INIT_DEBUGOUT("igb_attach: end");
return (0);
@@ -742,6 +748,9 @@
callout_drain(&adapter->timer);
+#ifdef DEV_NETMAP
+ netmap_detach(adapter->ifp);
+#endif /* DEV_NETMAP */
igb_free_pci_resources(adapter);
bus_generic_detach(dev);
if_free(ifp);
@@ -3212,6 +3221,10 @@
struct adapter *adapter = txr->adapter;
struct igb_tx_buffer *txbuf;
int i;
+#ifdef DEV_NETMAP
+ struct netmap_slot *slot = netmap_reset(NA(adapter->ifp),
+ NR_TX, txr->me, 0);
+#endif
/* Clear the old descriptor contents */
IGB_TX_LOCK(txr);
@@ -3231,6 +3244,13 @@
m_freem(txbuf->m_head);
txbuf->m_head = NULL;
}
+#ifdef DEV_NETMAP
+ if (slot) {
+ netmap_load_map(txr->txtag, txbuf->map,
+ NMB(slot), adapter->rx_mbuf_sz);
+ slot++;
+ }
+#endif /* DEV_NETMAP */
/* clear the watch index */
txbuf->next_eop = -1;
}
@@ -3626,6 +3646,19 @@
IGB_TX_LOCK_ASSERT(txr);
+#ifdef DEV_NETMAP
+ if (ifp->if_capenable & IFCAP_NETMAP) {
+ struct netmap_adapter *na = NA(ifp);
+
+ selwakeuppri(&na->tx_rings[txr->me].si, PI_NET);
+ IGB_TX_UNLOCK(txr);
+ IGB_CORE_LOCK(adapter);
+ selwakeuppri(&na->tx_rings[na->num_queues + 1].si, PI_NET);
+ IGB_CORE_UNLOCK(adapter);
+ IGB_TX_LOCK(txr); // the caller is supposed to own the lock
+ return FALSE;
+ }
+#endif /* DEV_NETMAP */
if (txr->tx_avail == adapter->num_tx_desc) {
txr->queue_status = IGB_QUEUE_IDLE;
return FALSE;
@@ -3949,6 +3982,10 @@
bus_dma_segment_t pseg[1], hseg[1];
struct lro_ctrl *lro = &rxr->lro;
int rsize, nsegs, error = 0;
+#ifdef DEV_NETMAP
+ struct netmap_slot *slot = netmap_reset(NA(rxr->adapter->ifp),
+ NR_RX, rxr->me, 0);
+#endif
adapter = rxr->adapter;
dev = adapter->dev;
@@ -3974,6 +4011,18 @@
struct mbuf *mh, *mp;
rxbuf = &rxr->rx_buffers[j];
+#ifdef DEV_NETMAP
+ if (slot) {
+ netmap_load_map(rxr->ptag,
+ rxbuf->pmap, NMB(slot),
+ adapter->rx_mbuf_sz);
+ /* Update descriptor */
+ rxr->rx_base[j].read.pkt_addr =
+ htole64(vtophys(NMB(slot)));
+ slot++;
+ continue;
+ }
+#endif /* DEV_NETMAP */
if (rxr->hdr_split == FALSE)
goto skip_head;
@@ -4436,6 +4485,19 @@
bus_dmamap_sync(rxr->rxdma.dma_tag, rxr->rxdma.dma_map,
BUS_DMASYNC_POSTREAD | BUS_DMASYNC_POSTWRITE);
+#ifdef DEV_NETMAP
+ if (ifp->if_capenable & IFCAP_NETMAP) {
+ struct netmap_adapter *na = NA(ifp);
+
+ selwakeuppri(&na->rx_rings[rxr->me].si, PI_NET);
+ IGB_RX_UNLOCK(rxr);
+ IGB_CORE_LOCK(adapter);
+ selwakeuppri(&na->rx_rings[na->num_queues + 1].si, PI_NET);
+ IGB_CORE_UNLOCK(adapter);
+ return (0);
+ }
+#endif /* DEV_NETMAP */
+
/* Main clean loop */
for (i = rxr->next_to_check; count != 0;) {
struct mbuf *sendmp, *mh, *mp;
Index: dev/e1000/if_lem.c
===================================================================
--- dev/e1000/if_lem.c (revision 227552)
+++ dev/e1000/if_lem.c (working copy)
@@ -316,6 +316,10 @@
/* Global used in WOL setup with multiport cards */
static int global_quad_port_a = 0;
+#ifdef DEV_NETMAP
+#include <dev/netmap/if_lem_netmap.h>
+#endif /* DEV_NETMAP */
+
/*********************************************************************
* Device identification routine
*
@@ -646,6 +650,9 @@
adapter->led_dev = led_create(lem_led_func, adapter,
device_get_nameunit(dev));
+#ifdef DEV_NETMAP
+ lem_netmap_attach(adapter);
+#endif /* DEV_NETMAP */
INIT_DEBUGOUT("lem_attach: end");
return (0);
@@ -724,6 +731,9 @@
callout_drain(&adapter->timer);
callout_drain(&adapter->tx_fifo_timer);
+#ifdef DEV_NETMAP
+ netmap_detach(ifp);
+#endif /* DEV_NETMAP */
lem_free_pci_resources(adapter);
bus_generic_detach(dev);
if_free(ifp);
@@ -2637,6 +2647,9 @@
lem_setup_transmit_structures(struct adapter *adapter)
{
struct em_buffer *tx_buffer;
+#ifdef DEV_NETMAP
+ struct netmap_slot *slot = netmap_reset(NA(adapter->ifp), NR_TX, 0, 0);
+#endif
/* Clear the old ring contents */
bzero(adapter->tx_desc_base,
@@ -2650,6 +2663,15 @@
bus_dmamap_unload(adapter->txtag, tx_buffer->map);
m_freem(tx_buffer->m_head);
tx_buffer->m_head = NULL;
+#ifdef DEV_NETMAP
+ if (slot) {
+ /* reload the map for netmap mode */
+ netmap_load_map(adapter->txtag,
+ tx_buffer->map, NMB(slot),
+ NA(adapter->ifp)->buff_size);
+ slot++;
+ }
+#endif /* DEV_NETMAP */
tx_buffer->next_eop = -1;
}
@@ -2951,6 +2973,12 @@
EM_TX_LOCK_ASSERT(adapter);
+#ifdef DEV_NETMAP
+ if (ifp->if_capenable & IFCAP_NETMAP) {
+ selwakeuppri(&NA(ifp)->tx_rings[0].si, PI_NET);
+ return;
+ }
+#endif /* DEV_NETMAP */
if (adapter->num_tx_desc_avail == adapter->num_tx_desc)
return;
@@ -3181,6 +3209,9 @@
{
struct em_buffer *rx_buffer;
int i, error;
+#ifdef DEV_NETMAP
+ struct netmap_slot *slot = netmap_reset(NA(adapter->ifp), NR_RX, 0, 0);
+#endif
/* Reset descriptor ring */
bzero(adapter->rx_desc_base,
@@ -3200,6 +3231,18 @@
/* Allocate new ones. */
for (i = 0; i < adapter->num_rx_desc; i++) {
+#ifdef DEV_NETMAP
+ if (slot) {
+ netmap_load_map(adapter->rxtag,
+ rx_buffer->map, NMB(slot),
+ NA(adapter->ifp)->buff_size);
+ /* Update descriptor */
+ adapter->rx_desc_base[i].buffer_addr =
+ htole64(vtophys(NMB(slot)));
+ slot++;
+ continue;
+ }
+#endif /* DEV_NETMAP */
error = lem_get_buf(adapter, i);
if (error)
return (error);
@@ -3407,6 +3450,14 @@
bus_dmamap_sync(adapter->rxdma.dma_tag, adapter->rxdma.dma_map,
BUS_DMASYNC_POSTREAD);
+#ifdef DEV_NETMAP
+ if (ifp->if_capenable & IFCAP_NETMAP) {
+ selwakeuppri(&NA(ifp)->rx_rings[0].si, PI_NET);
+ EM_RX_UNLOCK(adapter);
+ return (0);
+ }
+#endif /* DEV_NETMAP */
+
if (!((current_desc->status) & E1000_RXD_STAT_DD)) {
if (done != NULL)
*done = rx_sent;
Index: dev/e1000/if_em.c
===================================================================
--- dev/e1000/if_em.c (revision 227552)
+++ dev/e1000/if_em.c (working copy)
@@ -399,6 +399,10 @@
/* Global used in WOL setup with multiport cards */
static int global_quad_port_a = 0;
+#ifdef DEV_NETMAP
+#include <dev/netmap/if_em_netmap.h>
+#endif /* DEV_NETMAP */
+
/*********************************************************************
* Device identification routine
*
@@ -714,6 +718,9 @@
adapter->led_dev = led_create(em_led_func, adapter,
device_get_nameunit(dev));
+#ifdef DEV_NETMAP
+ em_netmap_attach(adapter);
+#endif /* DEV_NETMAP */
INIT_DEBUGOUT("em_attach: end");
@@ -785,6 +792,10 @@
ether_ifdetach(adapter->ifp);
callout_drain(&adapter->timer);
+#ifdef DEV_NETMAP
+ netmap_detach(ifp);
+#endif /* DEV_NETMAP */
+
em_free_pci_resources(adapter);
bus_generic_detach(dev);
if_free(ifp);
@@ -3213,6 +3224,10 @@
struct adapter *adapter = txr->adapter;
struct em_buffer *txbuf;
int i;
+#ifdef DEV_NETMAP
+ struct netmap_slot *slot = netmap_reset(NA(adapter->ifp),
+ NR_TX, txr->me, 0);
+#endif
/* Clear the old descriptor contents */
EM_TX_LOCK(txr);
@@ -3232,6 +3247,16 @@
m_freem(txbuf->m_head);
txbuf->m_head = NULL;
}
+#ifdef DEV_NETMAP
+ if (slot) {
+ /* reload the map for netmap mode */
+ netmap_load_map(txr->txtag,
+ txbuf->map, NMB(slot),
+ adapter->rx_mbuf_sz);
+ slot++;
+ }
+#endif /* DEV_NETMAP */
+
/* clear the watch index */
txbuf->next_eop = -1;
}
@@ -3682,6 +3707,12 @@
struct ifnet *ifp = adapter->ifp;
EM_TX_LOCK_ASSERT(txr);
+#ifdef DEV_NETMAP
+ if (ifp->if_capenable & IFCAP_NETMAP) {
+ selwakeuppri(&NA(ifp)->tx_rings[txr->me].si, PI_NET);
+ return (FALSE);
+ }
+#endif /* DEV_NETMAP */
/* No work, make sure watchdog is off */
if (txr->tx_avail == adapter->num_tx_desc) {
@@ -3978,6 +4009,33 @@
if (++j == adapter->num_rx_desc)
j = 0;
}
+#ifdef DEV_NETMAP
+ {
+ /* slot is NULL if we are not in netmap mode */
+ struct netmap_slot *slot = netmap_reset(NA(adapter->ifp),
+ NR_RX, rxr->me, rxr->next_to_check);
+ /*
+ * we need to restore all buffer addresses in the ring as they might
+ * be in the wrong state if we are exiting from netmap mode.
+ */
+ for (j = 0; j != adapter->num_rx_desc; ++j) {
+ void *addr;
+ rxbuf = &rxr->rx_buffers[j];
+ if (rxbuf->m_head == NULL && !slot)
+ continue;
+ addr = slot ? NMB(slot) : rxbuf->m_head->m_data;
+ // XXX load or reload ?
+ netmap_load_map(rxr->rxtag, rxbuf->map, addr, adapter->rx_mbuf_sz);
+ /* Update descriptor */
+ rxr->rx_base[j].buffer_addr = htole64(vtophys(addr));
+ bus_dmamap_sync(rxr->rxtag, rxbuf->map, BUS_DMASYNC_PREREAD);
+ if (slot)
+ slot++;
+ }
+ /* Setup our descriptor indices */
+ NA(adapter->ifp)->rx_rings[rxr->me].nr_hwcur = rxr->next_to_check;
+ }
+#endif /* DEV_NETMAP */
fail:
rxr->next_to_refresh = i;
@@ -4247,6 +4305,14 @@
EM_RX_LOCK(rxr);
+#ifdef DEV_NETMAP
+ if (ifp->if_capenable & IFCAP_NETMAP) {
+ selwakeuppri(&NA(ifp)->rx_rings[rxr->me].si, PI_NET);
+ EM_RX_UNLOCK(rxr);
+ return (0);
+ }
+#endif /* DEV_NETMAP */
+
for (i = rxr->next_to_check, processed = 0; count != 0;) {
if ((ifp->if_drv_flags & IFF_DRV_RUNNING) == 0)
Index: dev/re/if_re.c
===================================================================
--- dev/re/if_re.c (revision 227552)
+++ dev/re/if_re.c (working copy)
@@ -291,6 +291,10 @@
static void re_setwol (struct rl_softc *);
static void re_clrwol (struct rl_softc *);
+#ifdef DEV_NETMAP
+#include <dev/netmap/if_re_netmap.h>
+#endif /* !DEV_NETMAP */
+
#ifdef RE_DIAG
static int re_diag (struct rl_softc *);
#endif
@@ -1583,6 +1587,9 @@
*/
ifp->if_data.ifi_hdrlen = sizeof(struct ether_vlan_header);
+#ifdef DEV_NETMAP
+ re_netmap_attach(sc);
+#endif /* DEV_NETMAP */
#ifdef RE_DIAG
/*
* Perform hardware diagnostic on the original RTL8169.
@@ -1778,6 +1785,9 @@
bus_dma_tag_destroy(sc->rl_ldata.rl_stag);
}
+#ifdef DEV_NETMAP
+ netmap_detach(ifp);
+#endif /* DEV_NETMAP */
if (sc->rl_parent_tag)
bus_dma_tag_destroy(sc->rl_parent_tag);
@@ -1952,6 +1962,9 @@
sc->rl_ldata.rl_tx_desc_cnt * sizeof(struct rl_desc));
for (i = 0; i < sc->rl_ldata.rl_tx_desc_cnt; i++)
sc->rl_ldata.rl_tx_desc[i].tx_m = NULL;
+#ifdef DEV_NETMAP
+ re_netmap_tx_init(sc);
+#endif /* DEV_NETMAP */
/* Set EOR. */
desc = &sc->rl_ldata.rl_tx_list[sc->rl_ldata.rl_tx_desc_cnt - 1];
desc->rl_cmdstat |= htole32(RL_TDESC_CMD_EOR);
@@ -1979,6 +1992,9 @@
if ((error = re_newbuf(sc, i)) != 0)
return (error);
}
+#ifdef DEV_NETMAP
+ re_netmap_rx_init(sc);
+#endif /* DEV_NETMAP */
/* Flush the RX descriptors */
@@ -2035,6 +2051,12 @@
RL_LOCK_ASSERT(sc);
ifp = sc->rl_ifp;
+#ifdef DEV_NETMAP
+ if (ifp->if_capenable & IFCAP_NETMAP) {
+ selwakeuppri(&NA(ifp)->rx_rings->si, PI_NET);
+ return 0;
+ }
+#endif /* DEV_NETMAP */
if (ifp->if_mtu > RL_MTU && (sc->rl_flags & RL_FLAG_JUMBOV2) != 0)
jumbo = 1;
else
@@ -2276,6 +2298,12 @@
return;
ifp = sc->rl_ifp;
+#ifdef DEV_NETMAP
+ if (ifp->if_capenable & IFCAP_NETMAP) {
+ selwakeuppri(&NA(ifp)->tx_rings[0].si, PI_NET);
+ return;
+ }
+#endif /* DEV_NETMAP */
/* Invalidate the TX descriptor list */
bus_dmamap_sync(sc->rl_ldata.rl_tx_list_tag,
sc->rl_ldata.rl_tx_list_map,
@@ -2794,6 +2822,20 @@
sc = ifp->if_softc;
+#ifdef DEV_NETMAP
+ if (ifp->if_capenable & IFCAP_NETMAP) {
+ struct netmap_kring *kring = &NA(ifp)->tx_rings[0];
+ if (sc->rl_ldata.rl_tx_prodidx != kring->nr_hwcur) {
+ /* kick the tx unit */
+ CSR_WRITE_1(sc, sc->rl_txstart, RL_TXSTART_START);
+#ifdef RE_TX_MODERATION
+ CSR_WRITE_4(sc, RL_TIMERCNT, 1);
+#endif
+ sc->rl_watchdog_timer = 5;
+ }
+ return;
+ }
+#endif /* DEV_NETMAP */
if ((ifp->if_drv_flags & (IFF_DRV_RUNNING | IFF_DRV_OACTIVE)) !=
IFF_DRV_RUNNING || (sc->rl_flags & RL_FLAG_LINK) == 0)
return;
Index: dev/ixgbe/ixgbe.c
===================================================================
--- dev/ixgbe/ixgbe.c (revision 227552)
+++ dev/ixgbe/ixgbe.c (working copy)
@@ -313,6 +313,10 @@
static int fdir_pballoc = 1;
#endif
+#ifdef DEV_NETMAP
+#include <dev/netmap/ixgbe_netmap.h>
+#endif /* DEV_NETMAP */
+
/*********************************************************************
* Device identification routine
*
@@ -578,6 +582,9 @@
ixgbe_add_hw_stats(adapter);
+#ifdef DEV_NETMAP
+ ixgbe_netmap_attach(adapter);
+#endif /* DEV_NETMAP */
INIT_DEBUGOUT("ixgbe_attach: end");
return (0);
err_late:
@@ -652,6 +659,9 @@
ether_ifdetach(adapter->ifp);
callout_drain(&adapter->timer);
+#ifdef DEV_NETMAP
+ netmap_detach(adapter->ifp);
+#endif /* DEV_NETMAP */
ixgbe_free_pci_resources(adapter);
bus_generic_detach(dev);
if_free(adapter->ifp);
@@ -1719,6 +1729,7 @@
if (++i == adapter->num_tx_desc)
i = 0;
+ // XXX should we sync each buffer ?
txbuf->m_head = NULL;
txbuf->eop_index = -1;
}
@@ -2813,6 +2824,10 @@
struct adapter *adapter = txr->adapter;
struct ixgbe_tx_buf *txbuf;
int i;
+#ifdef DEV_NETMAP
+ struct netmap_slot *slot = netmap_reset(NA(adapter->ifp),
+ NR_TX, txr->me, 0);
+#endif
/* Clear the old ring contents */
IXGBE_TX_LOCK(txr);
@@ -2832,6 +2847,13 @@
m_freem(txbuf->m_head);
txbuf->m_head = NULL;
}
+#ifdef DEV_NETMAP
+ if (slot) {
+ netmap_load_map(txr->txtag, txbuf->map,
+ NMB(slot), adapter->rx_mbuf_sz);
+ slot++;
+ }
+#endif /* DEV_NETMAP */
/* Clear the EOP index */
txbuf->eop_index = -1;
}
@@ -3310,6 +3332,20 @@
mtx_assert(&txr->tx_mtx, MA_OWNED);
+#ifdef DEV_NETMAP
+ if (ifp->if_capenable & IFCAP_NETMAP) {
+ struct netmap_adapter *na = NA(ifp);
+
+ selwakeuppri(&na->tx_rings[txr->me].si, PI_NET);
+ IXGBE_TX_UNLOCK(txr);
+ IXGBE_CORE_LOCK(adapter);
+ selwakeuppri(&na->tx_rings[na->num_queues + 1].si, PI_NET);
+ IXGBE_CORE_UNLOCK(adapter);
+ IXGBE_TX_LOCK(txr); // the caller is supposed to own the lock
+ return (FALSE);
+ }
+#endif /* DEV_NETMAP */
+
if (txr->tx_avail == adapter->num_tx_desc) {
txr->queue_status = IXGBE_QUEUE_IDLE;
return FALSE;
@@ -3698,6 +3734,10 @@
bus_dma_segment_t pseg[1], hseg[1];
struct lro_ctrl *lro = &rxr->lro;
int rsize, nsegs, error = 0;
+#ifdef DEV_NETMAP
+ struct netmap_slot *slot = netmap_reset(NA(rxr->adapter->ifp),
+ NR_RX, rxr->me, 0);
+#endif /* DEV_NETMAP */
adapter = rxr->adapter;
ifp = adapter->ifp;
@@ -3721,6 +3761,18 @@
struct mbuf *mh, *mp;
rxbuf = &rxr->rx_buffers[j];
+#ifdef DEV_NETMAP
+ if (slot) {
+ netmap_load_map(rxr->ptag,
+ rxbuf->pmap, NMB(slot),
+ adapter->rx_mbuf_sz);
+ /* Update descriptor */
+ rxr->rx_base[j].read.pkt_addr =
+ htole64(vtophys(NMB(slot)));
+ slot++;
+ continue;
+ }
+#endif /* DEV_NETMAP */
/*
** Don't allocate mbufs if not
** doing header split, its wasteful
@@ -4148,6 +4200,18 @@
IXGBE_RX_LOCK(rxr);
+#ifdef DEV_NETMAP
+ if (ifp->if_capenable & IFCAP_NETMAP) {
+ struct netmap_adapter *na = NA(ifp);
+
+ selwakeuppri(&na->rx_rings[rxr->me].si, PI_NET);
+ IXGBE_RX_UNLOCK(rxr);
+ IXGBE_CORE_LOCK(adapter);
+ selwakeuppri(&na->rx_rings[na->num_queues + 1].si, PI_NET);
+ IXGBE_CORE_UNLOCK(adapter);
+ return (0);
+ }
+#endif /* DEV_NETMAP */
for (i = rxr->next_to_check; count != 0;) {
struct mbuf *sendmp, *mh, *mp;
u32 rsc, ptype;

View File

@ -0,0 +1,383 @@
/*
* Copyright (C) 2011 Matteo Landi, Luigi Rizzo. All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
* 1. Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
* 2. Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in the
* documentation and/or other materials provided with the distribution.
*
* THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
* ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
* IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
* ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
* FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
* DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
* OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
* HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
* OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
* SUCH DAMAGE.
*/
/*
* $FreeBSD$
* $Id: if_em_netmap.h 9662 2011-11-16 13:18:06Z luigi $
*
* netmap changes for if_em.
*/
#include <net/netmap.h>
#include <sys/selinfo.h>
#include <vm/vm.h>
#include <vm/pmap.h> /* vtophys ? */
#include <dev/netmap/netmap_kern.h>
static void em_netmap_block_tasks(struct adapter *);
static void em_netmap_unblock_tasks(struct adapter *);
static int em_netmap_reg(struct ifnet *, int onoff);
static int em_netmap_txsync(void *, u_int, int);
static int em_netmap_rxsync(void *, u_int, int);
static void em_netmap_lock_wrapper(void *, int, u_int);
static void
em_netmap_attach(struct adapter *adapter)
{
struct netmap_adapter na;
bzero(&na, sizeof(na));
na.ifp = adapter->ifp;
na.separate_locks = 1;
na.num_tx_desc = adapter->num_tx_desc;
na.num_rx_desc = adapter->num_rx_desc;
na.nm_txsync = em_netmap_txsync;
na.nm_rxsync = em_netmap_rxsync;
na.nm_lock = em_netmap_lock_wrapper;
na.nm_register = em_netmap_reg;
/*
* adapter->rx_mbuf_sz is set by SIOCSETMTU, but in netmap mode
* we allocate the buffers on the first register. So we must
* disallow a SIOCSETMTU when if_capenable & IFCAP_NETMAP is set.
*/
na.buff_size = MCLBYTES;
netmap_attach(&na, adapter->num_queues);
}
/*
* wrapper to export locks to the generic code
*/
static void
em_netmap_lock_wrapper(void *_a, int what, u_int queueid)
{
struct adapter *adapter = _a;
ASSERT(queueid < adapter->num_queues);
switch (what) {
case NETMAP_CORE_LOCK:
EM_CORE_LOCK(adapter);
break;
case NETMAP_CORE_UNLOCK:
EM_CORE_UNLOCK(adapter);
break;
case NETMAP_TX_LOCK:
EM_TX_LOCK(&adapter->tx_rings[queueid]);
break;
case NETMAP_TX_UNLOCK:
EM_TX_UNLOCK(&adapter->tx_rings[queueid]);
break;
case NETMAP_RX_LOCK:
EM_RX_LOCK(&adapter->rx_rings[queueid]);
break;
case NETMAP_RX_UNLOCK:
EM_RX_UNLOCK(&adapter->rx_rings[queueid]);
break;
}
}
static void
em_netmap_block_tasks(struct adapter *adapter)
{
if (adapter->msix > 1) { /* MSIX */
int i;
struct tx_ring *txr = adapter->tx_rings;
struct rx_ring *rxr = adapter->rx_rings;
for (i = 0; i < adapter->num_queues; i++, txr++, rxr++) {
taskqueue_block(txr->tq);
taskqueue_drain(txr->tq, &txr->tx_task);
taskqueue_block(rxr->tq);
taskqueue_drain(rxr->tq, &rxr->rx_task);
}
} else { /* legacy */
taskqueue_block(adapter->tq);
taskqueue_drain(adapter->tq, &adapter->link_task);
taskqueue_drain(adapter->tq, &adapter->que_task);
}
}
static void
em_netmap_unblock_tasks(struct adapter *adapter)
{
if (adapter->msix > 1) {
struct tx_ring *txr = adapter->tx_rings;
struct rx_ring *rxr = adapter->rx_rings;
int i;
for (i = 0; i < adapter->num_queues; i++) {
taskqueue_unblock(txr->tq);
taskqueue_unblock(rxr->tq);
}
} else { /* legacy */
taskqueue_unblock(adapter->tq);
}
}
/*
* register-unregister routine
*/
static int
em_netmap_reg(struct ifnet *ifp, int onoff)
{
struct adapter *adapter = ifp->if_softc;
struct netmap_adapter *na = NA(ifp);
int error = 0;
if (na == NULL)
return EINVAL; /* no netmap support here */
em_disable_intr(adapter);
/* Tell the stack that the interface is no longer active */
ifp->if_drv_flags &= ~(IFF_DRV_RUNNING | IFF_DRV_OACTIVE);
em_netmap_block_tasks(adapter);
if (onoff) {
ifp->if_capenable |= IFCAP_NETMAP;
/* save if_transmit for later restore.
* XXX also if_start and if_qflush ?
*/
na->if_transmit = ifp->if_transmit;
ifp->if_transmit = netmap_start;
em_init_locked(adapter);
if ((ifp->if_drv_flags & (IFF_DRV_RUNNING | IFF_DRV_OACTIVE)) == 0) {
error = ENOMEM;
goto fail;
}
} else {
fail:
/* restore if_transmit */
ifp->if_transmit = na->if_transmit;
ifp->if_capenable &= ~IFCAP_NETMAP;
em_init_locked(adapter); /* also enable intr */
}
em_netmap_unblock_tasks(adapter);
return (error);
}
/*
* Reconcile hardware and user view of the transmit ring, see
* ixgbe.c for details.
*/
static int
em_netmap_txsync(void *a, u_int ring_nr, int do_lock)
{
struct adapter *adapter = a;
struct tx_ring *txr = &adapter->tx_rings[ring_nr];
struct netmap_adapter *na = NA(adapter->ifp);
struct netmap_kring *kring = &na->tx_rings[ring_nr];
struct netmap_ring *ring = kring->ring;
int j, k, n, lim = kring->nkr_num_slots - 1;
/* generate an interrupt approximately every half ring */
int report_frequency = kring->nkr_num_slots >> 1;
k = ring->cur;
if ( (kring->nr_kflags & NR_REINIT) || k > lim)
return netmap_ring_reinit(kring);
if (do_lock)
EM_TX_LOCK(txr);
bus_dmamap_sync(txr->txdma.dma_tag, txr->txdma.dma_map,
BUS_DMASYNC_POSTREAD);
/* record completed transmissions TODO
*
* instead of using TDH, we could read the transmitted status bit.
*/
j = E1000_READ_REG(&adapter->hw, E1000_TDH(ring_nr));
if (j >= kring->nkr_num_slots) { /* XXX can happen */
D("TDH wrap %d", j);
j -= kring->nkr_num_slots;
}
int delta = j - txr->next_to_clean;
if (delta) {
/* new transmissions were completed, increment
ring->nr_hwavail. */
if (delta < 0)
delta += kring->nkr_num_slots;
txr->next_to_clean = j;
kring->nr_hwavail += delta;
}
/* update avail to what the hardware knows */
ring->avail = kring->nr_hwavail;
j = kring->nr_hwcur;
if (j != k) { /* we have packets to send */
n = 0;
while (j != k) {
struct netmap_slot *slot = &ring->slot[j];
struct e1000_tx_desc *curr = &txr->tx_base[j];
struct em_buffer *txbuf = &txr->tx_buffers[j];
int flags = ((slot->flags & NS_REPORT) ||
j == 0 || j == report_frequency) ?
E1000_TXD_CMD_RS : 0;
void *addr = NMB(slot);
int len = slot->len;
if (addr == netmap_buffer_base || len > NETMAP_BUF_SIZE) {
if (do_lock)
EM_TX_UNLOCK(txr);
return netmap_ring_reinit(kring);
}
slot->flags &= ~NS_REPORT;
curr->upper.data = 0;
curr->lower.data =
htole32(
adapter->txd_cmd |
(E1000_TXD_CMD_EOP | flags) |
slot->len);
if (slot->flags & NS_BUF_CHANGED) {
curr->buffer_addr = htole64(vtophys(addr));
/* buffer has changed, unload and reload map */
netmap_reload_map(txr->txtag, txbuf->map,
addr, na->buff_size);
slot->flags &= ~NS_BUF_CHANGED;
}
bus_dmamap_sync(txr->txtag, txbuf->map,
BUS_DMASYNC_PREWRITE);
j = (j == lim) ? 0 : j + 1;
n++;
}
kring->nr_hwcur = ring->cur;
/* decrease avail by number of sent packets */
ring->avail -= n;
kring->nr_hwavail = ring->avail;
bus_dmamap_sync(txr->txdma.dma_tag, txr->txdma.dma_map,
BUS_DMASYNC_PREREAD | BUS_DMASYNC_PREWRITE);
E1000_WRITE_REG(&adapter->hw, E1000_TDT(txr->me),
ring->cur);
}
if (do_lock)
EM_TX_UNLOCK(txr);
return 0;
}
/*
* Reconcile kernel and user view of the receive ring, see ixgbe.c
*/
static int
em_netmap_rxsync(void *a, u_int ring_nr, int do_lock)
{
struct adapter *adapter = a;
struct rx_ring *rxr = &adapter->rx_rings[ring_nr];
struct netmap_adapter *na = NA(adapter->ifp);
struct netmap_kring *kring = &na->rx_rings[ring_nr];
struct netmap_ring *ring = kring->ring;
int j, k, n, lim = kring->nkr_num_slots - 1;
k = ring->cur;
if ( (kring->nr_kflags & NR_REINIT) || k > lim)
return netmap_ring_reinit(kring);
if (do_lock)
EM_RX_LOCK(rxr);
/* XXX check sync modes */
bus_dmamap_sync(rxr->rxdma.dma_tag, rxr->rxdma.dma_map,
BUS_DMASYNC_POSTREAD | BUS_DMASYNC_POSTWRITE);
/* acknowledge all the received packets. */
j = rxr->next_to_check;
for (n = 0; ; n++) {
struct e1000_rx_desc *curr = &rxr->rx_base[j];
if ((curr->status & E1000_RXD_STAT_DD) == 0)
break;
ring->slot[j].len = le16toh(curr->length);
bus_dmamap_sync(rxr->tag, rxr->rx_buffers[j].map,
BUS_DMASYNC_POSTREAD);
j = (j == lim) ? 0 : j + 1;
}
if (n) {
rxr->next_to_check = j;
kring->nr_hwavail += n;
}
/* skip past packets that userspace has already processed:
* making them available for reception.
* advance nr_hwcur and issue a bus_dmamap_sync on the
* buffers so it is safe to write to them.
* Also increase nr_hwavail
*/
j = kring->nr_hwcur;
if (j != k) { /* userspace has read some packets. */
n = 0;
while (j != k) {
struct netmap_slot *slot = &ring->slot[j];
struct e1000_rx_desc *curr = &rxr->rx_base[j];
struct em_buffer *rxbuf = &rxr->rx_buffers[j];
void *addr = NMB(slot);
if (addr == netmap_buffer_base) { /* bad buf */
if (do_lock)
EM_RX_UNLOCK(rxr);
return netmap_ring_reinit(kring);
}
curr->status = 0;
if (slot->flags & NS_BUF_CHANGED) {
curr->buffer_addr = htole64(vtophys(addr));
/* buffer has changed, unload and reload map */
netmap_reload_map(rxr->rxtag, rxbuf->map,
addr, na->buff_size);
slot->flags &= ~NS_BUF_CHANGED;
}
bus_dmamap_sync(rxr->rxtag, rxbuf->map,
BUS_DMASYNC_PREREAD);
j = (j == lim) ? 0 : j + 1;
n++;
}
kring->nr_hwavail -= n;
kring->nr_hwcur = ring->cur;
bus_dmamap_sync(rxr->rxdma.dma_tag, rxr->rxdma.dma_map,
BUS_DMASYNC_PREREAD | BUS_DMASYNC_PREWRITE);
/*
* IMPORTANT: we must leave one free slot in the ring,
* so move j back by one unit
*/
j = (j == 0) ? lim : j - 1;
E1000_WRITE_REG(&adapter->hw, E1000_RDT(rxr->me), j);
}
/* tell userspace that there are new packets */
ring->avail = kring->nr_hwavail ;
if (do_lock)
EM_RX_UNLOCK(rxr);
return 0;
}

View File

@ -0,0 +1,378 @@
/*
* Copyright (C) 2011 Universita` di Pisa. All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
* 1. Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
* 2. Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in the
* documentation and/or other materials provided with the distribution.
*
* THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
* ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
* IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
* ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
* FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
* DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
* OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
* HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
* OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
* SUCH DAMAGE.
*/
/*
* $FreeBSD$
* $Id: if_igb_netmap.h 9662 2011-11-16 13:18:06Z luigi $
*
* netmap modifications for igb
* contribured by Ahmed Kooli
*/
#include <net/netmap.h>
#include <sys/selinfo.h>
#include <vm/vm.h>
#include <vm/pmap.h> /* vtophys ? */
#include <dev/netmap/netmap_kern.h>
static int igb_netmap_reg(struct ifnet *, int onoff);
static int igb_netmap_txsync(void *, u_int, int);
static int igb_netmap_rxsync(void *, u_int, int);
static void igb_netmap_lock_wrapper(void *, int, u_int);
static void
igb_netmap_attach(struct adapter *adapter)
{
struct netmap_adapter na;
bzero(&na, sizeof(na));
na.ifp = adapter->ifp;
na.separate_locks = 1;
na.num_tx_desc = adapter->num_tx_desc;
na.num_rx_desc = adapter->num_rx_desc;
na.nm_txsync = igb_netmap_txsync;
na.nm_rxsync = igb_netmap_rxsync;
na.nm_lock = igb_netmap_lock_wrapper;
na.nm_register = igb_netmap_reg;
/*
* adapter->rx_mbuf_sz is set by SIOCSETMTU, but in netmap mode
* we allocate the buffers on the first register. So we must
* disallow a SIOCSETMTU when if_capenable & IFCAP_NETMAP is set.
*/
na.buff_size = MCLBYTES;
netmap_attach(&na, adapter->num_queues);
}
/*
* wrapper to export locks to the generic code
*/
static void
igb_netmap_lock_wrapper(void *_a, int what, u_int queueid)
{
struct adapter *adapter = _a;
ASSERT(queueid < adapter->num_queues);
switch (what) {
case NETMAP_CORE_LOCK:
IGB_CORE_LOCK(adapter);
break;
case NETMAP_CORE_UNLOCK:
IGB_CORE_UNLOCK(adapter);
break;
case NETMAP_TX_LOCK:
IGB_TX_LOCK(&adapter->tx_rings[queueid]);
break;
case NETMAP_TX_UNLOCK:
IGB_TX_UNLOCK(&adapter->tx_rings[queueid]);
break;
case NETMAP_RX_LOCK:
IGB_RX_LOCK(&adapter->rx_rings[queueid]);
break;
case NETMAP_RX_UNLOCK:
IGB_RX_UNLOCK(&adapter->rx_rings[queueid]);
break;
}
}
/*
* support for netmap register/unregisted. We are already under core lock.
* only called on the first init or the last unregister.
*/
static int
igb_netmap_reg(struct ifnet *ifp, int onoff)
{
struct adapter *adapter = ifp->if_softc;
struct netmap_adapter *na = NA(ifp);
int error = 0;
if (!na)
return EINVAL;
igb_disable_intr(adapter);
/* Tell the stack that the interface is no longer active */
ifp->if_drv_flags &= ~(IFF_DRV_RUNNING | IFF_DRV_OACTIVE);
if (onoff) {
ifp->if_capenable |= IFCAP_NETMAP;
/* save if_transmit to restore it later */
na->if_transmit = ifp->if_transmit;
ifp->if_transmit = netmap_start;
igb_init_locked(adapter);
if ((ifp->if_drv_flags & (IFF_DRV_RUNNING | IFF_DRV_OACTIVE)) == 0) {
error = ENOMEM;
goto fail;
}
} else {
fail:
/* restore if_transmit */
ifp->if_transmit = na->if_transmit;
ifp->if_capenable &= ~IFCAP_NETMAP;
igb_init_locked(adapter); /* also enables intr */
}
return (error);
}
/*
* Reconcile kernel and user view of the transmit ring.
*
* Userspace has filled tx slots up to cur (excluded).
* The last unused slot previously known to the kernel was nr_hwcur,
* and the last interrupt reported nr_hwavail slots available
* (using the special value -1 to indicate idle transmit ring).
* The function must first update avail to what the kernel
* knows, subtract the newly used slots (cur - nr_hwcur)
* from both avail and nr_hwavail, and set nr_hwcur = cur
* issuing a dmamap_sync on all slots.
*
* Check parameters in the struct netmap_ring.
* We don't use avail, only check for bogus values.
* Make sure cur is valid, and same goes for buffer indexes and lengths.
* To avoid races, read the values once, and never use those from
* the ring afterwards.
*/
static int
igb_netmap_txsync(void *a, u_int ring_nr, int do_lock)
{
struct adapter *adapter = a;
struct tx_ring *txr = &adapter->tx_rings[ring_nr];
struct netmap_adapter *na = NA(adapter->ifp);
struct netmap_kring *kring = &na->tx_rings[ring_nr];
struct netmap_ring *ring = kring->ring;
int j, k, n, lim = kring->nkr_num_slots - 1;
/* generate an interrupt approximately every half ring */
int report_frequency = kring->nkr_num_slots >> 1;
k = ring->cur; /* ring is not protected by any lock */
if ( (kring->nr_kflags & NR_REINIT) || k > lim)
return netmap_ring_reinit(kring);
if (do_lock)
IGB_TX_LOCK(txr);
bus_dmamap_sync(txr->txdma.dma_tag, txr->txdma.dma_map,
BUS_DMASYNC_POSTREAD);
/* record completed transmissions. TODO
*
* Instead of reading from the TDH register, we could and try to check
* the status bit of descriptor packets.
*/
j = E1000_READ_REG(&adapter->hw, E1000_TDH(ring_nr));
if (j >= kring->nkr_num_slots) /* XXX can it happen ? */
j -= kring->nkr_num_slots;
int delta = j - txr->next_to_clean;
if (delta) {
/* new tx were completed */
if (delta < 0)
delta += kring->nkr_num_slots;
txr->next_to_clean = j;
kring->nr_hwavail += delta;
}
/* update avail to what the hardware knows */
ring->avail = kring->nr_hwavail;
j = kring->nr_hwcur;
if (j != k) { /* we have new packets to send */
u32 olinfo_status = 0;
n = 0;
/* 82575 needs the queue index added */
if (adapter->hw.mac.type == e1000_82575)
olinfo_status |= txr->me << 4;
while (j != k) {
struct netmap_slot *slot = &ring->slot[j];
struct igb_tx_buffer *txbuf = &txr->tx_buffers[j];
union e1000_adv_tx_desc *curr =
(union e1000_adv_tx_desc *)&txr->tx_base[j];
void *addr = NMB(slot);
int flags = ((slot->flags & NS_REPORT) ||
j == 0 || j == report_frequency) ?
E1000_ADVTXD_DCMD_RS : 0;
int len = slot->len;
if (addr == netmap_buffer_base || len > NETMAP_BUF_SIZE) {
if (do_lock)
IGB_TX_UNLOCK(txr);
return netmap_ring_reinit(kring);
}
slot->flags &= ~NS_REPORT;
curr->read.buffer_addr = htole64(vtophys(addr));
curr->read.olinfo_status =
htole32(olinfo_status |
(len<< E1000_ADVTXD_PAYLEN_SHIFT));
curr->read.cmd_type_len =
htole32(len | E1000_ADVTXD_DTYP_DATA |
E1000_ADVTXD_DCMD_IFCS |
E1000_ADVTXD_DCMD_DEXT |
E1000_ADVTXD_DCMD_EOP | flags);
if (slot->flags & NS_BUF_CHANGED) {
/* buffer has changed, unload and reload map */
netmap_reload_map(txr->txtag, txbuf->map,
addr, na->buff_size);
slot->flags &= ~NS_BUF_CHANGED;
}
bus_dmamap_sync(txr->txtag, txbuf->map,
BUS_DMASYNC_PREWRITE);
j = (j == lim) ? 0 : j + 1;
n++;
}
kring->nr_hwcur = k;
/* decrease avail by number of sent packets */
ring->avail -= n;
kring->nr_hwavail = ring->avail;
/* Set the watchdog */
txr->queue_status = IGB_QUEUE_WORKING;
txr->watchdog_time = ticks;
bus_dmamap_sync(txr->txdma.dma_tag, txr->txdma.dma_map,
BUS_DMASYNC_PREREAD | BUS_DMASYNC_PREWRITE);
E1000_WRITE_REG(&adapter->hw, E1000_TDT(txr->me), k);
}
if (do_lock)
IGB_TX_UNLOCK(txr);
return 0;
}
/*
* Reconcile kernel and user view of the receive ring.
*
* Userspace has read rx slots up to cur (excluded).
* The last unread slot previously known to the kernel was nr_hwcur,
* and the last interrupt reported nr_hwavail slots available.
* We must subtract the newly consumed slots (cur - nr_hwcur)
* from nr_hwavail, clearing the descriptors for the next
* read, tell the hardware that they are available,
* and set nr_hwcur = cur and avail = nr_hwavail.
* issuing a dmamap_sync on all slots.
*/
static int
igb_netmap_rxsync(void *a, u_int ring_nr, int do_lock)
{
struct adapter *adapter = a;
struct rx_ring *rxr = &adapter->rx_rings[ring_nr];
struct netmap_adapter *na = NA(adapter->ifp);
struct netmap_kring *kring = &na->rx_rings[ring_nr];
struct netmap_ring *ring = kring->ring;
int j, k, n, lim = kring->nkr_num_slots - 1;
k = ring->cur; /* ring is not protected by any lock */
if ( (kring->nr_kflags & NR_REINIT) || k > lim)
return netmap_ring_reinit(kring);
if (do_lock)
IGB_RX_LOCK(rxr);
/* Sync the ring. */
bus_dmamap_sync(rxr->rxdma.dma_tag, rxr->rxdma.dma_map,
BUS_DMASYNC_POSTREAD | BUS_DMASYNC_POSTWRITE);
j = rxr->next_to_check;
for (n = 0; ; n++) {
union e1000_adv_rx_desc *curr = &rxr->rx_base[j];
uint32_t staterr = le32toh(curr->wb.upper.status_error);
if ((staterr & E1000_RXD_STAT_DD) == 0)
break;
ring->slot[j].len = le16toh(curr->wb.upper.length);
bus_dmamap_sync(rxr->ptag,
rxr->rx_buffers[j].pmap, BUS_DMASYNC_POSTREAD);
j = (j == lim) ? 0 : j + 1;
}
if (n) {
rxr->next_to_check = j;
kring->nr_hwavail += n;
if (kring->nr_hwavail >= lim - 10) {
ND("rx ring %d almost full %d", ring_nr, kring->nr_hwavail);
}
}
/* skip past packets that userspace has already processed,
* making them available for reception.
* advance nr_hwcur and issue a bus_dmamap_sync on the
* buffers so it is safe to write to them.
* Also increase nr_hwavail
*/
j = kring->nr_hwcur;
if (j != k) { /* userspace has read some packets. */
n = 0;
while (j != k) {
struct netmap_slot *slot = ring->slot + j;
union e1000_adv_rx_desc *curr = &rxr->rx_base[j];
struct igb_rx_buf *rxbuf = rxr->rx_buffers + j;
void *addr = NMB(slot);
if (addr == netmap_buffer_base) { /* bad buf */
if (do_lock)
IGB_RX_UNLOCK(rxr);
return netmap_ring_reinit(kring);
}
curr->wb.upper.status_error = 0;
curr->read.pkt_addr = htole64(vtophys(addr));
if (slot->flags & NS_BUF_CHANGED) {
netmap_reload_map(rxr->ptag, rxbuf->pmap,
addr, na->buff_size);
slot->flags &= ~NS_BUF_CHANGED;
}
bus_dmamap_sync(rxr->ptag, rxbuf->pmap,
BUS_DMASYNC_PREREAD);
j = (j == lim) ? 0 : j + 1;
n++;
}
kring->nr_hwavail -= n;
kring->nr_hwcur = ring->cur;
bus_dmamap_sync(rxr->rxdma.dma_tag, rxr->rxdma.dma_map,
BUS_DMASYNC_PREREAD | BUS_DMASYNC_PREWRITE);
/* IMPORTANT: we must leave one free slot in the ring,
* so move j back by one unit
*/
j = (j == 0) ? lim : j - 1;
E1000_WRITE_REG(&adapter->hw, E1000_RDT(rxr->me), j);
}
/* tell userspace that there are new packets */
ring->avail = kring->nr_hwavail ;
if (do_lock)
IGB_RX_UNLOCK(rxr);
return 0;
}

View File

@ -0,0 +1,344 @@
/*
* Copyright (C) 2011 Matteo Landi, Luigi Rizzo. All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
* 1. Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
* 2. Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in the
* documentation and/or other materials provided with the distribution.
*
* THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
* ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
* IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
* ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
* FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
* DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
* OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
* HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
* OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
* SUCH DAMAGE.
*/
/*
* $FreeBSD$
* $Id: if_lem_netmap.h 9662 2011-11-16 13:18:06Z luigi $
*
* netmap support for if_lem.c
*/
#include <net/netmap.h>
#include <sys/selinfo.h>
#include <vm/vm.h>
#include <vm/pmap.h> /* vtophys ? */
#include <dev/netmap/netmap_kern.h>
static int lem_netmap_reg(struct ifnet *, int onoff);
static int lem_netmap_txsync(void *, u_int, int);
static int lem_netmap_rxsync(void *, u_int, int);
static void lem_netmap_lock_wrapper(void *, int, u_int);
SYSCTL_NODE(_dev, OID_AUTO, lem, CTLFLAG_RW, 0, "lem card");
static void
lem_netmap_attach(struct adapter *adapter)
{
struct netmap_adapter na;
bzero(&na, sizeof(na));
na.ifp = adapter->ifp;
na.separate_locks = 1;
na.num_tx_desc = adapter->num_tx_desc;
na.num_rx_desc = adapter->num_rx_desc;
na.nm_txsync = lem_netmap_txsync;
na.nm_rxsync = lem_netmap_rxsync;
na.nm_lock = lem_netmap_lock_wrapper;
na.nm_register = lem_netmap_reg;
na.buff_size = MCLBYTES;
netmap_attach(&na, 1);
}
static void
lem_netmap_lock_wrapper(void *_a, int what, u_int ringid)
{
struct adapter *adapter = _a;
/* only one ring here so ignore the ringid */
switch (what) {
case NETMAP_CORE_LOCK:
EM_CORE_LOCK(adapter);
break;
case NETMAP_CORE_UNLOCK:
EM_CORE_UNLOCK(adapter);
break;
case NETMAP_TX_LOCK:
EM_TX_LOCK(adapter);
break;
case NETMAP_TX_UNLOCK:
EM_TX_UNLOCK(adapter);
break;
case NETMAP_RX_LOCK:
EM_RX_LOCK(adapter);
break;
case NETMAP_RX_UNLOCK:
EM_RX_UNLOCK(adapter);
break;
}
}
/*
* Reconcile kernel and user view of the transmit ring. see ixgbe.c
*/
static int
lem_netmap_txsync(void *a, u_int ring_nr, int do_lock)
{
struct adapter *adapter = a;
struct netmap_adapter *na = NA(adapter->ifp);
struct netmap_kring *kring = &na->tx_rings[0];
struct netmap_ring *ring = kring->ring;
int j, k, n, lim = kring->nkr_num_slots - 1;
/* generate an interrupt approximately every half ring */
int report_frequency = kring->nkr_num_slots >> 1;
k = ring->cur;
if ( (kring->nr_kflags & NR_REINIT) || k > lim)
return netmap_ring_reinit(kring);
if (do_lock)
EM_TX_LOCK(adapter);
bus_dmamap_sync(adapter->txdma.dma_tag, adapter->txdma.dma_map,
BUS_DMASYNC_POSTREAD);
/* record completed transmissions TODO
*
* instead of using TDH, we could read the transmitted status bit.
*/
j = E1000_READ_REG(&adapter->hw, E1000_TDH(0));
if (j >= kring->nkr_num_slots) { /* can it happen ? */
D("bad TDH %d", j);
j -= kring->nkr_num_slots;
}
int delta = j - adapter->next_tx_to_clean;
if (delta) {
if (delta < 0)
delta += kring->nkr_num_slots;
adapter->next_tx_to_clean = j;
kring->nr_hwavail += delta;
}
/* update avail to what the hardware knows */
ring->avail = kring->nr_hwavail;
j = kring->nr_hwcur;
if (j != k) { /* we have new packets to send */
n = 0;
while (j != k) {
struct netmap_slot *slot = &ring->slot[j];
struct e1000_tx_desc *curr = &adapter->tx_desc_base[j];
struct em_buffer *txbuf = &adapter->tx_buffer_area[j];
void *addr = NMB(slot);
int flags = ((slot->flags & NS_REPORT) ||
j == 0 || j == report_frequency) ?
E1000_TXD_CMD_RS : 0;
int len = slot->len;
if (addr == netmap_buffer_base || len > NETMAP_BUF_SIZE) {
if (do_lock)
EM_TX_UNLOCK(adapter);
return netmap_ring_reinit(kring);
}
curr->upper.data = 0;
/* always interrupt. XXX make it conditional */
curr->lower.data =
htole32( adapter->txd_cmd | len |
(E1000_TXD_CMD_EOP | flags) );
if (slot->flags & NS_BUF_CHANGED) {
curr->buffer_addr = htole64(vtophys(addr));
/* buffer has changed, unload and reload map */
netmap_reload_map(adapter->txtag, txbuf->map,
addr, na->buff_size);
slot->flags &= ~NS_BUF_CHANGED;
}
bus_dmamap_sync(adapter->txtag, txbuf->map,
BUS_DMASYNC_PREWRITE);
j = (j == lim) ? 0 : j + 1;
n++;
}
kring->nr_hwcur = ring->cur;
/* decrease avail by number of sent packets */
ring->avail -= n;
kring->nr_hwavail = ring->avail;
bus_dmamap_sync(adapter->txdma.dma_tag, adapter->txdma.dma_map,
BUS_DMASYNC_PREREAD | BUS_DMASYNC_PREWRITE);
E1000_WRITE_REG(&adapter->hw, E1000_TDT(0), ring->cur);
}
if (do_lock)
EM_TX_UNLOCK(adapter);
return 0;
}
/*
* Reconcile kernel and user view of the receive ring. see ixgbe.c
*/
static int
lem_netmap_rxsync(void *a, u_int ring_nr, int do_lock)
{
struct adapter *adapter = a;
struct netmap_adapter *na = NA(adapter->ifp);
struct netmap_kring *kring = &na->rx_rings[0];
struct netmap_ring *ring = kring->ring;
int j, k, n, lim = kring->nkr_num_slots - 1;
k = ring->cur;
if ( (kring->nr_kflags & NR_REINIT) || k > lim)
return netmap_ring_reinit(kring);
if (do_lock)
EM_RX_LOCK(adapter);
/* XXX check sync modes */
bus_dmamap_sync(adapter->rxdma.dma_tag, adapter->rxdma.dma_map,
BUS_DMASYNC_POSTREAD | BUS_DMASYNC_POSTWRITE);
/* acknowldge all the received packets. */
j = adapter->next_rx_desc_to_check;
for (n = 0; ; n++) {
struct e1000_rx_desc *curr = &adapter->rx_desc_base[j];
int len = le16toh(adapter->rx_desc_base[j].length) - 4; // CRC
if ((curr->status & E1000_RXD_STAT_DD) == 0)
break;
if (len < 0) {
D("bogus pkt size at %d", j);
len = 0;
}
ring->slot[j].len = len;
bus_dmamap_sync(adapter->rxtag, adapter->rx_buffer_area[j].map,
BUS_DMASYNC_POSTREAD);
j = (j == lim) ? 0 : j + 1;
}
if (n) {
adapter->next_rx_desc_to_check = j;
kring->nr_hwavail += n;
}
/* skip past packets that userspace has already processed,
* making them available for reception. We don't need to set
* the length as it is the same for all slots.
*/
j = kring->nr_hwcur;
if (j != k) { /* userspace has read some packets. */
n = 0;
while (j != k) {
struct netmap_slot *slot = &ring->slot[j];
struct e1000_rx_desc *curr = &adapter->rx_desc_base[j];
struct em_buffer *rxbuf = &adapter->rx_buffer_area[j];
void *addr = NMB(slot);
if (addr == netmap_buffer_base) { /* bad buf */
if (do_lock)
EM_RX_UNLOCK(adapter);
return netmap_ring_reinit(kring);
}
curr = &adapter->rx_desc_base[j];
curr->status = 0;
if (slot->flags & NS_BUF_CHANGED) {
curr->buffer_addr = htole64(vtophys(addr));
/* buffer has changed, unload and reload map */
netmap_reload_map(adapter->rxtag, rxbuf->map,
addr, na->buff_size);
slot->flags &= ~NS_BUF_CHANGED;
}
bus_dmamap_sync(adapter->rxtag, rxbuf->map,
BUS_DMASYNC_PREREAD);
j = (j == lim) ? 0 : j + 1;
n++;
}
kring->nr_hwavail -= n;
kring->nr_hwcur = ring->cur;
bus_dmamap_sync(adapter->rxdma.dma_tag, adapter->rxdma.dma_map,
BUS_DMASYNC_PREREAD | BUS_DMASYNC_PREWRITE);
/*
* IMPORTANT: we must leave one free slot in the ring,
* so move j back by one unit
*/
j = (j == 0) ? lim : j - 1;
E1000_WRITE_REG(&adapter->hw, E1000_RDT(0), j);
}
/* tell userspace that there are new packets */
ring->avail = kring->nr_hwavail ;
if (do_lock)
EM_RX_UNLOCK(adapter);
return 0;
}
/*
* Register/unregister routine
*/
static int
lem_netmap_reg(struct ifnet *ifp, int onoff)
{
struct adapter *adapter = ifp->if_softc;
struct netmap_adapter *na = NA(ifp);
int error = 0;
if (!na)
return EINVAL;
lem_disable_intr(adapter);
/* Tell the stack that the interface is no longer active */
ifp->if_drv_flags &= ~(IFF_DRV_RUNNING | IFF_DRV_OACTIVE);
/* lem_netmap_block_tasks(adapter); */
#ifndef EM_LEGACY_IRQ
taskqueue_block(adapter->tq);
taskqueue_drain(adapter->tq, &adapter->rxtx_task);
taskqueue_drain(adapter->tq, &adapter->link_task);
#endif /* !EM_LEGCY_IRQ */
if (onoff) {
ifp->if_capenable |= IFCAP_NETMAP;
/* save if_transmit to restore it when exiting.
* XXX what about if_start and if_qflush ?
*/
na->if_transmit = ifp->if_transmit;
ifp->if_transmit = netmap_start;
lem_init_locked(adapter);
if ((ifp->if_drv_flags & (IFF_DRV_RUNNING | IFF_DRV_OACTIVE)) == 0) {
error = ENOMEM;
goto fail;
}
} else {
fail:
/* restore non-netmap mode */
ifp->if_transmit = na->if_transmit;
ifp->if_capenable &= ~IFCAP_NETMAP;
lem_init_locked(adapter); /* also enables intr */
}
#ifndef EM_LEGACY_IRQ
taskqueue_unblock(adapter->tq);
#endif /* !EM_LEGCY_IRQ */
return (error);
}

View File

@ -0,0 +1,415 @@
/*
* Copyright (C) 2011 Luigi Rizzo. All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
* 1. Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
* 2. Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in the
* documentation and/or other materials provided with the distribution.
*
* THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
* ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
* IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
* ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
* FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
* DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
* OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
* HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
* OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
* SUCH DAMAGE.
*/
/*
* $FreeBSD$
* $Id: if_re_netmap.h 9662 2011-11-16 13:18:06Z luigi $
*
* netmap support for if_re
*/
#include <net/netmap.h>
#include <sys/selinfo.h>
#include <vm/vm.h>
#include <vm/pmap.h> /* vtophys ? */
#include <dev/netmap/netmap_kern.h>
static int re_netmap_reg(struct ifnet *, int onoff);
static int re_netmap_txsync(void *, u_int, int);
static int re_netmap_rxsync(void *, u_int, int);
static void re_netmap_lock_wrapper(void *, int, u_int);
static void
re_netmap_attach(struct rl_softc *sc)
{
struct netmap_adapter na;
bzero(&na, sizeof(na));
na.ifp = sc->rl_ifp;
na.separate_locks = 0;
na.num_tx_desc = sc->rl_ldata.rl_tx_desc_cnt;
na.num_rx_desc = sc->rl_ldata.rl_rx_desc_cnt;
na.nm_txsync = re_netmap_txsync;
na.nm_rxsync = re_netmap_rxsync;
na.nm_lock = re_netmap_lock_wrapper;
na.nm_register = re_netmap_reg;
na.buff_size = MCLBYTES;
netmap_attach(&na, 1);
}
/*
* wrapper to export locks to the generic code
* We should not use the tx/rx locks
*/
static void
re_netmap_lock_wrapper(void *_a, int what, u_int queueid)
{
struct rl_softc *adapter = _a;
switch (what) {
case NETMAP_CORE_LOCK:
RL_LOCK(adapter);
break;
case NETMAP_CORE_UNLOCK:
RL_UNLOCK(adapter);
break;
case NETMAP_TX_LOCK:
case NETMAP_RX_LOCK:
case NETMAP_TX_UNLOCK:
case NETMAP_RX_UNLOCK:
D("invalid lock call %d, no tx/rx locks here", what);
break;
}
}
/*
* support for netmap register/unregisted. We are already under core lock.
* only called on the first register or the last unregister.
*/
static int
re_netmap_reg(struct ifnet *ifp, int onoff)
{
struct rl_softc *adapter = ifp->if_softc;
struct netmap_adapter *na = NA(ifp);
int error = 0;
if (!na)
return EINVAL;
/* Tell the stack that the interface is no longer active */
ifp->if_drv_flags &= ~(IFF_DRV_RUNNING | IFF_DRV_OACTIVE);
re_stop(adapter);
if (onoff) {
ifp->if_capenable |= IFCAP_NETMAP;
/* save if_transmit and restore it */
na->if_transmit = ifp->if_transmit;
/* XXX if_start and if_qflush ??? */
ifp->if_transmit = netmap_start;
re_init_locked(adapter);
if ((ifp->if_drv_flags & (IFF_DRV_RUNNING | IFF_DRV_OACTIVE)) == 0) {
error = ENOMEM;
goto fail;
}
} else {
fail:
/* restore if_transmit */
ifp->if_transmit = na->if_transmit;
ifp->if_capenable &= ~IFCAP_NETMAP;
re_init_locked(adapter); /* also enables intr */
}
return (error);
}
/*
* Reconcile kernel and user view of the transmit ring.
*
* Userspace has filled tx slots up to cur (excluded).
* The last unused slot previously known to the kernel was nr_hwcur,
* and the last interrupt reported nr_hwavail slots available
* (using the special value -1 to indicate idle transmit ring).
* The function must first update avail to what the kernel
* knows (translating the -1 to nkr_num_slots - 1),
* subtract the newly used slots (cur - nr_hwcur)
* from both avail and nr_hwavail, and set nr_hwcur = cur
* issuing a dmamap_sync on all slots.
*/
static int
re_netmap_txsync(void *a, u_int ring_nr, int do_lock)
{
struct rl_softc *sc = a;
struct rl_txdesc *txd = sc->rl_ldata.rl_tx_desc;
struct netmap_adapter *na = NA(sc->rl_ifp);
struct netmap_kring *kring = &na->tx_rings[ring_nr];
struct netmap_ring *ring = kring->ring;
int j, k, n, lim = kring->nkr_num_slots - 1;
k = ring->cur;
if ( (kring->nr_kflags & NR_REINIT) || k > lim)
return netmap_ring_reinit(kring);
if (do_lock)
RL_LOCK(sc);
/* Sync the TX descriptor list */
bus_dmamap_sync(sc->rl_ldata.rl_tx_list_tag,
sc->rl_ldata.rl_tx_list_map,
BUS_DMASYNC_POSTREAD | BUS_DMASYNC_POSTWRITE);
/* record completed transmissions */
for (n = 0, j = sc->rl_ldata.rl_tx_considx;
j != sc->rl_ldata.rl_tx_prodidx;
n++, j = RL_TX_DESC_NXT(sc, j)) {
uint32_t cmdstat =
le32toh(sc->rl_ldata.rl_tx_list[j].rl_cmdstat);
if (cmdstat & RL_TDESC_STAT_OWN)
break;
}
if (n > 0) {
sc->rl_ldata.rl_tx_considx = j;
sc->rl_ldata.rl_tx_free += n;
kring->nr_hwavail += n;
}
/* update avail to what the hardware knows */
ring->avail = kring->nr_hwavail;
/* we trust prodidx, not hwcur */
j = kring->nr_hwcur = sc->rl_ldata.rl_tx_prodidx;
if (j != k) { /* we have new packets to send */
n = 0;
while (j != k) {
struct netmap_slot *slot = &ring->slot[j];
struct rl_desc *desc = &sc->rl_ldata.rl_tx_list[j];
int cmd = slot->len | RL_TDESC_CMD_EOF |
RL_TDESC_CMD_OWN | RL_TDESC_CMD_SOF ;
void *addr = NMB(slot);
int len = slot->len;
if (addr == netmap_buffer_base || len > NETMAP_BUF_SIZE) {
if (do_lock)
RL_UNLOCK(sc);
return netmap_ring_reinit(kring);
}
if (j == lim) /* mark end of ring */
cmd |= RL_TDESC_CMD_EOR;
if (slot->flags & NS_BUF_CHANGED) {
uint64_t paddr = vtophys(addr);
desc->rl_bufaddr_lo = htole32(RL_ADDR_LO(paddr));
desc->rl_bufaddr_hi = htole32(RL_ADDR_HI(paddr));
/* buffer has changed, unload and reload map */
netmap_reload_map(sc->rl_ldata.rl_tx_mtag,
txd[j].tx_dmamap, addr, na->buff_size);
slot->flags &= ~NS_BUF_CHANGED;
}
slot->flags &= ~NS_REPORT;
desc->rl_cmdstat = htole32(cmd);
bus_dmamap_sync(sc->rl_ldata.rl_tx_mtag,
txd[j].tx_dmamap, BUS_DMASYNC_PREWRITE);
j = (j == lim) ? 0 : j + 1;
n++;
}
sc->rl_ldata.rl_tx_prodidx = kring->nr_hwcur = ring->cur;
/* decrease avail by number of sent packets */
ring->avail -= n;
kring->nr_hwavail = ring->avail;
bus_dmamap_sync(sc->rl_ldata.rl_tx_list_tag,
sc->rl_ldata.rl_tx_list_map,
BUS_DMASYNC_PREWRITE|BUS_DMASYNC_PREREAD);
/* start ? */
CSR_WRITE_1(sc, sc->rl_txstart, RL_TXSTART_START);
}
if (do_lock)
RL_UNLOCK(sc);
return 0;
}
/*
* Reconcile kernel and user view of the receive ring.
*
* Userspace has read rx slots up to cur (excluded).
* The last unread slot previously known to the kernel was nr_hwcur,
* and the last interrupt reported nr_hwavail slots available.
* We must subtract the newly consumed slots (cur - nr_hwcur)
* from nr_hwavail, clearing the descriptors for the next
* read, tell the hardware that they are available,
* and set nr_hwcur = cur and avail = nr_hwavail.
* issuing a dmamap_sync on all slots.
*/
static int
re_netmap_rxsync(void *a, u_int ring_nr, int do_lock)
{
struct rl_softc *sc = a;
struct rl_rxdesc *rxd = sc->rl_ldata.rl_rx_desc;
struct netmap_adapter *na = NA(sc->rl_ifp);
struct netmap_kring *kring = &na->rx_rings[ring_nr];
struct netmap_ring *ring = kring->ring;
int j, k, n, lim = kring->nkr_num_slots - 1;
k = ring->cur;
if ( (kring->nr_kflags & NR_REINIT) || k > lim)
return netmap_ring_reinit(kring);
if (do_lock)
RL_LOCK(sc);
/* XXX check sync modes */
bus_dmamap_sync(sc->rl_ldata.rl_rx_list_tag,
sc->rl_ldata.rl_rx_list_map,
BUS_DMASYNC_POSTREAD | BUS_DMASYNC_POSTWRITE);
/*
* The device uses all the buffers in the ring, so we need
* another termination condition in addition to RL_RDESC_STAT_OWN
* cleared (all buffers could have it cleared. The easiest one
* is to limit the amount of data reported up to 'lim'
*/
j = sc->rl_ldata.rl_rx_prodidx;
for (n = kring->nr_hwavail; n < lim ; n++) {
struct rl_desc *cur_rx = &sc->rl_ldata.rl_rx_list[j];
uint32_t rxstat = le32toh(cur_rx->rl_cmdstat);
uint32_t total_len;
if ((rxstat & RL_RDESC_STAT_OWN) != 0)
break;
total_len = rxstat & sc->rl_rxlenmask;
/* XXX subtract crc */
total_len = (total_len < 4) ? 0 : total_len - 4;
kring->ring->slot[j].len = total_len;
/* sync was in re_newbuf() */
bus_dmamap_sync(sc->rl_ldata.rl_rx_mtag,
rxd[j].rx_dmamap, BUS_DMASYNC_POSTREAD);
j = RL_RX_DESC_NXT(sc, j);
}
if (n != kring->nr_hwavail) {
sc->rl_ldata.rl_rx_prodidx = j;
sc->rl_ifp->if_ipackets += n - kring->nr_hwavail;
kring->nr_hwavail = n;
}
/* skip past packets that userspace has already processed,
* making them available for reception.
* advance nr_hwcur and issue a bus_dmamap_sync on the
* buffers so it is safe to write to them.
* Also increase nr_hwavail
*/
j = kring->nr_hwcur;
if (j != k) { /* userspace has read some packets. */
n = 0;
while (j != k) {
struct netmap_slot *slot = ring->slot + j;
struct rl_desc *desc = &sc->rl_ldata.rl_rx_list[j];
int cmd = na->buff_size | RL_RDESC_CMD_OWN;
void *addr = NMB(slot);
if (addr == netmap_buffer_base) { /* bad buf */
if (do_lock)
RL_UNLOCK(sc);
return netmap_ring_reinit(kring);
}
if (j == lim) /* mark end of ring */
cmd |= RL_RDESC_CMD_EOR;
desc->rl_cmdstat = htole32(cmd);
slot->flags &= ~NS_REPORT;
if (slot->flags & NS_BUF_CHANGED) {
uint64_t paddr = vtophys(addr);
desc->rl_bufaddr_lo = htole32(RL_ADDR_LO(paddr));
desc->rl_bufaddr_hi = htole32(RL_ADDR_HI(paddr));
netmap_reload_map(sc->rl_ldata.rl_rx_mtag,
rxd[j].rx_dmamap, addr, na->buff_size);
slot->flags &= ~NS_BUF_CHANGED;
}
bus_dmamap_sync(sc->rl_ldata.rl_rx_mtag,
rxd[j].rx_dmamap, BUS_DMASYNC_PREREAD);
j = (j == lim) ? 0 : j + 1;
n++;
}
kring->nr_hwavail -= n;
kring->nr_hwcur = k;
/* Flush the RX DMA ring */
bus_dmamap_sync(sc->rl_ldata.rl_rx_list_tag,
sc->rl_ldata.rl_rx_list_map,
BUS_DMASYNC_PREWRITE|BUS_DMASYNC_PREREAD);
}
/* tell userspace that there are new packets */
ring->avail = kring->nr_hwavail ;
if (do_lock)
RL_UNLOCK(sc);
return 0;
}
static void
re_netmap_tx_init(struct rl_softc *sc)
{
struct rl_txdesc *txd;
struct rl_desc *desc;
int i;
struct netmap_adapter *na = NA(sc->rl_ifp);
struct netmap_slot *slot = netmap_reset(na, NR_TX, 0, 0);
/* slot is NULL if we are not in netmap mode */
if (!slot)
return;
/* in netmap mode, overwrite addresses and maps */
txd = sc->rl_ldata.rl_tx_desc;
desc = sc->rl_ldata.rl_tx_list;
for (i = 0; i < sc->rl_ldata.rl_tx_desc_cnt; i++) {
void *addr = NMB(slot+i);
uint64_t paddr = vtophys(addr);
desc[i].rl_bufaddr_lo = htole32(RL_ADDR_LO(paddr));
desc[i].rl_bufaddr_hi = htole32(RL_ADDR_HI(paddr));
netmap_load_map(sc->rl_ldata.rl_tx_mtag,
txd[i].tx_dmamap, addr, na->buff_size);
}
}
static void
re_netmap_rx_init(struct rl_softc *sc)
{
/* slot is NULL if we are not in netmap mode */
struct netmap_adapter *na = NA(sc->rl_ifp);
struct netmap_slot *slot = netmap_reset(na, NR_RX, 0, 0);
struct rl_desc *desc = sc->rl_ldata.rl_rx_list;
uint32_t cmdstat;
int i;
if (!slot)
return;
for (i = 0; i < sc->rl_ldata.rl_rx_desc_cnt; i++) {
void *addr = NMB(slot+i);
uint64_t paddr = vtophys(addr);
desc[i].rl_bufaddr_lo = htole32(RL_ADDR_LO(paddr));
desc[i].rl_bufaddr_hi = htole32(RL_ADDR_HI(paddr));
cmdstat = slot[i].len = na->buff_size; // XXX
if (i == sc->rl_ldata.rl_rx_desc_cnt - 1)
cmdstat |= RL_RDESC_CMD_EOR;
desc[i].rl_cmdstat = htole32(cmdstat | RL_RDESC_CMD_OWN);
netmap_reload_map(sc->rl_ldata.rl_rx_mtag,
sc->rl_ldata.rl_rx_desc[i].rx_dmamap,
addr, na->buff_size);
}
}

View File

@ -0,0 +1,376 @@
/*
* Copyright (C) 2011 Matteo Landi, Luigi Rizzo. All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
* 1. Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
* 2. Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in the
* documentation and/or other materials provided with the distribution.
*
* THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
* ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
* IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
* ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
* FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
* DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
* OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
* HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
* OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
* SUCH DAMAGE.
*/
/*
* $FreeBSD$
* $Id: ixgbe_netmap.h 9662 2011-11-16 13:18:06Z luigi $
*
* netmap modifications for ixgbe
*/
#include <net/netmap.h>
#include <sys/selinfo.h>
// #include <vm/vm.h>
// #include <vm/pmap.h> /* vtophys ? */
#include <dev/netmap/netmap_kern.h>
static int ixgbe_netmap_reg(struct ifnet *, int onoff);
static int ixgbe_netmap_txsync(void *, u_int, int);
static int ixgbe_netmap_rxsync(void *, u_int, int);
static void ixgbe_netmap_lock_wrapper(void *, int, u_int);
SYSCTL_NODE(_dev, OID_AUTO, ixgbe, CTLFLAG_RW, 0, "ixgbe card");
static void
ixgbe_netmap_attach(struct adapter *adapter)
{
struct netmap_adapter na;
bzero(&na, sizeof(na));
na.ifp = adapter->ifp;
na.separate_locks = 1;
na.num_tx_desc = adapter->num_tx_desc;
na.num_rx_desc = adapter->num_rx_desc;
na.nm_txsync = ixgbe_netmap_txsync;
na.nm_rxsync = ixgbe_netmap_rxsync;
na.nm_lock = ixgbe_netmap_lock_wrapper;
na.nm_register = ixgbe_netmap_reg;
/*
* adapter->rx_mbuf_sz is set by SIOCSETMTU, but in netmap mode
* we allocate the buffers on the first register. So we must
* disallow a SIOCSETMTU when if_capenable & IFCAP_NETMAP is set.
*/
na.buff_size = MCLBYTES;
netmap_attach(&na, adapter->num_queues);
}
/*
* wrapper to export locks to the generic code
*/
static void
ixgbe_netmap_lock_wrapper(void *_a, int what, u_int queueid)
{
struct adapter *adapter = _a;
ASSERT(queueid < adapter->num_queues);
switch (what) {
case NETMAP_CORE_LOCK:
IXGBE_CORE_LOCK(adapter);
break;
case NETMAP_CORE_UNLOCK:
IXGBE_CORE_UNLOCK(adapter);
break;
case NETMAP_TX_LOCK:
IXGBE_TX_LOCK(&adapter->tx_rings[queueid]);
break;
case NETMAP_TX_UNLOCK:
IXGBE_TX_UNLOCK(&adapter->tx_rings[queueid]);
break;
case NETMAP_RX_LOCK:
IXGBE_RX_LOCK(&adapter->rx_rings[queueid]);
break;
case NETMAP_RX_UNLOCK:
IXGBE_RX_UNLOCK(&adapter->rx_rings[queueid]);
break;
}
}
/*
* support for netmap register/unregisted. We are already under core lock.
* only called on the first init or the last unregister.
*/
static int
ixgbe_netmap_reg(struct ifnet *ifp, int onoff)
{
struct adapter *adapter = ifp->if_softc;
struct netmap_adapter *na = NA(ifp);
int error = 0;
if (!na)
return EINVAL;
ixgbe_disable_intr(adapter);
/* Tell the stack that the interface is no longer active */
ifp->if_drv_flags &= ~(IFF_DRV_RUNNING | IFF_DRV_OACTIVE);
if (onoff) {
ifp->if_capenable |= IFCAP_NETMAP;
/* save if_transmit to restore it later */
na->if_transmit = ifp->if_transmit;
ifp->if_transmit = netmap_start;
ixgbe_init_locked(adapter);
if ((ifp->if_drv_flags & (IFF_DRV_RUNNING | IFF_DRV_OACTIVE)) == 0) {
error = ENOMEM;
goto fail;
}
} else {
fail:
/* restore if_transmit */
ifp->if_transmit = na->if_transmit;
ifp->if_capenable &= ~IFCAP_NETMAP;
ixgbe_init_locked(adapter); /* also enables intr */
}
return (error);
}
/*
* Reconcile kernel and user view of the transmit ring.
*
* Userspace has filled tx slots up to cur (excluded).
* The last unused slot previously known to the kernel was nr_hwcur,
* and the last interrupt reported nr_hwavail slots available
* (using the special value -1 to indicate idle transmit ring).
* The function must first update avail to what the kernel
* knows, subtract the newly used slots (cur - nr_hwcur)
* from both avail and nr_hwavail, and set nr_hwcur = cur
* issuing a dmamap_sync on all slots.
*
* Check parameters in the struct netmap_ring.
* We don't use avail, only check for bogus values.
* Make sure cur is valid, and same goes for buffer indexes and lengths.
* To avoid races, read the values once, and never use those from
* the ring afterwards.
*/
static int
ixgbe_netmap_txsync(void *a, u_int ring_nr, int do_lock)
{
struct adapter *adapter = a;
struct tx_ring *txr = &adapter->tx_rings[ring_nr];
struct netmap_adapter *na = NA(adapter->ifp);
struct netmap_kring *kring = &na->tx_rings[ring_nr];
struct netmap_ring *ring = kring->ring;
int j, k, n = 0, lim = kring->nkr_num_slots - 1;
/* generate an interrupt approximately every half ring */
int report_frequency = kring->nkr_num_slots >> 1;
k = ring->cur; /* ring is not protected by any lock */
if ( (kring->nr_kflags & NR_REINIT) || k > lim)
return netmap_ring_reinit(kring);
if (do_lock)
IXGBE_TX_LOCK(txr);
bus_dmamap_sync(txr->txdma.dma_tag, txr->txdma.dma_map,
BUS_DMASYNC_POSTREAD);
/* update avail to what the hardware knows */
ring->avail = kring->nr_hwavail;
j = kring->nr_hwcur;
if (j != k) { /* we have new packets to send */
while (j != k) {
struct netmap_slot *slot = &ring->slot[j];
struct ixgbe_tx_buf *txbuf = &txr->tx_buffers[j];
union ixgbe_adv_tx_desc *curr = &txr->tx_base[j];
void *addr = NMB(slot);
int flags = ((slot->flags & NS_REPORT) ||
j == 0 || j == report_frequency) ?
IXGBE_TXD_CMD_RS : 0;
int len = slot->len;
if (addr == netmap_buffer_base || len > NETMAP_BUF_SIZE) {
if (do_lock)
IXGBE_TX_UNLOCK(txr);
return netmap_ring_reinit(kring);
}
slot->flags &= ~NS_REPORT;
curr->read.buffer_addr = htole64(vtophys(addr));
curr->read.olinfo_status = 0;
curr->read.cmd_type_len =
htole32(txr->txd_cmd | len |
(IXGBE_ADVTXD_DTYP_DATA |
IXGBE_ADVTXD_DCMD_IFCS |
IXGBE_TXD_CMD_EOP | flags) );
if (slot->flags & NS_BUF_CHANGED) {
/* buffer has changed, unload and reload map */
netmap_reload_map(txr->txtag, txbuf->map,
addr, na->buff_size);
slot->flags &= ~NS_BUF_CHANGED;
}
bus_dmamap_sync(txr->txtag, txbuf->map,
BUS_DMASYNC_PREWRITE);
j = (j == lim) ? 0 : j + 1;
n++;
}
kring->nr_hwcur = k;
/* decrease avail by number of sent packets */
ring->avail -= n;
kring->nr_hwavail = ring->avail;
bus_dmamap_sync(txr->txdma.dma_tag, txr->txdma.dma_map,
BUS_DMASYNC_PREREAD | BUS_DMASYNC_PREWRITE);
IXGBE_WRITE_REG(&adapter->hw, IXGBE_TDT(txr->me), k);
}
if (n == 0 || kring->nr_hwavail < 1) {
/* record completed transmissions. TODO
*
* The datasheet discourages the use of TDH to find out the
* number of sent packets; the right way to do so, is to check
* the DD bit inside the status of a packet descriptor. On the
* other hand, we avoid to set the `report status' bit for
* *all* outgoing packets (kind of interrupt mitigation),
* consequently the DD bit is not guaranteed to be set for all
* the packets: thats way, for the moment we continue to use
* TDH.
*/
j = IXGBE_READ_REG(&adapter->hw, IXGBE_TDH(ring_nr));
if (j >= kring->nkr_num_slots) { /* XXX can happen */
D("TDH wrap %d", j);
j -= kring->nkr_num_slots;
}
int delta = j - txr->next_to_clean;
if (delta) {
/* new transmissions were completed, increment
ring->nr_hwavail. */
if (delta < 0)
delta += kring->nkr_num_slots;
txr->next_to_clean = j;
kring->nr_hwavail += delta;
ring->avail = kring->nr_hwavail;
}
}
if (do_lock)
IXGBE_TX_UNLOCK(txr);
return 0;
}
/*
* Reconcile kernel and user view of the receive ring.
*
* Userspace has read rx slots up to cur (excluded).
* The last unread slot previously known to the kernel was nr_hwcur,
* and the last interrupt reported nr_hwavail slots available.
* We must subtract the newly consumed slots (cur - nr_hwcur)
* from nr_hwavail, clearing the descriptors for the next
* read, tell the hardware that they are available,
* and set nr_hwcur = cur and avail = nr_hwavail.
* issuing a dmamap_sync on all slots.
*/
static int
ixgbe_netmap_rxsync(void *a, u_int ring_nr, int do_lock)
{
struct adapter *adapter = a;
struct rx_ring *rxr = &adapter->rx_rings[ring_nr];
struct netmap_adapter *na = NA(adapter->ifp);
struct netmap_kring *kring = &na->rx_rings[ring_nr];
struct netmap_ring *ring = kring->ring;
int j, k, n, lim = kring->nkr_num_slots - 1;
k = ring->cur; /* ring is not protected by any lock */
if ( (kring->nr_kflags & NR_REINIT) || k > lim)
return netmap_ring_reinit(kring);
if (do_lock)
IXGBE_RX_LOCK(rxr);
/* XXX check sync modes */
bus_dmamap_sync(rxr->rxdma.dma_tag, rxr->rxdma.dma_map,
BUS_DMASYNC_POSTREAD | BUS_DMASYNC_POSTWRITE);
j = rxr->next_to_check;
for (n = 0; ; n++) {
union ixgbe_adv_rx_desc *curr = &rxr->rx_base[j];
uint32_t staterr = le32toh(curr->wb.upper.status_error);
if ((staterr & IXGBE_RXD_STAT_DD) == 0)
break;
ring->slot[j].len = le16toh(curr->wb.upper.length);
bus_dmamap_sync(rxr->ptag,
rxr->rx_buffers[j].pmap, BUS_DMASYNC_POSTREAD);
j = (j == lim) ? 0 : j + 1;
}
if (n) {
rxr->next_to_check = j;
kring->nr_hwavail += n;
if (kring->nr_hwavail >= lim - 10) {
ND("rx ring %d almost full %d", ring_nr, kring->nr_hwavail);
}
}
/* skip past packets that userspace has already processed,
* making them available for reception.
* advance nr_hwcur and issue a bus_dmamap_sync on the
* buffers so it is safe to write to them.
* Also increase nr_hwavail
*/
j = kring->nr_hwcur;
if (j != k) { /* userspace has read some packets. */
n = 0;
while (j != k) {
struct netmap_slot *slot = ring->slot + j;
union ixgbe_adv_rx_desc *curr = &rxr->rx_base[j];
struct ixgbe_rx_buf *rxbuf = rxr->rx_buffers + j;
void *addr = NMB(slot);
if (addr == netmap_buffer_base) { /* bad buf */
if (do_lock)
IXGBE_RX_UNLOCK(rxr);
return netmap_ring_reinit(kring);
}
curr->wb.upper.status_error = 0;
curr->read.pkt_addr = htole64(vtophys(addr));
if (slot->flags & NS_BUF_CHANGED) {
netmap_reload_map(rxr->ptag, rxbuf->pmap,
addr, na->buff_size);
slot->flags &= ~NS_BUF_CHANGED;
}
bus_dmamap_sync(rxr->ptag, rxbuf->pmap,
BUS_DMASYNC_PREREAD);
j = (j == lim) ? 0 : j + 1;
n++;
}
kring->nr_hwavail -= n;
kring->nr_hwcur = ring->cur;
bus_dmamap_sync(rxr->rxdma.dma_tag, rxr->rxdma.dma_map,
BUS_DMASYNC_PREREAD | BUS_DMASYNC_PREWRITE);
/* IMPORTANT: we must leave one free slot in the ring,
* so move j back by one unit
*/
j = (j == 0) ? lim : j - 1;
IXGBE_WRITE_REG(&adapter->hw, IXGBE_RDT(rxr->me), j);
}
/* tell userspace that there are new packets */
ring->avail = kring->nr_hwavail ;
if (do_lock)
IXGBE_RX_UNLOCK(rxr);
return 0;
}

1762
sys/dev/netmap/netmap.c Normal file

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,221 @@
/*
* Copyright (C) 2011 Matteo Landi, Luigi Rizzo. All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
* 1. Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
* 2. Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in the
* documentation and/or other materials provided with the distribution.
*
* THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
* ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
* IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
* ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
* FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
* DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
* OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
* HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
* OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
* SUCH DAMAGE.
*/
/*
* $FreeBSD$
* $Id: netmap_kern.h 9662 2011-11-16 13:18:06Z luigi $
*
* The header contains the definitions of constants and function
* prototypes used only in kernelspace.
*/
#ifndef _NET_NETMAP_KERN_H_
#define _NET_NETMAP_KERN_H_
#ifdef MALLOC_DECLARE
MALLOC_DECLARE(M_NETMAP);
#endif
#define ND(format, ...)
#define D(format, ...) \
do { \
struct timeval __xxts; \
microtime(&__xxts); \
printf("%03d.%06d %s [%d] " format "\n",\
(int)__xxts.tv_sec % 1000, (int)__xxts.tv_usec, \
__FUNCTION__, __LINE__, ##__VA_ARGS__); \
} while (0)
struct netmap_adapter;
/*
* private, kernel view of a ring.
*
* XXX 20110627-todo
* The index in the NIC and netmap ring is offset by nkr_hwofs slots.
* This is so that, on a reset, buffers owned by userspace are not
* modified by the kernel. In particular:
* RX rings: the next empty buffer (hwcur + hwavail + hwofs) coincides
* the next empty buffer as known by the hardware (next_to_check or so).
* TX rings: hwcur + hwofs coincides with next_to_send
*/
struct netmap_kring {
struct netmap_ring *ring;
u_int nr_hwcur;
int nr_hwavail;
u_int nr_kflags;
u_int nkr_num_slots;
u_int nkr_hwofs; /* offset between NIC and netmap ring */
struct netmap_adapter *na; // debugging
struct selinfo si; /* poll/select wait queue */
};
/*
* This struct is part of and extends the 'struct adapter' (or
* equivalent) device descriptor. It contains all fields needed to
* support netmap operation.
*/
struct netmap_adapter {
int refcount; /* number of user-space descriptors using this
interface, which is equal to the number of
struct netmap_if objs in the mapped region. */
int separate_locks; /* set if the interface suports different
locks for rx, tx and core. */
u_int num_queues; /* number of tx/rx queue pairs: this is
a duplicate field needed to simplify the
signature of ``netmap_detach``. */
u_int num_tx_desc; /* number of descriptor in each queue */
u_int num_rx_desc;
u_int buff_size;
u_int flags; /* NR_REINIT */
/* tx_rings and rx_rings are private but allocated
* as a contiguous chunk of memory. Each array has
* N+1 entries, for the adapter queues and for the host queue.
*/
struct netmap_kring *tx_rings; /* array of TX rings. */
struct netmap_kring *rx_rings; /* array of RX rings. */
/* copy of if_qflush and if_transmit pointers, to intercept
* packets from the network stack when netmap is active.
* XXX probably if_qflush is not necessary.
*/
void (*if_qflush)(struct ifnet *);
int (*if_transmit)(struct ifnet *, struct mbuf *);
/* references to the ifnet and device routines, used by
* the generic netmap functions.
*/
struct ifnet *ifp; /* adapter is ifp->if_softc */
int (*nm_register)(struct ifnet *, int onoff);
void (*nm_lock)(void *, int what, u_int ringid);
int (*nm_txsync)(void *, u_int ring, int lock);
int (*nm_rxsync)(void *, u_int ring, int lock);
};
/*
* The combination of "enable" (ifp->if_capabilities &IFCAP_NETMAP)
* and refcount gives the status of the interface, namely:
*
* enable refcount Status
*
* FALSE 0 normal operation
* FALSE != 0 -- (impossible)
* TRUE 1 netmap mode
* TRUE 0 being deleted.
*/
#define NETMAP_DELETING(_na) ( ((_na)->refcount == 0) && \
( (_na)->ifp->if_capenable & IFCAP_NETMAP) )
/*
* parameters for (*nm_lock)(adapter, what, index)
*/
enum {
NETMAP_NO_LOCK = 0,
NETMAP_CORE_LOCK, NETMAP_CORE_UNLOCK,
NETMAP_TX_LOCK, NETMAP_TX_UNLOCK,
NETMAP_RX_LOCK, NETMAP_RX_UNLOCK,
};
/*
* The following are support routines used by individual drivers to
* support netmap operation.
*
* netmap_attach() initializes a struct netmap_adapter, allocating the
* struct netmap_ring's and the struct selinfo.
*
* netmap_detach() frees the memory allocated by netmap_attach().
*
* netmap_start() replaces the if_transmit routine of the interface,
* and is used to intercept packets coming from the stack.
*
* netmap_load_map/netmap_reload_map are helper routines to set/reset
* the dmamap for a packet buffer
*
* netmap_reset() is a helper routine to be called in the driver
* when reinitializing a ring.
*/
int netmap_attach(struct netmap_adapter *, int);
void netmap_detach(struct ifnet *);
int netmap_start(struct ifnet *, struct mbuf *);
enum txrx { NR_RX = 0, NR_TX = 1 };
struct netmap_slot *netmap_reset(struct netmap_adapter *na,
enum txrx tx, int n, u_int new_cur);
void netmap_load_map(bus_dma_tag_t tag, bus_dmamap_t map,
void *buf, bus_size_t buflen);
void netmap_reload_map(bus_dma_tag_t tag, bus_dmamap_t map,
void *buf, bus_size_t buflen);
int netmap_ring_reinit(struct netmap_kring *);
/*
* XXX eventually, get rid of netmap_total_buffers and netmap_buffer_base
* in favour of the structure
*/
// struct netmap_buf_pool;
// extern struct netmap_buf_pool nm_buf_pool;
extern u_int netmap_total_buffers;
extern char *netmap_buffer_base;
extern int netmap_verbose; // XXX debugging
enum { /* verbose flags */
NM_VERB_ON = 1, /* generic verbose */
NM_VERB_HOST = 0x2, /* verbose host stack */
NM_VERB_RXSYNC = 0x10, /* verbose on rxsync/txsync */
NM_VERB_TXSYNC = 0x20,
NM_VERB_RXINTR = 0x100, /* verbose on rx/tx intr (driver) */
NM_VERB_TXINTR = 0x200,
NM_VERB_NIC_RXSYNC = 0x1000, /* verbose on rx/tx intr (driver) */
NM_VERB_NIC_TXSYNC = 0x2000,
};
/*
* return a pointer to the struct netmap adapter from the ifp
*/
#define NA(_ifp) ((struct netmap_adapter *)(_ifp)->if_pspare[0])
/*
* return the address of a buffer.
* XXX this is a special version with hardwired 2k bufs
* On error return netmap_buffer_base which is detected as a bad pointer.
*/
static inline char *
NMB(struct netmap_slot *slot)
{
uint32_t i = slot->buf_idx;
return (i >= netmap_total_buffers) ? netmap_buffer_base :
#if NETMAP_BUF_SIZE == 2048
netmap_buffer_base + (i << 11);
#else
netmap_buffer_base + (i *NETMAP_BUF_SIZE);
#endif
}
#endif /* _NET_NETMAP_KERN_H_ */

281
sys/net/netmap.h Normal file
View File

@ -0,0 +1,281 @@
/*
* Copyright (C) 2011 Matteo Landi, Luigi Rizzo. All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions are
* met:
*
* 1. Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
*
* 2. Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in the
* documentation and/or other materials provided with the
* distribution.
*
* 3. Neither the name of the authors nor the names of their contributors
* may be used to endorse or promote products derived from this
* software without specific prior written permission.
*
* THIS SOFTWARE IS PROVIDED BY MATTEO LANDI AND CONTRIBUTORS "AS IS" AND
* ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
* IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
* PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL MATTEO LANDI OR CONTRIBUTORS
* BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
* CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
* SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
* INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
* CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
* ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF
* THE POSSIBILITY OF SUCH DAMAGE.
*/
/*
* $FreeBSD$
* $Id: netmap.h 9662 2011-11-16 13:18:06Z luigi $
*
* This header contains the definitions of the constants and the
* structures needed by the ``netmap'' module, both kernel and
* userspace.
*/
#ifndef _NET_NETMAP_H_
#define _NET_NETMAP_H_
/*
* --- Netmap data structures ---
*
* The data structures used by netmap are shown below. Those in
* capital letters are in an mmapp()ed area shared with userspace,
* while others are private to the kernel.
* Shared structures do not contain pointers but only relative
* offsets, so that addressing is portable between kernel and userspace.
*
* The 'softc' of each interface is extended with a struct netmap_adapter
* containing information to support netmap operation. In addition to
* the fixed fields, it has two pointers to reach the arrays of
* 'struct netmap_kring' which in turn reaches the various
* struct netmap_ring, shared with userspace.
softc
+----------------+
| standard fields|
| if_pspare[0] ----------+
+----------------+ |
|
+----------------+<------+
|(netmap_adapter)|
| | netmap_kring
| tx_rings *--------------------------------->+-------------+
| | netmap_kring | ring *---------> ...
| rx_rings *---------->+--------------+ | nr_hwcur |
+----------------+ | ring *-------+ | nr_hwavail |
| nr_hwcur | | | selinfo |
| nr_hwavail | | +-------------+
| selinfo | | | ... |
+--------------+ | (na_num_rings+1 entries)
| .... | | | |
(na_num_rings+1 entries) +-------------+
| | |
+--------------+ |
| NETMAP_RING
+---->+-------------+
/ | cur |
NETMAP_IF (nifp, one per file desc.) / | avail |
+---------------+ / | buf_ofs |
| ni_num_queues | / +=============+
| | / | buf_idx | slot[0]
| | / | len, flags |
| | / +-------------+
+===============+ / | buf_idx | slot[1]
| txring_ofs[0] | (rel.to nifp)--' | len, flags |
| txring_ofs[1] | +-------------+
(num_rings+1 entries) (nr_num_slots entries)
| txring_ofs[n] | | buf_idx | slot[n-1]
+---------------+ | len, flags |
| rxring_ofs[0] | +-------------+
| rxring_ofs[1] |
(num_rings+1 entries)
| txring_ofs[n] |
+---------------+
* The NETMAP_RING is the shadow ring that mirrors the NIC rings.
* Each slot has the index of a buffer, its length and some flags.
* In user space, the buffer address is computed as
* (char *)ring + buf_ofs + index*MAX_BUF_SIZE
* In the kernel, buffers do not necessarily need to be contiguous,
* and the virtual and physical addresses are derived through
* a lookup table. When userspace wants to use a different buffer
* in a location, it must set the NS_BUF_CHANGED flag to make
* sure that the kernel recomputes updates the hardware ring and
* other fields (bus_dmamap, etc.) as needed.
*
* Normally the driver is not requested to report the result of
* transmissions (this can dramatically speed up operation).
* However the user may request to report completion by setting
* NS_REPORT.
*/
struct netmap_slot {
uint32_t buf_idx; /* buffer index */
uint16_t len; /* packet length, to be copied to/from the hw ring */
uint16_t flags; /* buf changed, etc. */
#define NS_BUF_CHANGED 0x0001 /* must resync the map, buffer changed */
#define NS_REPORT 0x0002 /* ask the hardware to report results
* e.g. by generating an interrupt
*/
};
/*
* Netmap representation of a TX or RX ring (also known as "queue").
* This is a queue implemented as a fixed-size circular array.
* At the software level, two fields are important: avail and cur.
*
* In TX rings:
* avail indicates the number of slots available for transmission.
* It is decremented by the application when it appends a
* packet, and set to nr_hwavail (see below) on a
* NIOCTXSYNC to reflect the actual state of the queue
* (keeping track of completed transmissions).
* cur indicates the empty slot to use for the next packet
* to send (i.e. the "tail" of the queue).
* It is incremented by the application.
*
* The kernel side of netmap uses two additional fields in its own
* private ring structure, netmap_kring:
* nr_hwcur is a copy of nr_cur on an NIOCTXSYNC.
* nr_hwavail is the number of slots known as available by the
* hardware. It is updated on an INTR (inc by the
* number of packets sent) and on a NIOCTXSYNC
* (decrease by nr_cur - nr_hwcur)
* A special case, nr_hwavail is -1 if the transmit
* side is idle (no pending transmits).
*
* In RX rings:
* avail is the number of packets available (possibly 0).
* It is decremented by the software when it consumes
* a packet, and set to nr_hwavail on a NIOCRXSYNC
* cur indicates the first slot that contains a packet
* (the "head" of the queue).
* It is incremented by the software when it consumes
* a packet.
*
* The kernel side of netmap uses two additional fields in the kring:
* nr_hwcur is a copy of nr_cur on an NIOCRXSYNC
* nr_hwavail is the number of packets available. It is updated
* on INTR (inc by the number of new packets arrived)
* and on NIOCRXSYNC (decreased by nr_cur - nr_hwcur).
*
* DATA OWNERSHIP/LOCKING:
* The netmap_ring is owned by the user program and it is only
* accessed or modified in the upper half of the kernel during
* a system call.
*
* The netmap_kring is only modified by the upper half of the kernel.
*/
struct netmap_ring {
/*
* nr_buf_base_ofs is meant to be used through macros.
* It contains the offset of the buffer region from this
* descriptor.
*/
const ssize_t buf_ofs;
const uint32_t num_slots; /* number of slots in the ring. */
uint32_t avail; /* number of usable slots */
uint32_t cur; /* 'current' r/w position */
const uint16_t nr_buf_size;
uint16_t flags;
/*
* When a ring is reinitialized, the kernel sets kflags.
* On exit from a syscall, if the flag is found set, we
* also reinitialize the nr_* variables. The kflag is then
* unconditionally copied to nr_flags and cleared.
*/
#define NR_REINIT 0x0001 /* ring reinitialized! */
#define NR_TIMESTAMP 0x0002 /* set timestamp on *sync() */
struct timeval ts; /* time of last *sync() */
/* the slots follow. This struct has variable size */
struct netmap_slot slot[0]; /* array of slots. */
};
/*
* Netmap representation of an interface and its queue(s).
* There is one netmap_if for each file descriptor on which we want
* to select/poll. We assume that on each interface has the same number
* of receive and transmit queues.
* select/poll operates on one or all pairs depending on the value of
* nmr_queueid passed on the ioctl.
*/
struct netmap_if {
char ni_name[IFNAMSIZ]; /* name of the interface. */
const u_int ni_version; /* API version, currently unused */
const u_int ni_num_queues; /* number of queue pairs (TX/RX). */
const u_int ni_rx_queues; /* if zero, use ni_num_queues */
/*
* the following array contains the offset of the
* each netmap ring from this structure. The first num_queues+1
* refer to the tx rings, the next n+1 refer to the rx rings.
* The area is filled up by the kernel on NIOCREG,
* and then only read by userspace code.
* entries 0..ni_num_queues-1 indicate the hardware queues,
* entry ni_num_queues is the queue from/to the stack.
*/
const ssize_t ring_ofs[0];
};
#ifndef IFCAP_NETMAP /* this should go in net/if.h */
#define IFCAP_NETMAP 0x100000
#endif
#ifndef NIOCREGIF
/*
* ioctl names and related fields
*
* NIOCGINFO takes a struct ifreq, the interface name is the input,
* the outputs are number of queues and number of descriptor
* for each queue (useful to set number of threads etc.).
*
* NIOCREGIF takes an interface name within a struct ifreq,
* and activates netmap mode on the interface (if possible).
*
* NIOCUNREGIF unregisters the interface associated to the fd.
*
* NIOCTXSYNC, NIOCRXSYNC synchronize tx or rx queues,
* whose identity is set in NIOCREGIF through nr_ringid
*/
/*
* struct nmreq overlays a struct ifreq
*/
struct nmreq {
char nr_name[IFNAMSIZ];
uint32_t nr_version; /* API version (unused) */
uint32_t nr_offset; /* nifp offset in the shared region */
uint32_t nr_memsize; /* size of the shared region */
uint32_t nr_numslots; /* descriptors per queue */
uint16_t nr_numrings;
uint16_t nr_ringid; /* ring(s) we care about */
#define NETMAP_HW_RING 0x4000 /* low bits indicate one hw ring */
#define NETMAP_SW_RING 0x2000 /* we process the sw ring */
#define NETMAP_NO_TX_POLL 0x1000 /* no gratuitous txsync on poll */
#define NETMAP_RING_MASK 0xfff /* the ring number */
};
/*
* default buf size is 2048, but it may make sense to have
* it shorter for better cache usage.
*/
#define NETMAP_BUF_SIZE (2048)
#define NIOCGINFO _IOWR('i', 145, struct nmreq) /* return IF info */
#define NIOCREGIF _IOWR('i', 146, struct nmreq) /* interface register */
#define NIOCUNREGIF _IO('i', 147) /* interface unregister */
#define NIOCTXSYNC _IO('i', 148) /* sync tx queues */
#define NIOCRXSYNC _IO('i', 149) /* sync rx queues */
#endif /* !NIOCREGIF */
#endif /* _NET_NETMAP_H_ */

98
sys/net/netmap_user.h Normal file
View File

@ -0,0 +1,98 @@
/*
* Copyright (C) 2011 Matteo Landi, Luigi Rizzo. All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions are
* met:
*
* 1. Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
*
* 2. Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in the
* documentation and/or other materials provided with the
* distribution.
*
* 3. Neither the name of the authors nor the names of their contributors
* may be used to endorse or promote products derived from this
* software without specific prior written permission.
*
* THIS SOFTWARE IS PROVIDED BY MATTEO LANDI AND CONTRIBUTORS "AS IS" AND
* ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
* IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
* PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL MATTEO LANDI OR CONTRIBUTORS
* BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
* CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
* SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
* INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
* CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
* ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF
* THE POSSIBILITY OF SUCH DAMAGE.
*/
/*
* $FreeBSD$
* $Id: netmap_user.h 9495 2011-10-18 15:28:23Z luigi $
*
* This header contains the macros used to manipulate netmap structures
* and packets in userspace. See netmap(4) for more information.
*
* The address of the struct netmap_if, say nifp, is determined
* by the value returned from ioctl(.., NIOCREG, ...) and the mmap
* region:
* ioctl(fd, NIOCREG, &req);
* mem = mmap(0, ... );
* nifp = NETMAP_IF(mem, req.nr_nifp);
* (so simple, we could just do it manually)
*
* From there:
* struct netmap_ring *NETMAP_TXRING(nifp, index)
* struct netmap_ring *NETMAP_RXRING(nifp, index)
* we can access ring->nr_cur, ring->nr_avail, ring->nr_flags
*
* ring->slot[i] gives us the i-th slot (we can access
* directly plen, flags, bufindex)
*
* char *buf = NETMAP_BUF(ring, index) returns a pointer to
* the i-th buffer
*
* Since rings are circular, we have macros to compute the next index
* i = NETMAP_RING_NEXT(ring, i);
*/
#ifndef _NET_NETMAP_USER_H_
#define _NET_NETMAP_USER_H_
#define NETMAP_IF(b, o) (struct netmap_if *)((char *)(b) + (o))
#define NETMAP_TXRING(nifp, index) \
((struct netmap_ring *)((char *)(nifp) + \
(nifp)->ring_ofs[index] ) )
#define NETMAP_RXRING(nifp, index) \
((struct netmap_ring *)((char *)(nifp) + \
(nifp)->ring_ofs[index + (nifp)->ni_num_queues+1] ) )
#if NETMAP_BUF_SIZE != 2048
#error cannot handle odd size
#define NETMAP_BUF(ring, index) \
((char *)(ring) + (ring)->buf_ofs + ((index)*NETMAP_BUF_SIZE))
#else
#define NETMAP_BUF(ring, index) \
((char *)(ring) + (ring)->buf_ofs + ((index)<<11))
#endif
#define NETMAP_RING_NEXT(r, i) \
((i)+1 == (r)->num_slots ? 0 : (i) + 1 )
/*
* Return 1 if the given tx ring is empty.
*
* @r netmap_ring descriptor pointer.
* Special case, a negative value in hwavail indicates that the
* transmit queue is idle.
* XXX revise
*/
#define NETMAP_TX_RING_EMPTY(r) ((r)->avail >= (r)->num_slots - 1)
#endif /* _NET_NETMAP_USER_H_ */

View File

@ -50,6 +50,7 @@ mfc Merge a directory from HEAD to a branch where it does not
mid Create a Message-ID database for mailing lists.
mwl Tools specific to the Marvell 88W8363 support
ncpus Count the number of processors
netmap Test applications for netmap(4)
notescheck Check for missing devices and options in NOTES files.
npe Tools specific to the Intel IXP4XXX NPE device
nxge A diagnostic tool for the nxge(4) driver

View File

@ -0,0 +1,25 @@
#
# $FreeBSD$
#
# For multiple programs using a single source file each,
# we can just define 'progs' and create custom targets.
PROGS = pkt-gen bridge testpcap libnetmap.so
CLEANFILES = $(PROGS) pcap.o
NO_MAN=
CFLAGS += -Werror -Wall -nostdinc -I/usr/include -I../../../sys
CFLAGS += -Wextra
LDFLAGS += -lpthread -lpcap
.include <bsd.prog.mk>
.include <bsd.lib.mk>
all: $(PROGS)
testpcap: pcap.c libnetmap.so
$(CC) $(CFLAGS) -L. -lnetmap -o ${.TARGET} pcap.c
libnetmap.so: pcap.c
$(CC) $(CFLAGS) -fpic -c ${.ALLSRC}
$(CC) -shared -o ${.TARGET} ${.ALLSRC:.c=.o}

11
tools/tools/netmap/README Normal file
View File

@ -0,0 +1,11 @@
$FreeBSD$
This directory contains examples that use netmap
pkt-gen a packet sink/source using the netmap API
bridge a two-port jumper wire, also using the native API
testpcap a jumper wire using libnetmap (or libpcap)
click* various click examples

456
tools/tools/netmap/bridge.c Normal file
View File

@ -0,0 +1,456 @@
/*
* (C) 2011 Luigi Rizzo, Matteo Landi
*
* BSD license
*
* A netmap client to bridge two network interfaces
* (or one interface and the host stack).
*
* $FreeBSD$
*/
#include <errno.h>
#include <signal.h> /* signal */
#include <stdlib.h>
#include <stdio.h>
#include <string.h> /* strcmp */
#include <fcntl.h> /* open */
#include <unistd.h> /* close */
#include <sys/endian.h> /* le64toh */
#include <sys/mman.h> /* PROT_* */
#include <sys/ioctl.h> /* ioctl */
#include <machine/param.h>
#include <sys/poll.h>
#include <sys/socket.h> /* sockaddr.. */
#include <arpa/inet.h> /* ntohs */
#include <net/if.h> /* ifreq */
#include <net/ethernet.h>
#include <net/netmap.h>
#include <net/netmap_user.h>
#include <netinet/in.h> /* sockaddr_in */
#define MIN(a, b) ((a) < (b) ? (a) : (b))
int verbose = 0;
/* debug support */
#define ND(format, ...) {}
#define D(format, ...) do { \
if (!verbose) break; \
struct timeval _xxts; \
gettimeofday(&_xxts, NULL); \
fprintf(stderr, "%03d.%06d %s [%d] " format "\n", \
(int)_xxts.tv_sec %1000, (int)_xxts.tv_usec, \
__FUNCTION__, __LINE__, ##__VA_ARGS__); \
} while (0)
char *version = "$Id: bridge.c 9642 2011-11-07 21:39:47Z luigi $";
static int do_abort = 0;
/*
* info on a ring we handle
*/
struct my_ring {
const char *ifname;
int fd;
char *mem; /* userspace mmap address */
u_int memsize;
u_int queueid;
u_int begin, end; /* first..last+1 rings to check */
struct netmap_if *nifp;
struct netmap_ring *tx, *rx; /* shortcuts */
uint32_t if_flags;
uint32_t if_reqcap;
uint32_t if_curcap;
};
static void
sigint_h(__unused int sig)
{
do_abort = 1;
signal(SIGINT, SIG_DFL);
}
static int
do_ioctl(struct my_ring *me, int what)
{
struct ifreq ifr;
int error;
bzero(&ifr, sizeof(ifr));
strncpy(ifr.ifr_name, me->ifname, sizeof(ifr.ifr_name));
switch (what) {
case SIOCSIFFLAGS:
ifr.ifr_flagshigh = me->if_flags >> 16;
ifr.ifr_flags = me->if_flags & 0xffff;
break;
case SIOCSIFCAP:
ifr.ifr_reqcap = me->if_reqcap;
ifr.ifr_curcap = me->if_curcap;
break;
}
error = ioctl(me->fd, what, &ifr);
if (error) {
D("ioctl error %d", what);
return error;
}
switch (what) {
case SIOCGIFFLAGS:
me->if_flags = (ifr.ifr_flagshigh << 16) |
(0xffff & ifr.ifr_flags);
if (verbose)
D("flags are 0x%x", me->if_flags);
break;
case SIOCGIFCAP:
me->if_reqcap = ifr.ifr_reqcap;
me->if_curcap = ifr.ifr_curcap;
if (verbose)
D("curcap are 0x%x", me->if_curcap);
break;
}
return 0;
}
/*
* open a device. if me->mem is null then do an mmap.
*/
static int
netmap_open(struct my_ring *me, int ringid)
{
int fd, err, l;
struct nmreq req;
me->fd = fd = open("/dev/netmap", O_RDWR);
if (fd < 0) {
D("Unable to open /dev/netmap");
return (-1);
}
bzero(&req, sizeof(req));
strncpy(req.nr_name, me->ifname, sizeof(req.nr_name));
req.nr_ringid = ringid;
err = ioctl(fd, NIOCGINFO, &req);
if (err) {
D("cannot get info on %s", me->ifname);
goto error;
}
me->memsize = l = req.nr_memsize;
if (verbose)
D("memsize is %d MB", l>>20);
err = ioctl(fd, NIOCREGIF, &req);
if (err) {
D("Unable to register %s", me->ifname);
goto error;
}
if (me->mem == NULL) {
me->mem = mmap(0, l, PROT_WRITE | PROT_READ, MAP_SHARED, fd, 0);
if (me->mem == MAP_FAILED) {
D("Unable to mmap");
me->mem = NULL;
goto error;
}
}
me->nifp = NETMAP_IF(me->mem, req.nr_offset);
me->queueid = ringid;
if (ringid & NETMAP_SW_RING) {
me->begin = req.nr_numrings;
me->end = me->begin + 1;
} else if (ringid & NETMAP_HW_RING) {
me->begin = ringid & NETMAP_RING_MASK;
me->end = me->begin + 1;
} else {
me->begin = 0;
me->end = req.nr_numrings;
}
me->tx = NETMAP_TXRING(me->nifp, me->begin);
me->rx = NETMAP_RXRING(me->nifp, me->begin);
return (0);
error:
close(me->fd);
return -1;
}
static int
netmap_close(struct my_ring *me)
{
D("");
if (me->mem)
munmap(me->mem, me->memsize);
ioctl(me->fd, NIOCUNREGIF, NULL);
close(me->fd);
return (0);
}
/*
* move up to 'limit' pkts from rxring to txring swapping buffers.
*/
static int
process_rings(struct netmap_ring *rxring, struct netmap_ring *txring,
u_int limit, const char *msg)
{
u_int j, k, m = 0;
/* print a warning if any of the ring flags is set (e.g. NM_REINIT) */
if (rxring->flags || txring->flags)
D("%s rxflags %x txflags %x",
msg, rxring->flags, txring->flags);
j = rxring->cur; /* RX */
k = txring->cur; /* TX */
if (rxring->avail < limit)
limit = rxring->avail;
if (txring->avail < limit)
limit = txring->avail;
m = limit;
while (limit-- > 0) {
struct netmap_slot *rs = &rxring->slot[j];
struct netmap_slot *ts = &txring->slot[k];
uint32_t pkt;
/* swap packets */
if (ts->buf_idx < 2 || rs->buf_idx < 2) {
D("wrong index rx[%d] = %d -> tx[%d] = %d",
j, rs->buf_idx, k, ts->buf_idx);
sleep(2);
}
pkt = ts->buf_idx;
ts->buf_idx = rs->buf_idx;
rs->buf_idx = pkt;
/* copy the packet lenght. */
if (rs->len < 14 || rs->len > 2048)
D("wrong len %d rx[%d] -> tx[%d]", rs->len, j, k);
else if (verbose > 1)
D("send len %d rx[%d] -> tx[%d]", rs->len, j, k);
ts->len = rs->len;
/* report the buffer change. */
ts->flags |= NS_BUF_CHANGED;
rs->flags |= NS_BUF_CHANGED;
j = NETMAP_RING_NEXT(rxring, j);
k = NETMAP_RING_NEXT(txring, k);
}
rxring->avail -= m;
txring->avail -= m;
rxring->cur = j;
txring->cur = k;
if (verbose && m > 0)
D("sent %d packets to %p", m, txring);
return (m);
}
/* move packts from src to destination */
static int
move(struct my_ring *src, struct my_ring *dst, u_int limit)
{
struct netmap_ring *txring, *rxring;
u_int m = 0, si = src->begin, di = dst->begin;
const char *msg = (src->queueid & NETMAP_SW_RING) ?
"host->net" : "net->host";
while (si < src->end && di < dst->end) {
rxring = NETMAP_RXRING(src->nifp, si);
txring = NETMAP_TXRING(dst->nifp, di);
ND("txring %p rxring %p", txring, rxring);
if (rxring->avail == 0) {
si++;
continue;
}
if (txring->avail == 0) {
di++;
continue;
}
m += process_rings(rxring, txring, limit, msg);
}
return (m);
}
/*
* how many packets on this set of queues ?
*/
static int
howmany(struct my_ring *me, int tx)
{
u_int i, tot = 0;
ND("me %p begin %d end %d", me, me->begin, me->end);
for (i = me->begin; i < me->end; i++) {
struct netmap_ring *ring = tx ?
NETMAP_TXRING(me->nifp, i) : NETMAP_RXRING(me->nifp, i);
tot += ring->avail;
}
if (0 && verbose && tot && !tx)
D("ring %s %s %s has %d avail at %d",
me->ifname, tx ? "tx": "rx",
me->end > me->nifp->ni_num_queues ?
"host":"net",
tot, NETMAP_TXRING(me->nifp, me->begin)->cur);
return tot;
}
/*
* bridge [-v] if1 [if2]
*
* If only one name, or the two interfaces are the same,
* bridges userland and the adapter. Otherwise bridge
* two intefaces.
*/
int
main(int argc, char **argv)
{
struct pollfd pollfd[2];
int i;
u_int burst = 1024;
struct my_ring me[2];
fprintf(stderr, "%s %s built %s %s\n",
argv[0], version, __DATE__, __TIME__);
bzero(me, sizeof(me));
while (argc > 1 && !strcmp(argv[1], "-v")) {
verbose++;
argv++;
argc--;
}
if (argc < 2 || argc > 4) {
D("Usage: %s IFNAME1 [IFNAME2 [BURST]]", argv[0]);
return (1);
}
/* setup netmap interface #1. */
me[0].ifname = argv[1];
if (argc == 2 || !strcmp(argv[1], argv[2])) {
D("same interface, endpoint 0 goes to host");
i = NETMAP_SW_RING;
me[1].ifname = argv[1];
} else {
/* two different interfaces. Take all rings on if1 */
i = 0; // all hw rings
me[1].ifname = argv[2];
}
if (netmap_open(me, i))
return (1);
me[1].mem = me[0].mem; /* copy the pointer, so only one mmap */
if (netmap_open(me+1, 0))
return (1);
/* if bridging two interfaces, set promisc mode */
if (i != NETMAP_SW_RING) {
do_ioctl(me, SIOCGIFFLAGS);
if ((me[0].if_flags & IFF_UP) == 0) {
D("%s is down, bringing up...", me[0].ifname);
me[0].if_flags |= IFF_UP;
}
me[0].if_flags |= IFF_PPROMISC;
do_ioctl(me, SIOCSIFFLAGS);
do_ioctl(me+1, SIOCGIFFLAGS);
me[1].if_flags |= IFF_PPROMISC;
do_ioctl(me+1, SIOCSIFFLAGS);
/* also disable checksums etc. */
do_ioctl(me, SIOCGIFCAP);
me[0].if_reqcap = me[0].if_curcap;
me[0].if_reqcap &= ~(IFCAP_HWCSUM | IFCAP_TSO | IFCAP_TOE);
do_ioctl(me+0, SIOCSIFCAP);
}
do_ioctl(me+1, SIOCGIFFLAGS);
if ((me[1].if_flags & IFF_UP) == 0) {
D("%s is down, bringing up...", me[1].ifname);
me[1].if_flags |= IFF_UP;
}
do_ioctl(me+1, SIOCSIFFLAGS);
do_ioctl(me+1, SIOCGIFCAP);
me[1].if_reqcap = me[1].if_curcap;
me[1].if_reqcap &= ~(IFCAP_HWCSUM | IFCAP_TSO | IFCAP_TOE);
do_ioctl(me+1, SIOCSIFCAP);
if (argc > 3)
burst = atoi(argv[3]); /* packets burst size. */
/* setup poll(2) variables. */
memset(pollfd, 0, sizeof(pollfd));
for (i = 0; i < 2; i++) {
pollfd[i].fd = me[i].fd;
pollfd[i].events = (POLLIN);
}
D("Wait 2 secs for link to come up...");
sleep(2);
D("Ready to go, %s 0x%x/%d <-> %s 0x%x/%d.",
me[0].ifname, me[0].queueid, me[0].nifp->ni_num_queues,
me[1].ifname, me[1].queueid, me[1].nifp->ni_num_queues);
/* main loop */
signal(SIGINT, sigint_h);
while (!do_abort) {
int n0, n1, ret;
pollfd[0].events = pollfd[1].events = 0;
pollfd[0].revents = pollfd[1].revents = 0;
n0 = howmany(me, 0);
n1 = howmany(me + 1, 0);
if (n0)
pollfd[1].events |= POLLOUT;
else
pollfd[0].events |= POLLIN;
if (n1)
pollfd[0].events |= POLLOUT;
else
pollfd[1].events |= POLLIN;
ret = poll(pollfd, 2, 2500);
if (ret <= 0 || verbose)
D("poll %s [0] ev %x %x rx %d@%d tx %d,"
" [1] ev %x %x rx %d@%d tx %d",
ret <= 0 ? "timeout" : "ok",
pollfd[0].events,
pollfd[0].revents,
howmany(me, 0),
me[0].rx->cur,
howmany(me, 1),
pollfd[1].events,
pollfd[1].revents,
howmany(me+1, 0),
me[1].rx->cur,
howmany(me+1, 1)
);
if (ret < 0)
continue;
if (pollfd[0].revents & POLLERR) {
D("error on fd0, rxcur %d@%d",
me[0].rx->avail, me[0].rx->cur);
}
if (pollfd[1].revents & POLLERR) {
D("error on fd1, rxcur %d@%d",
me[1].rx->avail, me[1].rx->cur);
}
if (pollfd[0].revents & POLLOUT) {
move(me + 1, me, burst);
// XXX we don't need the ioctl */
// ioctl(me[0].fd, NIOCTXSYNC, NULL);
}
if (pollfd[1].revents & POLLOUT) {
move(me, me + 1, burst);
// XXX we don't need the ioctl */
// ioctl(me[1].fd, NIOCTXSYNC, NULL);
}
}
D("exiting");
netmap_close(me + 1);
netmap_close(me + 0);
return (0);
}

View File

@ -0,0 +1,19 @@
//
// $FreeBSD$
//
// A sample test configuration for click
//
//
// create a switch
myswitch :: EtherSwitch;
// two input devices
c0 :: FromDevice(ix0, PROMISC true);
c1 :: FromDevice(ix1, PROMISC true);
// and now pass packets around
c0[0] -> [0]sw[0] -> Queue(10000) -> ToDevice(ix0);
c1[0] -> [1]sw[1] -> Queue(10000) -> ToDevice(ix1);

761
tools/tools/netmap/pcap.c Normal file
View File

@ -0,0 +1,761 @@
/*
* (C) 2011 Luigi Rizzo
*
* BSD license
*
* A simple library that maps some pcap functions onto netmap
* This is not 100% complete but enough to let tcpdump, trafshow
* and other apps work.
*
* $FreeBSD$
*/
#include <errno.h>
#include <signal.h> /* signal */
#include <stdlib.h>
#include <stdio.h>
#include <string.h> /* strcmp */
#include <fcntl.h> /* open */
#include <unistd.h> /* close */
#include <sys/endian.h> /* le64toh */
#include <sys/mman.h> /* PROT_* */
#include <sys/ioctl.h> /* ioctl */
#include <machine/param.h>
#include <sys/poll.h>
#include <sys/socket.h> /* sockaddr.. */
#include <arpa/inet.h> /* ntohs */
#include <net/if.h> /* ifreq */
#include <net/ethernet.h>
#include <net/netmap.h>
#include <net/netmap_user.h>
#include <netinet/in.h> /* sockaddr_in */
#include <sys/socket.h>
#include <ifaddrs.h>
#define MIN(a, b) ((a) < (b) ? (a) : (b))
char *version = "$Id$";
int verbose = 0;
/* debug support */
#define ND(format, ...) do {} while (0)
#define D(format, ...) do { \
if (verbose) \
fprintf(stderr, "--- %s [%d] " format "\n", \
__FUNCTION__, __LINE__, ##__VA_ARGS__); \
} while (0)
/*
* We redefine here a number of structures that are in pcap.h
* so we can compile this file without the system header.
*/
#ifndef PCAP_ERRBUF_SIZE
#define PCAP_ERRBUF_SIZE 128
/*
* Each packet is accompanied by a header including the timestamp,
* captured size and actual size.
*/
struct pcap_pkthdr {
struct timeval ts; /* time stamp */
uint32_t caplen; /* length of portion present */
uint32_t len; /* length this packet (off wire) */
};
typedef struct pcap_if pcap_if_t;
/*
* Representation of an interface address.
*/
struct pcap_addr {
struct pcap_addr *next;
struct sockaddr *addr; /* address */
struct sockaddr *netmask; /* netmask for the above */
struct sockaddr *broadaddr; /* broadcast addr for the above */
struct sockaddr *dstaddr; /* P2P dest. address for the above */
};
struct pcap_if {
struct pcap_if *next;
char *name; /* name to hand to "pcap_open_live()" */
char *description; /* textual description of interface, or NULL */
struct pcap_addr *addresses;
uint32_t flags; /* PCAP_IF_ interface flags */
};
/*
* We do not support stats (yet)
*/
struct pcap_stat {
u_int ps_recv; /* number of packets received */
u_int ps_drop; /* number of packets dropped */
u_int ps_ifdrop; /* drops by interface XXX not yet supported */
#ifdef WIN32
u_int bs_capt; /* number of packets that reach the app. */
#endif /* WIN32 */
};
typedef void pcap_t;
typedef enum {
PCAP_D_INOUT = 0,
PCAP_D_IN,
PCAP_D_OUT
} pcap_direction_t;
typedef void (*pcap_handler)(u_char *user,
const struct pcap_pkthdr *h, const u_char *bytes);
char errbuf[PCAP_ERRBUF_SIZE];
pcap_t *pcap_open_live(const char *device, int snaplen,
int promisc, int to_ms, char *errbuf);
int pcap_findalldevs(pcap_if_t **alldevsp, char *errbuf);
void pcap_close(pcap_t *p);
int pcap_get_selectable_fd(pcap_t *p);
int pcap_dispatch(pcap_t *p, int cnt, pcap_handler callback, u_char *user);
int pcap_setnonblock(pcap_t *p, int nonblock, char *errbuf);
int pcap_setdirection(pcap_t *p, pcap_direction_t d);
char *pcap_lookupdev(char *errbuf);
int pcap_inject(pcap_t *p, const void *buf, size_t size);
int pcap_fileno(pcap_t *p);
struct eproto {
const char *s;
u_short p;
};
#endif /* !PCAP_ERRBUF_SIZE */
#ifdef __PIC__
/*
* build as a shared library
*/
char pcap_version[] = "libnetmap version 0.3";
/*
* Our equivalent of pcap_t
*/
struct my_ring {
struct nmreq nmr;
int fd;
char *mem; /* userspace mmap address */
u_int memsize;
u_int queueid;
u_int begin, end; /* first..last+1 rings to check */
struct netmap_if *nifp;
int snaplen;
char *errbuf;
int promisc;
int to_ms;
struct pcap_pkthdr hdr;
uint32_t if_flags;
uint32_t if_reqcap;
uint32_t if_curcap;
struct pcap_stat st;
char msg[PCAP_ERRBUF_SIZE];
};
static int
do_ioctl(struct my_ring *me, int what)
{
struct ifreq ifr;
int error;
bzero(&ifr, sizeof(ifr));
strncpy(ifr.ifr_name, me->nmr.nr_name, sizeof(ifr.ifr_name));
switch (what) {
case SIOCSIFFLAGS:
D("call SIOCSIFFLAGS 0x%x", me->if_flags);
ifr.ifr_flagshigh = (me->if_flags >> 16) & 0xffff;
ifr.ifr_flags = me->if_flags & 0xffff;
break;
case SIOCSIFCAP:
ifr.ifr_reqcap = me->if_reqcap;
ifr.ifr_curcap = me->if_curcap;
break;
}
error = ioctl(me->fd, what, &ifr);
if (error) {
D("ioctl 0x%x error %d", what, error);
return error;
}
switch (what) {
case SIOCSIFFLAGS:
case SIOCGIFFLAGS:
me->if_flags = (ifr.ifr_flagshigh << 16) |
(0xffff & ifr.ifr_flags);
D("flags are L 0x%x H 0x%x 0x%x",
(uint16_t)ifr.ifr_flags,
(uint16_t)ifr.ifr_flagshigh, me->if_flags);
break;
case SIOCGIFCAP:
me->if_reqcap = ifr.ifr_reqcap;
me->if_curcap = ifr.ifr_curcap;
D("curcap are 0x%x", me->if_curcap);
break;
}
return 0;
}
/*
* open a device. if me->mem is null then do an mmap.
*/
static int
netmap_open(struct my_ring *me, int ringid)
{
int fd, err, l;
u_int i;
struct nmreq req;
me->fd = fd = open("/dev/netmap", O_RDWR);
if (fd < 0) {
D("Unable to open /dev/netmap");
return (-1);
}
bzero(&req, sizeof(req));
strncpy(req.nr_name, me->nmr.nr_name, sizeof(req.nr_name));
req.nr_ringid = ringid;
err = ioctl(fd, NIOCGINFO, &req);
if (err) {
D("cannot get info on %s", me->nmr.nr_name);
goto error;
}
me->memsize = l = req.nr_memsize;
ND("memsize is %d MB", l>>20);
err = ioctl(fd, NIOCREGIF, &req);
if (err) {
D("Unable to register %s", me->nmr.nr_name);
goto error;
}
if (me->mem == NULL) {
me->mem = mmap(0, l, PROT_WRITE | PROT_READ, MAP_SHARED, fd, 0);
if (me->mem == MAP_FAILED) {
D("Unable to mmap");
me->mem = NULL;
goto error;
}
}
me->nifp = NETMAP_IF(me->mem, req.nr_offset);
me->queueid = ringid;
if (ringid & NETMAP_SW_RING) {
me->begin = req.nr_numrings;
me->end = me->begin + 1;
} else if (ringid & NETMAP_HW_RING) {
me->begin = ringid & NETMAP_RING_MASK;
me->end = me->begin + 1;
} else {
me->begin = 0;
me->end = req.nr_numrings;
}
/* request timestamps for packets */
for (i = me->begin; i < me->end; i++) {
struct netmap_ring *ring = NETMAP_RXRING(me->nifp, i);
ring->flags = NR_TIMESTAMP;
}
//me->tx = NETMAP_TXRING(me->nifp, 0);
return (0);
error:
close(me->fd);
return -1;
}
/*
* There is a set of functions that tcpdump expects even if probably
* not used
*/
struct eproto eproto_db[] = {
{ "ip", ETHERTYPE_IP },
{ "arp", ETHERTYPE_ARP },
{ (char *)0, 0 }
};
int
pcap_findalldevs(pcap_if_t **alldevsp, __unused char *errbuf)
{
struct ifaddrs *i_head, *i;
pcap_if_t *top = NULL, *cur;
struct pcap_addr *tail = NULL;
int l;
D("listing all devs");
*alldevsp = NULL;
i_head = NULL;
if (getifaddrs(&i_head)) {
D("cannot get if addresses");
return -1;
}
for (i = i_head; i; i = i->ifa_next) {
//struct ifaddrs *ifa;
struct pcap_addr *pca;
//struct sockaddr *sa;
D("got interface %s", i->ifa_name);
if (!top || strcmp(top->name, i->ifa_name)) {
/* new interface */
l = sizeof(*top) + strlen(i->ifa_name) + 1;
cur = calloc(1, l);
if (cur == NULL) {
D("no space for if descriptor");
continue;
}
cur->name = (char *)(cur + 1);
//cur->flags = i->ifa_flags;
strcpy(cur->name, i->ifa_name);
cur->description = NULL;
cur->next = top;
top = cur;
tail = NULL;
}
/* now deal with addresses */
D("%s addr family %d len %d %s %s",
top->name,
i->ifa_addr->sa_family, i->ifa_addr->sa_len,
i->ifa_netmask ? "Netmask" : "",
i->ifa_broadaddr ? "Broadcast" : "");
l = sizeof(struct pcap_addr) +
(i->ifa_addr ? i->ifa_addr->sa_len:0) +
(i->ifa_netmask ? i->ifa_netmask->sa_len:0) +
(i->ifa_broadaddr? i->ifa_broadaddr->sa_len:0);
pca = calloc(1, l);
if (pca == NULL) {
D("no space for if addr");
continue;
}
#define SA_NEXT(x) ((struct sockaddr *)((char *)(x) + (x)->sa_len))
pca->addr = (struct sockaddr *)(pca + 1);
bcopy(i->ifa_addr, pca->addr, i->ifa_addr->sa_len);
if (i->ifa_netmask) {
pca->netmask = SA_NEXT(pca->addr);
bcopy(i->ifa_netmask, pca->netmask, i->ifa_netmask->sa_len);
if (i->ifa_broadaddr) {
pca->broadaddr = SA_NEXT(pca->netmask);
bcopy(i->ifa_broadaddr, pca->broadaddr, i->ifa_broadaddr->sa_len);
}
}
if (tail == NULL) {
top->addresses = pca;
} else {
tail->next = pca;
}
tail = pca;
}
freeifaddrs(i_head);
*alldevsp = top;
return 0;
}
void pcap_freealldevs(__unused pcap_if_t *alldevs)
{
D("unimplemented");
}
char *
pcap_lookupdev(char *buf)
{
D("%s", buf);
strcpy(buf, "/dev/netmap");
return buf;
}
pcap_t *
pcap_create(const char *source, char *errbuf)
{
D("src %s (call open liveted)", source);
return pcap_open_live(source, 0, 1, 100, errbuf);
}
int
pcap_activate(pcap_t *p)
{
D("pcap %p running", p);
return 0;
}
int
pcap_can_set_rfmon(__unused pcap_t *p)
{
D("");
return 0; /* no we can't */
}
int
pcap_set_snaplen(pcap_t *p, int snaplen)
{
struct my_ring *me = p;
D("len %d", snaplen);
me->snaplen = snaplen;
return 0;
}
int
pcap_snapshot(pcap_t *p)
{
struct my_ring *me = p;
D("len %d", me->snaplen);
return me->snaplen;
}
int
pcap_lookupnet(const char *device, uint32_t *netp,
uint32_t *maskp, __unused char *errbuf)
{
D("device %s", device);
inet_aton("10.0.0.255", (struct in_addr *)netp);
inet_aton("255.255.255.0",(struct in_addr *) maskp);
return 0;
}
int
pcap_set_promisc(pcap_t *p, int promisc)
{
struct my_ring *me = p;
D("promisc %d", promisc);
if (do_ioctl(me, SIOCGIFFLAGS))
D("SIOCGIFFLAGS failed");
if (promisc) {
me->if_flags |= IFF_PPROMISC;
} else {
me->if_flags &= ~IFF_PPROMISC;
}
if (do_ioctl(me, SIOCSIFFLAGS))
D("SIOCSIFFLAGS failed");
return 0;
}
int
pcap_set_timeout(pcap_t *p, int to_ms)
{
struct my_ring *me = p;
D("%d ms", to_ms);
me->to_ms = to_ms;
return 0;
}
struct bpf_program;
int
pcap_compile(__unused pcap_t *p, __unused struct bpf_program *fp,
const char *str, __unused int optimize, __unused uint32_t netmask)
{
D("%s", str);
return 0;
}
int
pcap_setfilter(__unused pcap_t *p, __unused struct bpf_program *fp)
{
D("");
return 0;
}
int
pcap_datalink(__unused pcap_t *p)
{
D("");
return 1; // ethernet
}
const char *
pcap_datalink_val_to_name(int dlt)
{
D("%d", dlt);
return "DLT_EN10MB";
}
const char *
pcap_datalink_val_to_description(int dlt)
{
D("%d", dlt);
return "Ethernet link";
}
struct pcap_stat;
int
pcap_stats(pcap_t *p, struct pcap_stat *ps)
{
struct my_ring *me = p;
ND("");
me->st.ps_recv += 10;
*ps = me->st;
sprintf(me->msg, "stats not supported");
return -1;
};
char *
pcap_geterr(pcap_t *p)
{
struct my_ring *me = p;
D("");
return me->msg;
}
pcap_t *
pcap_open_live(const char *device, __unused int snaplen,
int promisc, int to_ms, __unused char *errbuf)
{
struct my_ring *me;
D("request to open %s", device);
me = calloc(1, sizeof(*me));
if (me == NULL) {
D("failed to allocate struct for %s", device);
return NULL;
}
strncpy(me->nmr.nr_name, device, sizeof(me->nmr.nr_name));
if (netmap_open(me, 0)) {
D("error opening %s", device);
free(me);
return NULL;
}
me->to_ms = to_ms;
if (do_ioctl(me, SIOCGIFFLAGS))
D("SIOCGIFFLAGS failed");
if (promisc) {
me->if_flags |= IFF_PPROMISC;
if (do_ioctl(me, SIOCSIFFLAGS))
D("SIOCSIFFLAGS failed");
}
if (do_ioctl(me, SIOCGIFCAP))
D("SIOCGIFCAP failed");
me->if_reqcap &= ~(IFCAP_HWCSUM | IFCAP_TSO | IFCAP_TOE);
if (do_ioctl(me, SIOCSIFCAP))
D("SIOCSIFCAP failed");
return (pcap_t *)me;
}
void
pcap_close(pcap_t *p)
{
struct my_ring *me = p;
D("");
if (!me)
return;
if (me->mem)
munmap(me->mem, me->memsize);
/* restore original flags ? */
ioctl(me->fd, NIOCUNREGIF, NULL);
close(me->fd);
bzero(me, sizeof(*me));
free(me);
}
int
pcap_fileno(pcap_t *p)
{
struct my_ring *me = p;
D("returns %d", me->fd);
return me->fd;
}
int
pcap_get_selectable_fd(pcap_t *p)
{
struct my_ring *me = p;
ND("");
return me->fd;
}
int
pcap_setnonblock(__unused pcap_t *p, int nonblock, __unused char *errbuf)
{
D("mode is %d", nonblock);
return 0; /* ignore */
}
int
pcap_setdirection(__unused pcap_t *p, __unused pcap_direction_t d)
{
D("");
return 0; /* ignore */
};
int
pcap_dispatch(pcap_t *p, int cnt, pcap_handler callback, u_char *user)
{
struct my_ring *me = p;
int got = 0;
u_int si;
ND("cnt %d", cnt);
/* scan all rings */
for (si = me->begin; si < me->end; si++) {
struct netmap_ring *ring = NETMAP_RXRING(me->nifp, si);
ND("ring has %d pkts", ring->avail);
if (ring->avail == 0)
continue;
me->hdr.ts = ring->ts;
while ((cnt == -1 || cnt != got) && ring->avail > 0) {
u_int i = ring->cur;
u_int idx = ring->slot[i].buf_idx;
if (idx < 2) {
D("%s bogus RX index %d at offset %d",
me->nifp->ni_name, idx, i);
sleep(2);
}
u_char *buf = (u_char *)NETMAP_BUF(ring, idx);
me->hdr.len = me->hdr.caplen = ring->slot[i].len;
// D("call %p len %d", p, me->hdr.len);
callback(user, &me->hdr, buf);
ring->cur = NETMAP_RING_NEXT(ring, i);
ring->avail--;
got++;
}
}
return got;
}
int
pcap_inject(pcap_t *p, const void *buf, size_t size)
{
struct my_ring *me = p;
u_int si;
ND("cnt %d", cnt);
/* scan all rings */
for (si = me->begin; si < me->end; si++) {
struct netmap_ring *ring = NETMAP_TXRING(me->nifp, si);
ND("ring has %d pkts", ring->avail);
if (ring->avail == 0)
continue;
u_int i = ring->cur;
u_int idx = ring->slot[i].buf_idx;
if (idx < 2) {
D("%s bogus TX index %d at offset %d",
me->nifp->ni_name, idx, i);
sleep(2);
}
u_char *dst = (u_char *)NETMAP_BUF(ring, idx);
ring->slot[i].len = size;
bcopy(buf, dst, size);
ring->cur = NETMAP_RING_NEXT(ring, i);
ring->avail--;
// if (ring->avail == 0) ioctl(me->fd, NIOCTXSYNC, NULL);
return size;
}
errno = ENOBUFS;
return -1;
}
int
pcap_loop(pcap_t *p, int cnt, pcap_handler callback, u_char *user)
{
struct my_ring *me = p;
struct pollfd fds[1];
int i;
ND("cnt %d", cnt);
memset(fds, 0, sizeof(fds));
fds[0].fd = me->fd;
fds[0].events = (POLLIN);
while (cnt == -1 || cnt > 0) {
if (poll(fds, 1, me->to_ms) <= 0) {
D("poll error/timeout");
continue;
}
i = pcap_dispatch(p, cnt, callback, user);
if (cnt > 0)
cnt -= i;
}
return 0;
}
#endif /* __PIC__ */
#ifndef __PIC__
void do_send(u_char *user, const struct pcap_pkthdr *h, const u_char *buf)
{
pcap_inject((pcap_t *)user, buf, h->caplen);
}
/*
* a simple pcap test program, bridge between two interfaces.
*/
int
main(int argc, char **argv)
{
pcap_t *p0, *p1;
int burst = 1024;
struct pollfd pollfd[2];
fprintf(stderr, "%s %s built %s %s\n",
argv[0], version, __DATE__, __TIME__);
while (argc > 1 && !strcmp(argv[1], "-v")) {
verbose++;
argv++;
argc--;
}
if (argc < 3 || argc > 4 || !strcmp(argv[1], argv[2])) {
D("Usage: %s IFNAME1 IFNAME2 [BURST]", argv[0]);
return (1);
}
if (argc > 3)
burst = atoi(argv[3]);
p0 = pcap_open_live(argv[1], 0, 1, 100, NULL);
p1 = pcap_open_live(argv[2], 0, 1, 100, NULL);
D("%s", version);
D("open returns %p %p", p0, p1);
if (!p0 || !p1)
return(1);
bzero(pollfd, sizeof(pollfd));
pollfd[0].fd = pcap_fileno(p0);
pollfd[1].fd = pcap_fileno(p1);
pollfd[0].events = pollfd[1].events = POLLIN;
for (;;) {
/* do i need to reset ? */
pollfd[0].revents = pollfd[1].revents = 0;
int ret = poll(pollfd, 2, 1000);
if (ret <= 0 || verbose)
D("poll %s [0] ev %x %x [1] ev %x %x",
ret <= 0 ? "timeout" : "ok",
pollfd[0].events,
pollfd[0].revents,
pollfd[1].events,
pollfd[1].revents);
if (ret < 0)
continue;
if (pollfd[0].revents & POLLIN)
pcap_dispatch(p0, burst, do_send, p1);
if (pollfd[1].revents & POLLIN)
pcap_dispatch(p1, burst, do_send, p0);
}
return (0);
}
#endif /* !__PIC__ */

1021
tools/tools/netmap/pkt-gen.c Normal file

File diff suppressed because it is too large Load Diff