Import the current version of netmap, aligned with the one on github.
This commit, long overdue, contains contributions in the last 2 years from Stefano Garzarella, Giuseppe Lettieri, Vincenzo Maffione, including: + fixes on monitor ports + the 'ptnet' virtual device driver, and ptnetmap backend, for high speed virtual passthrough on VMs (bhyve fixes in an upcoming commit) + improved emulated netmap mode + more robust error handling + removal of stale code + various fixes to code and documentation (some mixup between RX and TX parameters, and private and public variables) We also include an additional tool, nmreplay, which is functionally equivalent to tcpreplay but operating on netmap ports.
This commit is contained in:
parent
63f6b1a75a
commit
37e3a6d349
@ -33,10 +33,10 @@
|
||||
.Sh NAME
|
||||
.Nm netmap
|
||||
.Nd a framework for fast packet I/O
|
||||
.Pp
|
||||
.br
|
||||
.Nm VALE
|
||||
.Nd a fast VirtuAl Local Ethernet using the netmap API
|
||||
.Pp
|
||||
.br
|
||||
.Nm netmap pipes
|
||||
.Nd a shared memory packet transport channel
|
||||
.Sh SYNOPSIS
|
||||
@ -44,28 +44,49 @@
|
||||
.Sh DESCRIPTION
|
||||
.Nm
|
||||
is a framework for extremely fast and efficient packet I/O
|
||||
for both userspace and kernel clients.
|
||||
for userspace and kernel clients, and for Virtual Machines.
|
||||
It runs on
|
||||
.Fx
|
||||
and Linux, and includes
|
||||
.Nm VALE ,
|
||||
a very fast and modular in-kernel software switch/dataplane,
|
||||
and
|
||||
.Nm netmap pipes ,
|
||||
a shared memory packet transport channel.
|
||||
All these are accessed interchangeably with the same API.
|
||||
Linux and some versions of Windows, and supports a variety of
|
||||
.Nm netmap ports ,
|
||||
including
|
||||
.Bl -tag -width XXXX
|
||||
.It Nm physical NIC ports
|
||||
to access individual queues of network interfaces;
|
||||
.It Nm host ports
|
||||
to inject packets into the host stack;
|
||||
.It Nm VALE ports
|
||||
implementing a very fast and modular in-kernel software switch/dataplane;
|
||||
.It Nm netmap pipes
|
||||
a shared memory packet transport channel;
|
||||
.It Nm netmap monitors
|
||||
a mechanism similar to
|
||||
.Xr bpf
|
||||
to capture traffic
|
||||
.El
|
||||
.Pp
|
||||
.Nm ,
|
||||
.Nm VALE
|
||||
and
|
||||
.Nm netmap pipes
|
||||
are at least one order of magnitude faster than
|
||||
All these
|
||||
.Nm netmap ports
|
||||
are accessed interchangeably with the same API,
|
||||
and are at least one order of magnitude faster than
|
||||
standard OS mechanisms
|
||||
(sockets, bpf, tun/tap interfaces, native switches, pipes),
|
||||
reaching 14.88 million packets per second (Mpps)
|
||||
with much less than one core on a 10 Gbit NIC,
|
||||
about 20 Mpps per core for VALE ports,
|
||||
and over 100 Mpps for netmap pipes.
|
||||
(sockets, bpf, tun/tap interfaces, native switches, pipes).
|
||||
With suitably fast hardware (NICs, PCIe buses, CPUs),
|
||||
packet I/O using
|
||||
.Nm
|
||||
on supported NICs
|
||||
reaches 14.88 million packets per second (Mpps)
|
||||
with much less than one core on 10 Gbit/s NICs;
|
||||
35-40 Mpps on 40 Gbit/s NICs (limited by the hardware);
|
||||
about 20 Mpps per core for VALE ports;
|
||||
and over 100 Mpps for
|
||||
.Nm netmap pipes.
|
||||
NICs without native
|
||||
.Nm
|
||||
support can still use the API in emulated mode,
|
||||
which uses unmodified device drivers and is 3-5 times faster than
|
||||
.Xr bpf
|
||||
or raw sockets.
|
||||
.Pp
|
||||
Userspace clients can dynamically switch NICs into
|
||||
.Nm
|
||||
@ -73,8 +94,10 @@ mode and send and receive raw packets through
|
||||
memory mapped buffers.
|
||||
Similarly,
|
||||
.Nm VALE
|
||||
switch instances and ports, and
|
||||
switch instances and ports,
|
||||
.Nm netmap pipes
|
||||
and
|
||||
.Nm netmap monitors
|
||||
can be created dynamically,
|
||||
providing high speed packet I/O between processes,
|
||||
virtual machines, NICs and the host stack.
|
||||
@ -89,17 +112,17 @@ and standard OS mechanisms such as
|
||||
.Xr epoll 2 ,
|
||||
and
|
||||
.Xr kqueue 2 .
|
||||
.Nm VALE
|
||||
and
|
||||
.Nm netmap pipes
|
||||
All types of
|
||||
.Nm netmap ports
|
||||
and the
|
||||
.Nm VALE switch
|
||||
are implemented by a single kernel module, which also emulates the
|
||||
.Nm
|
||||
API over standard drivers for devices without native
|
||||
.Nm
|
||||
support.
|
||||
API over standard drivers.
|
||||
For best performance,
|
||||
.Nm
|
||||
requires explicit support in device drivers.
|
||||
requires native support in device drivers.
|
||||
A list of such devices is at the end of this document.
|
||||
.Pp
|
||||
In the rest of this (long) manual page we document
|
||||
various aspects of the
|
||||
@ -116,7 +139,7 @@ which can be connected to a physical interface
|
||||
to the host stack,
|
||||
or to a
|
||||
.Nm VALE
|
||||
switch).
|
||||
switch.
|
||||
Ports use preallocated circular queues of buffers
|
||||
.Em ( rings )
|
||||
residing in an mmapped region.
|
||||
@ -166,16 +189,18 @@ has multiple modes of operation controlled by the
|
||||
.Vt struct nmreq
|
||||
argument.
|
||||
.Va arg.nr_name
|
||||
specifies the port name, as follows:
|
||||
specifies the netmap port name, as follows:
|
||||
.Bl -tag -width XXXX
|
||||
.It Dv OS network interface name (e.g. 'em0', 'eth1', ... )
|
||||
the data path of the NIC is disconnected from the host stack,
|
||||
and the file descriptor is bound to the NIC (one or all queues),
|
||||
or to the host stack;
|
||||
.It Dv valeXXX:YYY (arbitrary XXX and YYY)
|
||||
the file descriptor is bound to port YYY of a VALE switch called XXX,
|
||||
both dynamically created if necessary.
|
||||
The string cannot exceed IFNAMSIZ characters, and YYY cannot
|
||||
.It Dv valeSSS:PPP
|
||||
the file descriptor is bound to port PPP of VALE switch SSS.
|
||||
Switch instances and ports are dynamically created if necessary.
|
||||
.br
|
||||
Both SSS and PPP have the form [0-9a-zA-Z_]+ , the string
|
||||
cannot exceed IFNAMSIZ characters, and PPP cannot
|
||||
be the name of any existing OS network interface.
|
||||
.El
|
||||
.Pp
|
||||
@ -312,9 +337,6 @@ one slot is always kept empty.
|
||||
The ring size
|
||||
.Va ( num_slots )
|
||||
should not be assumed to be a power of two.
|
||||
.br
|
||||
(NOTE: older versions of netmap used head/count format to indicate
|
||||
the content of a ring).
|
||||
.Pp
|
||||
.Va head
|
||||
is the first slot available to userspace;
|
||||
@ -585,6 +607,15 @@ it from the host stack.
|
||||
Multiple file descriptors can be bound to the same port,
|
||||
with proper synchronization left to the user.
|
||||
.Pp
|
||||
The recommended way to bind a file descriptor to a port is
|
||||
to use function
|
||||
.Va nm_open(..)
|
||||
(see
|
||||
.Xr LIBRARIES )
|
||||
which parses names to access specific port types and
|
||||
enable features.
|
||||
In the following we document the main features.
|
||||
.Pp
|
||||
.Dv NIOCREGIF can also bind a file descriptor to one endpoint of a
|
||||
.Em netmap pipe ,
|
||||
consisting of two netmap ports with a crossover connection.
|
||||
@ -734,7 +765,7 @@ similar to
|
||||
binds a file descriptor to a port.
|
||||
.Bl -tag -width XX
|
||||
.It Va ifname
|
||||
is a port name, in the form "netmap:XXX" for a NIC and "valeXXX:YYY" for a
|
||||
is a port name, in the form "netmap:PPP" for a NIC and "valeSSS:PPP" for a
|
||||
.Nm VALE
|
||||
port.
|
||||
.It Va req
|
||||
@ -774,28 +805,39 @@ similar to pcap_next(), fetches the next packet
|
||||
natively supports the following devices:
|
||||
.Pp
|
||||
On FreeBSD:
|
||||
.Xr cxgbe 4 ,
|
||||
.Xr em 4 ,
|
||||
.Xr igb 4 ,
|
||||
.Xr ixgbe 4 ,
|
||||
.Xr ixl 4 ,
|
||||
.Xr lem 4 ,
|
||||
.Xr re 4 .
|
||||
.Pp
|
||||
On Linux
|
||||
.Xr e1000 4 ,
|
||||
.Xr e1000e 4 ,
|
||||
.Xr i40e 4 ,
|
||||
.Xr igb 4 ,
|
||||
.Xr ixgbe 4 ,
|
||||
.Xr mlx4 4 ,
|
||||
.Xr forcedeth 4 ,
|
||||
.Xr r8169 4 .
|
||||
.Pp
|
||||
NICs without native support can still be used in
|
||||
.Nm
|
||||
mode through emulation.
|
||||
Performance is inferior to native netmap
|
||||
mode but still significantly higher than sockets, and approaching
|
||||
mode but still significantly higher than various raw socket types
|
||||
(bpf, PF_PACKET, etc.).
|
||||
Note that for slow devices (such as 1 Gbit/s and slower NICs,
|
||||
or several 10 Gbit/s NICs whose hardware is unable
|
||||
that of in-kernel solutions such as Linux's
|
||||
.Xr pktgen .
|
||||
When emulation is in use, packet sniffer programs such as tcpdump
|
||||
could see received packets before they are diverted by netmap. This behaviour
|
||||
is not intentional, being just an artifact of the implementation of emulation.
|
||||
Note that in case the netmap application subsequently moves packets received
|
||||
from the emulated adapter onto the host RX ring, the sniffer will intercept
|
||||
those packets again, since the packets are injected to the host stack as they
|
||||
were received by the network interface.
|
||||
.Pp
|
||||
Emulation is also available for devices with native netmap support,
|
||||
which can be used for testing or performance comparison.
|
||||
@ -812,8 +854,12 @@ and module parameters on Linux
|
||||
.Bl -tag -width indent
|
||||
.It Va dev.netmap.admode: 0
|
||||
Controls the use of native or emulated adapter mode.
|
||||
0 uses the best available option, 1 forces native and
|
||||
fails if not available, 2 forces emulated hence never fails.
|
||||
.br
|
||||
0 uses the best available option;
|
||||
.br
|
||||
1 forces native mode and fails if not available;
|
||||
.br
|
||||
2 forces emulated hence never fails.
|
||||
.It Va dev.netmap.generic_ringsize: 1024
|
||||
Ring size used for emulated netmap mode
|
||||
.It Va dev.netmap.generic_mit: 100000
|
||||
@ -861,9 +907,9 @@ performance.
|
||||
uses
|
||||
.Xr select 2 ,
|
||||
.Xr poll 2 ,
|
||||
.Xr epoll
|
||||
.Xr epoll 2
|
||||
and
|
||||
.Xr kqueue
|
||||
.Xr kqueue 2
|
||||
to wake up processes when significant events occur, and
|
||||
.Xr mmap 2
|
||||
to map memory.
|
||||
@ -1015,8 +1061,8 @@ e.g. running the following in two different terminals:
|
||||
.Dl pkt-gen -i vale1:b -f tx # sender
|
||||
The same example can be used to test netmap pipes, by simply
|
||||
changing port names, e.g.
|
||||
.Dl pkt-gen -i vale:x{3 -f rx # receiver on the master side
|
||||
.Dl pkt-gen -i vale:x}3 -f tx # sender on the slave side
|
||||
.Dl pkt-gen -i vale2:x{3 -f rx # receiver on the master side
|
||||
.Dl pkt-gen -i vale2:x}3 -f tx # sender on the slave side
|
||||
.Pp
|
||||
The following command attaches an interface and the host stack
|
||||
to a switch:
|
||||
|
@ -2187,6 +2187,7 @@ dev/nand/nfc_if.m optional nand
|
||||
dev/ncr/ncr.c optional ncr pci
|
||||
dev/ncv/ncr53c500.c optional ncv
|
||||
dev/ncv/ncr53c500_pccard.c optional ncv pccard
|
||||
dev/netmap/if_ptnet.c optional netmap
|
||||
dev/netmap/netmap.c optional netmap
|
||||
dev/netmap/netmap_freebsd.c optional netmap
|
||||
dev/netmap/netmap_generic.c optional netmap
|
||||
@ -2195,6 +2196,7 @@ dev/netmap/netmap_mem2.c optional netmap
|
||||
dev/netmap/netmap_monitor.c optional netmap
|
||||
dev/netmap/netmap_offloadings.c optional netmap
|
||||
dev/netmap/netmap_pipe.c optional netmap
|
||||
dev/netmap/netmap_pt.c optional netmap
|
||||
dev/netmap/netmap_vale.c optional netmap
|
||||
# compile-with "${NORMAL_C} -Wconversion -Wextra"
|
||||
dev/nfsmb/nfsmb.c optional nfsmb pci
|
||||
|
@ -59,7 +59,7 @@ extern int ixl_rx_miss, ixl_rx_miss_bufs, ixl_crcstrip;
|
||||
/*
|
||||
* device-specific sysctl variables:
|
||||
*
|
||||
* ixl_crcstrip: 0: keep CRC in rx frames (default), 1: strip it.
|
||||
* ixl_crcstrip: 0: NIC keeps CRC in rx frames, 1: NIC strips it (default).
|
||||
* During regular operations the CRC is stripped, but on some
|
||||
* hardware reception of frames not multiple of 64 is slower,
|
||||
* so using crcstrip=0 helps in benchmarks.
|
||||
@ -73,7 +73,7 @@ SYSCTL_DECL(_dev_netmap);
|
||||
*/
|
||||
#if 0
|
||||
SYSCTL_INT(_dev_netmap, OID_AUTO, ixl_crcstrip,
|
||||
CTLFLAG_RW, &ixl_crcstrip, 1, "strip CRC on rx frames");
|
||||
CTLFLAG_RW, &ixl_crcstrip, 1, "NIC strips CRC on rx frames");
|
||||
#endif
|
||||
SYSCTL_INT(_dev_netmap, OID_AUTO, ixl_rx_miss,
|
||||
CTLFLAG_RW, &ixl_rx_miss, 0, "potentially missed rx intr");
|
||||
|
@ -81,6 +81,22 @@ lem_netmap_reg(struct netmap_adapter *na, int onoff)
|
||||
}
|
||||
|
||||
|
||||
static void
|
||||
lem_netmap_intr(struct netmap_adapter *na, int onoff)
|
||||
{
|
||||
struct ifnet *ifp = na->ifp;
|
||||
struct adapter *adapter = ifp->if_softc;
|
||||
|
||||
EM_CORE_LOCK(adapter);
|
||||
if (onoff) {
|
||||
lem_enable_intr(adapter);
|
||||
} else {
|
||||
lem_disable_intr(adapter);
|
||||
}
|
||||
EM_CORE_UNLOCK(adapter);
|
||||
}
|
||||
|
||||
|
||||
/*
|
||||
* Reconcile kernel and user view of the transmit ring.
|
||||
*/
|
||||
@ -99,10 +115,6 @@ lem_netmap_txsync(struct netmap_kring *kring, int flags)
|
||||
|
||||
/* device-specific */
|
||||
struct adapter *adapter = ifp->if_softc;
|
||||
#ifdef NIC_PARAVIRT
|
||||
struct paravirt_csb *csb = adapter->csb;
|
||||
uint64_t *csbd = (uint64_t *)(csb + 1);
|
||||
#endif /* NIC_PARAVIRT */
|
||||
|
||||
bus_dmamap_sync(adapter->txdma.dma_tag, adapter->txdma.dma_map,
|
||||
BUS_DMASYNC_POSTREAD);
|
||||
@ -113,19 +125,6 @@ lem_netmap_txsync(struct netmap_kring *kring, int flags)
|
||||
|
||||
nm_i = kring->nr_hwcur;
|
||||
if (nm_i != head) { /* we have new packets to send */
|
||||
#ifdef NIC_PARAVIRT
|
||||
int do_kick = 0;
|
||||
uint64_t t = 0; // timestamp
|
||||
int n = head - nm_i;
|
||||
if (n < 0)
|
||||
n += lim + 1;
|
||||
if (csb) {
|
||||
t = rdtsc(); /* last timestamp */
|
||||
csbd[16] += t - csbd[0]; /* total Wg */
|
||||
csbd[17] += n; /* Wg count */
|
||||
csbd[0] = t;
|
||||
}
|
||||
#endif /* NIC_PARAVIRT */
|
||||
nic_i = netmap_idx_k2n(kring, nm_i);
|
||||
while (nm_i != head) {
|
||||
struct netmap_slot *slot = &ring->slot[nm_i];
|
||||
@ -166,38 +165,8 @@ lem_netmap_txsync(struct netmap_kring *kring, int flags)
|
||||
bus_dmamap_sync(adapter->txdma.dma_tag, adapter->txdma.dma_map,
|
||||
BUS_DMASYNC_PREREAD | BUS_DMASYNC_PREWRITE);
|
||||
|
||||
#ifdef NIC_PARAVIRT
|
||||
/* set unconditionally, then also kick if needed */
|
||||
if (csb) {
|
||||
t = rdtsc();
|
||||
if (csb->host_need_txkick == 2) {
|
||||
/* can compute an update of delta */
|
||||
int64_t delta = t - csbd[3];
|
||||
if (delta < 0)
|
||||
delta = -delta;
|
||||
if (csbd[8] == 0 || delta < csbd[8]) {
|
||||
csbd[8] = delta;
|
||||
csbd[9]++;
|
||||
}
|
||||
csbd[10]++;
|
||||
}
|
||||
csb->guest_tdt = nic_i;
|
||||
csbd[18] += t - csbd[0]; // total wp
|
||||
csbd[19] += n;
|
||||
}
|
||||
if (!csb || !csb->guest_csb_on || (csb->host_need_txkick & 1))
|
||||
do_kick = 1;
|
||||
if (do_kick)
|
||||
#endif /* NIC_PARAVIRT */
|
||||
/* (re)start the tx unit up to slot nic_i (excluded) */
|
||||
E1000_WRITE_REG(&adapter->hw, E1000_TDT(0), nic_i);
|
||||
#ifdef NIC_PARAVIRT
|
||||
if (do_kick) {
|
||||
uint64_t t1 = rdtsc();
|
||||
csbd[20] += t1 - t; // total Np
|
||||
csbd[21]++;
|
||||
}
|
||||
#endif /* NIC_PARAVIRT */
|
||||
}
|
||||
|
||||
/*
|
||||
@ -206,93 +175,6 @@ lem_netmap_txsync(struct netmap_kring *kring, int flags)
|
||||
if (ticks != kring->last_reclaim || flags & NAF_FORCE_RECLAIM || nm_kr_txempty(kring)) {
|
||||
kring->last_reclaim = ticks;
|
||||
/* record completed transmissions using TDH */
|
||||
#ifdef NIC_PARAVIRT
|
||||
/* host updates tdh unconditionally, and we have
|
||||
* no side effects on reads, so we can read from there
|
||||
* instead of exiting.
|
||||
*/
|
||||
if (csb) {
|
||||
static int drain = 0, nodrain=0, good = 0, bad = 0, fail = 0;
|
||||
u_int x = adapter->next_tx_to_clean;
|
||||
csbd[19]++; // XXX count reclaims
|
||||
nic_i = csb->host_tdh;
|
||||
if (csb->guest_csb_on) {
|
||||
if (nic_i == x) {
|
||||
bad++;
|
||||
csbd[24]++; // failed reclaims
|
||||
/* no progress, request kick and retry */
|
||||
csb->guest_need_txkick = 1;
|
||||
mb(); // XXX barrier
|
||||
nic_i = csb->host_tdh;
|
||||
} else {
|
||||
good++;
|
||||
}
|
||||
if (nic_i != x) {
|
||||
csb->guest_need_txkick = 2;
|
||||
if (nic_i == csb->guest_tdt)
|
||||
drain++;
|
||||
else
|
||||
nodrain++;
|
||||
#if 1
|
||||
if (netmap_adaptive_io) {
|
||||
/* new mechanism: last half ring (or so)
|
||||
* released one slot at a time.
|
||||
* This effectively makes the system spin.
|
||||
*
|
||||
* Take next_to_clean + 1 as a reference.
|
||||
* tdh must be ahead or equal
|
||||
* On entry, the logical order is
|
||||
* x < tdh = nic_i
|
||||
* We first push tdh up to avoid wraps.
|
||||
* The limit is tdh-ll (half ring).
|
||||
* if tdh-256 < x we report x;
|
||||
* else we report tdh-256
|
||||
*/
|
||||
u_int tdh = nic_i;
|
||||
u_int ll = csbd[15];
|
||||
u_int delta = lim/8;
|
||||
if (netmap_adaptive_io == 2 || ll > delta)
|
||||
csbd[15] = ll = delta;
|
||||
else if (netmap_adaptive_io == 1 && ll > 1) {
|
||||
csbd[15]--;
|
||||
}
|
||||
|
||||
if (nic_i >= kring->nkr_num_slots) {
|
||||
RD(5, "bad nic_i %d on input", nic_i);
|
||||
}
|
||||
x = nm_next(x, lim);
|
||||
if (tdh < x)
|
||||
tdh += lim + 1;
|
||||
if (tdh <= x + ll) {
|
||||
nic_i = x;
|
||||
csbd[25]++; //report n + 1;
|
||||
} else {
|
||||
tdh = nic_i;
|
||||
if (tdh < ll)
|
||||
tdh += lim + 1;
|
||||
nic_i = tdh - ll;
|
||||
csbd[26]++; // report tdh - ll
|
||||
}
|
||||
}
|
||||
#endif
|
||||
} else {
|
||||
/* we stop, count whether we are idle or not */
|
||||
int bh_active = csb->host_need_txkick & 2 ? 4 : 0;
|
||||
csbd[27+ csb->host_need_txkick]++;
|
||||
if (netmap_adaptive_io == 1) {
|
||||
if (bh_active && csbd[15] > 1)
|
||||
csbd[15]--;
|
||||
else if (!bh_active && csbd[15] < lim/2)
|
||||
csbd[15]++;
|
||||
}
|
||||
bad--;
|
||||
fail++;
|
||||
}
|
||||
}
|
||||
RD(1, "drain %d nodrain %d good %d retry %d fail %d",
|
||||
drain, nodrain, good, bad, fail);
|
||||
} else
|
||||
#endif /* !NIC_PARAVIRT */
|
||||
nic_i = E1000_READ_REG(&adapter->hw, E1000_TDH(0));
|
||||
if (nic_i >= kring->nkr_num_slots) { /* XXX can it happen ? */
|
||||
D("TDH wrap %d", nic_i);
|
||||
@ -324,21 +206,10 @@ lem_netmap_rxsync(struct netmap_kring *kring, int flags)
|
||||
|
||||
/* device-specific */
|
||||
struct adapter *adapter = ifp->if_softc;
|
||||
#ifdef NIC_PARAVIRT
|
||||
struct paravirt_csb *csb = adapter->csb;
|
||||
uint32_t csb_mode = csb && csb->guest_csb_on;
|
||||
uint32_t do_host_rxkick = 0;
|
||||
#endif /* NIC_PARAVIRT */
|
||||
|
||||
if (head > lim)
|
||||
return netmap_ring_reinit(kring);
|
||||
|
||||
#ifdef NIC_PARAVIRT
|
||||
if (csb_mode) {
|
||||
force_update = 1;
|
||||
csb->guest_need_rxkick = 0;
|
||||
}
|
||||
#endif /* NIC_PARAVIRT */
|
||||
/* XXX check sync modes */
|
||||
bus_dmamap_sync(adapter->rxdma.dma_tag, adapter->rxdma.dma_map,
|
||||
BUS_DMASYNC_POSTREAD | BUS_DMASYNC_POSTWRITE);
|
||||
@ -357,23 +228,6 @@ lem_netmap_rxsync(struct netmap_kring *kring, int flags)
|
||||
uint32_t staterr = le32toh(curr->status);
|
||||
int len;
|
||||
|
||||
#ifdef NIC_PARAVIRT
|
||||
if (csb_mode) {
|
||||
if ((staterr & E1000_RXD_STAT_DD) == 0) {
|
||||
/* don't bother to retry if more than 1 pkt */
|
||||
if (n > 1)
|
||||
break;
|
||||
csb->guest_need_rxkick = 1;
|
||||
wmb();
|
||||
staterr = le32toh(curr->status);
|
||||
if ((staterr & E1000_RXD_STAT_DD) == 0) {
|
||||
break;
|
||||
} else { /* we are good */
|
||||
csb->guest_need_rxkick = 0;
|
||||
}
|
||||
}
|
||||
} else
|
||||
#endif /* NIC_PARAVIRT */
|
||||
if ((staterr & E1000_RXD_STAT_DD) == 0)
|
||||
break;
|
||||
len = le16toh(curr->length) - 4; // CRC
|
||||
@ -390,18 +244,6 @@ lem_netmap_rxsync(struct netmap_kring *kring, int flags)
|
||||
nic_i = nm_next(nic_i, lim);
|
||||
}
|
||||
if (n) { /* update the state variables */
|
||||
#ifdef NIC_PARAVIRT
|
||||
if (csb_mode) {
|
||||
if (n > 1) {
|
||||
/* leave one spare buffer so we avoid rxkicks */
|
||||
nm_i = nm_prev(nm_i, lim);
|
||||
nic_i = nm_prev(nic_i, lim);
|
||||
n--;
|
||||
} else {
|
||||
csb->guest_need_rxkick = 1;
|
||||
}
|
||||
}
|
||||
#endif /* NIC_PARAVIRT */
|
||||
ND("%d new packets at nic %d nm %d tail %d",
|
||||
n,
|
||||
adapter->next_rx_desc_to_check,
|
||||
@ -440,10 +282,6 @@ lem_netmap_rxsync(struct netmap_kring *kring, int flags)
|
||||
curr->status = 0;
|
||||
bus_dmamap_sync(adapter->rxtag, rxbuf->map,
|
||||
BUS_DMASYNC_PREREAD);
|
||||
#ifdef NIC_PARAVIRT
|
||||
if (csb_mode && csb->host_rxkick_at == nic_i)
|
||||
do_host_rxkick = 1;
|
||||
#endif /* NIC_PARAVIRT */
|
||||
nm_i = nm_next(nm_i, lim);
|
||||
nic_i = nm_next(nic_i, lim);
|
||||
}
|
||||
@ -455,12 +293,6 @@ lem_netmap_rxsync(struct netmap_kring *kring, int flags)
|
||||
* so move nic_i back by one unit
|
||||
*/
|
||||
nic_i = nm_prev(nic_i, lim);
|
||||
#ifdef NIC_PARAVIRT
|
||||
/* set unconditionally, then also kick if needed */
|
||||
if (csb)
|
||||
csb->guest_rdt = nic_i;
|
||||
if (!csb_mode || do_host_rxkick)
|
||||
#endif /* NIC_PARAVIRT */
|
||||
E1000_WRITE_REG(&adapter->hw, E1000_RDT(0), nic_i);
|
||||
}
|
||||
|
||||
@ -486,6 +318,7 @@ lem_netmap_attach(struct adapter *adapter)
|
||||
na.nm_rxsync = lem_netmap_rxsync;
|
||||
na.nm_register = lem_netmap_reg;
|
||||
na.num_tx_rings = na.num_rx_rings = 1;
|
||||
na.nm_intr = lem_netmap_intr;
|
||||
netmap_attach(&na);
|
||||
}
|
||||
|
||||
|
@ -53,7 +53,7 @@ void ixgbe_netmap_attach(struct adapter *adapter);
|
||||
/*
|
||||
* device-specific sysctl variables:
|
||||
*
|
||||
* ix_crcstrip: 0: keep CRC in rx frames (default), 1: strip it.
|
||||
* ix_crcstrip: 0: NIC keeps CRC in rx frames (default), 1: NIC strips it.
|
||||
* During regular operations the CRC is stripped, but on some
|
||||
* hardware reception of frames not multiple of 64 is slower,
|
||||
* so using crcstrip=0 helps in benchmarks.
|
||||
@ -65,7 +65,7 @@ SYSCTL_DECL(_dev_netmap);
|
||||
static int ix_rx_miss, ix_rx_miss_bufs;
|
||||
int ix_crcstrip;
|
||||
SYSCTL_INT(_dev_netmap, OID_AUTO, ix_crcstrip,
|
||||
CTLFLAG_RW, &ix_crcstrip, 0, "strip CRC on rx frames");
|
||||
CTLFLAG_RW, &ix_crcstrip, 0, "NIC strips CRC on rx frames");
|
||||
SYSCTL_INT(_dev_netmap, OID_AUTO, ix_rx_miss,
|
||||
CTLFLAG_RW, &ix_rx_miss, 0, "potentially missed rx intr");
|
||||
SYSCTL_INT(_dev_netmap, OID_AUTO, ix_rx_miss_bufs,
|
||||
@ -109,6 +109,20 @@ set_crcstrip(struct ixgbe_hw *hw, int onoff)
|
||||
IXGBE_WRITE_REG(hw, IXGBE_RDRXCTL, rxc);
|
||||
}
|
||||
|
||||
static void
|
||||
ixgbe_netmap_intr(struct netmap_adapter *na, int onoff)
|
||||
{
|
||||
struct ifnet *ifp = na->ifp;
|
||||
struct adapter *adapter = ifp->if_softc;
|
||||
|
||||
IXGBE_CORE_LOCK(adapter);
|
||||
if (onoff) {
|
||||
ixgbe_enable_intr(adapter); // XXX maybe ixgbe_stop ?
|
||||
} else {
|
||||
ixgbe_disable_intr(adapter); // XXX maybe ixgbe_stop ?
|
||||
}
|
||||
IXGBE_CORE_UNLOCK(adapter);
|
||||
}
|
||||
|
||||
/*
|
||||
* Register/unregister. We are already under netmap lock.
|
||||
@ -311,7 +325,7 @@ ixgbe_netmap_txsync(struct netmap_kring *kring, int flags)
|
||||
* good way.
|
||||
*/
|
||||
nic_i = IXGBE_READ_REG(&adapter->hw, IXGBE_IS_VF(adapter) ?
|
||||
IXGBE_VFTDH(kring->ring_id) : IXGBE_TDH(kring->ring_id));
|
||||
IXGBE_VFTDH(kring->ring_id) : IXGBE_TDH(kring->ring_id));
|
||||
if (nic_i >= kring->nkr_num_slots) { /* XXX can it happen ? */
|
||||
D("TDH wrap %d", nic_i);
|
||||
nic_i -= kring->nkr_num_slots;
|
||||
@ -486,6 +500,7 @@ ixgbe_netmap_attach(struct adapter *adapter)
|
||||
na.nm_rxsync = ixgbe_netmap_rxsync;
|
||||
na.nm_register = ixgbe_netmap_reg;
|
||||
na.num_tx_rings = na.num_rx_rings = adapter->num_queues;
|
||||
na.nm_intr = ixgbe_netmap_intr;
|
||||
netmap_attach(&na);
|
||||
}
|
||||
|
||||
|
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
@ -1,5 +1,6 @@
|
||||
/*
|
||||
* Copyright (C) 2013-2014 Vincenzo Maffione. All rights reserved.
|
||||
* Copyright (C) 2013-2014 Vincenzo Maffione
|
||||
* All rights reserved.
|
||||
*
|
||||
* Redistribution and use in source and binary forms, with or without
|
||||
* modification, are permitted provided that the following conditions
|
||||
@ -30,6 +31,8 @@
|
||||
|
||||
#ifdef linux
|
||||
#include "bsd_glue.h"
|
||||
#elif defined (_WIN32)
|
||||
#include "win_glue.h"
|
||||
#else /* __FreeBSD__ */
|
||||
#include <sys/param.h>
|
||||
#include <sys/lock.h>
|
||||
@ -152,12 +155,12 @@ void mbq_safe_purge(struct mbq *q)
|
||||
}
|
||||
|
||||
|
||||
void mbq_safe_destroy(struct mbq *q)
|
||||
void mbq_safe_fini(struct mbq *q)
|
||||
{
|
||||
mtx_destroy(&q->lock);
|
||||
}
|
||||
|
||||
|
||||
void mbq_destroy(struct mbq *q)
|
||||
void mbq_fini(struct mbq *q)
|
||||
{
|
||||
}
|
||||
|
@ -1,5 +1,6 @@
|
||||
/*
|
||||
* Copyright (C) 2013-2014 Vincenzo Maffione. All rights reserved.
|
||||
* Copyright (C) 2013-2014 Vincenzo Maffione
|
||||
* All rights reserved.
|
||||
*
|
||||
* Redistribution and use in source and binary forms, with or without
|
||||
* modification, are permitted provided that the following conditions
|
||||
@ -40,6 +41,8 @@
|
||||
/* XXX probably rely on a previous definition of SPINLOCK_T */
|
||||
#ifdef linux
|
||||
#define SPINLOCK_T safe_spinlock_t
|
||||
#elif defined (_WIN32)
|
||||
#define SPINLOCK_T win_spinlock_t
|
||||
#else
|
||||
#define SPINLOCK_T struct mtx
|
||||
#endif
|
||||
@ -52,16 +55,21 @@ struct mbq {
|
||||
SPINLOCK_T lock;
|
||||
};
|
||||
|
||||
/* XXX "destroy" does not match "init" as a name.
|
||||
* We should also clarify whether init can be used while
|
||||
/* We should clarify whether init can be used while
|
||||
* holding a lock, and whether mbq_safe_destroy() is a NOP.
|
||||
*/
|
||||
void mbq_init(struct mbq *q);
|
||||
void mbq_destroy(struct mbq *q);
|
||||
void mbq_fini(struct mbq *q);
|
||||
void mbq_enqueue(struct mbq *q, struct mbuf *m);
|
||||
struct mbuf *mbq_dequeue(struct mbq *q);
|
||||
void mbq_purge(struct mbq *q);
|
||||
|
||||
static inline struct mbuf *
|
||||
mbq_peek(struct mbq *q)
|
||||
{
|
||||
return q->head ? q->head : NULL;
|
||||
}
|
||||
|
||||
static inline void
|
||||
mbq_lock(struct mbq *q)
|
||||
{
|
||||
@ -76,7 +84,7 @@ mbq_unlock(struct mbq *q)
|
||||
|
||||
|
||||
void mbq_safe_init(struct mbq *q);
|
||||
void mbq_safe_destroy(struct mbq *q);
|
||||
void mbq_safe_fini(struct mbq *q);
|
||||
void mbq_safe_enqueue(struct mbq *q, struct mbuf *m);
|
||||
struct mbuf *mbq_safe_dequeue(struct mbq *q);
|
||||
void mbq_safe_purge(struct mbq *q);
|
||||
|
File diff suppressed because it is too large
Load Diff
@ -1,5 +1,8 @@
|
||||
/*
|
||||
* Copyright (C) 2012-2014 Matteo Landi, Luigi Rizzo, Giuseppe Lettieri. All rights reserved.
|
||||
* Copyright (C) 2012-2014 Matteo Landi
|
||||
* Copyright (C) 2012-2016 Luigi Rizzo
|
||||
* Copyright (C) 2012-2016 Giuseppe Lettieri
|
||||
* All rights reserved.
|
||||
*
|
||||
* Redistribution and use in source and binary forms, with or without
|
||||
* modification, are permitted provided that the following conditions
|
||||
@ -117,8 +120,11 @@
|
||||
|
||||
extern struct netmap_mem_d nm_mem;
|
||||
|
||||
void netmap_mem_get_lut(struct netmap_mem_d *, struct netmap_lut *);
|
||||
int netmap_mem_get_lut(struct netmap_mem_d *, struct netmap_lut *);
|
||||
vm_paddr_t netmap_mem_ofstophys(struct netmap_mem_d *, vm_ooffset_t);
|
||||
#ifdef _WIN32
|
||||
PMDL win32_build_user_vm_map(struct netmap_mem_d* nmd);
|
||||
#endif
|
||||
int netmap_mem_finalize(struct netmap_mem_d *, struct netmap_adapter *);
|
||||
int netmap_mem_init(void);
|
||||
void netmap_mem_fini(void);
|
||||
@ -127,6 +133,7 @@ void netmap_mem_if_delete(struct netmap_adapter *, struct netmap_if *);
|
||||
int netmap_mem_rings_create(struct netmap_adapter *);
|
||||
void netmap_mem_rings_delete(struct netmap_adapter *);
|
||||
void netmap_mem_deref(struct netmap_mem_d *, struct netmap_adapter *);
|
||||
int netmap_mem2_get_pool_info(struct netmap_mem_d *, u_int, u_int *, u_int *);
|
||||
int netmap_mem_get_info(struct netmap_mem_d *, u_int *size, u_int *memflags, uint16_t *id);
|
||||
ssize_t netmap_mem_if_offset(struct netmap_mem_d *, const void *vaddr);
|
||||
struct netmap_mem_d* netmap_mem_private_new(const char *name,
|
||||
@ -157,6 +164,15 @@ void netmap_mem_put(struct netmap_mem_d *);
|
||||
|
||||
#endif /* !NM_DEBUG_PUTGET */
|
||||
|
||||
#ifdef WITH_PTNETMAP_GUEST
|
||||
struct netmap_mem_d* netmap_mem_pt_guest_new(struct ifnet *,
|
||||
unsigned int nifp_offset,
|
||||
nm_pt_guest_ptctl_t);
|
||||
struct ptnetmap_memdev;
|
||||
struct netmap_mem_d* netmap_mem_pt_guest_attach(struct ptnetmap_memdev *, uint16_t);
|
||||
int netmap_mem_pt_guest_ifp_del(struct netmap_mem_d *, struct ifnet *);
|
||||
#endif /* WITH_PTNETMAP_GUEST */
|
||||
|
||||
#define NETMAP_MEM_PRIVATE 0x2 /* allocator uses private address space */
|
||||
#define NETMAP_MEM_IO 0x4 /* the underlying memory is mmapped I/O */
|
||||
|
||||
|
@ -1,5 +1,6 @@
|
||||
/*
|
||||
* Copyright (C) 2014 Giuseppe Lettieri. All rights reserved.
|
||||
* Copyright (C) 2014-2016 Giuseppe Lettieri
|
||||
* All rights reserved.
|
||||
*
|
||||
* Redistribution and use in source and binary forms, with or without
|
||||
* modification, are permitted provided that the following conditions
|
||||
@ -101,6 +102,8 @@
|
||||
#warning OSX support is only partial
|
||||
#include "osx_glue.h"
|
||||
|
||||
#elif defined(_WIN32)
|
||||
#include "win_glue.h"
|
||||
#else
|
||||
|
||||
#error Unsupported platform
|
||||
@ -151,13 +154,17 @@ netmap_monitor_rxsync(struct netmap_kring *kring, int flags)
|
||||
}
|
||||
|
||||
/* nm_krings_create callbacks for monitors.
|
||||
* We could use the default netmap_hw_krings_zmon, but
|
||||
* we don't need the mbq.
|
||||
*/
|
||||
static int
|
||||
netmap_monitor_krings_create(struct netmap_adapter *na)
|
||||
{
|
||||
return netmap_krings_create(na, 0);
|
||||
int error = netmap_krings_create(na, 0);
|
||||
if (error)
|
||||
return error;
|
||||
/* override the host rings callbacks */
|
||||
na->tx_rings[na->num_tx_rings].nm_sync = netmap_monitor_txsync;
|
||||
na->rx_rings[na->num_rx_rings].nm_sync = netmap_monitor_rxsync;
|
||||
return 0;
|
||||
}
|
||||
|
||||
/* nm_krings_delete callback for monitors */
|
||||
@ -186,7 +193,11 @@ nm_monitor_alloc(struct netmap_kring *kring, u_int n)
|
||||
return 0;
|
||||
|
||||
len = sizeof(struct netmap_kring *) * n;
|
||||
#ifndef _WIN32
|
||||
nm = realloc(kring->monitors, len, M_DEVBUF, M_NOWAIT | M_ZERO);
|
||||
#else
|
||||
nm = realloc(kring->monitors, len, sizeof(struct netmap_kring *)*kring->max_monitors);
|
||||
#endif
|
||||
if (nm == NULL)
|
||||
return ENOMEM;
|
||||
|
||||
@ -229,10 +240,10 @@ static int netmap_monitor_parent_notify(struct netmap_kring *, int);
|
||||
static int
|
||||
netmap_monitor_add(struct netmap_kring *mkring, struct netmap_kring *kring, int zcopy)
|
||||
{
|
||||
int error = 0;
|
||||
int error = NM_IRQ_COMPLETED;
|
||||
|
||||
/* sinchronize with concurrently running nm_sync()s */
|
||||
nm_kr_get(kring);
|
||||
nm_kr_stop(kring, NM_KR_LOCKED);
|
||||
/* make sure the monitor array exists and is big enough */
|
||||
error = nm_monitor_alloc(kring, kring->n_monitors + 1);
|
||||
if (error)
|
||||
@ -242,7 +253,7 @@ netmap_monitor_add(struct netmap_kring *mkring, struct netmap_kring *kring, int
|
||||
kring->n_monitors++;
|
||||
if (kring->n_monitors == 1) {
|
||||
/* this is the first monitor, intercept callbacks */
|
||||
D("%s: intercept callbacks on %s", mkring->name, kring->name);
|
||||
ND("%s: intercept callbacks on %s", mkring->name, kring->name);
|
||||
kring->mon_sync = kring->nm_sync;
|
||||
/* zcopy monitors do not override nm_notify(), but
|
||||
* we save the original one regardless, so that
|
||||
@ -265,7 +276,7 @@ netmap_monitor_add(struct netmap_kring *mkring, struct netmap_kring *kring, int
|
||||
}
|
||||
|
||||
out:
|
||||
nm_kr_put(kring);
|
||||
nm_kr_start(kring);
|
||||
return error;
|
||||
}
|
||||
|
||||
@ -277,7 +288,7 @@ static void
|
||||
netmap_monitor_del(struct netmap_kring *mkring, struct netmap_kring *kring)
|
||||
{
|
||||
/* sinchronize with concurrently running nm_sync()s */
|
||||
nm_kr_get(kring);
|
||||
nm_kr_stop(kring, NM_KR_LOCKED);
|
||||
kring->n_monitors--;
|
||||
if (mkring->mon_pos != kring->n_monitors) {
|
||||
kring->monitors[mkring->mon_pos] = kring->monitors[kring->n_monitors];
|
||||
@ -286,18 +297,18 @@ netmap_monitor_del(struct netmap_kring *mkring, struct netmap_kring *kring)
|
||||
kring->monitors[kring->n_monitors] = NULL;
|
||||
if (kring->n_monitors == 0) {
|
||||
/* this was the last monitor, restore callbacks and delete monitor array */
|
||||
D("%s: restoring sync on %s: %p", mkring->name, kring->name, kring->mon_sync);
|
||||
ND("%s: restoring sync on %s: %p", mkring->name, kring->name, kring->mon_sync);
|
||||
kring->nm_sync = kring->mon_sync;
|
||||
kring->mon_sync = NULL;
|
||||
if (kring->tx == NR_RX) {
|
||||
D("%s: restoring notify on %s: %p",
|
||||
ND("%s: restoring notify on %s: %p",
|
||||
mkring->name, kring->name, kring->mon_notify);
|
||||
kring->nm_notify = kring->mon_notify;
|
||||
kring->mon_notify = NULL;
|
||||
}
|
||||
nm_monitor_dealloc(kring);
|
||||
}
|
||||
nm_kr_put(kring);
|
||||
nm_kr_start(kring);
|
||||
}
|
||||
|
||||
|
||||
@ -316,7 +327,7 @@ netmap_monitor_stop(struct netmap_adapter *na)
|
||||
for_rx_tx(t) {
|
||||
u_int i;
|
||||
|
||||
for (i = 0; i < nma_get_nrings(na, t); i++) {
|
||||
for (i = 0; i < nma_get_nrings(na, t) + 1; i++) {
|
||||
struct netmap_kring *kring = &NMR(na, t)[i];
|
||||
u_int j;
|
||||
|
||||
@ -360,23 +371,32 @@ netmap_monitor_reg_common(struct netmap_adapter *na, int onoff, int zmon)
|
||||
for (i = priv->np_qfirst[t]; i < priv->np_qlast[t]; i++) {
|
||||
kring = &NMR(pna, t)[i];
|
||||
mkring = &na->rx_rings[i];
|
||||
netmap_monitor_add(mkring, kring, zmon);
|
||||
if (nm_kring_pending_on(mkring)) {
|
||||
netmap_monitor_add(mkring, kring, zmon);
|
||||
mkring->nr_mode = NKR_NETMAP_ON;
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
na->na_flags |= NAF_NETMAP_ON;
|
||||
} else {
|
||||
if (pna == NULL) {
|
||||
D("%s: parent left netmap mode, nothing to restore", na->name);
|
||||
return 0;
|
||||
}
|
||||
na->na_flags &= ~NAF_NETMAP_ON;
|
||||
if (na->active_fds == 0)
|
||||
na->na_flags &= ~NAF_NETMAP_ON;
|
||||
for_rx_tx(t) {
|
||||
if (mna->flags & nm_txrx2flag(t)) {
|
||||
for (i = priv->np_qfirst[t]; i < priv->np_qlast[t]; i++) {
|
||||
kring = &NMR(pna, t)[i];
|
||||
mkring = &na->rx_rings[i];
|
||||
netmap_monitor_del(mkring, kring);
|
||||
if (nm_kring_pending_off(mkring)) {
|
||||
mkring->nr_mode = NKR_NETMAP_OFF;
|
||||
/* we cannot access the parent krings if the parent
|
||||
* has left netmap mode. This is signaled by a NULL
|
||||
* pna pointer
|
||||
*/
|
||||
if (pna) {
|
||||
kring = &NMR(pna, t)[i];
|
||||
netmap_monitor_del(mkring, kring);
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
@ -652,17 +672,27 @@ netmap_monitor_parent_rxsync(struct netmap_kring *kring, int flags)
|
||||
static int
|
||||
netmap_monitor_parent_notify(struct netmap_kring *kring, int flags)
|
||||
{
|
||||
int (*notify)(struct netmap_kring*, int);
|
||||
ND(5, "%s %x", kring->name, flags);
|
||||
/* ?xsync callbacks have tryget called by their callers
|
||||
* (NIOCREGIF and poll()), but here we have to call it
|
||||
* by ourself
|
||||
*/
|
||||
if (nm_kr_tryget(kring))
|
||||
goto out;
|
||||
netmap_monitor_parent_rxsync(kring, NAF_FORCE_READ);
|
||||
if (nm_kr_tryget(kring, 0, NULL)) {
|
||||
/* in all cases, just skip the sync */
|
||||
return NM_IRQ_COMPLETED;
|
||||
}
|
||||
if (kring->n_monitors > 0) {
|
||||
netmap_monitor_parent_rxsync(kring, NAF_FORCE_READ);
|
||||
notify = kring->mon_notify;
|
||||
} else {
|
||||
/* we are no longer monitoring this ring, so both
|
||||
* mon_sync and mon_notify are NULL
|
||||
*/
|
||||
notify = kring->nm_notify;
|
||||
}
|
||||
nm_kr_put(kring);
|
||||
out:
|
||||
return kring->mon_notify(kring, flags);
|
||||
return notify(kring, flags);
|
||||
}
|
||||
|
||||
|
||||
@ -691,18 +721,25 @@ netmap_get_monitor_na(struct nmreq *nmr, struct netmap_adapter **na, int create)
|
||||
struct nmreq pnmr;
|
||||
struct netmap_adapter *pna; /* parent adapter */
|
||||
struct netmap_monitor_adapter *mna;
|
||||
struct ifnet *ifp = NULL;
|
||||
int i, error;
|
||||
enum txrx t;
|
||||
int zcopy = (nmr->nr_flags & NR_ZCOPY_MON);
|
||||
char monsuff[10] = "";
|
||||
|
||||
if ((nmr->nr_flags & (NR_MONITOR_TX | NR_MONITOR_RX)) == 0) {
|
||||
if (nmr->nr_flags & NR_ZCOPY_MON) {
|
||||
/* the flag makes no sense unless you are
|
||||
* creating a monitor
|
||||
*/
|
||||
return EINVAL;
|
||||
}
|
||||
ND("not a monitor");
|
||||
return 0;
|
||||
}
|
||||
/* this is a request for a monitor adapter */
|
||||
|
||||
D("flags %x", nmr->nr_flags);
|
||||
ND("flags %x", nmr->nr_flags);
|
||||
|
||||
mna = malloc(sizeof(*mna), M_DEVBUF, M_NOWAIT | M_ZERO);
|
||||
if (mna == NULL) {
|
||||
@ -716,13 +753,14 @@ netmap_get_monitor_na(struct nmreq *nmr, struct netmap_adapter **na, int create)
|
||||
* except other monitors.
|
||||
*/
|
||||
memcpy(&pnmr, nmr, sizeof(pnmr));
|
||||
pnmr.nr_flags &= ~(NR_MONITOR_TX | NR_MONITOR_RX);
|
||||
error = netmap_get_na(&pnmr, &pna, create);
|
||||
pnmr.nr_flags &= ~(NR_MONITOR_TX | NR_MONITOR_RX | NR_ZCOPY_MON);
|
||||
error = netmap_get_na(&pnmr, &pna, &ifp, create);
|
||||
if (error) {
|
||||
D("parent lookup failed: %d", error);
|
||||
free(mna, M_DEVBUF);
|
||||
return error;
|
||||
}
|
||||
D("found parent: %s", pna->name);
|
||||
ND("found parent: %s", pna->name);
|
||||
|
||||
if (!nm_netmap_on(pna)) {
|
||||
/* parent not in netmap mode */
|
||||
@ -829,19 +867,17 @@ netmap_get_monitor_na(struct nmreq *nmr, struct netmap_adapter **na, int create)
|
||||
*na = &mna->up;
|
||||
netmap_adapter_get(*na);
|
||||
|
||||
/* write the configuration back */
|
||||
nmr->nr_tx_rings = mna->up.num_tx_rings;
|
||||
nmr->nr_rx_rings = mna->up.num_rx_rings;
|
||||
nmr->nr_tx_slots = mna->up.num_tx_desc;
|
||||
nmr->nr_rx_slots = mna->up.num_rx_desc;
|
||||
|
||||
/* keep the reference to the parent */
|
||||
D("monitor ok");
|
||||
ND("monitor ok");
|
||||
|
||||
/* drop the reference to the ifp, if any */
|
||||
if (ifp)
|
||||
if_rele(ifp);
|
||||
|
||||
return 0;
|
||||
|
||||
put_out:
|
||||
netmap_adapter_put(pna);
|
||||
netmap_unget_na(pna, ifp);
|
||||
free(mna, M_DEVBUF);
|
||||
return error;
|
||||
}
|
||||
|
@ -1,5 +1,6 @@
|
||||
/*
|
||||
* Copyright (C) 2014 Vincenzo Maffione. All rights reserved.
|
||||
* Copyright (C) 2014-2015 Vincenzo Maffione
|
||||
* All rights reserved.
|
||||
*
|
||||
* Redistribution and use in source and binary forms, with or without
|
||||
* modification, are permitted provided that the following conditions
|
||||
@ -31,9 +32,9 @@
|
||||
#include <sys/types.h>
|
||||
#include <sys/errno.h>
|
||||
#include <sys/param.h> /* defines used in kernel.h */
|
||||
#include <sys/malloc.h> /* types used in module initialization */
|
||||
#include <sys/kernel.h> /* types used in module initialization */
|
||||
#include <sys/sockio.h>
|
||||
#include <sys/malloc.h>
|
||||
#include <sys/socketvar.h> /* struct socket */
|
||||
#include <sys/socket.h> /* sockaddrs */
|
||||
#include <net/if.h>
|
||||
@ -64,21 +65,21 @@
|
||||
/* This routine is called by bdg_mismatch_datapath() when it finishes
|
||||
* accumulating bytes for a segment, in order to fix some fields in the
|
||||
* segment headers (which still contain the same content as the header
|
||||
* of the original GSO packet). 'buf' points to the beginning (e.g.
|
||||
* the ethernet header) of the segment, and 'len' is its length.
|
||||
* of the original GSO packet). 'pkt' points to the beginning of the IP
|
||||
* header of the segment, while 'len' is the length of the IP packet.
|
||||
*/
|
||||
static void gso_fix_segment(uint8_t *buf, size_t len, u_int idx,
|
||||
u_int segmented_bytes, u_int last_segment,
|
||||
u_int tcp, u_int iphlen)
|
||||
static void
|
||||
gso_fix_segment(uint8_t *pkt, size_t len, u_int ipv4, u_int iphlen, u_int tcp,
|
||||
u_int idx, u_int segmented_bytes, u_int last_segment)
|
||||
{
|
||||
struct nm_iphdr *iph = (struct nm_iphdr *)(buf + 14);
|
||||
struct nm_ipv6hdr *ip6h = (struct nm_ipv6hdr *)(buf + 14);
|
||||
struct nm_iphdr *iph = (struct nm_iphdr *)(pkt);
|
||||
struct nm_ipv6hdr *ip6h = (struct nm_ipv6hdr *)(pkt);
|
||||
uint16_t *check = NULL;
|
||||
uint8_t *check_data = NULL;
|
||||
|
||||
if (iphlen == 20) {
|
||||
if (ipv4) {
|
||||
/* Set the IPv4 "Total Length" field. */
|
||||
iph->tot_len = htobe16(len-14);
|
||||
iph->tot_len = htobe16(len);
|
||||
ND("ip total length %u", be16toh(ip->tot_len));
|
||||
|
||||
/* Set the IPv4 "Identification" field. */
|
||||
@ -87,15 +88,15 @@ static void gso_fix_segment(uint8_t *buf, size_t len, u_int idx,
|
||||
|
||||
/* Compute and insert the IPv4 header checksum. */
|
||||
iph->check = 0;
|
||||
iph->check = nm_csum_ipv4(iph);
|
||||
iph->check = nm_os_csum_ipv4(iph);
|
||||
ND("IP csum %x", be16toh(iph->check));
|
||||
} else {/* if (iphlen == 40) */
|
||||
} else {
|
||||
/* Set the IPv6 "Payload Len" field. */
|
||||
ip6h->payload_len = htobe16(len-14-iphlen);
|
||||
ip6h->payload_len = htobe16(len-iphlen);
|
||||
}
|
||||
|
||||
if (tcp) {
|
||||
struct nm_tcphdr *tcph = (struct nm_tcphdr *)(buf + 14 + iphlen);
|
||||
struct nm_tcphdr *tcph = (struct nm_tcphdr *)(pkt + iphlen);
|
||||
|
||||
/* Set the TCP sequence number. */
|
||||
tcph->seq = htobe32(be32toh(tcph->seq) + segmented_bytes);
|
||||
@ -110,10 +111,10 @@ static void gso_fix_segment(uint8_t *buf, size_t len, u_int idx,
|
||||
check = &tcph->check;
|
||||
check_data = (uint8_t *)tcph;
|
||||
} else { /* UDP */
|
||||
struct nm_udphdr *udph = (struct nm_udphdr *)(buf + 14 + iphlen);
|
||||
struct nm_udphdr *udph = (struct nm_udphdr *)(pkt + iphlen);
|
||||
|
||||
/* Set the UDP 'Length' field. */
|
||||
udph->len = htobe16(len-14-iphlen);
|
||||
udph->len = htobe16(len-iphlen);
|
||||
|
||||
check = &udph->check;
|
||||
check_data = (uint8_t *)udph;
|
||||
@ -121,48 +122,80 @@ static void gso_fix_segment(uint8_t *buf, size_t len, u_int idx,
|
||||
|
||||
/* Compute and insert TCP/UDP checksum. */
|
||||
*check = 0;
|
||||
if (iphlen == 20)
|
||||
nm_csum_tcpudp_ipv4(iph, check_data, len-14-iphlen, check);
|
||||
if (ipv4)
|
||||
nm_os_csum_tcpudp_ipv4(iph, check_data, len-iphlen, check);
|
||||
else
|
||||
nm_csum_tcpudp_ipv6(ip6h, check_data, len-14-iphlen, check);
|
||||
nm_os_csum_tcpudp_ipv6(ip6h, check_data, len-iphlen, check);
|
||||
|
||||
ND("TCP/UDP csum %x", be16toh(*check));
|
||||
}
|
||||
|
||||
static int
|
||||
vnet_hdr_is_bad(struct nm_vnet_hdr *vh)
|
||||
{
|
||||
uint8_t gso_type = vh->gso_type & ~VIRTIO_NET_HDR_GSO_ECN;
|
||||
|
||||
return (
|
||||
(gso_type != VIRTIO_NET_HDR_GSO_NONE &&
|
||||
gso_type != VIRTIO_NET_HDR_GSO_TCPV4 &&
|
||||
gso_type != VIRTIO_NET_HDR_GSO_UDP &&
|
||||
gso_type != VIRTIO_NET_HDR_GSO_TCPV6)
|
||||
||
|
||||
(vh->flags & ~(VIRTIO_NET_HDR_F_NEEDS_CSUM
|
||||
| VIRTIO_NET_HDR_F_DATA_VALID))
|
||||
);
|
||||
}
|
||||
|
||||
/* The VALE mismatch datapath implementation. */
|
||||
void bdg_mismatch_datapath(struct netmap_vp_adapter *na,
|
||||
struct netmap_vp_adapter *dst_na,
|
||||
struct nm_bdg_fwd *ft_p, struct netmap_ring *ring,
|
||||
u_int *j, u_int lim, u_int *howmany)
|
||||
void
|
||||
bdg_mismatch_datapath(struct netmap_vp_adapter *na,
|
||||
struct netmap_vp_adapter *dst_na,
|
||||
const struct nm_bdg_fwd *ft_p,
|
||||
struct netmap_ring *dst_ring,
|
||||
u_int *j, u_int lim, u_int *howmany)
|
||||
{
|
||||
struct netmap_slot *slot = NULL;
|
||||
struct netmap_slot *dst_slot = NULL;
|
||||
struct nm_vnet_hdr *vh = NULL;
|
||||
/* Number of source slots to process. */
|
||||
u_int frags = ft_p->ft_frags;
|
||||
struct nm_bdg_fwd *ft_end = ft_p + frags;
|
||||
const struct nm_bdg_fwd *ft_end = ft_p + ft_p->ft_frags;
|
||||
|
||||
/* Source and destination pointers. */
|
||||
uint8_t *dst, *src;
|
||||
size_t src_len, dst_len;
|
||||
|
||||
/* Indices and counters for the destination ring. */
|
||||
u_int j_start = *j;
|
||||
u_int j_cur = j_start;
|
||||
u_int dst_slots = 0;
|
||||
|
||||
/* If the source port uses the offloadings, while destination doesn't,
|
||||
* we grab the source virtio-net header and do the offloadings here.
|
||||
*/
|
||||
if (na->virt_hdr_len && !dst_na->virt_hdr_len) {
|
||||
vh = (struct nm_vnet_hdr *)ft_p->ft_buf;
|
||||
if (unlikely(ft_p == ft_end)) {
|
||||
RD(3, "No source slots to process");
|
||||
return;
|
||||
}
|
||||
|
||||
/* Init source and dest pointers. */
|
||||
src = ft_p->ft_buf;
|
||||
src_len = ft_p->ft_len;
|
||||
slot = &ring->slot[*j];
|
||||
dst = NMB(&dst_na->up, slot);
|
||||
dst_slot = &dst_ring->slot[j_cur];
|
||||
dst = NMB(&dst_na->up, dst_slot);
|
||||
dst_len = src_len;
|
||||
|
||||
/* If the source port uses the offloadings, while destination doesn't,
|
||||
* we grab the source virtio-net header and do the offloadings here.
|
||||
*/
|
||||
if (na->up.virt_hdr_len && !dst_na->up.virt_hdr_len) {
|
||||
vh = (struct nm_vnet_hdr *)src;
|
||||
/* Initial sanity check on the source virtio-net header. If
|
||||
* something seems wrong, just drop the packet. */
|
||||
if (src_len < na->up.virt_hdr_len) {
|
||||
RD(3, "Short src vnet header, dropping");
|
||||
return;
|
||||
}
|
||||
if (vnet_hdr_is_bad(vh)) {
|
||||
RD(3, "Bad src vnet header, dropping");
|
||||
return;
|
||||
}
|
||||
}
|
||||
|
||||
/* We are processing the first input slot and there is a mismatch
|
||||
* between source and destination virt_hdr_len (SHL and DHL).
|
||||
* When the a client is using virtio-net headers, the header length
|
||||
@ -185,14 +218,14 @@ void bdg_mismatch_datapath(struct netmap_vp_adapter *na,
|
||||
* 12 | 0 | doesn't exist
|
||||
* 12 | 10 | copied from the first 10 bytes of source header
|
||||
*/
|
||||
bzero(dst, dst_na->virt_hdr_len);
|
||||
if (na->virt_hdr_len && dst_na->virt_hdr_len)
|
||||
bzero(dst, dst_na->up.virt_hdr_len);
|
||||
if (na->up.virt_hdr_len && dst_na->up.virt_hdr_len)
|
||||
memcpy(dst, src, sizeof(struct nm_vnet_hdr));
|
||||
/* Skip the virtio-net headers. */
|
||||
src += na->virt_hdr_len;
|
||||
src_len -= na->virt_hdr_len;
|
||||
dst += dst_na->virt_hdr_len;
|
||||
dst_len = dst_na->virt_hdr_len + src_len;
|
||||
src += na->up.virt_hdr_len;
|
||||
src_len -= na->up.virt_hdr_len;
|
||||
dst += dst_na->up.virt_hdr_len;
|
||||
dst_len = dst_na->up.virt_hdr_len + src_len;
|
||||
|
||||
/* Here it could be dst_len == 0 (which implies src_len == 0),
|
||||
* so we avoid passing a zero length fragment.
|
||||
@ -214,16 +247,27 @@ void bdg_mismatch_datapath(struct netmap_vp_adapter *na,
|
||||
u_int gso_idx = 0;
|
||||
/* Payload data bytes segmented so far (e.g. TCP data bytes). */
|
||||
u_int segmented_bytes = 0;
|
||||
/* Is this an IPv4 or IPv6 GSO packet? */
|
||||
u_int ipv4 = 0;
|
||||
/* Length of the IP header (20 if IPv4, 40 if IPv6). */
|
||||
u_int iphlen = 0;
|
||||
/* Length of the Ethernet header (18 if 802.1q, otherwise 14). */
|
||||
u_int ethhlen = 14;
|
||||
/* Is this a TCP or an UDP GSO packet? */
|
||||
u_int tcp = ((vh->gso_type & ~VIRTIO_NET_HDR_GSO_ECN)
|
||||
== VIRTIO_NET_HDR_GSO_UDP) ? 0 : 1;
|
||||
|
||||
/* Segment the GSO packet contained into the input slots (frags). */
|
||||
while (ft_p != ft_end) {
|
||||
for (;;) {
|
||||
size_t copy;
|
||||
|
||||
if (dst_slots >= *howmany) {
|
||||
/* We still have work to do, but we've run out of
|
||||
* dst slots, so we have to drop the packet. */
|
||||
RD(3, "Not enough slots, dropping GSO packet");
|
||||
return;
|
||||
}
|
||||
|
||||
/* Grab the GSO header if we don't have it. */
|
||||
if (!gso_hdr) {
|
||||
uint16_t ethertype;
|
||||
@ -231,28 +275,75 @@ void bdg_mismatch_datapath(struct netmap_vp_adapter *na,
|
||||
gso_hdr = src;
|
||||
|
||||
/* Look at the 'Ethertype' field to see if this packet
|
||||
* is IPv4 or IPv6.
|
||||
*/
|
||||
ethertype = be16toh(*((uint16_t *)(gso_hdr + 12)));
|
||||
if (ethertype == 0x0800)
|
||||
iphlen = 20;
|
||||
else /* if (ethertype == 0x86DD) */
|
||||
iphlen = 40;
|
||||
* is IPv4 or IPv6, taking into account VLAN
|
||||
* encapsulation. */
|
||||
for (;;) {
|
||||
if (src_len < ethhlen) {
|
||||
RD(3, "Short GSO fragment [eth], dropping");
|
||||
return;
|
||||
}
|
||||
ethertype = be16toh(*((uint16_t *)
|
||||
(gso_hdr + ethhlen - 2)));
|
||||
if (ethertype != 0x8100) /* not 802.1q */
|
||||
break;
|
||||
ethhlen += 4;
|
||||
}
|
||||
switch (ethertype) {
|
||||
case 0x0800: /* IPv4 */
|
||||
{
|
||||
struct nm_iphdr *iph = (struct nm_iphdr *)
|
||||
(gso_hdr + ethhlen);
|
||||
|
||||
if (src_len < ethhlen + 20) {
|
||||
RD(3, "Short GSO fragment "
|
||||
"[IPv4], dropping");
|
||||
return;
|
||||
}
|
||||
ipv4 = 1;
|
||||
iphlen = 4 * (iph->version_ihl & 0x0F);
|
||||
break;
|
||||
}
|
||||
case 0x86DD: /* IPv6 */
|
||||
ipv4 = 0;
|
||||
iphlen = 40;
|
||||
break;
|
||||
default:
|
||||
RD(3, "Unsupported ethertype, "
|
||||
"dropping GSO packet");
|
||||
return;
|
||||
}
|
||||
ND(3, "type=%04x", ethertype);
|
||||
|
||||
if (src_len < ethhlen + iphlen) {
|
||||
RD(3, "Short GSO fragment [IP], dropping");
|
||||
return;
|
||||
}
|
||||
|
||||
/* Compute gso_hdr_len. For TCP we need to read the
|
||||
* content of the 'Data Offset' field.
|
||||
*/
|
||||
if (tcp) {
|
||||
struct nm_tcphdr *tcph =
|
||||
(struct nm_tcphdr *)&gso_hdr[14+iphlen];
|
||||
struct nm_tcphdr *tcph = (struct nm_tcphdr *)
|
||||
(gso_hdr + ethhlen + iphlen);
|
||||
|
||||
gso_hdr_len = 14 + iphlen + 4*(tcph->doff >> 4);
|
||||
} else
|
||||
gso_hdr_len = 14 + iphlen + 8; /* UDP */
|
||||
if (src_len < ethhlen + iphlen + 20) {
|
||||
RD(3, "Short GSO fragment "
|
||||
"[TCP], dropping");
|
||||
return;
|
||||
}
|
||||
gso_hdr_len = ethhlen + iphlen +
|
||||
4 * (tcph->doff >> 4);
|
||||
} else {
|
||||
gso_hdr_len = ethhlen + iphlen + 8; /* UDP */
|
||||
}
|
||||
|
||||
if (src_len < gso_hdr_len) {
|
||||
RD(3, "Short GSO fragment [TCP/UDP], dropping");
|
||||
return;
|
||||
}
|
||||
|
||||
ND(3, "gso_hdr_len %u gso_mtu %d", gso_hdr_len,
|
||||
dst_na->mfs);
|
||||
dst_na->mfs);
|
||||
|
||||
/* Advance source pointers. */
|
||||
src += gso_hdr_len;
|
||||
@ -263,7 +354,6 @@ void bdg_mismatch_datapath(struct netmap_vp_adapter *na,
|
||||
break;
|
||||
src = ft_p->ft_buf;
|
||||
src_len = ft_p->ft_len;
|
||||
continue;
|
||||
}
|
||||
}
|
||||
|
||||
@ -289,25 +379,24 @@ void bdg_mismatch_datapath(struct netmap_vp_adapter *na,
|
||||
/* After raw segmentation, we must fix some header
|
||||
* fields and compute checksums, in a protocol dependent
|
||||
* way. */
|
||||
gso_fix_segment(dst, gso_bytes, gso_idx,
|
||||
segmented_bytes,
|
||||
src_len == 0 && ft_p + 1 == ft_end,
|
||||
tcp, iphlen);
|
||||
gso_fix_segment(dst + ethhlen, gso_bytes - ethhlen,
|
||||
ipv4, iphlen, tcp,
|
||||
gso_idx, segmented_bytes,
|
||||
src_len == 0 && ft_p + 1 == ft_end);
|
||||
|
||||
ND("frame %u completed with %d bytes", gso_idx, (int)gso_bytes);
|
||||
slot->len = gso_bytes;
|
||||
slot->flags = 0;
|
||||
segmented_bytes += gso_bytes - gso_hdr_len;
|
||||
|
||||
dst_slot->len = gso_bytes;
|
||||
dst_slot->flags = 0;
|
||||
dst_slots++;
|
||||
|
||||
/* Next destination slot. */
|
||||
*j = nm_next(*j, lim);
|
||||
slot = &ring->slot[*j];
|
||||
dst = NMB(&dst_na->up, slot);
|
||||
segmented_bytes += gso_bytes - gso_hdr_len;
|
||||
|
||||
gso_bytes = 0;
|
||||
gso_idx++;
|
||||
|
||||
/* Next destination slot. */
|
||||
j_cur = nm_next(j_cur, lim);
|
||||
dst_slot = &dst_ring->slot[j_cur];
|
||||
dst = NMB(&dst_na->up, dst_slot);
|
||||
}
|
||||
|
||||
/* Next input slot. */
|
||||
@ -342,10 +431,10 @@ void bdg_mismatch_datapath(struct netmap_vp_adapter *na,
|
||||
/* Init/update the packet checksum if needed. */
|
||||
if (vh && (vh->flags & VIRTIO_NET_HDR_F_NEEDS_CSUM)) {
|
||||
if (!dst_slots)
|
||||
csum = nm_csum_raw(src + vh->csum_start,
|
||||
csum = nm_os_csum_raw(src + vh->csum_start,
|
||||
src_len - vh->csum_start, 0);
|
||||
else
|
||||
csum = nm_csum_raw(src, src_len, csum);
|
||||
csum = nm_os_csum_raw(src, src_len, csum);
|
||||
}
|
||||
|
||||
/* Round to a multiple of 64 */
|
||||
@ -359,44 +448,43 @@ void bdg_mismatch_datapath(struct netmap_vp_adapter *na,
|
||||
} else {
|
||||
memcpy(dst, src, (int)src_len);
|
||||
}
|
||||
slot->len = dst_len;
|
||||
|
||||
dst_slot->len = dst_len;
|
||||
dst_slots++;
|
||||
|
||||
/* Next destination slot. */
|
||||
*j = nm_next(*j, lim);
|
||||
slot = &ring->slot[*j];
|
||||
dst = NMB(&dst_na->up, slot);
|
||||
j_cur = nm_next(j_cur, lim);
|
||||
dst_slot = &dst_ring->slot[j_cur];
|
||||
dst = NMB(&dst_na->up, dst_slot);
|
||||
|
||||
/* Next source slot. */
|
||||
ft_p++;
|
||||
src = ft_p->ft_buf;
|
||||
dst_len = src_len = ft_p->ft_len;
|
||||
|
||||
}
|
||||
|
||||
/* Finalize (fold) the checksum if needed. */
|
||||
if (check && vh && (vh->flags & VIRTIO_NET_HDR_F_NEEDS_CSUM)) {
|
||||
*check = nm_csum_fold(csum);
|
||||
*check = nm_os_csum_fold(csum);
|
||||
}
|
||||
ND(3, "using %u dst_slots", dst_slots);
|
||||
|
||||
/* A second pass on the desitations slots to set the slot flags,
|
||||
/* A second pass on the destination slots to set the slot flags,
|
||||
* using the right number of destination slots.
|
||||
*/
|
||||
while (j_start != *j) {
|
||||
slot = &ring->slot[j_start];
|
||||
slot->flags = (dst_slots << 8)| NS_MOREFRAG;
|
||||
while (j_start != j_cur) {
|
||||
dst_slot = &dst_ring->slot[j_start];
|
||||
dst_slot->flags = (dst_slots << 8)| NS_MOREFRAG;
|
||||
j_start = nm_next(j_start, lim);
|
||||
}
|
||||
/* Clear NS_MOREFRAG flag on last entry. */
|
||||
slot->flags = (dst_slots << 8);
|
||||
dst_slot->flags = (dst_slots << 8);
|
||||
}
|
||||
|
||||
/* Update howmany. */
|
||||
/* Update howmany and j. This is to commit the use of
|
||||
* those slots in the destination ring. */
|
||||
if (unlikely(dst_slots > *howmany)) {
|
||||
dst_slots = *howmany;
|
||||
D("Slot allocation error: Should never happen");
|
||||
D("Slot allocation error: This is a bug");
|
||||
}
|
||||
*j = j_cur;
|
||||
*howmany -= dst_slots;
|
||||
}
|
||||
|
@ -1,5 +1,6 @@
|
||||
/*
|
||||
* Copyright (C) 2014 Giuseppe Lettieri. All rights reserved.
|
||||
* Copyright (C) 2014-2016 Giuseppe Lettieri
|
||||
* All rights reserved.
|
||||
*
|
||||
* Redistribution and use in source and binary forms, with or without
|
||||
* modification, are permitted provided that the following conditions
|
||||
@ -54,6 +55,9 @@
|
||||
#warning OSX support is only partial
|
||||
#include "osx_glue.h"
|
||||
|
||||
#elif defined(_WIN32)
|
||||
#include "win_glue.h"
|
||||
|
||||
#else
|
||||
|
||||
#error Unsupported platform
|
||||
@ -72,9 +76,11 @@
|
||||
|
||||
#define NM_PIPE_MAXSLOTS 4096
|
||||
|
||||
int netmap_default_pipes = 0; /* ignored, kept for compatibility */
|
||||
static int netmap_default_pipes = 0; /* ignored, kept for compatibility */
|
||||
SYSBEGIN(vars_pipes);
|
||||
SYSCTL_DECL(_dev_netmap);
|
||||
SYSCTL_INT(_dev_netmap, OID_AUTO, default_pipes, CTLFLAG_RW, &netmap_default_pipes, 0 , "");
|
||||
SYSEND;
|
||||
|
||||
/* allocate the pipe array in the parent adapter */
|
||||
static int
|
||||
@ -91,7 +97,11 @@ nm_pipe_alloc(struct netmap_adapter *na, u_int npipes)
|
||||
return EINVAL;
|
||||
|
||||
len = sizeof(struct netmap_pipe_adapter *) * npipes;
|
||||
#ifndef _WIN32
|
||||
npa = realloc(na->na_pipes, len, M_DEVBUF, M_NOWAIT | M_ZERO);
|
||||
#else
|
||||
npa = realloc(na->na_pipes, len, sizeof(struct netmap_pipe_adapter *)*na->na_max_pipes);
|
||||
#endif
|
||||
if (npa == NULL)
|
||||
return ENOMEM;
|
||||
|
||||
@ -199,7 +209,7 @@ netmap_pipe_txsync(struct netmap_kring *txkring, int flags)
|
||||
}
|
||||
|
||||
while (limit-- > 0) {
|
||||
struct netmap_slot *rs = &rxkring->save_ring->slot[j];
|
||||
struct netmap_slot *rs = &rxkring->ring->slot[j];
|
||||
struct netmap_slot *ts = &txkring->ring->slot[k];
|
||||
struct netmap_slot tmp;
|
||||
|
||||
@ -295,7 +305,7 @@ netmap_pipe_rxsync(struct netmap_kring *rxkring, int flags)
|
||||
* usr1 --> e1 --> e2
|
||||
*
|
||||
* and we are e2. e1 is certainly registered and our
|
||||
* krings already exist, but they may be hidden.
|
||||
* krings already exist. Nothing to do.
|
||||
*/
|
||||
static int
|
||||
netmap_pipe_krings_create(struct netmap_adapter *na)
|
||||
@ -310,65 +320,28 @@ netmap_pipe_krings_create(struct netmap_adapter *na)
|
||||
int i;
|
||||
|
||||
/* case 1) above */
|
||||
ND("%p: case 1, create everything", na);
|
||||
D("%p: case 1, create both ends", na);
|
||||
error = netmap_krings_create(na, 0);
|
||||
if (error)
|
||||
goto err;
|
||||
|
||||
/* we also create all the rings, since we need to
|
||||
* update the save_ring pointers.
|
||||
* netmap_mem_rings_create (called by our caller)
|
||||
* will not create the rings again
|
||||
*/
|
||||
|
||||
error = netmap_mem_rings_create(na);
|
||||
/* create the krings of the other end */
|
||||
error = netmap_krings_create(ona, 0);
|
||||
if (error)
|
||||
goto del_krings1;
|
||||
|
||||
/* update our hidden ring pointers */
|
||||
for_rx_tx(t) {
|
||||
for (i = 0; i < nma_get_nrings(na, t) + 1; i++)
|
||||
NMR(na, t)[i].save_ring = NMR(na, t)[i].ring;
|
||||
}
|
||||
|
||||
/* now, create krings and rings of the other end */
|
||||
error = netmap_krings_create(ona, 0);
|
||||
if (error)
|
||||
goto del_rings1;
|
||||
|
||||
error = netmap_mem_rings_create(ona);
|
||||
if (error)
|
||||
goto del_krings2;
|
||||
|
||||
for_rx_tx(t) {
|
||||
for (i = 0; i < nma_get_nrings(ona, t) + 1; i++)
|
||||
NMR(ona, t)[i].save_ring = NMR(ona, t)[i].ring;
|
||||
}
|
||||
|
||||
/* cross link the krings */
|
||||
for_rx_tx(t) {
|
||||
enum txrx r= nm_txrx_swap(t); /* swap NR_TX <-> NR_RX */
|
||||
enum txrx r = nm_txrx_swap(t); /* swap NR_TX <-> NR_RX */
|
||||
for (i = 0; i < nma_get_nrings(na, t); i++) {
|
||||
NMR(na, t)[i].pipe = NMR(&pna->peer->up, r) + i;
|
||||
NMR(&pna->peer->up, r)[i].pipe = NMR(na, t) + i;
|
||||
}
|
||||
}
|
||||
} else {
|
||||
int i;
|
||||
/* case 2) above */
|
||||
/* recover the hidden rings */
|
||||
ND("%p: case 2, hidden rings", na);
|
||||
for_rx_tx(t) {
|
||||
for (i = 0; i < nma_get_nrings(na, t) + 1; i++)
|
||||
NMR(na, t)[i].ring = NMR(na, t)[i].save_ring;
|
||||
}
|
||||
|
||||
}
|
||||
return 0;
|
||||
|
||||
del_krings2:
|
||||
netmap_krings_delete(ona);
|
||||
del_rings1:
|
||||
netmap_mem_rings_delete(na);
|
||||
del_krings1:
|
||||
netmap_krings_delete(na);
|
||||
err:
|
||||
@ -383,7 +356,8 @@ netmap_pipe_krings_create(struct netmap_adapter *na)
|
||||
*
|
||||
* usr1 --> e1 --> e2
|
||||
*
|
||||
* and we are e1. Nothing special to do.
|
||||
* and we are e1. Create the needed rings of the
|
||||
* other end.
|
||||
*
|
||||
* 1.b) state is
|
||||
*
|
||||
@ -412,14 +386,65 @@ netmap_pipe_reg(struct netmap_adapter *na, int onoff)
|
||||
{
|
||||
struct netmap_pipe_adapter *pna =
|
||||
(struct netmap_pipe_adapter *)na;
|
||||
struct netmap_adapter *ona = &pna->peer->up;
|
||||
int i, error = 0;
|
||||
enum txrx t;
|
||||
|
||||
ND("%p: onoff %d", na, onoff);
|
||||
if (onoff) {
|
||||
na->na_flags |= NAF_NETMAP_ON;
|
||||
for_rx_tx(t) {
|
||||
for (i = 0; i < nma_get_nrings(na, t) + 1; i++) {
|
||||
struct netmap_kring *kring = &NMR(na, t)[i];
|
||||
|
||||
if (nm_kring_pending_on(kring)) {
|
||||
/* mark the partner ring as needed */
|
||||
kring->pipe->nr_kflags |= NKR_NEEDRING;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/* create all missing needed rings on the other end */
|
||||
error = netmap_mem_rings_create(ona);
|
||||
if (error)
|
||||
return error;
|
||||
|
||||
/* In case of no error we put our rings in netmap mode */
|
||||
for_rx_tx(t) {
|
||||
for (i = 0; i < nma_get_nrings(na, t) + 1; i++) {
|
||||
struct netmap_kring *kring = &NMR(na, t)[i];
|
||||
|
||||
if (nm_kring_pending_on(kring)) {
|
||||
kring->nr_mode = NKR_NETMAP_ON;
|
||||
}
|
||||
}
|
||||
}
|
||||
if (na->active_fds == 0)
|
||||
na->na_flags |= NAF_NETMAP_ON;
|
||||
} else {
|
||||
na->na_flags &= ~NAF_NETMAP_ON;
|
||||
if (na->active_fds == 0)
|
||||
na->na_flags &= ~NAF_NETMAP_ON;
|
||||
for_rx_tx(t) {
|
||||
for (i = 0; i < nma_get_nrings(na, t) + 1; i++) {
|
||||
struct netmap_kring *kring = &NMR(na, t)[i];
|
||||
|
||||
if (nm_kring_pending_off(kring)) {
|
||||
kring->nr_mode = NKR_NETMAP_OFF;
|
||||
/* mark the peer ring as no longer needed by us
|
||||
* (it may still be kept if sombody else is using it)
|
||||
*/
|
||||
kring->pipe->nr_kflags &= ~NKR_NEEDRING;
|
||||
}
|
||||
}
|
||||
}
|
||||
/* delete all the peer rings that are no longer needed */
|
||||
netmap_mem_rings_delete(ona);
|
||||
}
|
||||
|
||||
if (na->active_fds) {
|
||||
D("active_fds %d", na->active_fds);
|
||||
return 0;
|
||||
}
|
||||
|
||||
if (pna->peer_ref) {
|
||||
ND("%p: case 1.a or 2.a, nothing to do", na);
|
||||
return 0;
|
||||
@ -429,18 +454,11 @@ netmap_pipe_reg(struct netmap_adapter *na, int onoff)
|
||||
pna->peer->peer_ref = 0;
|
||||
netmap_adapter_put(na);
|
||||
} else {
|
||||
int i;
|
||||
ND("%p: case 2.b, grab peer", na);
|
||||
netmap_adapter_get(na);
|
||||
pna->peer->peer_ref = 1;
|
||||
/* hide our rings from netmap_mem_rings_delete */
|
||||
for_rx_tx(t) {
|
||||
for (i = 0; i < nma_get_nrings(na, t) + 1; i++) {
|
||||
NMR(na, t)[i].ring = NULL;
|
||||
}
|
||||
}
|
||||
}
|
||||
return 0;
|
||||
return error;
|
||||
}
|
||||
|
||||
/* netmap_pipe_krings_delete.
|
||||
@ -470,8 +488,6 @@ netmap_pipe_krings_delete(struct netmap_adapter *na)
|
||||
struct netmap_pipe_adapter *pna =
|
||||
(struct netmap_pipe_adapter *)na;
|
||||
struct netmap_adapter *ona; /* na of the other end */
|
||||
int i;
|
||||
enum txrx t;
|
||||
|
||||
if (!pna->peer_ref) {
|
||||
ND("%p: case 2, kept alive by peer", na);
|
||||
@ -480,18 +496,12 @@ netmap_pipe_krings_delete(struct netmap_adapter *na)
|
||||
/* case 1) above */
|
||||
ND("%p: case 1, deleting everyhing", na);
|
||||
netmap_krings_delete(na); /* also zeroes tx_rings etc. */
|
||||
/* restore the ring to be deleted on the peer */
|
||||
ona = &pna->peer->up;
|
||||
if (ona->tx_rings == NULL) {
|
||||
/* already deleted, we must be on an
|
||||
* cleanup-after-error path */
|
||||
return;
|
||||
}
|
||||
for_rx_tx(t) {
|
||||
for (i = 0; i < nma_get_nrings(ona, t) + 1; i++)
|
||||
NMR(ona, t)[i].ring = NMR(ona, t)[i].save_ring;
|
||||
}
|
||||
netmap_mem_rings_delete(ona);
|
||||
netmap_krings_delete(ona);
|
||||
}
|
||||
|
||||
@ -519,6 +529,7 @@ netmap_get_pipe_na(struct nmreq *nmr, struct netmap_adapter **na, int create)
|
||||
struct nmreq pnmr;
|
||||
struct netmap_adapter *pna; /* parent adapter */
|
||||
struct netmap_pipe_adapter *mna, *sna, *req;
|
||||
struct ifnet *ifp = NULL;
|
||||
u_int pipe_id;
|
||||
int role = nmr->nr_flags & NR_REG_MASK;
|
||||
int error;
|
||||
@ -536,7 +547,7 @@ netmap_get_pipe_na(struct nmreq *nmr, struct netmap_adapter **na, int create)
|
||||
memcpy(&pnmr.nr_name, nmr->nr_name, IFNAMSIZ);
|
||||
/* pass to parent the requested number of pipes */
|
||||
pnmr.nr_arg1 = nmr->nr_arg1;
|
||||
error = netmap_get_na(&pnmr, &pna, create);
|
||||
error = netmap_get_na(&pnmr, &pna, &ifp, create);
|
||||
if (error) {
|
||||
ND("parent lookup failed: %d", error);
|
||||
return error;
|
||||
@ -652,16 +663,15 @@ netmap_get_pipe_na(struct nmreq *nmr, struct netmap_adapter **na, int create)
|
||||
*na = &req->up;
|
||||
netmap_adapter_get(*na);
|
||||
|
||||
/* write the configuration back */
|
||||
nmr->nr_tx_rings = req->up.num_tx_rings;
|
||||
nmr->nr_rx_rings = req->up.num_rx_rings;
|
||||
nmr->nr_tx_slots = req->up.num_tx_desc;
|
||||
nmr->nr_rx_slots = req->up.num_rx_desc;
|
||||
|
||||
/* keep the reference to the parent.
|
||||
* It will be released by the req destructor
|
||||
*/
|
||||
|
||||
/* drop the ifp reference, if any */
|
||||
if (ifp) {
|
||||
if_rele(ifp);
|
||||
}
|
||||
|
||||
return 0;
|
||||
|
||||
free_sna:
|
||||
@ -671,7 +681,7 @@ netmap_get_pipe_na(struct nmreq *nmr, struct netmap_adapter **na, int create)
|
||||
free_mna:
|
||||
free(mna, M_DEVBUF);
|
||||
put_out:
|
||||
netmap_adapter_put(pna);
|
||||
netmap_unget_na(pna, ifp);
|
||||
return error;
|
||||
}
|
||||
|
||||
|
File diff suppressed because it is too large
Load Diff
@ -3,11 +3,14 @@
|
||||
# Compile netmap as a module, useful if you want a netmap bridge
|
||||
# or loadable drivers.
|
||||
|
||||
.include <bsd.own.mk> # FreeBSD 10 and earlier
|
||||
# .include "${SYSDIR}/conf/kern.opts.mk"
|
||||
|
||||
.PATH: ${.CURDIR}/../../dev/netmap
|
||||
.PATH.h: ${.CURDIR}/../../net
|
||||
CFLAGS += -I${.CURDIR}/../../
|
||||
CFLAGS += -I${.CURDIR}/../../ -D INET
|
||||
KMOD = netmap
|
||||
SRCS = device_if.h bus_if.h opt_netmap.h
|
||||
SRCS = device_if.h bus_if.h pci_if.h opt_netmap.h
|
||||
SRCS += netmap.c netmap.h netmap_kern.h
|
||||
SRCS += netmap_mem2.c netmap_mem2.h
|
||||
SRCS += netmap_generic.c
|
||||
@ -17,5 +20,8 @@ SRCS += netmap_freebsd.c
|
||||
SRCS += netmap_offloadings.c
|
||||
SRCS += netmap_pipe.c
|
||||
SRCS += netmap_monitor.c
|
||||
SRCS += netmap_pt.c
|
||||
SRCS += if_ptnet.c
|
||||
SRCS += opt_inet.h opt_inet6.h
|
||||
|
||||
.include <bsd.kmod.mk>
|
||||
|
109
sys/net/netmap.h
109
sys/net/netmap.h
@ -137,6 +137,26 @@
|
||||
* netmap:foo-k the k-th NIC ring pair
|
||||
* netmap:foo{k PIPE ring pair k, master side
|
||||
* netmap:foo}k PIPE ring pair k, slave side
|
||||
*
|
||||
* Some notes about host rings:
|
||||
*
|
||||
* + The RX host ring is used to store those packets that the host network
|
||||
* stack is trying to transmit through a NIC queue, but only if that queue
|
||||
* is currently in netmap mode. Netmap will not intercept host stack mbufs
|
||||
* designated to NIC queues that are not in netmap mode. As a consequence,
|
||||
* registering a netmap port with netmap:foo^ is not enough to intercept
|
||||
* mbufs in the RX host ring; the netmap port should be registered with
|
||||
* netmap:foo*, or another registration should be done to open at least a
|
||||
* NIC TX queue in netmap mode.
|
||||
*
|
||||
* + Netmap is not currently able to deal with intercepted trasmit mbufs which
|
||||
* require offloadings like TSO, UFO, checksumming offloadings, etc. It is
|
||||
* responsibility of the user to disable those offloadings (e.g. using
|
||||
* ifconfig on FreeBSD or ethtool -K on Linux) for an interface that is being
|
||||
* used in netmap mode. If the offloadings are not disabled, GSO and/or
|
||||
* unchecksummed packets may be dropped immediately or end up in the host RX
|
||||
* ring, and will be dropped as soon as the packet reaches another netmap
|
||||
* adapter.
|
||||
*/
|
||||
|
||||
/*
|
||||
@ -277,7 +297,11 @@ struct netmap_ring {
|
||||
struct timeval ts; /* (k) time of last *sync() */
|
||||
|
||||
/* opaque room for a mutex or similar object */
|
||||
uint8_t sem[128] __attribute__((__aligned__(NM_CACHE_ALIGN)));
|
||||
#if !defined(_WIN32) || defined(__CYGWIN__)
|
||||
uint8_t __attribute__((__aligned__(NM_CACHE_ALIGN))) sem[128];
|
||||
#else
|
||||
uint8_t __declspec(align(NM_CACHE_ALIGN)) sem[128];
|
||||
#endif
|
||||
|
||||
/* the slots follow. This struct has variable size */
|
||||
struct netmap_slot slot[0]; /* array of slots. */
|
||||
@ -496,6 +520,11 @@ struct nmreq {
|
||||
#define NETMAP_BDG_OFFSET NETMAP_BDG_VNET_HDR /* deprecated alias */
|
||||
#define NETMAP_BDG_NEWIF 6 /* create a virtual port */
|
||||
#define NETMAP_BDG_DELIF 7 /* destroy a virtual port */
|
||||
#define NETMAP_PT_HOST_CREATE 8 /* create ptnetmap kthreads */
|
||||
#define NETMAP_PT_HOST_DELETE 9 /* delete ptnetmap kthreads */
|
||||
#define NETMAP_BDG_POLLING_ON 10 /* delete polling kthread */
|
||||
#define NETMAP_BDG_POLLING_OFF 11 /* delete polling kthread */
|
||||
#define NETMAP_VNET_HDR_GET 12 /* get the port virtio-net-hdr length */
|
||||
uint16_t nr_arg1; /* reserve extra rings in NIOCREGIF */
|
||||
#define NETMAP_BDG_HOST 1 /* attach the host stack on ATTACH */
|
||||
|
||||
@ -521,7 +550,61 @@ enum { NR_REG_DEFAULT = 0, /* backward compat, should not be used. */
|
||||
#define NR_ZCOPY_MON 0x400
|
||||
/* request exclusive access to the selected rings */
|
||||
#define NR_EXCLUSIVE 0x800
|
||||
/* request ptnetmap host support */
|
||||
#define NR_PASSTHROUGH_HOST NR_PTNETMAP_HOST /* deprecated */
|
||||
#define NR_PTNETMAP_HOST 0x1000
|
||||
#define NR_RX_RINGS_ONLY 0x2000
|
||||
#define NR_TX_RINGS_ONLY 0x4000
|
||||
/* Applications set this flag if they are able to deal with virtio-net headers,
|
||||
* that is send/receive frames that start with a virtio-net header.
|
||||
* If not set, NIOCREGIF will fail with netmap ports that require applications
|
||||
* to use those headers. If the flag is set, the application can use the
|
||||
* NETMAP_VNET_HDR_GET command to figure out the header length. */
|
||||
#define NR_ACCEPT_VNET_HDR 0x8000
|
||||
|
||||
#define NM_BDG_NAME "vale" /* prefix for bridge port name */
|
||||
|
||||
/*
|
||||
* Windows does not have _IOWR(). _IO(), _IOW() and _IOR() are defined
|
||||
* in ws2def.h but not sure if they are in the form we need.
|
||||
* XXX so we redefine them
|
||||
* in a convenient way to use for DeviceIoControl signatures
|
||||
*/
|
||||
#ifdef _WIN32
|
||||
#undef _IO // ws2def.h
|
||||
#define _WIN_NM_IOCTL_TYPE 40000
|
||||
#define _IO(_c, _n) CTL_CODE(_WIN_NM_IOCTL_TYPE, ((_n) + 0x800) , \
|
||||
METHOD_BUFFERED, FILE_ANY_ACCESS )
|
||||
#define _IO_direct(_c, _n) CTL_CODE(_WIN_NM_IOCTL_TYPE, ((_n) + 0x800) , \
|
||||
METHOD_OUT_DIRECT, FILE_ANY_ACCESS )
|
||||
|
||||
#define _IOWR(_c, _n, _s) _IO(_c, _n)
|
||||
|
||||
/* We havesome internal sysctl in addition to the externally visible ones */
|
||||
#define NETMAP_MMAP _IO_direct('i', 160) // note METHOD_OUT_DIRECT
|
||||
#define NETMAP_POLL _IO('i', 162)
|
||||
|
||||
/* and also two setsockopt for sysctl emulation */
|
||||
#define NETMAP_SETSOCKOPT _IO('i', 140)
|
||||
#define NETMAP_GETSOCKOPT _IO('i', 141)
|
||||
|
||||
|
||||
//These linknames are for the Netmap Core Driver
|
||||
#define NETMAP_NT_DEVICE_NAME L"\\Device\\NETMAP"
|
||||
#define NETMAP_DOS_DEVICE_NAME L"\\DosDevices\\netmap"
|
||||
|
||||
//Definition of a structure used to pass a virtual address within an IOCTL
|
||||
typedef struct _MEMORY_ENTRY {
|
||||
PVOID pUsermodeVirtualAddress;
|
||||
} MEMORY_ENTRY, *PMEMORY_ENTRY;
|
||||
|
||||
typedef struct _POLL_REQUEST_DATA {
|
||||
int events;
|
||||
int timeout;
|
||||
int revents;
|
||||
} POLL_REQUEST_DATA;
|
||||
|
||||
#endif /* _WIN32 */
|
||||
|
||||
/*
|
||||
* FreeBSD uses the size value embedded in the _IOWR to determine
|
||||
@ -561,4 +644,28 @@ struct nm_ifreq {
|
||||
char data[NM_IFRDATA_LEN];
|
||||
};
|
||||
|
||||
/*
|
||||
* netmap kernel thread configuration
|
||||
*/
|
||||
/* bhyve/vmm.ko MSIX parameters for IOCTL */
|
||||
struct ptn_vmm_ioctl_msix {
|
||||
uint64_t msg;
|
||||
uint64_t addr;
|
||||
};
|
||||
|
||||
/* IOCTL parameters */
|
||||
struct nm_kth_ioctl {
|
||||
u_long com;
|
||||
/* TODO: use union */
|
||||
union {
|
||||
struct ptn_vmm_ioctl_msix msix;
|
||||
} data;
|
||||
};
|
||||
|
||||
/* Configuration of a ptnetmap ring */
|
||||
struct ptnet_ring_cfg {
|
||||
uint64_t ioeventfd; /* eventfd in linux, tsleep() parameter in FreeBSD */
|
||||
uint64_t irqfd; /* eventfd in linux, ioctl fd in FreeBSD */
|
||||
struct nm_kth_ioctl ioctl; /* ioctl parameter to send irq (only used in bhyve/FreeBSD) */
|
||||
};
|
||||
#endif /* _NET_NETMAP_H_ */
|
||||
|
@ -1,5 +1,6 @@
|
||||
/*
|
||||
* Copyright (C) 2011-2014 Universita` di Pisa. All rights reserved.
|
||||
* Copyright (C) 2011-2016 Universita` di Pisa
|
||||
* All rights reserved.
|
||||
*
|
||||
* Redistribution and use in source and binary forms, with or without
|
||||
* modification, are permitted provided that the following conditions
|
||||
@ -65,9 +66,31 @@
|
||||
#ifndef _NET_NETMAP_USER_H_
|
||||
#define _NET_NETMAP_USER_H_
|
||||
|
||||
#define NETMAP_DEVICE_NAME "/dev/netmap"
|
||||
|
||||
#ifdef __CYGWIN__
|
||||
/*
|
||||
* we can compile userspace apps with either cygwin or msvc,
|
||||
* and we use _WIN32 to identify windows specific code
|
||||
*/
|
||||
#ifndef _WIN32
|
||||
#define _WIN32
|
||||
#endif /* _WIN32 */
|
||||
|
||||
#endif /* __CYGWIN__ */
|
||||
|
||||
#ifdef _WIN32
|
||||
#undef NETMAP_DEVICE_NAME
|
||||
#define NETMAP_DEVICE_NAME "/proc/sys/DosDevices/Global/netmap"
|
||||
#include <windows.h>
|
||||
#include <WinDef.h>
|
||||
#include <sys/cygwin.h>
|
||||
#endif /* _WIN32 */
|
||||
|
||||
#include <stdint.h>
|
||||
#include <sys/socket.h> /* apple needs sockaddr */
|
||||
#include <net/if.h> /* IFNAMSIZ */
|
||||
#include <ctype.h>
|
||||
|
||||
#ifndef likely
|
||||
#define likely(x) __builtin_expect(!!(x), 1)
|
||||
@ -172,17 +195,23 @@ nm_ring_space(struct netmap_ring *ring)
|
||||
} while (0)
|
||||
#endif
|
||||
|
||||
struct nm_pkthdr { /* same as pcap_pkthdr */
|
||||
struct nm_pkthdr { /* first part is the same as pcap_pkthdr */
|
||||
struct timeval ts;
|
||||
uint32_t caplen;
|
||||
uint32_t len;
|
||||
|
||||
uint64_t flags; /* NM_MORE_PKTS etc */
|
||||
#define NM_MORE_PKTS 1
|
||||
struct nm_desc *d;
|
||||
struct netmap_slot *slot;
|
||||
uint8_t *buf;
|
||||
};
|
||||
|
||||
struct nm_stat { /* same as pcap_stat */
|
||||
u_int ps_recv;
|
||||
u_int ps_drop;
|
||||
u_int ps_ifdrop;
|
||||
#ifdef WIN32
|
||||
#ifdef WIN32 /* XXX or _WIN32 ? */
|
||||
u_int bs_capt;
|
||||
#endif /* WIN32 */
|
||||
};
|
||||
@ -284,12 +313,14 @@ typedef void (*nm_cb_t)(u_char *, const struct nm_pkthdr *, const u_char *d);
|
||||
* -NN bind individual NIC ring pair
|
||||
* {NN bind master side of pipe NN
|
||||
* }NN bind slave side of pipe NN
|
||||
* a suffix starting with + and the following flags,
|
||||
* a suffix starting with / and the following flags,
|
||||
* in any order:
|
||||
* x exclusive access
|
||||
* z zero copy monitor
|
||||
* t monitor tx side
|
||||
* r monitor rx side
|
||||
* R bind only RX ring(s)
|
||||
* T bind only TX ring(s)
|
||||
*
|
||||
* req provides the initial values of nmreq before parsing ifname.
|
||||
* Remember that the ifname parsing will override the ring
|
||||
@ -328,6 +359,13 @@ enum {
|
||||
|
||||
static int nm_close(struct nm_desc *);
|
||||
|
||||
/*
|
||||
* nm_mmap() do mmap or inherit from parent if the nr_arg2
|
||||
* (memory block) matches.
|
||||
*/
|
||||
|
||||
static int nm_mmap(struct nm_desc *, const struct nm_desc *);
|
||||
|
||||
/*
|
||||
* nm_inject() is the same as pcap_inject()
|
||||
* nm_dispatch() is the same as pcap_dispatch()
|
||||
@ -338,13 +376,247 @@ static int nm_inject(struct nm_desc *, const void *, size_t);
|
||||
static int nm_dispatch(struct nm_desc *, int, nm_cb_t, u_char *);
|
||||
static u_char *nm_nextpkt(struct nm_desc *, struct nm_pkthdr *);
|
||||
|
||||
#ifdef _WIN32
|
||||
|
||||
intptr_t _get_osfhandle(int); /* defined in io.h in windows */
|
||||
|
||||
/*
|
||||
* In windows we do not have yet native poll support, so we keep track
|
||||
* of file descriptors associated to netmap ports to emulate poll on
|
||||
* them and fall back on regular poll on other file descriptors.
|
||||
*/
|
||||
struct win_netmap_fd_list {
|
||||
struct win_netmap_fd_list *next;
|
||||
int win_netmap_fd;
|
||||
HANDLE win_netmap_handle;
|
||||
};
|
||||
|
||||
/*
|
||||
* list head containing all the netmap opened fd and their
|
||||
* windows HANDLE counterparts
|
||||
*/
|
||||
static struct win_netmap_fd_list *win_netmap_fd_list_head;
|
||||
|
||||
static void
|
||||
win_insert_fd_record(int fd)
|
||||
{
|
||||
struct win_netmap_fd_list *curr;
|
||||
|
||||
for (curr = win_netmap_fd_list_head; curr; curr = curr->next) {
|
||||
if (fd == curr->win_netmap_fd) {
|
||||
return;
|
||||
}
|
||||
}
|
||||
curr = calloc(1, sizeof(*curr));
|
||||
curr->next = win_netmap_fd_list_head;
|
||||
curr->win_netmap_fd = fd;
|
||||
curr->win_netmap_handle = IntToPtr(_get_osfhandle(fd));
|
||||
win_netmap_fd_list_head = curr;
|
||||
}
|
||||
|
||||
void
|
||||
win_remove_fd_record(int fd)
|
||||
{
|
||||
struct win_netmap_fd_list *curr = win_netmap_fd_list_head;
|
||||
struct win_netmap_fd_list *prev = NULL;
|
||||
for (; curr ; prev = curr, curr = curr->next) {
|
||||
if (fd != curr->win_netmap_fd)
|
||||
continue;
|
||||
/* found the entry */
|
||||
if (prev == NULL) { /* we are freeing the first entry */
|
||||
win_netmap_fd_list_head = curr->next;
|
||||
} else {
|
||||
prev->next = curr->next;
|
||||
}
|
||||
free(curr);
|
||||
break;
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
HANDLE
|
||||
win_get_netmap_handle(int fd)
|
||||
{
|
||||
struct win_netmap_fd_list *curr;
|
||||
|
||||
for (curr = win_netmap_fd_list_head; curr; curr = curr->next) {
|
||||
if (fd == curr->win_netmap_fd) {
|
||||
return curr->win_netmap_handle;
|
||||
}
|
||||
}
|
||||
return NULL;
|
||||
}
|
||||
|
||||
/*
|
||||
* we need to wrap ioctl and mmap, at least for the netmap file descriptors
|
||||
*/
|
||||
|
||||
/*
|
||||
* use this function only from netmap_user.h internal functions
|
||||
* same as ioctl, returns 0 on success and -1 on error
|
||||
*/
|
||||
static int
|
||||
win_nm_ioctl_internal(HANDLE h, int32_t ctlCode, void *arg)
|
||||
{
|
||||
DWORD bReturn = 0, szIn, szOut;
|
||||
BOOL ioctlReturnStatus;
|
||||
void *inParam = arg, *outParam = arg;
|
||||
|
||||
switch (ctlCode) {
|
||||
case NETMAP_POLL:
|
||||
szIn = sizeof(POLL_REQUEST_DATA);
|
||||
szOut = sizeof(POLL_REQUEST_DATA);
|
||||
break;
|
||||
case NETMAP_MMAP:
|
||||
szIn = 0;
|
||||
szOut = sizeof(void*);
|
||||
inParam = NULL; /* nothing on input */
|
||||
break;
|
||||
case NIOCTXSYNC:
|
||||
case NIOCRXSYNC:
|
||||
szIn = 0;
|
||||
szOut = 0;
|
||||
break;
|
||||
case NIOCREGIF:
|
||||
szIn = sizeof(struct nmreq);
|
||||
szOut = sizeof(struct nmreq);
|
||||
break;
|
||||
case NIOCCONFIG:
|
||||
D("unsupported NIOCCONFIG!");
|
||||
return -1;
|
||||
|
||||
default: /* a regular ioctl */
|
||||
D("invalid ioctl %x on netmap fd", ctlCode);
|
||||
return -1;
|
||||
}
|
||||
|
||||
ioctlReturnStatus = DeviceIoControl(h,
|
||||
ctlCode, inParam, szIn,
|
||||
outParam, szOut,
|
||||
&bReturn, NULL);
|
||||
// XXX note windows returns 0 on error or async call, 1 on success
|
||||
// we could call GetLastError() to figure out what happened
|
||||
return ioctlReturnStatus ? 0 : -1;
|
||||
}
|
||||
|
||||
/*
|
||||
* this function is what must be called from user-space programs
|
||||
* same as ioctl, returns 0 on success and -1 on error
|
||||
*/
|
||||
static int
|
||||
win_nm_ioctl(int fd, int32_t ctlCode, void *arg)
|
||||
{
|
||||
HANDLE h = win_get_netmap_handle(fd);
|
||||
|
||||
if (h == NULL) {
|
||||
return ioctl(fd, ctlCode, arg);
|
||||
} else {
|
||||
return win_nm_ioctl_internal(h, ctlCode, arg);
|
||||
}
|
||||
}
|
||||
|
||||
#define ioctl win_nm_ioctl /* from now on, within this file ... */
|
||||
|
||||
/*
|
||||
* We cannot use the native mmap on windows
|
||||
* The only parameter used is "fd", the other ones are just declared to
|
||||
* make this signature comparable to the FreeBSD/Linux one
|
||||
*/
|
||||
static void *
|
||||
win32_mmap_emulated(void *addr, size_t length, int prot, int flags, int fd, int32_t offset)
|
||||
{
|
||||
HANDLE h = win_get_netmap_handle(fd);
|
||||
|
||||
if (h == NULL) {
|
||||
return mmap(addr, length, prot, flags, fd, offset);
|
||||
} else {
|
||||
MEMORY_ENTRY ret;
|
||||
|
||||
return win_nm_ioctl_internal(h, NETMAP_MMAP, &ret) ?
|
||||
NULL : ret.pUsermodeVirtualAddress;
|
||||
}
|
||||
}
|
||||
|
||||
#define mmap win32_mmap_emulated
|
||||
|
||||
#include <sys/poll.h> /* XXX needed to use the structure pollfd */
|
||||
|
||||
static int
|
||||
win_nm_poll(struct pollfd *fds, int nfds, int timeout)
|
||||
{
|
||||
HANDLE h;
|
||||
|
||||
if (nfds != 1 || fds == NULL || (h = win_get_netmap_handle(fds->fd)) == NULL) {;
|
||||
return poll(fds, nfds, timeout);
|
||||
} else {
|
||||
POLL_REQUEST_DATA prd;
|
||||
|
||||
prd.timeout = timeout;
|
||||
prd.events = fds->events;
|
||||
|
||||
win_nm_ioctl_internal(h, NETMAP_POLL, &prd);
|
||||
if ((prd.revents == POLLERR) || (prd.revents == STATUS_TIMEOUT)) {
|
||||
return -1;
|
||||
}
|
||||
return 1;
|
||||
}
|
||||
}
|
||||
|
||||
#define poll win_nm_poll
|
||||
|
||||
static int
|
||||
win_nm_open(char* pathname, int flags)
|
||||
{
|
||||
|
||||
if (strcmp(pathname, NETMAP_DEVICE_NAME) == 0) {
|
||||
int fd = open(NETMAP_DEVICE_NAME, O_RDWR);
|
||||
if (fd < 0) {
|
||||
return -1;
|
||||
}
|
||||
|
||||
win_insert_fd_record(fd);
|
||||
return fd;
|
||||
} else {
|
||||
return open(pathname, flags);
|
||||
}
|
||||
}
|
||||
|
||||
#define open win_nm_open
|
||||
|
||||
static int
|
||||
win_nm_close(int fd)
|
||||
{
|
||||
if (fd != -1) {
|
||||
close(fd);
|
||||
if (win_get_netmap_handle(fd) != NULL) {
|
||||
win_remove_fd_record(fd);
|
||||
}
|
||||
}
|
||||
return 0;
|
||||
}
|
||||
|
||||
#define close win_nm_close
|
||||
|
||||
#endif /* _WIN32 */
|
||||
|
||||
static int
|
||||
nm_is_identifier(const char *s, const char *e)
|
||||
{
|
||||
for (; s != e; s++) {
|
||||
if (!isalnum(*s) && *s != '_') {
|
||||
return 0;
|
||||
}
|
||||
}
|
||||
|
||||
return 1;
|
||||
}
|
||||
|
||||
/*
|
||||
* Try to open, return descriptor if successful, NULL otherwise.
|
||||
* An invalid netmap name will return errno = 0;
|
||||
* You can pass a pointer to a pre-filled nm_desc to add special
|
||||
* parameters. Flags is used as follows
|
||||
* NM_OPEN_NO_MMAP use the memory from arg, only
|
||||
* NM_OPEN_NO_MMAP use the memory from arg, only XXX avoid mmap
|
||||
* if the nr_arg2 (memory block) matches.
|
||||
* NM_OPEN_ARG1 use req.nr_arg1 from arg
|
||||
* NM_OPEN_ARG2 use req.nr_arg2 from arg
|
||||
@ -359,20 +631,48 @@ nm_open(const char *ifname, const struct nmreq *req,
|
||||
u_int namelen;
|
||||
uint32_t nr_ringid = 0, nr_flags, nr_reg;
|
||||
const char *port = NULL;
|
||||
const char *vpname = NULL;
|
||||
#define MAXERRMSG 80
|
||||
char errmsg[MAXERRMSG] = "";
|
||||
enum { P_START, P_RNGSFXOK, P_GETNUM, P_FLAGS, P_FLAGSOK } p_state;
|
||||
int is_vale;
|
||||
long num;
|
||||
|
||||
if (strncmp(ifname, "netmap:", 7) && strncmp(ifname, "vale", 4)) {
|
||||
if (strncmp(ifname, "netmap:", 7) &&
|
||||
strncmp(ifname, NM_BDG_NAME, strlen(NM_BDG_NAME))) {
|
||||
errno = 0; /* name not recognised, not an error */
|
||||
return NULL;
|
||||
}
|
||||
if (ifname[0] == 'n')
|
||||
|
||||
is_vale = (ifname[0] == 'v');
|
||||
if (is_vale) {
|
||||
port = index(ifname, ':');
|
||||
if (port == NULL) {
|
||||
snprintf(errmsg, MAXERRMSG,
|
||||
"missing ':' in vale name");
|
||||
goto fail;
|
||||
}
|
||||
|
||||
if (!nm_is_identifier(ifname + 4, port)) {
|
||||
snprintf(errmsg, MAXERRMSG, "invalid bridge name");
|
||||
goto fail;
|
||||
}
|
||||
|
||||
vpname = ++port;
|
||||
} else {
|
||||
ifname += 7;
|
||||
port = ifname;
|
||||
}
|
||||
|
||||
/* scan for a separator */
|
||||
for (port = ifname; *port && !index("-*^{}/", *port); port++)
|
||||
for (; *port && !index("-*^{}/", *port); port++)
|
||||
;
|
||||
|
||||
if (is_vale && !nm_is_identifier(vpname, port)) {
|
||||
snprintf(errmsg, MAXERRMSG, "invalid bridge port name");
|
||||
goto fail;
|
||||
}
|
||||
|
||||
namelen = port - ifname;
|
||||
if (namelen >= sizeof(d->req.nr_name)) {
|
||||
snprintf(errmsg, MAXERRMSG, "name too long");
|
||||
@ -449,6 +749,12 @@ nm_open(const char *ifname, const struct nmreq *req,
|
||||
case 'r':
|
||||
nr_flags |= NR_MONITOR_RX;
|
||||
break;
|
||||
case 'R':
|
||||
nr_flags |= NR_RX_RINGS_ONLY;
|
||||
break;
|
||||
case 'T':
|
||||
nr_flags |= NR_TX_RINGS_ONLY;
|
||||
break;
|
||||
default:
|
||||
snprintf(errmsg, MAXERRMSG, "unrecognized flag: '%c'", *port);
|
||||
goto fail;
|
||||
@ -462,6 +768,11 @@ nm_open(const char *ifname, const struct nmreq *req,
|
||||
snprintf(errmsg, MAXERRMSG, "unexpected end of port name");
|
||||
goto fail;
|
||||
}
|
||||
if ((nr_flags & NR_ZCOPY_MON) &&
|
||||
!(nr_flags & (NR_MONITOR_TX|NR_MONITOR_RX))) {
|
||||
snprintf(errmsg, MAXERRMSG, "'z' used but neither 'r', nor 't' found");
|
||||
goto fail;
|
||||
}
|
||||
ND("flags: %s %s %s %s",
|
||||
(nr_flags & NR_EXCLUSIVE) ? "EXCLUSIVE" : "",
|
||||
(nr_flags & NR_ZCOPY_MON) ? "ZCOPY_MON" : "",
|
||||
@ -474,7 +785,7 @@ nm_open(const char *ifname, const struct nmreq *req,
|
||||
return NULL;
|
||||
}
|
||||
d->self = d; /* set this early so nm_close() works */
|
||||
d->fd = open("/dev/netmap", O_RDWR);
|
||||
d->fd = open(NETMAP_DEVICE_NAME, O_RDWR);
|
||||
if (d->fd < 0) {
|
||||
snprintf(errmsg, MAXERRMSG, "cannot open /dev/netmap: %s", strerror(errno));
|
||||
goto fail;
|
||||
@ -487,7 +798,7 @@ nm_open(const char *ifname, const struct nmreq *req,
|
||||
|
||||
/* these fields are overridden by ifname and flags processing */
|
||||
d->req.nr_ringid |= nr_ringid;
|
||||
d->req.nr_flags = nr_flags;
|
||||
d->req.nr_flags |= nr_flags;
|
||||
memcpy(d->req.nr_name, ifname, namelen);
|
||||
d->req.nr_name[namelen] = '\0';
|
||||
/* optionally import info from parent */
|
||||
@ -529,31 +840,10 @@ nm_open(const char *ifname, const struct nmreq *req,
|
||||
goto fail;
|
||||
}
|
||||
|
||||
if (IS_NETMAP_DESC(parent) && parent->mem &&
|
||||
parent->req.nr_arg2 == d->req.nr_arg2) {
|
||||
/* do not mmap, inherit from parent */
|
||||
d->memsize = parent->memsize;
|
||||
d->mem = parent->mem;
|
||||
} else {
|
||||
/* XXX TODO: check if memsize is too large (or there is overflow) */
|
||||
d->memsize = d->req.nr_memsize;
|
||||
d->mem = mmap(0, d->memsize, PROT_WRITE | PROT_READ, MAP_SHARED,
|
||||
d->fd, 0);
|
||||
if (d->mem == MAP_FAILED) {
|
||||
snprintf(errmsg, MAXERRMSG, "mmap failed: %s", strerror(errno));
|
||||
goto fail;
|
||||
}
|
||||
d->done_mmap = 1;
|
||||
}
|
||||
{
|
||||
struct netmap_if *nifp = NETMAP_IF(d->mem, d->req.nr_offset);
|
||||
struct netmap_ring *r = NETMAP_RXRING(nifp, );
|
||||
|
||||
*(struct netmap_if **)(uintptr_t)&(d->nifp) = nifp;
|
||||
*(struct netmap_ring **)(uintptr_t)&d->some_ring = r;
|
||||
*(void **)(uintptr_t)&d->buf_start = NETMAP_BUF(r, 0);
|
||||
*(void **)(uintptr_t)&d->buf_end =
|
||||
(char *)d->mem + d->memsize;
|
||||
/* if parent is defined, do nm_mmap() even if NM_OPEN_NO_MMAP is set */
|
||||
if ((!(new_flags & NM_OPEN_NO_MMAP) || parent) && nm_mmap(d, parent)) {
|
||||
snprintf(errmsg, MAXERRMSG, "mmap failed: %s", strerror(errno));
|
||||
goto fail;
|
||||
}
|
||||
|
||||
nr_reg = d->req.nr_flags & NR_REG_MASK;
|
||||
@ -626,14 +916,54 @@ nm_close(struct nm_desc *d)
|
||||
return EINVAL;
|
||||
if (d->done_mmap && d->mem)
|
||||
munmap(d->mem, d->memsize);
|
||||
if (d->fd != -1)
|
||||
if (d->fd != -1) {
|
||||
close(d->fd);
|
||||
}
|
||||
|
||||
bzero(d, sizeof(*d));
|
||||
free(d);
|
||||
return 0;
|
||||
}
|
||||
|
||||
|
||||
static int
|
||||
nm_mmap(struct nm_desc *d, const struct nm_desc *parent)
|
||||
{
|
||||
//XXX TODO: check if mmap is already done
|
||||
|
||||
if (IS_NETMAP_DESC(parent) && parent->mem &&
|
||||
parent->req.nr_arg2 == d->req.nr_arg2) {
|
||||
/* do not mmap, inherit from parent */
|
||||
D("do not mmap, inherit from parent");
|
||||
d->memsize = parent->memsize;
|
||||
d->mem = parent->mem;
|
||||
} else {
|
||||
/* XXX TODO: check if memsize is too large (or there is overflow) */
|
||||
d->memsize = d->req.nr_memsize;
|
||||
d->mem = mmap(0, d->memsize, PROT_WRITE | PROT_READ, MAP_SHARED,
|
||||
d->fd, 0);
|
||||
if (d->mem == MAP_FAILED) {
|
||||
goto fail;
|
||||
}
|
||||
d->done_mmap = 1;
|
||||
}
|
||||
{
|
||||
struct netmap_if *nifp = NETMAP_IF(d->mem, d->req.nr_offset);
|
||||
struct netmap_ring *r = NETMAP_RXRING(nifp, );
|
||||
|
||||
*(struct netmap_if **)(uintptr_t)&(d->nifp) = nifp;
|
||||
*(struct netmap_ring **)(uintptr_t)&d->some_ring = r;
|
||||
*(void **)(uintptr_t)&d->buf_start = NETMAP_BUF(r, 0);
|
||||
*(void **)(uintptr_t)&d->buf_end =
|
||||
(char *)d->mem + d->memsize;
|
||||
}
|
||||
|
||||
return 0;
|
||||
|
||||
fail:
|
||||
return EINVAL;
|
||||
}
|
||||
|
||||
/*
|
||||
* Same prototype as pcap_inject(), only need to cast.
|
||||
*/
|
||||
@ -674,6 +1004,9 @@ nm_dispatch(struct nm_desc *d, int cnt, nm_cb_t cb, u_char *arg)
|
||||
{
|
||||
int n = d->last_rx_ring - d->first_rx_ring + 1;
|
||||
int c, got = 0, ri = d->cur_rx_ring;
|
||||
d->hdr.buf = NULL;
|
||||
d->hdr.flags = NM_MORE_PKTS;
|
||||
d->hdr.d = d;
|
||||
|
||||
if (cnt == 0)
|
||||
cnt = -1;
|
||||
@ -690,17 +1023,24 @@ nm_dispatch(struct nm_desc *d, int cnt, nm_cb_t cb, u_char *arg)
|
||||
ri = d->first_rx_ring;
|
||||
ring = NETMAP_RXRING(d->nifp, ri);
|
||||
for ( ; !nm_ring_empty(ring) && cnt != got; got++) {
|
||||
u_int i = ring->cur;
|
||||
u_int idx = ring->slot[i].buf_idx;
|
||||
u_char *buf = (u_char *)NETMAP_BUF(ring, idx);
|
||||
|
||||
u_int idx, i;
|
||||
if (d->hdr.buf) { /* from previous round */
|
||||
cb(arg, &d->hdr, d->hdr.buf);
|
||||
}
|
||||
i = ring->cur;
|
||||
idx = ring->slot[i].buf_idx;
|
||||
d->hdr.slot = &ring->slot[i];
|
||||
d->hdr.buf = (u_char *)NETMAP_BUF(ring, idx);
|
||||
// __builtin_prefetch(buf);
|
||||
d->hdr.len = d->hdr.caplen = ring->slot[i].len;
|
||||
d->hdr.ts = ring->ts;
|
||||
cb(arg, &d->hdr, buf);
|
||||
ring->head = ring->cur = nm_ring_next(ring, i);
|
||||
}
|
||||
}
|
||||
if (d->hdr.buf) { /* from previous round */
|
||||
d->hdr.flags = 0;
|
||||
cb(arg, &d->hdr, d->hdr.buf);
|
||||
}
|
||||
d->cur_rx_ring = ri;
|
||||
return got;
|
||||
}
|
||||
|
@ -3,11 +3,12 @@
|
||||
#
|
||||
# For multiple programs using a single source file each,
|
||||
# we can just define 'progs' and create custom targets.
|
||||
PROGS = pkt-gen bridge vale-ctl
|
||||
PROGS = pkt-gen nmreplay bridge vale-ctl
|
||||
|
||||
CLEANFILES = $(PROGS) *.o
|
||||
MAN=
|
||||
CFLAGS += -Werror -Wall # -nostdinc -I/usr/include -I../../../sys
|
||||
CFLAGS += -Werror -Wall
|
||||
CFLAGS += -nostdinc -I ../../../sys -I/usr/include
|
||||
CFLAGS += -Wextra
|
||||
|
||||
LDFLAGS += -lpthread
|
||||
@ -16,6 +17,7 @@ CFLAGS += -DNO_PCAP
|
||||
.else
|
||||
LDFLAGS += -lpcap
|
||||
.endif
|
||||
LDFLAGS += -lm # used by nmreplay
|
||||
|
||||
.include <bsd.prog.mk>
|
||||
.include <bsd.lib.mk>
|
||||
@ -28,5 +30,8 @@ pkt-gen: pkt-gen.o
|
||||
bridge: bridge.o
|
||||
$(CC) $(CFLAGS) -o bridge bridge.o
|
||||
|
||||
nmreplay: nmreplay.o
|
||||
$(CC) $(CFLAGS) -o nmreplay nmreplay.o $(LDFLAGS)
|
||||
|
||||
vale-ctl: vale-ctl.o
|
||||
$(CC) $(CFLAGS) -o vale-ctl vale-ctl.o
|
||||
|
@ -143,7 +143,7 @@ static void
|
||||
usage(void)
|
||||
{
|
||||
fprintf(stderr,
|
||||
"usage: bridge [-v] [-i ifa] [-i ifb] [-b burst] [-w wait_time] [iface]\n");
|
||||
"usage: bridge [-v] [-i ifa] [-i ifb] [-b burst] [-w wait_time] [ifa [ifb [burst]]]\n");
|
||||
exit(1);
|
||||
}
|
||||
|
||||
@ -201,12 +201,12 @@ main(int argc, char **argv)
|
||||
argc -= optind;
|
||||
argv += optind;
|
||||
|
||||
if (argc > 0)
|
||||
ifa = argv[0];
|
||||
if (argc > 1)
|
||||
ifa = argv[1];
|
||||
ifb = argv[1];
|
||||
if (argc > 2)
|
||||
ifb = argv[2];
|
||||
if (argc > 3)
|
||||
burst = atoi(argv[3]);
|
||||
burst = atoi(argv[2]);
|
||||
if (!ifb)
|
||||
ifb = ifa;
|
||||
if (!ifa) {
|
||||
@ -233,7 +233,7 @@ main(int argc, char **argv)
|
||||
D("cannot open %s", ifa);
|
||||
return (1);
|
||||
}
|
||||
// XXX use a single mmap ?
|
||||
/* try to reuse the mmap() of the first interface, if possible */
|
||||
pb = nm_open(ifb, NULL, NM_OPEN_NO_MMAP, pa);
|
||||
if (pb == NULL) {
|
||||
D("cannot open %s", ifb);
|
||||
@ -262,6 +262,23 @@ main(int argc, char **argv)
|
||||
pollfd[0].revents = pollfd[1].revents = 0;
|
||||
n0 = pkt_queued(pa, 0);
|
||||
n1 = pkt_queued(pb, 0);
|
||||
#if defined(_WIN32) || defined(BUSYWAIT)
|
||||
if (n0){
|
||||
ioctl(pollfd[1].fd, NIOCTXSYNC, NULL);
|
||||
pollfd[1].revents = POLLOUT;
|
||||
}
|
||||
else {
|
||||
ioctl(pollfd[0].fd, NIOCRXSYNC, NULL);
|
||||
}
|
||||
if (n1){
|
||||
ioctl(pollfd[0].fd, NIOCTXSYNC, NULL);
|
||||
pollfd[0].revents = POLLOUT;
|
||||
}
|
||||
else {
|
||||
ioctl(pollfd[1].fd, NIOCRXSYNC, NULL);
|
||||
}
|
||||
ret = 1;
|
||||
#else
|
||||
if (n0)
|
||||
pollfd[1].events |= POLLOUT;
|
||||
else
|
||||
@ -271,6 +288,7 @@ main(int argc, char **argv)
|
||||
else
|
||||
pollfd[1].events |= POLLIN;
|
||||
ret = poll(pollfd, 2, 2500);
|
||||
#endif //defined(_WIN32) || defined(BUSYWAIT)
|
||||
if (ret <= 0 || verbose)
|
||||
D("poll %s [0] ev %x %x rx %d@%d tx %d,"
|
||||
" [1] ev %x %x rx %d@%d tx %d",
|
||||
|
108
tools/tools/netmap/ctrs.h
Normal file
108
tools/tools/netmap/ctrs.h
Normal file
@ -0,0 +1,108 @@
|
||||
#ifndef CTRS_H_
|
||||
#define CTRS_H_
|
||||
|
||||
/* $FreeBSD$ */
|
||||
|
||||
#include <sys/time.h>
|
||||
|
||||
/* counters to accumulate statistics */
|
||||
struct my_ctrs {
|
||||
uint64_t pkts, bytes, events, drop;
|
||||
uint64_t min_space;
|
||||
struct timeval t;
|
||||
};
|
||||
|
||||
/* very crude code to print a number in normalized form.
|
||||
* Caller has to make sure that the buffer is large enough.
|
||||
*/
|
||||
static const char *
|
||||
norm2(char *buf, double val, char *fmt)
|
||||
{
|
||||
char *units[] = { "", "K", "M", "G", "T" };
|
||||
u_int i;
|
||||
|
||||
for (i = 0; val >=1000 && i < sizeof(units)/sizeof(char *) - 1; i++)
|
||||
val /= 1000;
|
||||
sprintf(buf, fmt, val, units[i]);
|
||||
return buf;
|
||||
}
|
||||
|
||||
static __inline const char *
|
||||
norm(char *buf, double val)
|
||||
{
|
||||
return norm2(buf, val, "%.3f %s");
|
||||
}
|
||||
|
||||
static __inline int
|
||||
timespec_ge(const struct timespec *a, const struct timespec *b)
|
||||
{
|
||||
|
||||
if (a->tv_sec > b->tv_sec)
|
||||
return (1);
|
||||
if (a->tv_sec < b->tv_sec)
|
||||
return (0);
|
||||
if (a->tv_nsec >= b->tv_nsec)
|
||||
return (1);
|
||||
return (0);
|
||||
}
|
||||
|
||||
static __inline struct timespec
|
||||
timeval2spec(const struct timeval *a)
|
||||
{
|
||||
struct timespec ts = {
|
||||
.tv_sec = a->tv_sec,
|
||||
.tv_nsec = a->tv_usec * 1000
|
||||
};
|
||||
return ts;
|
||||
}
|
||||
|
||||
static __inline struct timeval
|
||||
timespec2val(const struct timespec *a)
|
||||
{
|
||||
struct timeval tv = {
|
||||
.tv_sec = a->tv_sec,
|
||||
.tv_usec = a->tv_nsec / 1000
|
||||
};
|
||||
return tv;
|
||||
}
|
||||
|
||||
|
||||
static __inline struct timespec
|
||||
timespec_add(struct timespec a, struct timespec b)
|
||||
{
|
||||
struct timespec ret = { a.tv_sec + b.tv_sec, a.tv_nsec + b.tv_nsec };
|
||||
if (ret.tv_nsec >= 1000000000) {
|
||||
ret.tv_sec++;
|
||||
ret.tv_nsec -= 1000000000;
|
||||
}
|
||||
return ret;
|
||||
}
|
||||
|
||||
static __inline struct timespec
|
||||
timespec_sub(struct timespec a, struct timespec b)
|
||||
{
|
||||
struct timespec ret = { a.tv_sec - b.tv_sec, a.tv_nsec - b.tv_nsec };
|
||||
if (ret.tv_nsec < 0) {
|
||||
ret.tv_sec--;
|
||||
ret.tv_nsec += 1000000000;
|
||||
}
|
||||
return ret;
|
||||
}
|
||||
|
||||
static uint64_t
|
||||
wait_for_next_report(struct timeval *prev, struct timeval *cur,
|
||||
int report_interval)
|
||||
{
|
||||
struct timeval delta;
|
||||
|
||||
delta.tv_sec = report_interval/1000;
|
||||
delta.tv_usec = (report_interval%1000)*1000;
|
||||
if (select(0, NULL, NULL, NULL, &delta) < 0 && errno != EINTR) {
|
||||
perror("select");
|
||||
abort();
|
||||
}
|
||||
gettimeofday(cur, NULL);
|
||||
timersub(cur, prev, &delta);
|
||||
return delta.tv_sec* 1000000 + delta.tv_usec;
|
||||
}
|
||||
#endif /* CTRS_H_ */
|
129
tools/tools/netmap/nmreplay.8
Normal file
129
tools/tools/netmap/nmreplay.8
Normal file
@ -0,0 +1,129 @@
|
||||
.\" Copyright (c) 2016 Luigi Rizzo, Universita` di Pisa
|
||||
.\" All rights reserved.
|
||||
.\"
|
||||
.\" Redistribution and use in source and binary forms, with or without
|
||||
.\" modification, are permitted provided that the following conditions
|
||||
.\" are met:
|
||||
.\" 1. Redistributions of source code must retain the above copyright
|
||||
.\" notice, this list of conditions and the following disclaimer.
|
||||
.\" 2. Redistributions in binary form must reproduce the above copyright
|
||||
.\" notice, this list of conditions and the following disclaimer in the
|
||||
.\" documentation and/or other materials provided with the distribution.
|
||||
.\"
|
||||
.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
|
||||
.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
|
||||
.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
|
||||
.\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
|
||||
.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
|
||||
.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
|
||||
.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
|
||||
.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
|
||||
.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
|
||||
.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
|
||||
.\" SUCH DAMAGE.
|
||||
.\"
|
||||
.\" $FreeBSD$
|
||||
.\"
|
||||
.Dd February 16, 2016
|
||||
.Dt NMREPLAY 1
|
||||
.Os
|
||||
.Sh NAME
|
||||
.Nm nmreplay
|
||||
.Nd playback a pcap file through a netmap interface
|
||||
.Sh SYNOPSIS
|
||||
.Bk -words
|
||||
.Bl -tag -width "nmreplay"
|
||||
.It Nm
|
||||
.Op Fl f Ar pcap-file
|
||||
.Op Fl i Ar netmap-interface
|
||||
.Op Fl B Ar bandwidth
|
||||
.Op Fl D Ar delay
|
||||
.Op Fl L Ar loss
|
||||
.Op Fl b Ar batch size
|
||||
.Op Fl w Ar wait-link
|
||||
.Op Fl v
|
||||
.Op Fl C Ar cpu-placement
|
||||
.Sh DESCRIPTION
|
||||
.Nm
|
||||
works like
|
||||
.Nm tcpreplay
|
||||
to replay a pcap file through a netmap interface,
|
||||
with programmable rates and possibly delays, losses
|
||||
and packet alterations.
|
||||
.Nm
|
||||
is designed to run at high speed, so the transmit schedule
|
||||
is computed ahead of time, and the thread in charge of transmission
|
||||
only has to pump data through the interface.
|
||||
.Nm
|
||||
can connect to any type of netmap port.
|
||||
.Pp
|
||||
Command line options are as follows
|
||||
.Bl -tag -width Ds
|
||||
.It Fl f Ar pcap-file
|
||||
Name of the pcap file to replay.
|
||||
.It Fl i Ar interface
|
||||
Name of the netmap interface to use as output.
|
||||
.It Fl v
|
||||
Enable verbose mode
|
||||
.It Fl b Ar batch-size
|
||||
Maximum batch size to use during transmissions.
|
||||
.Nm
|
||||
normally transmits packets one at a time, but it may use
|
||||
larger batches, up to the value specified with this option,
|
||||
when running at high rates.
|
||||
.It Fl B Ar bps | Cm constant, Ns Ar bps | Cm ether, Ns Ar bps | Cm real Ns Op , Ns Ar speedup
|
||||
Bandwidth to be used for transmission.
|
||||
.Ar bps
|
||||
is a floating point number optionally follow by a character
|
||||
(k, K, m, M, g, G) that multiplies the value by 10^3, 10^6 and 10^9
|
||||
respectively.
|
||||
.Cm constant
|
||||
(can be omitted) means that the bandwidth will be computed
|
||||
with reference to the actual packet size (excluding CRC and framing).
|
||||
.Cm ether
|
||||
indicates that the ethernet framing (160 bits) and CRC (32 bits)
|
||||
will be included in the computation of the packet size.
|
||||
.Cm real
|
||||
means transmission will occur according to the timestamps
|
||||
recorded in the trace. The optional
|
||||
.Ar speedup
|
||||
multiplier (defaults to 1) indicates how much faster
|
||||
or slower than real time the trace should be replayed.
|
||||
.It Fl D Ar dt | Cm constant, Ns Ar dt | Cm uniform, Ns Ar dmin,dmax | Cm exp, Ar dmin,davg
|
||||
Adds additional delay to the packet transmission, whose distribution
|
||||
can be constant, uniform or exponential.
|
||||
.Ar dt, dmin, dmax, avt
|
||||
are times expressed as floating point numbers optionally followed
|
||||
by a character (s, m, u, n) to indicate seconds, milliseconds,
|
||||
microseconds, nanoseconds.
|
||||
The delay is added to the transmit time and adjusted so that there is
|
||||
never packet reordering.
|
||||
.It Fl L Ar x | Cm plr, Ns Ar x | Cm ber, Ns Ar x
|
||||
Simulates packet or bit errors, causing offending packets to be dropped.
|
||||
.Ar x
|
||||
is a floating point number indicating the packet or bit error rate.
|
||||
.It Fl w Ar wait-link
|
||||
indicates the number of seconds to wait before transmitting.
|
||||
It defaults to 2, and may be useful when talking to physical
|
||||
ports to let link negotiation complete before starting transmission.
|
||||
.El
|
||||
.Sh OPERATION
|
||||
.Nm
|
||||
creates an in-memory schedule with all packets to be transmitted,
|
||||
and then launches a separate thread to take care of transmissions
|
||||
while the main thread reports statistics every second.
|
||||
.Sh SEE ALSO
|
||||
.Pa http://info.iet.unipi.it/~luigi/netmap/
|
||||
.Pp
|
||||
Luigi Rizzo, Revisiting network I/O APIs: the netmap framework,
|
||||
Communications of the ACM, 55 (3), pp.45-51, March 2012
|
||||
.Pp
|
||||
Luigi Rizzo, Giuseppe Lettieri,
|
||||
VALE, a switched ethernet for virtual machines,
|
||||
ACM CoNEXT'12, December 2012, Nice
|
||||
.Sh AUTHORS
|
||||
.An -nosplit
|
||||
.Nm
|
||||
has been written by
|
||||
.An Luigi Rizzo, Andrea Beconcini, Francesco Mola and Lorenzo Biagini
|
||||
at the Universita` di Pisa, Italy.
|
1820
tools/tools/netmap/nmreplay.c
Normal file
1820
tools/tools/netmap/nmreplay.c
Normal file
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
@ -25,6 +25,10 @@
|
||||
|
||||
/* $FreeBSD$ */
|
||||
|
||||
#define NETMAP_WITH_LIBS
|
||||
#include <net/netmap_user.h>
|
||||
#include <net/netmap.h>
|
||||
|
||||
#include <errno.h>
|
||||
#include <stdio.h>
|
||||
#include <inttypes.h> /* PRI* macros */
|
||||
@ -35,17 +39,9 @@
|
||||
#include <sys/param.h>
|
||||
#include <sys/socket.h> /* apple needs sockaddr */
|
||||
#include <net/if.h> /* ifreq */
|
||||
#include <net/netmap.h>
|
||||
#include <net/netmap_user.h>
|
||||
#include <libgen.h> /* basename */
|
||||
#include <stdlib.h> /* atoi, free */
|
||||
|
||||
/* debug support */
|
||||
#define ND(format, ...) do {} while(0)
|
||||
#define D(format, ...) \
|
||||
fprintf(stderr, "%s [%d] " format "\n", \
|
||||
__FUNCTION__, __LINE__, ##__VA_ARGS__)
|
||||
|
||||
/* XXX cut and paste from pkt-gen.c because I'm not sure whether this
|
||||
* program may include nm_util.h
|
||||
*/
|
||||
@ -117,8 +113,11 @@ bdg_ctl(const char *name, int nr_cmd, int nr_arg, char *nmr_config)
|
||||
break;
|
||||
case NETMAP_BDG_ATTACH:
|
||||
case NETMAP_BDG_DETACH:
|
||||
if (nr_arg && nr_arg != NETMAP_BDG_HOST)
|
||||
nmr.nr_flags = NR_REG_ALL_NIC;
|
||||
if (nr_arg && nr_arg != NETMAP_BDG_HOST) {
|
||||
nmr.nr_flags = NR_REG_NIC_SW;
|
||||
nr_arg = 0;
|
||||
}
|
||||
nmr.nr_arg1 = nr_arg;
|
||||
error = ioctl(fd, NIOCREGIF, &nmr);
|
||||
if (error == -1) {
|
||||
@ -152,6 +151,36 @@ bdg_ctl(const char *name, int nr_cmd, int nr_arg, char *nmr_config)
|
||||
|
||||
break;
|
||||
|
||||
case NETMAP_BDG_POLLING_ON:
|
||||
case NETMAP_BDG_POLLING_OFF:
|
||||
/* We reuse nmreq fields as follows:
|
||||
* nr_tx_slots: 0 and non-zero indicate REG_ALL_NIC
|
||||
* REG_ONE_NIC, respectively.
|
||||
* nr_rx_slots: CPU core index. This also indicates the
|
||||
* first queue in the case of REG_ONE_NIC
|
||||
* nr_tx_rings: (REG_ONE_NIC only) indicates the
|
||||
* number of CPU cores or the last queue
|
||||
*/
|
||||
nmr.nr_flags |= nmr.nr_tx_slots ?
|
||||
NR_REG_ONE_NIC : NR_REG_ALL_NIC;
|
||||
nmr.nr_ringid = nmr.nr_rx_slots;
|
||||
/* number of cores/rings */
|
||||
if (nmr.nr_flags == NR_REG_ALL_NIC)
|
||||
nmr.nr_arg1 = 1;
|
||||
else
|
||||
nmr.nr_arg1 = nmr.nr_tx_rings;
|
||||
|
||||
error = ioctl(fd, NIOCREGIF, &nmr);
|
||||
if (!error)
|
||||
D("polling on %s %s", nmr.nr_name,
|
||||
nr_cmd == NETMAP_BDG_POLLING_ON ?
|
||||
"started" : "stopped");
|
||||
else
|
||||
D("polling on %s %s (err %d)", nmr.nr_name,
|
||||
nr_cmd == NETMAP_BDG_POLLING_ON ?
|
||||
"couldn't start" : "couldn't stop", error);
|
||||
break;
|
||||
|
||||
default: /* GINFO */
|
||||
nmr.nr_cmd = nmr.nr_arg1 = nmr.nr_arg2 = 0;
|
||||
error = ioctl(fd, NIOCGINFO, &nmr);
|
||||
@ -173,7 +202,7 @@ main(int argc, char *argv[])
|
||||
const char *command = basename(argv[0]);
|
||||
char *name = NULL, *nmr_config = NULL;
|
||||
|
||||
if (argc > 3) {
|
||||
if (argc > 5) {
|
||||
usage:
|
||||
fprintf(stderr,
|
||||
"Usage:\n"
|
||||
@ -186,12 +215,18 @@ main(int argc, char *argv[])
|
||||
"\t-r interface interface name to be deleted\n"
|
||||
"\t-l list all or specified bridge's interfaces (default)\n"
|
||||
"\t-C string ring/slot setting of an interface creating by -n\n"
|
||||
"\t-p interface start polling. Additional -C x,y,z configures\n"
|
||||
"\t\t x: 0 (REG_ALL_NIC) or 1 (REG_ONE_NIC),\n"
|
||||
"\t\t y: CPU core id for ALL_NIC and core/ring for ONE_NIC\n"
|
||||
"\t\t z: (ONE_NIC only) num of total cores/rings\n"
|
||||
"\t-P interface stop polling\n"
|
||||
"", command);
|
||||
return 0;
|
||||
}
|
||||
|
||||
while ((ch = getopt(argc, argv, "d:a:h:g:l:n:r:C:")) != -1) {
|
||||
name = optarg; /* default */
|
||||
while ((ch = getopt(argc, argv, "d:a:h:g:l:n:r:C:p:P:")) != -1) {
|
||||
if (ch != 'C')
|
||||
name = optarg; /* default */
|
||||
switch (ch) {
|
||||
default:
|
||||
fprintf(stderr, "bad option %c %s", ch, optarg);
|
||||
@ -223,11 +258,17 @@ main(int argc, char *argv[])
|
||||
case 'C':
|
||||
nmr_config = strdup(optarg);
|
||||
break;
|
||||
case 'p':
|
||||
nr_cmd = NETMAP_BDG_POLLING_ON;
|
||||
break;
|
||||
case 'P':
|
||||
nr_cmd = NETMAP_BDG_POLLING_OFF;
|
||||
break;
|
||||
}
|
||||
if (optind != argc) {
|
||||
// fprintf(stderr, "optind %d argc %d\n", optind, argc);
|
||||
goto usage;
|
||||
}
|
||||
}
|
||||
if (optind != argc) {
|
||||
// fprintf(stderr, "optind %d argc %d\n", optind, argc);
|
||||
goto usage;
|
||||
}
|
||||
if (argc == 1)
|
||||
nr_cmd = NETMAP_BDG_LIST;
|
||||
|
Loading…
Reference in New Issue
Block a user