Import the current version of netmap, aligned with the one on github.

This commit, long overdue, contains contributions in the last 2 years
from Stefano Garzarella, Giuseppe Lettieri, Vincenzo Maffione, including:
+ fixes on monitor ports
+ the 'ptnet' virtual device driver, and ptnetmap backend, for
  high speed virtual passthrough on VMs (bhyve fixes in an upcoming commit)
+ improved emulated netmap mode
+ more robust error handling
+ removal of stale code
+ various fixes to code and documentation (some mixup between RX and TX
  parameters, and private and public variables)

We also include an additional tool, nmreplay, which is functionally
equivalent to tcpreplay but operating on netmap ports.
This commit is contained in:
Luigi Rizzo 2016-10-16 14:13:32 +00:00
parent 63f6b1a75a
commit 37e3a6d349
27 changed files with 8011 additions and 1993 deletions

View File

@ -33,10 +33,10 @@
.Sh NAME
.Nm netmap
.Nd a framework for fast packet I/O
.Pp
.br
.Nm VALE
.Nd a fast VirtuAl Local Ethernet using the netmap API
.Pp
.br
.Nm netmap pipes
.Nd a shared memory packet transport channel
.Sh SYNOPSIS
@ -44,28 +44,49 @@
.Sh DESCRIPTION
.Nm
is a framework for extremely fast and efficient packet I/O
for both userspace and kernel clients.
for userspace and kernel clients, and for Virtual Machines.
It runs on
.Fx
and Linux, and includes
.Nm VALE ,
a very fast and modular in-kernel software switch/dataplane,
and
.Nm netmap pipes ,
a shared memory packet transport channel.
All these are accessed interchangeably with the same API.
Linux and some versions of Windows, and supports a variety of
.Nm netmap ports ,
including
.Bl -tag -width XXXX
.It Nm physical NIC ports
to access individual queues of network interfaces;
.It Nm host ports
to inject packets into the host stack;
.It Nm VALE ports
implementing a very fast and modular in-kernel software switch/dataplane;
.It Nm netmap pipes
a shared memory packet transport channel;
.It Nm netmap monitors
a mechanism similar to
.Xr bpf
to capture traffic
.El
.Pp
.Nm ,
.Nm VALE
and
.Nm netmap pipes
are at least one order of magnitude faster than
All these
.Nm netmap ports
are accessed interchangeably with the same API,
and are at least one order of magnitude faster than
standard OS mechanisms
(sockets, bpf, tun/tap interfaces, native switches, pipes),
reaching 14.88 million packets per second (Mpps)
with much less than one core on a 10 Gbit NIC,
about 20 Mpps per core for VALE ports,
and over 100 Mpps for netmap pipes.
(sockets, bpf, tun/tap interfaces, native switches, pipes).
With suitably fast hardware (NICs, PCIe buses, CPUs),
packet I/O using
.Nm
on supported NICs
reaches 14.88 million packets per second (Mpps)
with much less than one core on 10 Gbit/s NICs;
35-40 Mpps on 40 Gbit/s NICs (limited by the hardware);
about 20 Mpps per core for VALE ports;
and over 100 Mpps for
.Nm netmap pipes.
NICs without native
.Nm
support can still use the API in emulated mode,
which uses unmodified device drivers and is 3-5 times faster than
.Xr bpf
or raw sockets.
.Pp
Userspace clients can dynamically switch NICs into
.Nm
@ -73,8 +94,10 @@ mode and send and receive raw packets through
memory mapped buffers.
Similarly,
.Nm VALE
switch instances and ports, and
switch instances and ports,
.Nm netmap pipes
and
.Nm netmap monitors
can be created dynamically,
providing high speed packet I/O between processes,
virtual machines, NICs and the host stack.
@ -89,17 +112,17 @@ and standard OS mechanisms such as
.Xr epoll 2 ,
and
.Xr kqueue 2 .
.Nm VALE
and
.Nm netmap pipes
All types of
.Nm netmap ports
and the
.Nm VALE switch
are implemented by a single kernel module, which also emulates the
.Nm
API over standard drivers for devices without native
.Nm
support.
API over standard drivers.
For best performance,
.Nm
requires explicit support in device drivers.
requires native support in device drivers.
A list of such devices is at the end of this document.
.Pp
In the rest of this (long) manual page we document
various aspects of the
@ -116,7 +139,7 @@ which can be connected to a physical interface
to the host stack,
or to a
.Nm VALE
switch).
switch.
Ports use preallocated circular queues of buffers
.Em ( rings )
residing in an mmapped region.
@ -166,16 +189,18 @@ has multiple modes of operation controlled by the
.Vt struct nmreq
argument.
.Va arg.nr_name
specifies the port name, as follows:
specifies the netmap port name, as follows:
.Bl -tag -width XXXX
.It Dv OS network interface name (e.g. 'em0', 'eth1', ... )
the data path of the NIC is disconnected from the host stack,
and the file descriptor is bound to the NIC (one or all queues),
or to the host stack;
.It Dv valeXXX:YYY (arbitrary XXX and YYY)
the file descriptor is bound to port YYY of a VALE switch called XXX,
both dynamically created if necessary.
The string cannot exceed IFNAMSIZ characters, and YYY cannot
.It Dv valeSSS:PPP
the file descriptor is bound to port PPP of VALE switch SSS.
Switch instances and ports are dynamically created if necessary.
.br
Both SSS and PPP have the form [0-9a-zA-Z_]+ , the string
cannot exceed IFNAMSIZ characters, and PPP cannot
be the name of any existing OS network interface.
.El
.Pp
@ -312,9 +337,6 @@ one slot is always kept empty.
The ring size
.Va ( num_slots )
should not be assumed to be a power of two.
.br
(NOTE: older versions of netmap used head/count format to indicate
the content of a ring).
.Pp
.Va head
is the first slot available to userspace;
@ -585,6 +607,15 @@ it from the host stack.
Multiple file descriptors can be bound to the same port,
with proper synchronization left to the user.
.Pp
The recommended way to bind a file descriptor to a port is
to use function
.Va nm_open(..)
(see
.Xr LIBRARIES )
which parses names to access specific port types and
enable features.
In the following we document the main features.
.Pp
.Dv NIOCREGIF can also bind a file descriptor to one endpoint of a
.Em netmap pipe ,
consisting of two netmap ports with a crossover connection.
@ -734,7 +765,7 @@ similar to
binds a file descriptor to a port.
.Bl -tag -width XX
.It Va ifname
is a port name, in the form "netmap:XXX" for a NIC and "valeXXX:YYY" for a
is a port name, in the form "netmap:PPP" for a NIC and "valeSSS:PPP" for a
.Nm VALE
port.
.It Va req
@ -774,28 +805,39 @@ similar to pcap_next(), fetches the next packet
natively supports the following devices:
.Pp
On FreeBSD:
.Xr cxgbe 4 ,
.Xr em 4 ,
.Xr igb 4 ,
.Xr ixgbe 4 ,
.Xr ixl 4 ,
.Xr lem 4 ,
.Xr re 4 .
.Pp
On Linux
.Xr e1000 4 ,
.Xr e1000e 4 ,
.Xr i40e 4 ,
.Xr igb 4 ,
.Xr ixgbe 4 ,
.Xr mlx4 4 ,
.Xr forcedeth 4 ,
.Xr r8169 4 .
.Pp
NICs without native support can still be used in
.Nm
mode through emulation.
Performance is inferior to native netmap
mode but still significantly higher than sockets, and approaching
mode but still significantly higher than various raw socket types
(bpf, PF_PACKET, etc.).
Note that for slow devices (such as 1 Gbit/s and slower NICs,
or several 10 Gbit/s NICs whose hardware is unable
that of in-kernel solutions such as Linux's
.Xr pktgen .
When emulation is in use, packet sniffer programs such as tcpdump
could see received packets before they are diverted by netmap. This behaviour
is not intentional, being just an artifact of the implementation of emulation.
Note that in case the netmap application subsequently moves packets received
from the emulated adapter onto the host RX ring, the sniffer will intercept
those packets again, since the packets are injected to the host stack as they
were received by the network interface.
.Pp
Emulation is also available for devices with native netmap support,
which can be used for testing or performance comparison.
@ -812,8 +854,12 @@ and module parameters on Linux
.Bl -tag -width indent
.It Va dev.netmap.admode: 0
Controls the use of native or emulated adapter mode.
0 uses the best available option, 1 forces native and
fails if not available, 2 forces emulated hence never fails.
.br
0 uses the best available option;
.br
1 forces native mode and fails if not available;
.br
2 forces emulated hence never fails.
.It Va dev.netmap.generic_ringsize: 1024
Ring size used for emulated netmap mode
.It Va dev.netmap.generic_mit: 100000
@ -861,9 +907,9 @@ performance.
uses
.Xr select 2 ,
.Xr poll 2 ,
.Xr epoll
.Xr epoll 2
and
.Xr kqueue
.Xr kqueue 2
to wake up processes when significant events occur, and
.Xr mmap 2
to map memory.
@ -1015,8 +1061,8 @@ e.g. running the following in two different terminals:
.Dl pkt-gen -i vale1:b -f tx # sender
The same example can be used to test netmap pipes, by simply
changing port names, e.g.
.Dl pkt-gen -i vale:x{3 -f rx # receiver on the master side
.Dl pkt-gen -i vale:x}3 -f tx # sender on the slave side
.Dl pkt-gen -i vale2:x{3 -f rx # receiver on the master side
.Dl pkt-gen -i vale2:x}3 -f tx # sender on the slave side
.Pp
The following command attaches an interface and the host stack
to a switch:

View File

@ -2187,6 +2187,7 @@ dev/nand/nfc_if.m optional nand
dev/ncr/ncr.c optional ncr pci
dev/ncv/ncr53c500.c optional ncv
dev/ncv/ncr53c500_pccard.c optional ncv pccard
dev/netmap/if_ptnet.c optional netmap
dev/netmap/netmap.c optional netmap
dev/netmap/netmap_freebsd.c optional netmap
dev/netmap/netmap_generic.c optional netmap
@ -2195,6 +2196,7 @@ dev/netmap/netmap_mem2.c optional netmap
dev/netmap/netmap_monitor.c optional netmap
dev/netmap/netmap_offloadings.c optional netmap
dev/netmap/netmap_pipe.c optional netmap
dev/netmap/netmap_pt.c optional netmap
dev/netmap/netmap_vale.c optional netmap
# compile-with "${NORMAL_C} -Wconversion -Wextra"
dev/nfsmb/nfsmb.c optional nfsmb pci

View File

@ -59,7 +59,7 @@ extern int ixl_rx_miss, ixl_rx_miss_bufs, ixl_crcstrip;
/*
* device-specific sysctl variables:
*
* ixl_crcstrip: 0: keep CRC in rx frames (default), 1: strip it.
* ixl_crcstrip: 0: NIC keeps CRC in rx frames, 1: NIC strips it (default).
* During regular operations the CRC is stripped, but on some
* hardware reception of frames not multiple of 64 is slower,
* so using crcstrip=0 helps in benchmarks.
@ -73,7 +73,7 @@ SYSCTL_DECL(_dev_netmap);
*/
#if 0
SYSCTL_INT(_dev_netmap, OID_AUTO, ixl_crcstrip,
CTLFLAG_RW, &ixl_crcstrip, 1, "strip CRC on rx frames");
CTLFLAG_RW, &ixl_crcstrip, 1, "NIC strips CRC on rx frames");
#endif
SYSCTL_INT(_dev_netmap, OID_AUTO, ixl_rx_miss,
CTLFLAG_RW, &ixl_rx_miss, 0, "potentially missed rx intr");

View File

@ -81,6 +81,22 @@ lem_netmap_reg(struct netmap_adapter *na, int onoff)
}
static void
lem_netmap_intr(struct netmap_adapter *na, int onoff)
{
struct ifnet *ifp = na->ifp;
struct adapter *adapter = ifp->if_softc;
EM_CORE_LOCK(adapter);
if (onoff) {
lem_enable_intr(adapter);
} else {
lem_disable_intr(adapter);
}
EM_CORE_UNLOCK(adapter);
}
/*
* Reconcile kernel and user view of the transmit ring.
*/
@ -99,10 +115,6 @@ lem_netmap_txsync(struct netmap_kring *kring, int flags)
/* device-specific */
struct adapter *adapter = ifp->if_softc;
#ifdef NIC_PARAVIRT
struct paravirt_csb *csb = adapter->csb;
uint64_t *csbd = (uint64_t *)(csb + 1);
#endif /* NIC_PARAVIRT */
bus_dmamap_sync(adapter->txdma.dma_tag, adapter->txdma.dma_map,
BUS_DMASYNC_POSTREAD);
@ -113,19 +125,6 @@ lem_netmap_txsync(struct netmap_kring *kring, int flags)
nm_i = kring->nr_hwcur;
if (nm_i != head) { /* we have new packets to send */
#ifdef NIC_PARAVIRT
int do_kick = 0;
uint64_t t = 0; // timestamp
int n = head - nm_i;
if (n < 0)
n += lim + 1;
if (csb) {
t = rdtsc(); /* last timestamp */
csbd[16] += t - csbd[0]; /* total Wg */
csbd[17] += n; /* Wg count */
csbd[0] = t;
}
#endif /* NIC_PARAVIRT */
nic_i = netmap_idx_k2n(kring, nm_i);
while (nm_i != head) {
struct netmap_slot *slot = &ring->slot[nm_i];
@ -166,38 +165,8 @@ lem_netmap_txsync(struct netmap_kring *kring, int flags)
bus_dmamap_sync(adapter->txdma.dma_tag, adapter->txdma.dma_map,
BUS_DMASYNC_PREREAD | BUS_DMASYNC_PREWRITE);
#ifdef NIC_PARAVIRT
/* set unconditionally, then also kick if needed */
if (csb) {
t = rdtsc();
if (csb->host_need_txkick == 2) {
/* can compute an update of delta */
int64_t delta = t - csbd[3];
if (delta < 0)
delta = -delta;
if (csbd[8] == 0 || delta < csbd[8]) {
csbd[8] = delta;
csbd[9]++;
}
csbd[10]++;
}
csb->guest_tdt = nic_i;
csbd[18] += t - csbd[0]; // total wp
csbd[19] += n;
}
if (!csb || !csb->guest_csb_on || (csb->host_need_txkick & 1))
do_kick = 1;
if (do_kick)
#endif /* NIC_PARAVIRT */
/* (re)start the tx unit up to slot nic_i (excluded) */
E1000_WRITE_REG(&adapter->hw, E1000_TDT(0), nic_i);
#ifdef NIC_PARAVIRT
if (do_kick) {
uint64_t t1 = rdtsc();
csbd[20] += t1 - t; // total Np
csbd[21]++;
}
#endif /* NIC_PARAVIRT */
}
/*
@ -206,93 +175,6 @@ lem_netmap_txsync(struct netmap_kring *kring, int flags)
if (ticks != kring->last_reclaim || flags & NAF_FORCE_RECLAIM || nm_kr_txempty(kring)) {
kring->last_reclaim = ticks;
/* record completed transmissions using TDH */
#ifdef NIC_PARAVIRT
/* host updates tdh unconditionally, and we have
* no side effects on reads, so we can read from there
* instead of exiting.
*/
if (csb) {
static int drain = 0, nodrain=0, good = 0, bad = 0, fail = 0;
u_int x = adapter->next_tx_to_clean;
csbd[19]++; // XXX count reclaims
nic_i = csb->host_tdh;
if (csb->guest_csb_on) {
if (nic_i == x) {
bad++;
csbd[24]++; // failed reclaims
/* no progress, request kick and retry */
csb->guest_need_txkick = 1;
mb(); // XXX barrier
nic_i = csb->host_tdh;
} else {
good++;
}
if (nic_i != x) {
csb->guest_need_txkick = 2;
if (nic_i == csb->guest_tdt)
drain++;
else
nodrain++;
#if 1
if (netmap_adaptive_io) {
/* new mechanism: last half ring (or so)
* released one slot at a time.
* This effectively makes the system spin.
*
* Take next_to_clean + 1 as a reference.
* tdh must be ahead or equal
* On entry, the logical order is
* x < tdh = nic_i
* We first push tdh up to avoid wraps.
* The limit is tdh-ll (half ring).
* if tdh-256 < x we report x;
* else we report tdh-256
*/
u_int tdh = nic_i;
u_int ll = csbd[15];
u_int delta = lim/8;
if (netmap_adaptive_io == 2 || ll > delta)
csbd[15] = ll = delta;
else if (netmap_adaptive_io == 1 && ll > 1) {
csbd[15]--;
}
if (nic_i >= kring->nkr_num_slots) {
RD(5, "bad nic_i %d on input", nic_i);
}
x = nm_next(x, lim);
if (tdh < x)
tdh += lim + 1;
if (tdh <= x + ll) {
nic_i = x;
csbd[25]++; //report n + 1;
} else {
tdh = nic_i;
if (tdh < ll)
tdh += lim + 1;
nic_i = tdh - ll;
csbd[26]++; // report tdh - ll
}
}
#endif
} else {
/* we stop, count whether we are idle or not */
int bh_active = csb->host_need_txkick & 2 ? 4 : 0;
csbd[27+ csb->host_need_txkick]++;
if (netmap_adaptive_io == 1) {
if (bh_active && csbd[15] > 1)
csbd[15]--;
else if (!bh_active && csbd[15] < lim/2)
csbd[15]++;
}
bad--;
fail++;
}
}
RD(1, "drain %d nodrain %d good %d retry %d fail %d",
drain, nodrain, good, bad, fail);
} else
#endif /* !NIC_PARAVIRT */
nic_i = E1000_READ_REG(&adapter->hw, E1000_TDH(0));
if (nic_i >= kring->nkr_num_slots) { /* XXX can it happen ? */
D("TDH wrap %d", nic_i);
@ -324,21 +206,10 @@ lem_netmap_rxsync(struct netmap_kring *kring, int flags)
/* device-specific */
struct adapter *adapter = ifp->if_softc;
#ifdef NIC_PARAVIRT
struct paravirt_csb *csb = adapter->csb;
uint32_t csb_mode = csb && csb->guest_csb_on;
uint32_t do_host_rxkick = 0;
#endif /* NIC_PARAVIRT */
if (head > lim)
return netmap_ring_reinit(kring);
#ifdef NIC_PARAVIRT
if (csb_mode) {
force_update = 1;
csb->guest_need_rxkick = 0;
}
#endif /* NIC_PARAVIRT */
/* XXX check sync modes */
bus_dmamap_sync(adapter->rxdma.dma_tag, adapter->rxdma.dma_map,
BUS_DMASYNC_POSTREAD | BUS_DMASYNC_POSTWRITE);
@ -357,23 +228,6 @@ lem_netmap_rxsync(struct netmap_kring *kring, int flags)
uint32_t staterr = le32toh(curr->status);
int len;
#ifdef NIC_PARAVIRT
if (csb_mode) {
if ((staterr & E1000_RXD_STAT_DD) == 0) {
/* don't bother to retry if more than 1 pkt */
if (n > 1)
break;
csb->guest_need_rxkick = 1;
wmb();
staterr = le32toh(curr->status);
if ((staterr & E1000_RXD_STAT_DD) == 0) {
break;
} else { /* we are good */
csb->guest_need_rxkick = 0;
}
}
} else
#endif /* NIC_PARAVIRT */
if ((staterr & E1000_RXD_STAT_DD) == 0)
break;
len = le16toh(curr->length) - 4; // CRC
@ -390,18 +244,6 @@ lem_netmap_rxsync(struct netmap_kring *kring, int flags)
nic_i = nm_next(nic_i, lim);
}
if (n) { /* update the state variables */
#ifdef NIC_PARAVIRT
if (csb_mode) {
if (n > 1) {
/* leave one spare buffer so we avoid rxkicks */
nm_i = nm_prev(nm_i, lim);
nic_i = nm_prev(nic_i, lim);
n--;
} else {
csb->guest_need_rxkick = 1;
}
}
#endif /* NIC_PARAVIRT */
ND("%d new packets at nic %d nm %d tail %d",
n,
adapter->next_rx_desc_to_check,
@ -440,10 +282,6 @@ lem_netmap_rxsync(struct netmap_kring *kring, int flags)
curr->status = 0;
bus_dmamap_sync(adapter->rxtag, rxbuf->map,
BUS_DMASYNC_PREREAD);
#ifdef NIC_PARAVIRT
if (csb_mode && csb->host_rxkick_at == nic_i)
do_host_rxkick = 1;
#endif /* NIC_PARAVIRT */
nm_i = nm_next(nm_i, lim);
nic_i = nm_next(nic_i, lim);
}
@ -455,12 +293,6 @@ lem_netmap_rxsync(struct netmap_kring *kring, int flags)
* so move nic_i back by one unit
*/
nic_i = nm_prev(nic_i, lim);
#ifdef NIC_PARAVIRT
/* set unconditionally, then also kick if needed */
if (csb)
csb->guest_rdt = nic_i;
if (!csb_mode || do_host_rxkick)
#endif /* NIC_PARAVIRT */
E1000_WRITE_REG(&adapter->hw, E1000_RDT(0), nic_i);
}
@ -486,6 +318,7 @@ lem_netmap_attach(struct adapter *adapter)
na.nm_rxsync = lem_netmap_rxsync;
na.nm_register = lem_netmap_reg;
na.num_tx_rings = na.num_rx_rings = 1;
na.nm_intr = lem_netmap_intr;
netmap_attach(&na);
}

View File

@ -53,7 +53,7 @@ void ixgbe_netmap_attach(struct adapter *adapter);
/*
* device-specific sysctl variables:
*
* ix_crcstrip: 0: keep CRC in rx frames (default), 1: strip it.
* ix_crcstrip: 0: NIC keeps CRC in rx frames (default), 1: NIC strips it.
* During regular operations the CRC is stripped, but on some
* hardware reception of frames not multiple of 64 is slower,
* so using crcstrip=0 helps in benchmarks.
@ -65,7 +65,7 @@ SYSCTL_DECL(_dev_netmap);
static int ix_rx_miss, ix_rx_miss_bufs;
int ix_crcstrip;
SYSCTL_INT(_dev_netmap, OID_AUTO, ix_crcstrip,
CTLFLAG_RW, &ix_crcstrip, 0, "strip CRC on rx frames");
CTLFLAG_RW, &ix_crcstrip, 0, "NIC strips CRC on rx frames");
SYSCTL_INT(_dev_netmap, OID_AUTO, ix_rx_miss,
CTLFLAG_RW, &ix_rx_miss, 0, "potentially missed rx intr");
SYSCTL_INT(_dev_netmap, OID_AUTO, ix_rx_miss_bufs,
@ -109,6 +109,20 @@ set_crcstrip(struct ixgbe_hw *hw, int onoff)
IXGBE_WRITE_REG(hw, IXGBE_RDRXCTL, rxc);
}
static void
ixgbe_netmap_intr(struct netmap_adapter *na, int onoff)
{
struct ifnet *ifp = na->ifp;
struct adapter *adapter = ifp->if_softc;
IXGBE_CORE_LOCK(adapter);
if (onoff) {
ixgbe_enable_intr(adapter); // XXX maybe ixgbe_stop ?
} else {
ixgbe_disable_intr(adapter); // XXX maybe ixgbe_stop ?
}
IXGBE_CORE_UNLOCK(adapter);
}
/*
* Register/unregister. We are already under netmap lock.
@ -311,7 +325,7 @@ ixgbe_netmap_txsync(struct netmap_kring *kring, int flags)
* good way.
*/
nic_i = IXGBE_READ_REG(&adapter->hw, IXGBE_IS_VF(adapter) ?
IXGBE_VFTDH(kring->ring_id) : IXGBE_TDH(kring->ring_id));
IXGBE_VFTDH(kring->ring_id) : IXGBE_TDH(kring->ring_id));
if (nic_i >= kring->nkr_num_slots) { /* XXX can it happen ? */
D("TDH wrap %d", nic_i);
nic_i -= kring->nkr_num_slots;
@ -486,6 +500,7 @@ ixgbe_netmap_attach(struct adapter *adapter)
na.nm_rxsync = ixgbe_netmap_rxsync;
na.nm_register = ixgbe_netmap_reg;
na.num_tx_rings = na.num_rx_rings = adapter->num_queues;
na.nm_intr = ixgbe_netmap_intr;
netmap_attach(&na);
}

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@ -1,5 +1,6 @@
/*
* Copyright (C) 2013-2014 Vincenzo Maffione. All rights reserved.
* Copyright (C) 2013-2014 Vincenzo Maffione
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
@ -30,6 +31,8 @@
#ifdef linux
#include "bsd_glue.h"
#elif defined (_WIN32)
#include "win_glue.h"
#else /* __FreeBSD__ */
#include <sys/param.h>
#include <sys/lock.h>
@ -152,12 +155,12 @@ void mbq_safe_purge(struct mbq *q)
}
void mbq_safe_destroy(struct mbq *q)
void mbq_safe_fini(struct mbq *q)
{
mtx_destroy(&q->lock);
}
void mbq_destroy(struct mbq *q)
void mbq_fini(struct mbq *q)
{
}

View File

@ -1,5 +1,6 @@
/*
* Copyright (C) 2013-2014 Vincenzo Maffione. All rights reserved.
* Copyright (C) 2013-2014 Vincenzo Maffione
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
@ -40,6 +41,8 @@
/* XXX probably rely on a previous definition of SPINLOCK_T */
#ifdef linux
#define SPINLOCK_T safe_spinlock_t
#elif defined (_WIN32)
#define SPINLOCK_T win_spinlock_t
#else
#define SPINLOCK_T struct mtx
#endif
@ -52,16 +55,21 @@ struct mbq {
SPINLOCK_T lock;
};
/* XXX "destroy" does not match "init" as a name.
* We should also clarify whether init can be used while
/* We should clarify whether init can be used while
* holding a lock, and whether mbq_safe_destroy() is a NOP.
*/
void mbq_init(struct mbq *q);
void mbq_destroy(struct mbq *q);
void mbq_fini(struct mbq *q);
void mbq_enqueue(struct mbq *q, struct mbuf *m);
struct mbuf *mbq_dequeue(struct mbq *q);
void mbq_purge(struct mbq *q);
static inline struct mbuf *
mbq_peek(struct mbq *q)
{
return q->head ? q->head : NULL;
}
static inline void
mbq_lock(struct mbq *q)
{
@ -76,7 +84,7 @@ mbq_unlock(struct mbq *q)
void mbq_safe_init(struct mbq *q);
void mbq_safe_destroy(struct mbq *q);
void mbq_safe_fini(struct mbq *q);
void mbq_safe_enqueue(struct mbq *q, struct mbuf *m);
struct mbuf *mbq_safe_dequeue(struct mbq *q);
void mbq_safe_purge(struct mbq *q);

File diff suppressed because it is too large Load Diff

View File

@ -1,5 +1,8 @@
/*
* Copyright (C) 2012-2014 Matteo Landi, Luigi Rizzo, Giuseppe Lettieri. All rights reserved.
* Copyright (C) 2012-2014 Matteo Landi
* Copyright (C) 2012-2016 Luigi Rizzo
* Copyright (C) 2012-2016 Giuseppe Lettieri
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
@ -117,8 +120,11 @@
extern struct netmap_mem_d nm_mem;
void netmap_mem_get_lut(struct netmap_mem_d *, struct netmap_lut *);
int netmap_mem_get_lut(struct netmap_mem_d *, struct netmap_lut *);
vm_paddr_t netmap_mem_ofstophys(struct netmap_mem_d *, vm_ooffset_t);
#ifdef _WIN32
PMDL win32_build_user_vm_map(struct netmap_mem_d* nmd);
#endif
int netmap_mem_finalize(struct netmap_mem_d *, struct netmap_adapter *);
int netmap_mem_init(void);
void netmap_mem_fini(void);
@ -127,6 +133,7 @@ void netmap_mem_if_delete(struct netmap_adapter *, struct netmap_if *);
int netmap_mem_rings_create(struct netmap_adapter *);
void netmap_mem_rings_delete(struct netmap_adapter *);
void netmap_mem_deref(struct netmap_mem_d *, struct netmap_adapter *);
int netmap_mem2_get_pool_info(struct netmap_mem_d *, u_int, u_int *, u_int *);
int netmap_mem_get_info(struct netmap_mem_d *, u_int *size, u_int *memflags, uint16_t *id);
ssize_t netmap_mem_if_offset(struct netmap_mem_d *, const void *vaddr);
struct netmap_mem_d* netmap_mem_private_new(const char *name,
@ -157,6 +164,15 @@ void netmap_mem_put(struct netmap_mem_d *);
#endif /* !NM_DEBUG_PUTGET */
#ifdef WITH_PTNETMAP_GUEST
struct netmap_mem_d* netmap_mem_pt_guest_new(struct ifnet *,
unsigned int nifp_offset,
nm_pt_guest_ptctl_t);
struct ptnetmap_memdev;
struct netmap_mem_d* netmap_mem_pt_guest_attach(struct ptnetmap_memdev *, uint16_t);
int netmap_mem_pt_guest_ifp_del(struct netmap_mem_d *, struct ifnet *);
#endif /* WITH_PTNETMAP_GUEST */
#define NETMAP_MEM_PRIVATE 0x2 /* allocator uses private address space */
#define NETMAP_MEM_IO 0x4 /* the underlying memory is mmapped I/O */

View File

@ -1,5 +1,6 @@
/*
* Copyright (C) 2014 Giuseppe Lettieri. All rights reserved.
* Copyright (C) 2014-2016 Giuseppe Lettieri
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
@ -101,6 +102,8 @@
#warning OSX support is only partial
#include "osx_glue.h"
#elif defined(_WIN32)
#include "win_glue.h"
#else
#error Unsupported platform
@ -151,13 +154,17 @@ netmap_monitor_rxsync(struct netmap_kring *kring, int flags)
}
/* nm_krings_create callbacks for monitors.
* We could use the default netmap_hw_krings_zmon, but
* we don't need the mbq.
*/
static int
netmap_monitor_krings_create(struct netmap_adapter *na)
{
return netmap_krings_create(na, 0);
int error = netmap_krings_create(na, 0);
if (error)
return error;
/* override the host rings callbacks */
na->tx_rings[na->num_tx_rings].nm_sync = netmap_monitor_txsync;
na->rx_rings[na->num_rx_rings].nm_sync = netmap_monitor_rxsync;
return 0;
}
/* nm_krings_delete callback for monitors */
@ -186,7 +193,11 @@ nm_monitor_alloc(struct netmap_kring *kring, u_int n)
return 0;
len = sizeof(struct netmap_kring *) * n;
#ifndef _WIN32
nm = realloc(kring->monitors, len, M_DEVBUF, M_NOWAIT | M_ZERO);
#else
nm = realloc(kring->monitors, len, sizeof(struct netmap_kring *)*kring->max_monitors);
#endif
if (nm == NULL)
return ENOMEM;
@ -229,10 +240,10 @@ static int netmap_monitor_parent_notify(struct netmap_kring *, int);
static int
netmap_monitor_add(struct netmap_kring *mkring, struct netmap_kring *kring, int zcopy)
{
int error = 0;
int error = NM_IRQ_COMPLETED;
/* sinchronize with concurrently running nm_sync()s */
nm_kr_get(kring);
nm_kr_stop(kring, NM_KR_LOCKED);
/* make sure the monitor array exists and is big enough */
error = nm_monitor_alloc(kring, kring->n_monitors + 1);
if (error)
@ -242,7 +253,7 @@ netmap_monitor_add(struct netmap_kring *mkring, struct netmap_kring *kring, int
kring->n_monitors++;
if (kring->n_monitors == 1) {
/* this is the first monitor, intercept callbacks */
D("%s: intercept callbacks on %s", mkring->name, kring->name);
ND("%s: intercept callbacks on %s", mkring->name, kring->name);
kring->mon_sync = kring->nm_sync;
/* zcopy monitors do not override nm_notify(), but
* we save the original one regardless, so that
@ -265,7 +276,7 @@ netmap_monitor_add(struct netmap_kring *mkring, struct netmap_kring *kring, int
}
out:
nm_kr_put(kring);
nm_kr_start(kring);
return error;
}
@ -277,7 +288,7 @@ static void
netmap_monitor_del(struct netmap_kring *mkring, struct netmap_kring *kring)
{
/* sinchronize with concurrently running nm_sync()s */
nm_kr_get(kring);
nm_kr_stop(kring, NM_KR_LOCKED);
kring->n_monitors--;
if (mkring->mon_pos != kring->n_monitors) {
kring->monitors[mkring->mon_pos] = kring->monitors[kring->n_monitors];
@ -286,18 +297,18 @@ netmap_monitor_del(struct netmap_kring *mkring, struct netmap_kring *kring)
kring->monitors[kring->n_monitors] = NULL;
if (kring->n_monitors == 0) {
/* this was the last monitor, restore callbacks and delete monitor array */
D("%s: restoring sync on %s: %p", mkring->name, kring->name, kring->mon_sync);
ND("%s: restoring sync on %s: %p", mkring->name, kring->name, kring->mon_sync);
kring->nm_sync = kring->mon_sync;
kring->mon_sync = NULL;
if (kring->tx == NR_RX) {
D("%s: restoring notify on %s: %p",
ND("%s: restoring notify on %s: %p",
mkring->name, kring->name, kring->mon_notify);
kring->nm_notify = kring->mon_notify;
kring->mon_notify = NULL;
}
nm_monitor_dealloc(kring);
}
nm_kr_put(kring);
nm_kr_start(kring);
}
@ -316,7 +327,7 @@ netmap_monitor_stop(struct netmap_adapter *na)
for_rx_tx(t) {
u_int i;
for (i = 0; i < nma_get_nrings(na, t); i++) {
for (i = 0; i < nma_get_nrings(na, t) + 1; i++) {
struct netmap_kring *kring = &NMR(na, t)[i];
u_int j;
@ -360,23 +371,32 @@ netmap_monitor_reg_common(struct netmap_adapter *na, int onoff, int zmon)
for (i = priv->np_qfirst[t]; i < priv->np_qlast[t]; i++) {
kring = &NMR(pna, t)[i];
mkring = &na->rx_rings[i];
netmap_monitor_add(mkring, kring, zmon);
if (nm_kring_pending_on(mkring)) {
netmap_monitor_add(mkring, kring, zmon);
mkring->nr_mode = NKR_NETMAP_ON;
}
}
}
}
na->na_flags |= NAF_NETMAP_ON;
} else {
if (pna == NULL) {
D("%s: parent left netmap mode, nothing to restore", na->name);
return 0;
}
na->na_flags &= ~NAF_NETMAP_ON;
if (na->active_fds == 0)
na->na_flags &= ~NAF_NETMAP_ON;
for_rx_tx(t) {
if (mna->flags & nm_txrx2flag(t)) {
for (i = priv->np_qfirst[t]; i < priv->np_qlast[t]; i++) {
kring = &NMR(pna, t)[i];
mkring = &na->rx_rings[i];
netmap_monitor_del(mkring, kring);
if (nm_kring_pending_off(mkring)) {
mkring->nr_mode = NKR_NETMAP_OFF;
/* we cannot access the parent krings if the parent
* has left netmap mode. This is signaled by a NULL
* pna pointer
*/
if (pna) {
kring = &NMR(pna, t)[i];
netmap_monitor_del(mkring, kring);
}
}
}
}
}
@ -652,17 +672,27 @@ netmap_monitor_parent_rxsync(struct netmap_kring *kring, int flags)
static int
netmap_monitor_parent_notify(struct netmap_kring *kring, int flags)
{
int (*notify)(struct netmap_kring*, int);
ND(5, "%s %x", kring->name, flags);
/* ?xsync callbacks have tryget called by their callers
* (NIOCREGIF and poll()), but here we have to call it
* by ourself
*/
if (nm_kr_tryget(kring))
goto out;
netmap_monitor_parent_rxsync(kring, NAF_FORCE_READ);
if (nm_kr_tryget(kring, 0, NULL)) {
/* in all cases, just skip the sync */
return NM_IRQ_COMPLETED;
}
if (kring->n_monitors > 0) {
netmap_monitor_parent_rxsync(kring, NAF_FORCE_READ);
notify = kring->mon_notify;
} else {
/* we are no longer monitoring this ring, so both
* mon_sync and mon_notify are NULL
*/
notify = kring->nm_notify;
}
nm_kr_put(kring);
out:
return kring->mon_notify(kring, flags);
return notify(kring, flags);
}
@ -691,18 +721,25 @@ netmap_get_monitor_na(struct nmreq *nmr, struct netmap_adapter **na, int create)
struct nmreq pnmr;
struct netmap_adapter *pna; /* parent adapter */
struct netmap_monitor_adapter *mna;
struct ifnet *ifp = NULL;
int i, error;
enum txrx t;
int zcopy = (nmr->nr_flags & NR_ZCOPY_MON);
char monsuff[10] = "";
if ((nmr->nr_flags & (NR_MONITOR_TX | NR_MONITOR_RX)) == 0) {
if (nmr->nr_flags & NR_ZCOPY_MON) {
/* the flag makes no sense unless you are
* creating a monitor
*/
return EINVAL;
}
ND("not a monitor");
return 0;
}
/* this is a request for a monitor adapter */
D("flags %x", nmr->nr_flags);
ND("flags %x", nmr->nr_flags);
mna = malloc(sizeof(*mna), M_DEVBUF, M_NOWAIT | M_ZERO);
if (mna == NULL) {
@ -716,13 +753,14 @@ netmap_get_monitor_na(struct nmreq *nmr, struct netmap_adapter **na, int create)
* except other monitors.
*/
memcpy(&pnmr, nmr, sizeof(pnmr));
pnmr.nr_flags &= ~(NR_MONITOR_TX | NR_MONITOR_RX);
error = netmap_get_na(&pnmr, &pna, create);
pnmr.nr_flags &= ~(NR_MONITOR_TX | NR_MONITOR_RX | NR_ZCOPY_MON);
error = netmap_get_na(&pnmr, &pna, &ifp, create);
if (error) {
D("parent lookup failed: %d", error);
free(mna, M_DEVBUF);
return error;
}
D("found parent: %s", pna->name);
ND("found parent: %s", pna->name);
if (!nm_netmap_on(pna)) {
/* parent not in netmap mode */
@ -829,19 +867,17 @@ netmap_get_monitor_na(struct nmreq *nmr, struct netmap_adapter **na, int create)
*na = &mna->up;
netmap_adapter_get(*na);
/* write the configuration back */
nmr->nr_tx_rings = mna->up.num_tx_rings;
nmr->nr_rx_rings = mna->up.num_rx_rings;
nmr->nr_tx_slots = mna->up.num_tx_desc;
nmr->nr_rx_slots = mna->up.num_rx_desc;
/* keep the reference to the parent */
D("monitor ok");
ND("monitor ok");
/* drop the reference to the ifp, if any */
if (ifp)
if_rele(ifp);
return 0;
put_out:
netmap_adapter_put(pna);
netmap_unget_na(pna, ifp);
free(mna, M_DEVBUF);
return error;
}

View File

@ -1,5 +1,6 @@
/*
* Copyright (C) 2014 Vincenzo Maffione. All rights reserved.
* Copyright (C) 2014-2015 Vincenzo Maffione
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
@ -31,9 +32,9 @@
#include <sys/types.h>
#include <sys/errno.h>
#include <sys/param.h> /* defines used in kernel.h */
#include <sys/malloc.h> /* types used in module initialization */
#include <sys/kernel.h> /* types used in module initialization */
#include <sys/sockio.h>
#include <sys/malloc.h>
#include <sys/socketvar.h> /* struct socket */
#include <sys/socket.h> /* sockaddrs */
#include <net/if.h>
@ -64,21 +65,21 @@
/* This routine is called by bdg_mismatch_datapath() when it finishes
* accumulating bytes for a segment, in order to fix some fields in the
* segment headers (which still contain the same content as the header
* of the original GSO packet). 'buf' points to the beginning (e.g.
* the ethernet header) of the segment, and 'len' is its length.
* of the original GSO packet). 'pkt' points to the beginning of the IP
* header of the segment, while 'len' is the length of the IP packet.
*/
static void gso_fix_segment(uint8_t *buf, size_t len, u_int idx,
u_int segmented_bytes, u_int last_segment,
u_int tcp, u_int iphlen)
static void
gso_fix_segment(uint8_t *pkt, size_t len, u_int ipv4, u_int iphlen, u_int tcp,
u_int idx, u_int segmented_bytes, u_int last_segment)
{
struct nm_iphdr *iph = (struct nm_iphdr *)(buf + 14);
struct nm_ipv6hdr *ip6h = (struct nm_ipv6hdr *)(buf + 14);
struct nm_iphdr *iph = (struct nm_iphdr *)(pkt);
struct nm_ipv6hdr *ip6h = (struct nm_ipv6hdr *)(pkt);
uint16_t *check = NULL;
uint8_t *check_data = NULL;
if (iphlen == 20) {
if (ipv4) {
/* Set the IPv4 "Total Length" field. */
iph->tot_len = htobe16(len-14);
iph->tot_len = htobe16(len);
ND("ip total length %u", be16toh(ip->tot_len));
/* Set the IPv4 "Identification" field. */
@ -87,15 +88,15 @@ static void gso_fix_segment(uint8_t *buf, size_t len, u_int idx,
/* Compute and insert the IPv4 header checksum. */
iph->check = 0;
iph->check = nm_csum_ipv4(iph);
iph->check = nm_os_csum_ipv4(iph);
ND("IP csum %x", be16toh(iph->check));
} else {/* if (iphlen == 40) */
} else {
/* Set the IPv6 "Payload Len" field. */
ip6h->payload_len = htobe16(len-14-iphlen);
ip6h->payload_len = htobe16(len-iphlen);
}
if (tcp) {
struct nm_tcphdr *tcph = (struct nm_tcphdr *)(buf + 14 + iphlen);
struct nm_tcphdr *tcph = (struct nm_tcphdr *)(pkt + iphlen);
/* Set the TCP sequence number. */
tcph->seq = htobe32(be32toh(tcph->seq) + segmented_bytes);
@ -110,10 +111,10 @@ static void gso_fix_segment(uint8_t *buf, size_t len, u_int idx,
check = &tcph->check;
check_data = (uint8_t *)tcph;
} else { /* UDP */
struct nm_udphdr *udph = (struct nm_udphdr *)(buf + 14 + iphlen);
struct nm_udphdr *udph = (struct nm_udphdr *)(pkt + iphlen);
/* Set the UDP 'Length' field. */
udph->len = htobe16(len-14-iphlen);
udph->len = htobe16(len-iphlen);
check = &udph->check;
check_data = (uint8_t *)udph;
@ -121,48 +122,80 @@ static void gso_fix_segment(uint8_t *buf, size_t len, u_int idx,
/* Compute and insert TCP/UDP checksum. */
*check = 0;
if (iphlen == 20)
nm_csum_tcpudp_ipv4(iph, check_data, len-14-iphlen, check);
if (ipv4)
nm_os_csum_tcpudp_ipv4(iph, check_data, len-iphlen, check);
else
nm_csum_tcpudp_ipv6(ip6h, check_data, len-14-iphlen, check);
nm_os_csum_tcpudp_ipv6(ip6h, check_data, len-iphlen, check);
ND("TCP/UDP csum %x", be16toh(*check));
}
static int
vnet_hdr_is_bad(struct nm_vnet_hdr *vh)
{
uint8_t gso_type = vh->gso_type & ~VIRTIO_NET_HDR_GSO_ECN;
return (
(gso_type != VIRTIO_NET_HDR_GSO_NONE &&
gso_type != VIRTIO_NET_HDR_GSO_TCPV4 &&
gso_type != VIRTIO_NET_HDR_GSO_UDP &&
gso_type != VIRTIO_NET_HDR_GSO_TCPV6)
||
(vh->flags & ~(VIRTIO_NET_HDR_F_NEEDS_CSUM
| VIRTIO_NET_HDR_F_DATA_VALID))
);
}
/* The VALE mismatch datapath implementation. */
void bdg_mismatch_datapath(struct netmap_vp_adapter *na,
struct netmap_vp_adapter *dst_na,
struct nm_bdg_fwd *ft_p, struct netmap_ring *ring,
u_int *j, u_int lim, u_int *howmany)
void
bdg_mismatch_datapath(struct netmap_vp_adapter *na,
struct netmap_vp_adapter *dst_na,
const struct nm_bdg_fwd *ft_p,
struct netmap_ring *dst_ring,
u_int *j, u_int lim, u_int *howmany)
{
struct netmap_slot *slot = NULL;
struct netmap_slot *dst_slot = NULL;
struct nm_vnet_hdr *vh = NULL;
/* Number of source slots to process. */
u_int frags = ft_p->ft_frags;
struct nm_bdg_fwd *ft_end = ft_p + frags;
const struct nm_bdg_fwd *ft_end = ft_p + ft_p->ft_frags;
/* Source and destination pointers. */
uint8_t *dst, *src;
size_t src_len, dst_len;
/* Indices and counters for the destination ring. */
u_int j_start = *j;
u_int j_cur = j_start;
u_int dst_slots = 0;
/* If the source port uses the offloadings, while destination doesn't,
* we grab the source virtio-net header and do the offloadings here.
*/
if (na->virt_hdr_len && !dst_na->virt_hdr_len) {
vh = (struct nm_vnet_hdr *)ft_p->ft_buf;
if (unlikely(ft_p == ft_end)) {
RD(3, "No source slots to process");
return;
}
/* Init source and dest pointers. */
src = ft_p->ft_buf;
src_len = ft_p->ft_len;
slot = &ring->slot[*j];
dst = NMB(&dst_na->up, slot);
dst_slot = &dst_ring->slot[j_cur];
dst = NMB(&dst_na->up, dst_slot);
dst_len = src_len;
/* If the source port uses the offloadings, while destination doesn't,
* we grab the source virtio-net header and do the offloadings here.
*/
if (na->up.virt_hdr_len && !dst_na->up.virt_hdr_len) {
vh = (struct nm_vnet_hdr *)src;
/* Initial sanity check on the source virtio-net header. If
* something seems wrong, just drop the packet. */
if (src_len < na->up.virt_hdr_len) {
RD(3, "Short src vnet header, dropping");
return;
}
if (vnet_hdr_is_bad(vh)) {
RD(3, "Bad src vnet header, dropping");
return;
}
}
/* We are processing the first input slot and there is a mismatch
* between source and destination virt_hdr_len (SHL and DHL).
* When the a client is using virtio-net headers, the header length
@ -185,14 +218,14 @@ void bdg_mismatch_datapath(struct netmap_vp_adapter *na,
* 12 | 0 | doesn't exist
* 12 | 10 | copied from the first 10 bytes of source header
*/
bzero(dst, dst_na->virt_hdr_len);
if (na->virt_hdr_len && dst_na->virt_hdr_len)
bzero(dst, dst_na->up.virt_hdr_len);
if (na->up.virt_hdr_len && dst_na->up.virt_hdr_len)
memcpy(dst, src, sizeof(struct nm_vnet_hdr));
/* Skip the virtio-net headers. */
src += na->virt_hdr_len;
src_len -= na->virt_hdr_len;
dst += dst_na->virt_hdr_len;
dst_len = dst_na->virt_hdr_len + src_len;
src += na->up.virt_hdr_len;
src_len -= na->up.virt_hdr_len;
dst += dst_na->up.virt_hdr_len;
dst_len = dst_na->up.virt_hdr_len + src_len;
/* Here it could be dst_len == 0 (which implies src_len == 0),
* so we avoid passing a zero length fragment.
@ -214,16 +247,27 @@ void bdg_mismatch_datapath(struct netmap_vp_adapter *na,
u_int gso_idx = 0;
/* Payload data bytes segmented so far (e.g. TCP data bytes). */
u_int segmented_bytes = 0;
/* Is this an IPv4 or IPv6 GSO packet? */
u_int ipv4 = 0;
/* Length of the IP header (20 if IPv4, 40 if IPv6). */
u_int iphlen = 0;
/* Length of the Ethernet header (18 if 802.1q, otherwise 14). */
u_int ethhlen = 14;
/* Is this a TCP or an UDP GSO packet? */
u_int tcp = ((vh->gso_type & ~VIRTIO_NET_HDR_GSO_ECN)
== VIRTIO_NET_HDR_GSO_UDP) ? 0 : 1;
/* Segment the GSO packet contained into the input slots (frags). */
while (ft_p != ft_end) {
for (;;) {
size_t copy;
if (dst_slots >= *howmany) {
/* We still have work to do, but we've run out of
* dst slots, so we have to drop the packet. */
RD(3, "Not enough slots, dropping GSO packet");
return;
}
/* Grab the GSO header if we don't have it. */
if (!gso_hdr) {
uint16_t ethertype;
@ -231,28 +275,75 @@ void bdg_mismatch_datapath(struct netmap_vp_adapter *na,
gso_hdr = src;
/* Look at the 'Ethertype' field to see if this packet
* is IPv4 or IPv6.
*/
ethertype = be16toh(*((uint16_t *)(gso_hdr + 12)));
if (ethertype == 0x0800)
iphlen = 20;
else /* if (ethertype == 0x86DD) */
iphlen = 40;
* is IPv4 or IPv6, taking into account VLAN
* encapsulation. */
for (;;) {
if (src_len < ethhlen) {
RD(3, "Short GSO fragment [eth], dropping");
return;
}
ethertype = be16toh(*((uint16_t *)
(gso_hdr + ethhlen - 2)));
if (ethertype != 0x8100) /* not 802.1q */
break;
ethhlen += 4;
}
switch (ethertype) {
case 0x0800: /* IPv4 */
{
struct nm_iphdr *iph = (struct nm_iphdr *)
(gso_hdr + ethhlen);
if (src_len < ethhlen + 20) {
RD(3, "Short GSO fragment "
"[IPv4], dropping");
return;
}
ipv4 = 1;
iphlen = 4 * (iph->version_ihl & 0x0F);
break;
}
case 0x86DD: /* IPv6 */
ipv4 = 0;
iphlen = 40;
break;
default:
RD(3, "Unsupported ethertype, "
"dropping GSO packet");
return;
}
ND(3, "type=%04x", ethertype);
if (src_len < ethhlen + iphlen) {
RD(3, "Short GSO fragment [IP], dropping");
return;
}
/* Compute gso_hdr_len. For TCP we need to read the
* content of the 'Data Offset' field.
*/
if (tcp) {
struct nm_tcphdr *tcph =
(struct nm_tcphdr *)&gso_hdr[14+iphlen];
struct nm_tcphdr *tcph = (struct nm_tcphdr *)
(gso_hdr + ethhlen + iphlen);
gso_hdr_len = 14 + iphlen + 4*(tcph->doff >> 4);
} else
gso_hdr_len = 14 + iphlen + 8; /* UDP */
if (src_len < ethhlen + iphlen + 20) {
RD(3, "Short GSO fragment "
"[TCP], dropping");
return;
}
gso_hdr_len = ethhlen + iphlen +
4 * (tcph->doff >> 4);
} else {
gso_hdr_len = ethhlen + iphlen + 8; /* UDP */
}
if (src_len < gso_hdr_len) {
RD(3, "Short GSO fragment [TCP/UDP], dropping");
return;
}
ND(3, "gso_hdr_len %u gso_mtu %d", gso_hdr_len,
dst_na->mfs);
dst_na->mfs);
/* Advance source pointers. */
src += gso_hdr_len;
@ -263,7 +354,6 @@ void bdg_mismatch_datapath(struct netmap_vp_adapter *na,
break;
src = ft_p->ft_buf;
src_len = ft_p->ft_len;
continue;
}
}
@ -289,25 +379,24 @@ void bdg_mismatch_datapath(struct netmap_vp_adapter *na,
/* After raw segmentation, we must fix some header
* fields and compute checksums, in a protocol dependent
* way. */
gso_fix_segment(dst, gso_bytes, gso_idx,
segmented_bytes,
src_len == 0 && ft_p + 1 == ft_end,
tcp, iphlen);
gso_fix_segment(dst + ethhlen, gso_bytes - ethhlen,
ipv4, iphlen, tcp,
gso_idx, segmented_bytes,
src_len == 0 && ft_p + 1 == ft_end);
ND("frame %u completed with %d bytes", gso_idx, (int)gso_bytes);
slot->len = gso_bytes;
slot->flags = 0;
segmented_bytes += gso_bytes - gso_hdr_len;
dst_slot->len = gso_bytes;
dst_slot->flags = 0;
dst_slots++;
/* Next destination slot. */
*j = nm_next(*j, lim);
slot = &ring->slot[*j];
dst = NMB(&dst_na->up, slot);
segmented_bytes += gso_bytes - gso_hdr_len;
gso_bytes = 0;
gso_idx++;
/* Next destination slot. */
j_cur = nm_next(j_cur, lim);
dst_slot = &dst_ring->slot[j_cur];
dst = NMB(&dst_na->up, dst_slot);
}
/* Next input slot. */
@ -342,10 +431,10 @@ void bdg_mismatch_datapath(struct netmap_vp_adapter *na,
/* Init/update the packet checksum if needed. */
if (vh && (vh->flags & VIRTIO_NET_HDR_F_NEEDS_CSUM)) {
if (!dst_slots)
csum = nm_csum_raw(src + vh->csum_start,
csum = nm_os_csum_raw(src + vh->csum_start,
src_len - vh->csum_start, 0);
else
csum = nm_csum_raw(src, src_len, csum);
csum = nm_os_csum_raw(src, src_len, csum);
}
/* Round to a multiple of 64 */
@ -359,44 +448,43 @@ void bdg_mismatch_datapath(struct netmap_vp_adapter *na,
} else {
memcpy(dst, src, (int)src_len);
}
slot->len = dst_len;
dst_slot->len = dst_len;
dst_slots++;
/* Next destination slot. */
*j = nm_next(*j, lim);
slot = &ring->slot[*j];
dst = NMB(&dst_na->up, slot);
j_cur = nm_next(j_cur, lim);
dst_slot = &dst_ring->slot[j_cur];
dst = NMB(&dst_na->up, dst_slot);
/* Next source slot. */
ft_p++;
src = ft_p->ft_buf;
dst_len = src_len = ft_p->ft_len;
}
/* Finalize (fold) the checksum if needed. */
if (check && vh && (vh->flags & VIRTIO_NET_HDR_F_NEEDS_CSUM)) {
*check = nm_csum_fold(csum);
*check = nm_os_csum_fold(csum);
}
ND(3, "using %u dst_slots", dst_slots);
/* A second pass on the desitations slots to set the slot flags,
/* A second pass on the destination slots to set the slot flags,
* using the right number of destination slots.
*/
while (j_start != *j) {
slot = &ring->slot[j_start];
slot->flags = (dst_slots << 8)| NS_MOREFRAG;
while (j_start != j_cur) {
dst_slot = &dst_ring->slot[j_start];
dst_slot->flags = (dst_slots << 8)| NS_MOREFRAG;
j_start = nm_next(j_start, lim);
}
/* Clear NS_MOREFRAG flag on last entry. */
slot->flags = (dst_slots << 8);
dst_slot->flags = (dst_slots << 8);
}
/* Update howmany. */
/* Update howmany and j. This is to commit the use of
* those slots in the destination ring. */
if (unlikely(dst_slots > *howmany)) {
dst_slots = *howmany;
D("Slot allocation error: Should never happen");
D("Slot allocation error: This is a bug");
}
*j = j_cur;
*howmany -= dst_slots;
}

View File

@ -1,5 +1,6 @@
/*
* Copyright (C) 2014 Giuseppe Lettieri. All rights reserved.
* Copyright (C) 2014-2016 Giuseppe Lettieri
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
@ -54,6 +55,9 @@
#warning OSX support is only partial
#include "osx_glue.h"
#elif defined(_WIN32)
#include "win_glue.h"
#else
#error Unsupported platform
@ -72,9 +76,11 @@
#define NM_PIPE_MAXSLOTS 4096
int netmap_default_pipes = 0; /* ignored, kept for compatibility */
static int netmap_default_pipes = 0; /* ignored, kept for compatibility */
SYSBEGIN(vars_pipes);
SYSCTL_DECL(_dev_netmap);
SYSCTL_INT(_dev_netmap, OID_AUTO, default_pipes, CTLFLAG_RW, &netmap_default_pipes, 0 , "");
SYSEND;
/* allocate the pipe array in the parent adapter */
static int
@ -91,7 +97,11 @@ nm_pipe_alloc(struct netmap_adapter *na, u_int npipes)
return EINVAL;
len = sizeof(struct netmap_pipe_adapter *) * npipes;
#ifndef _WIN32
npa = realloc(na->na_pipes, len, M_DEVBUF, M_NOWAIT | M_ZERO);
#else
npa = realloc(na->na_pipes, len, sizeof(struct netmap_pipe_adapter *)*na->na_max_pipes);
#endif
if (npa == NULL)
return ENOMEM;
@ -199,7 +209,7 @@ netmap_pipe_txsync(struct netmap_kring *txkring, int flags)
}
while (limit-- > 0) {
struct netmap_slot *rs = &rxkring->save_ring->slot[j];
struct netmap_slot *rs = &rxkring->ring->slot[j];
struct netmap_slot *ts = &txkring->ring->slot[k];
struct netmap_slot tmp;
@ -295,7 +305,7 @@ netmap_pipe_rxsync(struct netmap_kring *rxkring, int flags)
* usr1 --> e1 --> e2
*
* and we are e2. e1 is certainly registered and our
* krings already exist, but they may be hidden.
* krings already exist. Nothing to do.
*/
static int
netmap_pipe_krings_create(struct netmap_adapter *na)
@ -310,65 +320,28 @@ netmap_pipe_krings_create(struct netmap_adapter *na)
int i;
/* case 1) above */
ND("%p: case 1, create everything", na);
D("%p: case 1, create both ends", na);
error = netmap_krings_create(na, 0);
if (error)
goto err;
/* we also create all the rings, since we need to
* update the save_ring pointers.
* netmap_mem_rings_create (called by our caller)
* will not create the rings again
*/
error = netmap_mem_rings_create(na);
/* create the krings of the other end */
error = netmap_krings_create(ona, 0);
if (error)
goto del_krings1;
/* update our hidden ring pointers */
for_rx_tx(t) {
for (i = 0; i < nma_get_nrings(na, t) + 1; i++)
NMR(na, t)[i].save_ring = NMR(na, t)[i].ring;
}
/* now, create krings and rings of the other end */
error = netmap_krings_create(ona, 0);
if (error)
goto del_rings1;
error = netmap_mem_rings_create(ona);
if (error)
goto del_krings2;
for_rx_tx(t) {
for (i = 0; i < nma_get_nrings(ona, t) + 1; i++)
NMR(ona, t)[i].save_ring = NMR(ona, t)[i].ring;
}
/* cross link the krings */
for_rx_tx(t) {
enum txrx r= nm_txrx_swap(t); /* swap NR_TX <-> NR_RX */
enum txrx r = nm_txrx_swap(t); /* swap NR_TX <-> NR_RX */
for (i = 0; i < nma_get_nrings(na, t); i++) {
NMR(na, t)[i].pipe = NMR(&pna->peer->up, r) + i;
NMR(&pna->peer->up, r)[i].pipe = NMR(na, t) + i;
}
}
} else {
int i;
/* case 2) above */
/* recover the hidden rings */
ND("%p: case 2, hidden rings", na);
for_rx_tx(t) {
for (i = 0; i < nma_get_nrings(na, t) + 1; i++)
NMR(na, t)[i].ring = NMR(na, t)[i].save_ring;
}
}
return 0;
del_krings2:
netmap_krings_delete(ona);
del_rings1:
netmap_mem_rings_delete(na);
del_krings1:
netmap_krings_delete(na);
err:
@ -383,7 +356,8 @@ netmap_pipe_krings_create(struct netmap_adapter *na)
*
* usr1 --> e1 --> e2
*
* and we are e1. Nothing special to do.
* and we are e1. Create the needed rings of the
* other end.
*
* 1.b) state is
*
@ -412,14 +386,65 @@ netmap_pipe_reg(struct netmap_adapter *na, int onoff)
{
struct netmap_pipe_adapter *pna =
(struct netmap_pipe_adapter *)na;
struct netmap_adapter *ona = &pna->peer->up;
int i, error = 0;
enum txrx t;
ND("%p: onoff %d", na, onoff);
if (onoff) {
na->na_flags |= NAF_NETMAP_ON;
for_rx_tx(t) {
for (i = 0; i < nma_get_nrings(na, t) + 1; i++) {
struct netmap_kring *kring = &NMR(na, t)[i];
if (nm_kring_pending_on(kring)) {
/* mark the partner ring as needed */
kring->pipe->nr_kflags |= NKR_NEEDRING;
}
}
}
/* create all missing needed rings on the other end */
error = netmap_mem_rings_create(ona);
if (error)
return error;
/* In case of no error we put our rings in netmap mode */
for_rx_tx(t) {
for (i = 0; i < nma_get_nrings(na, t) + 1; i++) {
struct netmap_kring *kring = &NMR(na, t)[i];
if (nm_kring_pending_on(kring)) {
kring->nr_mode = NKR_NETMAP_ON;
}
}
}
if (na->active_fds == 0)
na->na_flags |= NAF_NETMAP_ON;
} else {
na->na_flags &= ~NAF_NETMAP_ON;
if (na->active_fds == 0)
na->na_flags &= ~NAF_NETMAP_ON;
for_rx_tx(t) {
for (i = 0; i < nma_get_nrings(na, t) + 1; i++) {
struct netmap_kring *kring = &NMR(na, t)[i];
if (nm_kring_pending_off(kring)) {
kring->nr_mode = NKR_NETMAP_OFF;
/* mark the peer ring as no longer needed by us
* (it may still be kept if sombody else is using it)
*/
kring->pipe->nr_kflags &= ~NKR_NEEDRING;
}
}
}
/* delete all the peer rings that are no longer needed */
netmap_mem_rings_delete(ona);
}
if (na->active_fds) {
D("active_fds %d", na->active_fds);
return 0;
}
if (pna->peer_ref) {
ND("%p: case 1.a or 2.a, nothing to do", na);
return 0;
@ -429,18 +454,11 @@ netmap_pipe_reg(struct netmap_adapter *na, int onoff)
pna->peer->peer_ref = 0;
netmap_adapter_put(na);
} else {
int i;
ND("%p: case 2.b, grab peer", na);
netmap_adapter_get(na);
pna->peer->peer_ref = 1;
/* hide our rings from netmap_mem_rings_delete */
for_rx_tx(t) {
for (i = 0; i < nma_get_nrings(na, t) + 1; i++) {
NMR(na, t)[i].ring = NULL;
}
}
}
return 0;
return error;
}
/* netmap_pipe_krings_delete.
@ -470,8 +488,6 @@ netmap_pipe_krings_delete(struct netmap_adapter *na)
struct netmap_pipe_adapter *pna =
(struct netmap_pipe_adapter *)na;
struct netmap_adapter *ona; /* na of the other end */
int i;
enum txrx t;
if (!pna->peer_ref) {
ND("%p: case 2, kept alive by peer", na);
@ -480,18 +496,12 @@ netmap_pipe_krings_delete(struct netmap_adapter *na)
/* case 1) above */
ND("%p: case 1, deleting everyhing", na);
netmap_krings_delete(na); /* also zeroes tx_rings etc. */
/* restore the ring to be deleted on the peer */
ona = &pna->peer->up;
if (ona->tx_rings == NULL) {
/* already deleted, we must be on an
* cleanup-after-error path */
return;
}
for_rx_tx(t) {
for (i = 0; i < nma_get_nrings(ona, t) + 1; i++)
NMR(ona, t)[i].ring = NMR(ona, t)[i].save_ring;
}
netmap_mem_rings_delete(ona);
netmap_krings_delete(ona);
}
@ -519,6 +529,7 @@ netmap_get_pipe_na(struct nmreq *nmr, struct netmap_adapter **na, int create)
struct nmreq pnmr;
struct netmap_adapter *pna; /* parent adapter */
struct netmap_pipe_adapter *mna, *sna, *req;
struct ifnet *ifp = NULL;
u_int pipe_id;
int role = nmr->nr_flags & NR_REG_MASK;
int error;
@ -536,7 +547,7 @@ netmap_get_pipe_na(struct nmreq *nmr, struct netmap_adapter **na, int create)
memcpy(&pnmr.nr_name, nmr->nr_name, IFNAMSIZ);
/* pass to parent the requested number of pipes */
pnmr.nr_arg1 = nmr->nr_arg1;
error = netmap_get_na(&pnmr, &pna, create);
error = netmap_get_na(&pnmr, &pna, &ifp, create);
if (error) {
ND("parent lookup failed: %d", error);
return error;
@ -652,16 +663,15 @@ netmap_get_pipe_na(struct nmreq *nmr, struct netmap_adapter **na, int create)
*na = &req->up;
netmap_adapter_get(*na);
/* write the configuration back */
nmr->nr_tx_rings = req->up.num_tx_rings;
nmr->nr_rx_rings = req->up.num_rx_rings;
nmr->nr_tx_slots = req->up.num_tx_desc;
nmr->nr_rx_slots = req->up.num_rx_desc;
/* keep the reference to the parent.
* It will be released by the req destructor
*/
/* drop the ifp reference, if any */
if (ifp) {
if_rele(ifp);
}
return 0;
free_sna:
@ -671,7 +681,7 @@ netmap_get_pipe_na(struct nmreq *nmr, struct netmap_adapter **na, int create)
free_mna:
free(mna, M_DEVBUF);
put_out:
netmap_adapter_put(pna);
netmap_unget_na(pna, ifp);
return error;
}

File diff suppressed because it is too large Load Diff

View File

@ -3,11 +3,14 @@
# Compile netmap as a module, useful if you want a netmap bridge
# or loadable drivers.
.include <bsd.own.mk> # FreeBSD 10 and earlier
# .include "${SYSDIR}/conf/kern.opts.mk"
.PATH: ${.CURDIR}/../../dev/netmap
.PATH.h: ${.CURDIR}/../../net
CFLAGS += -I${.CURDIR}/../../
CFLAGS += -I${.CURDIR}/../../ -D INET
KMOD = netmap
SRCS = device_if.h bus_if.h opt_netmap.h
SRCS = device_if.h bus_if.h pci_if.h opt_netmap.h
SRCS += netmap.c netmap.h netmap_kern.h
SRCS += netmap_mem2.c netmap_mem2.h
SRCS += netmap_generic.c
@ -17,5 +20,8 @@ SRCS += netmap_freebsd.c
SRCS += netmap_offloadings.c
SRCS += netmap_pipe.c
SRCS += netmap_monitor.c
SRCS += netmap_pt.c
SRCS += if_ptnet.c
SRCS += opt_inet.h opt_inet6.h
.include <bsd.kmod.mk>

View File

@ -137,6 +137,26 @@
* netmap:foo-k the k-th NIC ring pair
* netmap:foo{k PIPE ring pair k, master side
* netmap:foo}k PIPE ring pair k, slave side
*
* Some notes about host rings:
*
* + The RX host ring is used to store those packets that the host network
* stack is trying to transmit through a NIC queue, but only if that queue
* is currently in netmap mode. Netmap will not intercept host stack mbufs
* designated to NIC queues that are not in netmap mode. As a consequence,
* registering a netmap port with netmap:foo^ is not enough to intercept
* mbufs in the RX host ring; the netmap port should be registered with
* netmap:foo*, or another registration should be done to open at least a
* NIC TX queue in netmap mode.
*
* + Netmap is not currently able to deal with intercepted trasmit mbufs which
* require offloadings like TSO, UFO, checksumming offloadings, etc. It is
* responsibility of the user to disable those offloadings (e.g. using
* ifconfig on FreeBSD or ethtool -K on Linux) for an interface that is being
* used in netmap mode. If the offloadings are not disabled, GSO and/or
* unchecksummed packets may be dropped immediately or end up in the host RX
* ring, and will be dropped as soon as the packet reaches another netmap
* adapter.
*/
/*
@ -277,7 +297,11 @@ struct netmap_ring {
struct timeval ts; /* (k) time of last *sync() */
/* opaque room for a mutex or similar object */
uint8_t sem[128] __attribute__((__aligned__(NM_CACHE_ALIGN)));
#if !defined(_WIN32) || defined(__CYGWIN__)
uint8_t __attribute__((__aligned__(NM_CACHE_ALIGN))) sem[128];
#else
uint8_t __declspec(align(NM_CACHE_ALIGN)) sem[128];
#endif
/* the slots follow. This struct has variable size */
struct netmap_slot slot[0]; /* array of slots. */
@ -496,6 +520,11 @@ struct nmreq {
#define NETMAP_BDG_OFFSET NETMAP_BDG_VNET_HDR /* deprecated alias */
#define NETMAP_BDG_NEWIF 6 /* create a virtual port */
#define NETMAP_BDG_DELIF 7 /* destroy a virtual port */
#define NETMAP_PT_HOST_CREATE 8 /* create ptnetmap kthreads */
#define NETMAP_PT_HOST_DELETE 9 /* delete ptnetmap kthreads */
#define NETMAP_BDG_POLLING_ON 10 /* delete polling kthread */
#define NETMAP_BDG_POLLING_OFF 11 /* delete polling kthread */
#define NETMAP_VNET_HDR_GET 12 /* get the port virtio-net-hdr length */
uint16_t nr_arg1; /* reserve extra rings in NIOCREGIF */
#define NETMAP_BDG_HOST 1 /* attach the host stack on ATTACH */
@ -521,7 +550,61 @@ enum { NR_REG_DEFAULT = 0, /* backward compat, should not be used. */
#define NR_ZCOPY_MON 0x400
/* request exclusive access to the selected rings */
#define NR_EXCLUSIVE 0x800
/* request ptnetmap host support */
#define NR_PASSTHROUGH_HOST NR_PTNETMAP_HOST /* deprecated */
#define NR_PTNETMAP_HOST 0x1000
#define NR_RX_RINGS_ONLY 0x2000
#define NR_TX_RINGS_ONLY 0x4000
/* Applications set this flag if they are able to deal with virtio-net headers,
* that is send/receive frames that start with a virtio-net header.
* If not set, NIOCREGIF will fail with netmap ports that require applications
* to use those headers. If the flag is set, the application can use the
* NETMAP_VNET_HDR_GET command to figure out the header length. */
#define NR_ACCEPT_VNET_HDR 0x8000
#define NM_BDG_NAME "vale" /* prefix for bridge port name */
/*
* Windows does not have _IOWR(). _IO(), _IOW() and _IOR() are defined
* in ws2def.h but not sure if they are in the form we need.
* XXX so we redefine them
* in a convenient way to use for DeviceIoControl signatures
*/
#ifdef _WIN32
#undef _IO // ws2def.h
#define _WIN_NM_IOCTL_TYPE 40000
#define _IO(_c, _n) CTL_CODE(_WIN_NM_IOCTL_TYPE, ((_n) + 0x800) , \
METHOD_BUFFERED, FILE_ANY_ACCESS )
#define _IO_direct(_c, _n) CTL_CODE(_WIN_NM_IOCTL_TYPE, ((_n) + 0x800) , \
METHOD_OUT_DIRECT, FILE_ANY_ACCESS )
#define _IOWR(_c, _n, _s) _IO(_c, _n)
/* We havesome internal sysctl in addition to the externally visible ones */
#define NETMAP_MMAP _IO_direct('i', 160) // note METHOD_OUT_DIRECT
#define NETMAP_POLL _IO('i', 162)
/* and also two setsockopt for sysctl emulation */
#define NETMAP_SETSOCKOPT _IO('i', 140)
#define NETMAP_GETSOCKOPT _IO('i', 141)
//These linknames are for the Netmap Core Driver
#define NETMAP_NT_DEVICE_NAME L"\\Device\\NETMAP"
#define NETMAP_DOS_DEVICE_NAME L"\\DosDevices\\netmap"
//Definition of a structure used to pass a virtual address within an IOCTL
typedef struct _MEMORY_ENTRY {
PVOID pUsermodeVirtualAddress;
} MEMORY_ENTRY, *PMEMORY_ENTRY;
typedef struct _POLL_REQUEST_DATA {
int events;
int timeout;
int revents;
} POLL_REQUEST_DATA;
#endif /* _WIN32 */
/*
* FreeBSD uses the size value embedded in the _IOWR to determine
@ -561,4 +644,28 @@ struct nm_ifreq {
char data[NM_IFRDATA_LEN];
};
/*
* netmap kernel thread configuration
*/
/* bhyve/vmm.ko MSIX parameters for IOCTL */
struct ptn_vmm_ioctl_msix {
uint64_t msg;
uint64_t addr;
};
/* IOCTL parameters */
struct nm_kth_ioctl {
u_long com;
/* TODO: use union */
union {
struct ptn_vmm_ioctl_msix msix;
} data;
};
/* Configuration of a ptnetmap ring */
struct ptnet_ring_cfg {
uint64_t ioeventfd; /* eventfd in linux, tsleep() parameter in FreeBSD */
uint64_t irqfd; /* eventfd in linux, ioctl fd in FreeBSD */
struct nm_kth_ioctl ioctl; /* ioctl parameter to send irq (only used in bhyve/FreeBSD) */
};
#endif /* _NET_NETMAP_H_ */

View File

@ -1,5 +1,6 @@
/*
* Copyright (C) 2011-2014 Universita` di Pisa. All rights reserved.
* Copyright (C) 2011-2016 Universita` di Pisa
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
@ -65,9 +66,31 @@
#ifndef _NET_NETMAP_USER_H_
#define _NET_NETMAP_USER_H_
#define NETMAP_DEVICE_NAME "/dev/netmap"
#ifdef __CYGWIN__
/*
* we can compile userspace apps with either cygwin or msvc,
* and we use _WIN32 to identify windows specific code
*/
#ifndef _WIN32
#define _WIN32
#endif /* _WIN32 */
#endif /* __CYGWIN__ */
#ifdef _WIN32
#undef NETMAP_DEVICE_NAME
#define NETMAP_DEVICE_NAME "/proc/sys/DosDevices/Global/netmap"
#include <windows.h>
#include <WinDef.h>
#include <sys/cygwin.h>
#endif /* _WIN32 */
#include <stdint.h>
#include <sys/socket.h> /* apple needs sockaddr */
#include <net/if.h> /* IFNAMSIZ */
#include <ctype.h>
#ifndef likely
#define likely(x) __builtin_expect(!!(x), 1)
@ -172,17 +195,23 @@ nm_ring_space(struct netmap_ring *ring)
} while (0)
#endif
struct nm_pkthdr { /* same as pcap_pkthdr */
struct nm_pkthdr { /* first part is the same as pcap_pkthdr */
struct timeval ts;
uint32_t caplen;
uint32_t len;
uint64_t flags; /* NM_MORE_PKTS etc */
#define NM_MORE_PKTS 1
struct nm_desc *d;
struct netmap_slot *slot;
uint8_t *buf;
};
struct nm_stat { /* same as pcap_stat */
u_int ps_recv;
u_int ps_drop;
u_int ps_ifdrop;
#ifdef WIN32
#ifdef WIN32 /* XXX or _WIN32 ? */
u_int bs_capt;
#endif /* WIN32 */
};
@ -284,12 +313,14 @@ typedef void (*nm_cb_t)(u_char *, const struct nm_pkthdr *, const u_char *d);
* -NN bind individual NIC ring pair
* {NN bind master side of pipe NN
* }NN bind slave side of pipe NN
* a suffix starting with + and the following flags,
* a suffix starting with / and the following flags,
* in any order:
* x exclusive access
* z zero copy monitor
* t monitor tx side
* r monitor rx side
* R bind only RX ring(s)
* T bind only TX ring(s)
*
* req provides the initial values of nmreq before parsing ifname.
* Remember that the ifname parsing will override the ring
@ -328,6 +359,13 @@ enum {
static int nm_close(struct nm_desc *);
/*
* nm_mmap() do mmap or inherit from parent if the nr_arg2
* (memory block) matches.
*/
static int nm_mmap(struct nm_desc *, const struct nm_desc *);
/*
* nm_inject() is the same as pcap_inject()
* nm_dispatch() is the same as pcap_dispatch()
@ -338,13 +376,247 @@ static int nm_inject(struct nm_desc *, const void *, size_t);
static int nm_dispatch(struct nm_desc *, int, nm_cb_t, u_char *);
static u_char *nm_nextpkt(struct nm_desc *, struct nm_pkthdr *);
#ifdef _WIN32
intptr_t _get_osfhandle(int); /* defined in io.h in windows */
/*
* In windows we do not have yet native poll support, so we keep track
* of file descriptors associated to netmap ports to emulate poll on
* them and fall back on regular poll on other file descriptors.
*/
struct win_netmap_fd_list {
struct win_netmap_fd_list *next;
int win_netmap_fd;
HANDLE win_netmap_handle;
};
/*
* list head containing all the netmap opened fd and their
* windows HANDLE counterparts
*/
static struct win_netmap_fd_list *win_netmap_fd_list_head;
static void
win_insert_fd_record(int fd)
{
struct win_netmap_fd_list *curr;
for (curr = win_netmap_fd_list_head; curr; curr = curr->next) {
if (fd == curr->win_netmap_fd) {
return;
}
}
curr = calloc(1, sizeof(*curr));
curr->next = win_netmap_fd_list_head;
curr->win_netmap_fd = fd;
curr->win_netmap_handle = IntToPtr(_get_osfhandle(fd));
win_netmap_fd_list_head = curr;
}
void
win_remove_fd_record(int fd)
{
struct win_netmap_fd_list *curr = win_netmap_fd_list_head;
struct win_netmap_fd_list *prev = NULL;
for (; curr ; prev = curr, curr = curr->next) {
if (fd != curr->win_netmap_fd)
continue;
/* found the entry */
if (prev == NULL) { /* we are freeing the first entry */
win_netmap_fd_list_head = curr->next;
} else {
prev->next = curr->next;
}
free(curr);
break;
}
}
HANDLE
win_get_netmap_handle(int fd)
{
struct win_netmap_fd_list *curr;
for (curr = win_netmap_fd_list_head; curr; curr = curr->next) {
if (fd == curr->win_netmap_fd) {
return curr->win_netmap_handle;
}
}
return NULL;
}
/*
* we need to wrap ioctl and mmap, at least for the netmap file descriptors
*/
/*
* use this function only from netmap_user.h internal functions
* same as ioctl, returns 0 on success and -1 on error
*/
static int
win_nm_ioctl_internal(HANDLE h, int32_t ctlCode, void *arg)
{
DWORD bReturn = 0, szIn, szOut;
BOOL ioctlReturnStatus;
void *inParam = arg, *outParam = arg;
switch (ctlCode) {
case NETMAP_POLL:
szIn = sizeof(POLL_REQUEST_DATA);
szOut = sizeof(POLL_REQUEST_DATA);
break;
case NETMAP_MMAP:
szIn = 0;
szOut = sizeof(void*);
inParam = NULL; /* nothing on input */
break;
case NIOCTXSYNC:
case NIOCRXSYNC:
szIn = 0;
szOut = 0;
break;
case NIOCREGIF:
szIn = sizeof(struct nmreq);
szOut = sizeof(struct nmreq);
break;
case NIOCCONFIG:
D("unsupported NIOCCONFIG!");
return -1;
default: /* a regular ioctl */
D("invalid ioctl %x on netmap fd", ctlCode);
return -1;
}
ioctlReturnStatus = DeviceIoControl(h,
ctlCode, inParam, szIn,
outParam, szOut,
&bReturn, NULL);
// XXX note windows returns 0 on error or async call, 1 on success
// we could call GetLastError() to figure out what happened
return ioctlReturnStatus ? 0 : -1;
}
/*
* this function is what must be called from user-space programs
* same as ioctl, returns 0 on success and -1 on error
*/
static int
win_nm_ioctl(int fd, int32_t ctlCode, void *arg)
{
HANDLE h = win_get_netmap_handle(fd);
if (h == NULL) {
return ioctl(fd, ctlCode, arg);
} else {
return win_nm_ioctl_internal(h, ctlCode, arg);
}
}
#define ioctl win_nm_ioctl /* from now on, within this file ... */
/*
* We cannot use the native mmap on windows
* The only parameter used is "fd", the other ones are just declared to
* make this signature comparable to the FreeBSD/Linux one
*/
static void *
win32_mmap_emulated(void *addr, size_t length, int prot, int flags, int fd, int32_t offset)
{
HANDLE h = win_get_netmap_handle(fd);
if (h == NULL) {
return mmap(addr, length, prot, flags, fd, offset);
} else {
MEMORY_ENTRY ret;
return win_nm_ioctl_internal(h, NETMAP_MMAP, &ret) ?
NULL : ret.pUsermodeVirtualAddress;
}
}
#define mmap win32_mmap_emulated
#include <sys/poll.h> /* XXX needed to use the structure pollfd */
static int
win_nm_poll(struct pollfd *fds, int nfds, int timeout)
{
HANDLE h;
if (nfds != 1 || fds == NULL || (h = win_get_netmap_handle(fds->fd)) == NULL) {;
return poll(fds, nfds, timeout);
} else {
POLL_REQUEST_DATA prd;
prd.timeout = timeout;
prd.events = fds->events;
win_nm_ioctl_internal(h, NETMAP_POLL, &prd);
if ((prd.revents == POLLERR) || (prd.revents == STATUS_TIMEOUT)) {
return -1;
}
return 1;
}
}
#define poll win_nm_poll
static int
win_nm_open(char* pathname, int flags)
{
if (strcmp(pathname, NETMAP_DEVICE_NAME) == 0) {
int fd = open(NETMAP_DEVICE_NAME, O_RDWR);
if (fd < 0) {
return -1;
}
win_insert_fd_record(fd);
return fd;
} else {
return open(pathname, flags);
}
}
#define open win_nm_open
static int
win_nm_close(int fd)
{
if (fd != -1) {
close(fd);
if (win_get_netmap_handle(fd) != NULL) {
win_remove_fd_record(fd);
}
}
return 0;
}
#define close win_nm_close
#endif /* _WIN32 */
static int
nm_is_identifier(const char *s, const char *e)
{
for (; s != e; s++) {
if (!isalnum(*s) && *s != '_') {
return 0;
}
}
return 1;
}
/*
* Try to open, return descriptor if successful, NULL otherwise.
* An invalid netmap name will return errno = 0;
* You can pass a pointer to a pre-filled nm_desc to add special
* parameters. Flags is used as follows
* NM_OPEN_NO_MMAP use the memory from arg, only
* NM_OPEN_NO_MMAP use the memory from arg, only XXX avoid mmap
* if the nr_arg2 (memory block) matches.
* NM_OPEN_ARG1 use req.nr_arg1 from arg
* NM_OPEN_ARG2 use req.nr_arg2 from arg
@ -359,20 +631,48 @@ nm_open(const char *ifname, const struct nmreq *req,
u_int namelen;
uint32_t nr_ringid = 0, nr_flags, nr_reg;
const char *port = NULL;
const char *vpname = NULL;
#define MAXERRMSG 80
char errmsg[MAXERRMSG] = "";
enum { P_START, P_RNGSFXOK, P_GETNUM, P_FLAGS, P_FLAGSOK } p_state;
int is_vale;
long num;
if (strncmp(ifname, "netmap:", 7) && strncmp(ifname, "vale", 4)) {
if (strncmp(ifname, "netmap:", 7) &&
strncmp(ifname, NM_BDG_NAME, strlen(NM_BDG_NAME))) {
errno = 0; /* name not recognised, not an error */
return NULL;
}
if (ifname[0] == 'n')
is_vale = (ifname[0] == 'v');
if (is_vale) {
port = index(ifname, ':');
if (port == NULL) {
snprintf(errmsg, MAXERRMSG,
"missing ':' in vale name");
goto fail;
}
if (!nm_is_identifier(ifname + 4, port)) {
snprintf(errmsg, MAXERRMSG, "invalid bridge name");
goto fail;
}
vpname = ++port;
} else {
ifname += 7;
port = ifname;
}
/* scan for a separator */
for (port = ifname; *port && !index("-*^{}/", *port); port++)
for (; *port && !index("-*^{}/", *port); port++)
;
if (is_vale && !nm_is_identifier(vpname, port)) {
snprintf(errmsg, MAXERRMSG, "invalid bridge port name");
goto fail;
}
namelen = port - ifname;
if (namelen >= sizeof(d->req.nr_name)) {
snprintf(errmsg, MAXERRMSG, "name too long");
@ -449,6 +749,12 @@ nm_open(const char *ifname, const struct nmreq *req,
case 'r':
nr_flags |= NR_MONITOR_RX;
break;
case 'R':
nr_flags |= NR_RX_RINGS_ONLY;
break;
case 'T':
nr_flags |= NR_TX_RINGS_ONLY;
break;
default:
snprintf(errmsg, MAXERRMSG, "unrecognized flag: '%c'", *port);
goto fail;
@ -462,6 +768,11 @@ nm_open(const char *ifname, const struct nmreq *req,
snprintf(errmsg, MAXERRMSG, "unexpected end of port name");
goto fail;
}
if ((nr_flags & NR_ZCOPY_MON) &&
!(nr_flags & (NR_MONITOR_TX|NR_MONITOR_RX))) {
snprintf(errmsg, MAXERRMSG, "'z' used but neither 'r', nor 't' found");
goto fail;
}
ND("flags: %s %s %s %s",
(nr_flags & NR_EXCLUSIVE) ? "EXCLUSIVE" : "",
(nr_flags & NR_ZCOPY_MON) ? "ZCOPY_MON" : "",
@ -474,7 +785,7 @@ nm_open(const char *ifname, const struct nmreq *req,
return NULL;
}
d->self = d; /* set this early so nm_close() works */
d->fd = open("/dev/netmap", O_RDWR);
d->fd = open(NETMAP_DEVICE_NAME, O_RDWR);
if (d->fd < 0) {
snprintf(errmsg, MAXERRMSG, "cannot open /dev/netmap: %s", strerror(errno));
goto fail;
@ -487,7 +798,7 @@ nm_open(const char *ifname, const struct nmreq *req,
/* these fields are overridden by ifname and flags processing */
d->req.nr_ringid |= nr_ringid;
d->req.nr_flags = nr_flags;
d->req.nr_flags |= nr_flags;
memcpy(d->req.nr_name, ifname, namelen);
d->req.nr_name[namelen] = '\0';
/* optionally import info from parent */
@ -529,31 +840,10 @@ nm_open(const char *ifname, const struct nmreq *req,
goto fail;
}
if (IS_NETMAP_DESC(parent) && parent->mem &&
parent->req.nr_arg2 == d->req.nr_arg2) {
/* do not mmap, inherit from parent */
d->memsize = parent->memsize;
d->mem = parent->mem;
} else {
/* XXX TODO: check if memsize is too large (or there is overflow) */
d->memsize = d->req.nr_memsize;
d->mem = mmap(0, d->memsize, PROT_WRITE | PROT_READ, MAP_SHARED,
d->fd, 0);
if (d->mem == MAP_FAILED) {
snprintf(errmsg, MAXERRMSG, "mmap failed: %s", strerror(errno));
goto fail;
}
d->done_mmap = 1;
}
{
struct netmap_if *nifp = NETMAP_IF(d->mem, d->req.nr_offset);
struct netmap_ring *r = NETMAP_RXRING(nifp, );
*(struct netmap_if **)(uintptr_t)&(d->nifp) = nifp;
*(struct netmap_ring **)(uintptr_t)&d->some_ring = r;
*(void **)(uintptr_t)&d->buf_start = NETMAP_BUF(r, 0);
*(void **)(uintptr_t)&d->buf_end =
(char *)d->mem + d->memsize;
/* if parent is defined, do nm_mmap() even if NM_OPEN_NO_MMAP is set */
if ((!(new_flags & NM_OPEN_NO_MMAP) || parent) && nm_mmap(d, parent)) {
snprintf(errmsg, MAXERRMSG, "mmap failed: %s", strerror(errno));
goto fail;
}
nr_reg = d->req.nr_flags & NR_REG_MASK;
@ -626,14 +916,54 @@ nm_close(struct nm_desc *d)
return EINVAL;
if (d->done_mmap && d->mem)
munmap(d->mem, d->memsize);
if (d->fd != -1)
if (d->fd != -1) {
close(d->fd);
}
bzero(d, sizeof(*d));
free(d);
return 0;
}
static int
nm_mmap(struct nm_desc *d, const struct nm_desc *parent)
{
//XXX TODO: check if mmap is already done
if (IS_NETMAP_DESC(parent) && parent->mem &&
parent->req.nr_arg2 == d->req.nr_arg2) {
/* do not mmap, inherit from parent */
D("do not mmap, inherit from parent");
d->memsize = parent->memsize;
d->mem = parent->mem;
} else {
/* XXX TODO: check if memsize is too large (or there is overflow) */
d->memsize = d->req.nr_memsize;
d->mem = mmap(0, d->memsize, PROT_WRITE | PROT_READ, MAP_SHARED,
d->fd, 0);
if (d->mem == MAP_FAILED) {
goto fail;
}
d->done_mmap = 1;
}
{
struct netmap_if *nifp = NETMAP_IF(d->mem, d->req.nr_offset);
struct netmap_ring *r = NETMAP_RXRING(nifp, );
*(struct netmap_if **)(uintptr_t)&(d->nifp) = nifp;
*(struct netmap_ring **)(uintptr_t)&d->some_ring = r;
*(void **)(uintptr_t)&d->buf_start = NETMAP_BUF(r, 0);
*(void **)(uintptr_t)&d->buf_end =
(char *)d->mem + d->memsize;
}
return 0;
fail:
return EINVAL;
}
/*
* Same prototype as pcap_inject(), only need to cast.
*/
@ -674,6 +1004,9 @@ nm_dispatch(struct nm_desc *d, int cnt, nm_cb_t cb, u_char *arg)
{
int n = d->last_rx_ring - d->first_rx_ring + 1;
int c, got = 0, ri = d->cur_rx_ring;
d->hdr.buf = NULL;
d->hdr.flags = NM_MORE_PKTS;
d->hdr.d = d;
if (cnt == 0)
cnt = -1;
@ -690,17 +1023,24 @@ nm_dispatch(struct nm_desc *d, int cnt, nm_cb_t cb, u_char *arg)
ri = d->first_rx_ring;
ring = NETMAP_RXRING(d->nifp, ri);
for ( ; !nm_ring_empty(ring) && cnt != got; got++) {
u_int i = ring->cur;
u_int idx = ring->slot[i].buf_idx;
u_char *buf = (u_char *)NETMAP_BUF(ring, idx);
u_int idx, i;
if (d->hdr.buf) { /* from previous round */
cb(arg, &d->hdr, d->hdr.buf);
}
i = ring->cur;
idx = ring->slot[i].buf_idx;
d->hdr.slot = &ring->slot[i];
d->hdr.buf = (u_char *)NETMAP_BUF(ring, idx);
// __builtin_prefetch(buf);
d->hdr.len = d->hdr.caplen = ring->slot[i].len;
d->hdr.ts = ring->ts;
cb(arg, &d->hdr, buf);
ring->head = ring->cur = nm_ring_next(ring, i);
}
}
if (d->hdr.buf) { /* from previous round */
d->hdr.flags = 0;
cb(arg, &d->hdr, d->hdr.buf);
}
d->cur_rx_ring = ri;
return got;
}

View File

@ -3,11 +3,12 @@
#
# For multiple programs using a single source file each,
# we can just define 'progs' and create custom targets.
PROGS = pkt-gen bridge vale-ctl
PROGS = pkt-gen nmreplay bridge vale-ctl
CLEANFILES = $(PROGS) *.o
MAN=
CFLAGS += -Werror -Wall # -nostdinc -I/usr/include -I../../../sys
CFLAGS += -Werror -Wall
CFLAGS += -nostdinc -I ../../../sys -I/usr/include
CFLAGS += -Wextra
LDFLAGS += -lpthread
@ -16,6 +17,7 @@ CFLAGS += -DNO_PCAP
.else
LDFLAGS += -lpcap
.endif
LDFLAGS += -lm # used by nmreplay
.include <bsd.prog.mk>
.include <bsd.lib.mk>
@ -28,5 +30,8 @@ pkt-gen: pkt-gen.o
bridge: bridge.o
$(CC) $(CFLAGS) -o bridge bridge.o
nmreplay: nmreplay.o
$(CC) $(CFLAGS) -o nmreplay nmreplay.o $(LDFLAGS)
vale-ctl: vale-ctl.o
$(CC) $(CFLAGS) -o vale-ctl vale-ctl.o

View File

@ -143,7 +143,7 @@ static void
usage(void)
{
fprintf(stderr,
"usage: bridge [-v] [-i ifa] [-i ifb] [-b burst] [-w wait_time] [iface]\n");
"usage: bridge [-v] [-i ifa] [-i ifb] [-b burst] [-w wait_time] [ifa [ifb [burst]]]\n");
exit(1);
}
@ -201,12 +201,12 @@ main(int argc, char **argv)
argc -= optind;
argv += optind;
if (argc > 0)
ifa = argv[0];
if (argc > 1)
ifa = argv[1];
ifb = argv[1];
if (argc > 2)
ifb = argv[2];
if (argc > 3)
burst = atoi(argv[3]);
burst = atoi(argv[2]);
if (!ifb)
ifb = ifa;
if (!ifa) {
@ -233,7 +233,7 @@ main(int argc, char **argv)
D("cannot open %s", ifa);
return (1);
}
// XXX use a single mmap ?
/* try to reuse the mmap() of the first interface, if possible */
pb = nm_open(ifb, NULL, NM_OPEN_NO_MMAP, pa);
if (pb == NULL) {
D("cannot open %s", ifb);
@ -262,6 +262,23 @@ main(int argc, char **argv)
pollfd[0].revents = pollfd[1].revents = 0;
n0 = pkt_queued(pa, 0);
n1 = pkt_queued(pb, 0);
#if defined(_WIN32) || defined(BUSYWAIT)
if (n0){
ioctl(pollfd[1].fd, NIOCTXSYNC, NULL);
pollfd[1].revents = POLLOUT;
}
else {
ioctl(pollfd[0].fd, NIOCRXSYNC, NULL);
}
if (n1){
ioctl(pollfd[0].fd, NIOCTXSYNC, NULL);
pollfd[0].revents = POLLOUT;
}
else {
ioctl(pollfd[1].fd, NIOCRXSYNC, NULL);
}
ret = 1;
#else
if (n0)
pollfd[1].events |= POLLOUT;
else
@ -271,6 +288,7 @@ main(int argc, char **argv)
else
pollfd[1].events |= POLLIN;
ret = poll(pollfd, 2, 2500);
#endif //defined(_WIN32) || defined(BUSYWAIT)
if (ret <= 0 || verbose)
D("poll %s [0] ev %x %x rx %d@%d tx %d,"
" [1] ev %x %x rx %d@%d tx %d",

108
tools/tools/netmap/ctrs.h Normal file
View File

@ -0,0 +1,108 @@
#ifndef CTRS_H_
#define CTRS_H_
/* $FreeBSD$ */
#include <sys/time.h>
/* counters to accumulate statistics */
struct my_ctrs {
uint64_t pkts, bytes, events, drop;
uint64_t min_space;
struct timeval t;
};
/* very crude code to print a number in normalized form.
* Caller has to make sure that the buffer is large enough.
*/
static const char *
norm2(char *buf, double val, char *fmt)
{
char *units[] = { "", "K", "M", "G", "T" };
u_int i;
for (i = 0; val >=1000 && i < sizeof(units)/sizeof(char *) - 1; i++)
val /= 1000;
sprintf(buf, fmt, val, units[i]);
return buf;
}
static __inline const char *
norm(char *buf, double val)
{
return norm2(buf, val, "%.3f %s");
}
static __inline int
timespec_ge(const struct timespec *a, const struct timespec *b)
{
if (a->tv_sec > b->tv_sec)
return (1);
if (a->tv_sec < b->tv_sec)
return (0);
if (a->tv_nsec >= b->tv_nsec)
return (1);
return (0);
}
static __inline struct timespec
timeval2spec(const struct timeval *a)
{
struct timespec ts = {
.tv_sec = a->tv_sec,
.tv_nsec = a->tv_usec * 1000
};
return ts;
}
static __inline struct timeval
timespec2val(const struct timespec *a)
{
struct timeval tv = {
.tv_sec = a->tv_sec,
.tv_usec = a->tv_nsec / 1000
};
return tv;
}
static __inline struct timespec
timespec_add(struct timespec a, struct timespec b)
{
struct timespec ret = { a.tv_sec + b.tv_sec, a.tv_nsec + b.tv_nsec };
if (ret.tv_nsec >= 1000000000) {
ret.tv_sec++;
ret.tv_nsec -= 1000000000;
}
return ret;
}
static __inline struct timespec
timespec_sub(struct timespec a, struct timespec b)
{
struct timespec ret = { a.tv_sec - b.tv_sec, a.tv_nsec - b.tv_nsec };
if (ret.tv_nsec < 0) {
ret.tv_sec--;
ret.tv_nsec += 1000000000;
}
return ret;
}
static uint64_t
wait_for_next_report(struct timeval *prev, struct timeval *cur,
int report_interval)
{
struct timeval delta;
delta.tv_sec = report_interval/1000;
delta.tv_usec = (report_interval%1000)*1000;
if (select(0, NULL, NULL, NULL, &delta) < 0 && errno != EINTR) {
perror("select");
abort();
}
gettimeofday(cur, NULL);
timersub(cur, prev, &delta);
return delta.tv_sec* 1000000 + delta.tv_usec;
}
#endif /* CTRS_H_ */

View File

@ -0,0 +1,129 @@
.\" Copyright (c) 2016 Luigi Rizzo, Universita` di Pisa
.\" All rights reserved.
.\"
.\" Redistribution and use in source and binary forms, with or without
.\" modification, are permitted provided that the following conditions
.\" are met:
.\" 1. Redistributions of source code must retain the above copyright
.\" notice, this list of conditions and the following disclaimer.
.\" 2. Redistributions in binary form must reproduce the above copyright
.\" notice, this list of conditions and the following disclaimer in the
.\" documentation and/or other materials provided with the distribution.
.\"
.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
.\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
.\" SUCH DAMAGE.
.\"
.\" $FreeBSD$
.\"
.Dd February 16, 2016
.Dt NMREPLAY 1
.Os
.Sh NAME
.Nm nmreplay
.Nd playback a pcap file through a netmap interface
.Sh SYNOPSIS
.Bk -words
.Bl -tag -width "nmreplay"
.It Nm
.Op Fl f Ar pcap-file
.Op Fl i Ar netmap-interface
.Op Fl B Ar bandwidth
.Op Fl D Ar delay
.Op Fl L Ar loss
.Op Fl b Ar batch size
.Op Fl w Ar wait-link
.Op Fl v
.Op Fl C Ar cpu-placement
.Sh DESCRIPTION
.Nm
works like
.Nm tcpreplay
to replay a pcap file through a netmap interface,
with programmable rates and possibly delays, losses
and packet alterations.
.Nm
is designed to run at high speed, so the transmit schedule
is computed ahead of time, and the thread in charge of transmission
only has to pump data through the interface.
.Nm
can connect to any type of netmap port.
.Pp
Command line options are as follows
.Bl -tag -width Ds
.It Fl f Ar pcap-file
Name of the pcap file to replay.
.It Fl i Ar interface
Name of the netmap interface to use as output.
.It Fl v
Enable verbose mode
.It Fl b Ar batch-size
Maximum batch size to use during transmissions.
.Nm
normally transmits packets one at a time, but it may use
larger batches, up to the value specified with this option,
when running at high rates.
.It Fl B Ar bps | Cm constant, Ns Ar bps | Cm ether, Ns Ar bps | Cm real Ns Op , Ns Ar speedup
Bandwidth to be used for transmission.
.Ar bps
is a floating point number optionally follow by a character
(k, K, m, M, g, G) that multiplies the value by 10^3, 10^6 and 10^9
respectively.
.Cm constant
(can be omitted) means that the bandwidth will be computed
with reference to the actual packet size (excluding CRC and framing).
.Cm ether
indicates that the ethernet framing (160 bits) and CRC (32 bits)
will be included in the computation of the packet size.
.Cm real
means transmission will occur according to the timestamps
recorded in the trace. The optional
.Ar speedup
multiplier (defaults to 1) indicates how much faster
or slower than real time the trace should be replayed.
.It Fl D Ar dt | Cm constant, Ns Ar dt | Cm uniform, Ns Ar dmin,dmax | Cm exp, Ar dmin,davg
Adds additional delay to the packet transmission, whose distribution
can be constant, uniform or exponential.
.Ar dt, dmin, dmax, avt
are times expressed as floating point numbers optionally followed
by a character (s, m, u, n) to indicate seconds, milliseconds,
microseconds, nanoseconds.
The delay is added to the transmit time and adjusted so that there is
never packet reordering.
.It Fl L Ar x | Cm plr, Ns Ar x | Cm ber, Ns Ar x
Simulates packet or bit errors, causing offending packets to be dropped.
.Ar x
is a floating point number indicating the packet or bit error rate.
.It Fl w Ar wait-link
indicates the number of seconds to wait before transmitting.
It defaults to 2, and may be useful when talking to physical
ports to let link negotiation complete before starting transmission.
.El
.Sh OPERATION
.Nm
creates an in-memory schedule with all packets to be transmitted,
and then launches a separate thread to take care of transmissions
while the main thread reports statistics every second.
.Sh SEE ALSO
.Pa http://info.iet.unipi.it/~luigi/netmap/
.Pp
Luigi Rizzo, Revisiting network I/O APIs: the netmap framework,
Communications of the ACM, 55 (3), pp.45-51, March 2012
.Pp
Luigi Rizzo, Giuseppe Lettieri,
VALE, a switched ethernet for virtual machines,
ACM CoNEXT'12, December 2012, Nice
.Sh AUTHORS
.An -nosplit
.Nm
has been written by
.An Luigi Rizzo, Andrea Beconcini, Francesco Mola and Lorenzo Biagini
at the Universita` di Pisa, Italy.

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@ -25,6 +25,10 @@
/* $FreeBSD$ */
#define NETMAP_WITH_LIBS
#include <net/netmap_user.h>
#include <net/netmap.h>
#include <errno.h>
#include <stdio.h>
#include <inttypes.h> /* PRI* macros */
@ -35,17 +39,9 @@
#include <sys/param.h>
#include <sys/socket.h> /* apple needs sockaddr */
#include <net/if.h> /* ifreq */
#include <net/netmap.h>
#include <net/netmap_user.h>
#include <libgen.h> /* basename */
#include <stdlib.h> /* atoi, free */
/* debug support */
#define ND(format, ...) do {} while(0)
#define D(format, ...) \
fprintf(stderr, "%s [%d] " format "\n", \
__FUNCTION__, __LINE__, ##__VA_ARGS__)
/* XXX cut and paste from pkt-gen.c because I'm not sure whether this
* program may include nm_util.h
*/
@ -117,8 +113,11 @@ bdg_ctl(const char *name, int nr_cmd, int nr_arg, char *nmr_config)
break;
case NETMAP_BDG_ATTACH:
case NETMAP_BDG_DETACH:
if (nr_arg && nr_arg != NETMAP_BDG_HOST)
nmr.nr_flags = NR_REG_ALL_NIC;
if (nr_arg && nr_arg != NETMAP_BDG_HOST) {
nmr.nr_flags = NR_REG_NIC_SW;
nr_arg = 0;
}
nmr.nr_arg1 = nr_arg;
error = ioctl(fd, NIOCREGIF, &nmr);
if (error == -1) {
@ -152,6 +151,36 @@ bdg_ctl(const char *name, int nr_cmd, int nr_arg, char *nmr_config)
break;
case NETMAP_BDG_POLLING_ON:
case NETMAP_BDG_POLLING_OFF:
/* We reuse nmreq fields as follows:
* nr_tx_slots: 0 and non-zero indicate REG_ALL_NIC
* REG_ONE_NIC, respectively.
* nr_rx_slots: CPU core index. This also indicates the
* first queue in the case of REG_ONE_NIC
* nr_tx_rings: (REG_ONE_NIC only) indicates the
* number of CPU cores or the last queue
*/
nmr.nr_flags |= nmr.nr_tx_slots ?
NR_REG_ONE_NIC : NR_REG_ALL_NIC;
nmr.nr_ringid = nmr.nr_rx_slots;
/* number of cores/rings */
if (nmr.nr_flags == NR_REG_ALL_NIC)
nmr.nr_arg1 = 1;
else
nmr.nr_arg1 = nmr.nr_tx_rings;
error = ioctl(fd, NIOCREGIF, &nmr);
if (!error)
D("polling on %s %s", nmr.nr_name,
nr_cmd == NETMAP_BDG_POLLING_ON ?
"started" : "stopped");
else
D("polling on %s %s (err %d)", nmr.nr_name,
nr_cmd == NETMAP_BDG_POLLING_ON ?
"couldn't start" : "couldn't stop", error);
break;
default: /* GINFO */
nmr.nr_cmd = nmr.nr_arg1 = nmr.nr_arg2 = 0;
error = ioctl(fd, NIOCGINFO, &nmr);
@ -173,7 +202,7 @@ main(int argc, char *argv[])
const char *command = basename(argv[0]);
char *name = NULL, *nmr_config = NULL;
if (argc > 3) {
if (argc > 5) {
usage:
fprintf(stderr,
"Usage:\n"
@ -186,12 +215,18 @@ main(int argc, char *argv[])
"\t-r interface interface name to be deleted\n"
"\t-l list all or specified bridge's interfaces (default)\n"
"\t-C string ring/slot setting of an interface creating by -n\n"
"\t-p interface start polling. Additional -C x,y,z configures\n"
"\t\t x: 0 (REG_ALL_NIC) or 1 (REG_ONE_NIC),\n"
"\t\t y: CPU core id for ALL_NIC and core/ring for ONE_NIC\n"
"\t\t z: (ONE_NIC only) num of total cores/rings\n"
"\t-P interface stop polling\n"
"", command);
return 0;
}
while ((ch = getopt(argc, argv, "d:a:h:g:l:n:r:C:")) != -1) {
name = optarg; /* default */
while ((ch = getopt(argc, argv, "d:a:h:g:l:n:r:C:p:P:")) != -1) {
if (ch != 'C')
name = optarg; /* default */
switch (ch) {
default:
fprintf(stderr, "bad option %c %s", ch, optarg);
@ -223,11 +258,17 @@ main(int argc, char *argv[])
case 'C':
nmr_config = strdup(optarg);
break;
case 'p':
nr_cmd = NETMAP_BDG_POLLING_ON;
break;
case 'P':
nr_cmd = NETMAP_BDG_POLLING_OFF;
break;
}
if (optind != argc) {
// fprintf(stderr, "optind %d argc %d\n", optind, argc);
goto usage;
}
}
if (optind != argc) {
// fprintf(stderr, "optind %d argc %d\n", optind, argc);
goto usage;
}
if (argc == 1)
nr_cmd = NETMAP_BDG_LIST;