2012-04-13 16:32:33 +00:00
|
|
|
/*
|
It is 2014 and we have a new version of netmap.
Most relevant features:
- netmap emulation on any NIC, even those without native netmap support.
On the ixgbe we have measured about 4Mpps/core/queue in this mode,
which is still a lot more than with sockets/bpf.
- seamless interconnection of VALE switch, NICs and host stack.
If you disable accelerations on your NIC (say em0)
ifconfig em0 -txcsum -txcsum
you can use the VALE switch to connect the NIC and the host stack:
vale-ctl -h valeXX:em0
allowing sharing the NIC with other netmap clients.
- THE USER API HAS SLIGHTLY CHANGED (head/cur/tail pointers
instead of pointers/count as before). This was unavoidable to support,
in the future, multiple threads operating on the same rings.
Netmap clients require very small source code changes to compile again.
On the plus side, the new API should be easier to understand
and the internals are a lot simpler.
The manual page has been updated extensively to reflect the current
features and give some examples.
This is the result of work of several people including Giuseppe Lettieri,
Vincenzo Maffione, Michio Honda and myself, and has been financially
supported by EU projects CHANGE and OPENLAB, from NetApp University
Research Fund, NEC, and of course the Universita` di Pisa.
2014-01-06 12:53:15 +00:00
|
|
|
* Copyright (C) 2012-2014 Matteo Landi, Luigi Rizzo, Giuseppe Lettieri. All rights reserved.
|
2012-04-13 16:32:33 +00:00
|
|
|
*
|
|
|
|
* Redistribution and use in source and binary forms, with or without
|
|
|
|
* modification, are permitted provided that the following conditions
|
|
|
|
* are met:
|
|
|
|
* 1. Redistributions of source code must retain the above copyright
|
|
|
|
* notice, this list of conditions and the following disclaimer.
|
|
|
|
* 2. Redistributions in binary form must reproduce the above copyright
|
|
|
|
* notice, this list of conditions and the following disclaimer in the
|
2013-12-15 08:37:24 +00:00
|
|
|
* documentation and/or other materials provided with the distribution.
|
2012-04-13 16:32:33 +00:00
|
|
|
*
|
|
|
|
* THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
|
|
|
|
* ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
|
|
|
|
* IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
|
|
|
|
* ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
|
|
|
|
* FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
|
|
|
|
* DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
|
|
|
|
* OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
|
|
|
|
* HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
|
|
|
|
* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
|
|
|
|
* OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
|
|
|
|
* SUCH DAMAGE.
|
|
|
|
*/
|
|
|
|
|
2013-11-01 21:21:14 +00:00
|
|
|
#ifdef linux
|
|
|
|
#include "bsd_glue.h"
|
|
|
|
#endif /* linux */
|
2012-04-13 16:32:33 +00:00
|
|
|
|
2013-11-01 21:21:14 +00:00
|
|
|
#ifdef __APPLE__
|
|
|
|
#include "osx_glue.h"
|
|
|
|
#endif /* __APPLE__ */
|
|
|
|
|
|
|
|
#ifdef __FreeBSD__
|
|
|
|
#include <sys/cdefs.h> /* prerequisite */
|
|
|
|
__FBSDID("$FreeBSD$");
|
|
|
|
|
|
|
|
#include <sys/types.h>
|
|
|
|
#include <sys/malloc.h>
|
|
|
|
#include <sys/proc.h>
|
|
|
|
#include <vm/vm.h> /* vtophys */
|
|
|
|
#include <vm/pmap.h> /* vtophys */
|
|
|
|
#include <sys/socket.h> /* sockaddrs */
|
|
|
|
#include <sys/selinfo.h>
|
|
|
|
#include <sys/sysctl.h>
|
|
|
|
#include <net/if.h>
|
|
|
|
#include <net/if_var.h>
|
|
|
|
#include <net/vnet.h>
|
|
|
|
#include <machine/bus.h> /* bus_dmamap_* */
|
|
|
|
|
|
|
|
#endif /* __FreeBSD__ */
|
|
|
|
|
|
|
|
#include <net/netmap.h>
|
|
|
|
#include <dev/netmap/netmap_kern.h>
|
|
|
|
#include "netmap_mem2.h"
|
2012-04-13 16:32:33 +00:00
|
|
|
|
Update to the current version of netmap.
Mostly bugfixes or features developed in the past 6 months,
so this is a 10.1 candidate.
Basically no user API changes (some bugfixes in sys/net/netmap_user.h).
In detail:
1. netmap support for virtio-net, including in netmap mode.
Under bhyve and with a netmap backend [2] we reach over 1Mpps
with standard APIs (e.g. libpcap), and 5-8 Mpps in netmap mode.
2. (kernel) add support for multiple memory allocators, so we can
better partition physical and virtual interfaces giving access
to separate users. The most visible effect is one additional
argument to the various kernel functions to compute buffer
addresses. All netmap-supported drivers are affected, but changes
are mechanical and trivial
3. (kernel) simplify the prototype for *txsync() and *rxsync()
driver methods. All netmap drivers affected, changes mostly mechanical.
4. add support for netmap-monitor ports. Think of it as a mirroring
port on a physical switch: a netmap monitor port replicates traffic
present on the main port. Restrictions apply. Drive carefully.
5. if_lem.c: support for various paravirtualization features,
experimental and disabled by default.
Most of these are described in our ANCS'13 paper [1].
Paravirtualized support in netmap mode is new, and beats the
numbers in the paper by a large factor (under qemu-kvm,
we measured gues-host throughput up to 10-12 Mpps).
A lot of refactoring and additional documentation in the files
in sys/dev/netmap, but apart from #2 and #3 above, almost nothing
of this stuff is visible to other kernel parts.
Example programs in tools/tools/netmap have been updated with bugfixes
and to support more of the existing features.
This is meant to go into 10.1 so we plan an MFC before the Aug.22 deadline.
A lot of this code has been contributed by my colleagues at UNIPI,
including Giuseppe Lettieri, Vincenzo Maffione, Stefano Garzarella.
MFC after: 3 days.
2014-08-16 15:00:01 +00:00
|
|
|
#define NETMAP_BUF_MAX_NUM 20*4096*2 /* large machine */
|
|
|
|
|
|
|
|
#define NETMAP_POOL_MAX_NAMSZ 32
|
|
|
|
|
|
|
|
|
|
|
|
enum {
|
|
|
|
NETMAP_IF_POOL = 0,
|
|
|
|
NETMAP_RING_POOL,
|
|
|
|
NETMAP_BUF_POOL,
|
|
|
|
NETMAP_POOLS_NR
|
|
|
|
};
|
|
|
|
|
|
|
|
|
|
|
|
struct netmap_obj_params {
|
|
|
|
u_int size;
|
|
|
|
u_int num;
|
|
|
|
};
|
|
|
|
struct netmap_obj_pool {
|
|
|
|
char name[NETMAP_POOL_MAX_NAMSZ]; /* name of the allocator */
|
|
|
|
|
|
|
|
/* ---------------------------------------------------*/
|
|
|
|
/* these are only meaningful if the pool is finalized */
|
|
|
|
/* (see 'finalized' field in netmap_mem_d) */
|
|
|
|
u_int objtotal; /* actual total number of objects. */
|
|
|
|
u_int memtotal; /* actual total memory space */
|
|
|
|
u_int numclusters; /* actual number of clusters */
|
|
|
|
|
|
|
|
u_int objfree; /* number of free objects. */
|
|
|
|
|
|
|
|
struct lut_entry *lut; /* virt,phys addresses, objtotal entries */
|
|
|
|
uint32_t *bitmap; /* one bit per buffer, 1 means free */
|
|
|
|
uint32_t bitmap_slots; /* number of uint32 entries in bitmap */
|
|
|
|
/* ---------------------------------------------------*/
|
|
|
|
|
|
|
|
/* limits */
|
|
|
|
u_int objminsize; /* minimum object size */
|
|
|
|
u_int objmaxsize; /* maximum object size */
|
|
|
|
u_int nummin; /* minimum number of objects */
|
|
|
|
u_int nummax; /* maximum number of objects */
|
|
|
|
|
|
|
|
/* these are changed only by config */
|
|
|
|
u_int _objtotal; /* total number of objects */
|
|
|
|
u_int _objsize; /* object size */
|
|
|
|
u_int _clustsize; /* cluster size */
|
|
|
|
u_int _clustentries; /* objects per cluster */
|
|
|
|
u_int _numclusters; /* number of clusters */
|
|
|
|
|
|
|
|
/* requested values */
|
|
|
|
u_int r_objtotal;
|
|
|
|
u_int r_objsize;
|
|
|
|
};
|
|
|
|
|
|
|
|
#ifdef linux
|
|
|
|
// XXX a mtx would suffice here 20130415 lr
|
|
|
|
#define NMA_LOCK_T struct semaphore
|
|
|
|
#else /* !linux */
|
|
|
|
#define NMA_LOCK_T struct mtx
|
|
|
|
#endif /* linux */
|
|
|
|
|
|
|
|
typedef int (*netmap_mem_config_t)(struct netmap_mem_d*);
|
|
|
|
typedef int (*netmap_mem_finalize_t)(struct netmap_mem_d*);
|
|
|
|
typedef void (*netmap_mem_deref_t)(struct netmap_mem_d*);
|
|
|
|
|
|
|
|
typedef uint16_t nm_memid_t;
|
|
|
|
|
|
|
|
struct netmap_mem_d {
|
|
|
|
NMA_LOCK_T nm_mtx; /* protect the allocator */
|
|
|
|
u_int nm_totalsize; /* shorthand */
|
|
|
|
|
|
|
|
u_int flags;
|
|
|
|
#define NETMAP_MEM_FINALIZED 0x1 /* preallocation done */
|
|
|
|
int lasterr; /* last error for curr config */
|
|
|
|
int refcount; /* existing priv structures */
|
|
|
|
/* the three allocators */
|
|
|
|
struct netmap_obj_pool pools[NETMAP_POOLS_NR];
|
|
|
|
|
2015-05-15 15:36:57 +00:00
|
|
|
netmap_mem_config_t config; /* called with NMA_LOCK held */
|
|
|
|
netmap_mem_finalize_t finalize; /* called with NMA_LOCK held */
|
|
|
|
netmap_mem_deref_t deref; /* called with NMA_LOCK held */
|
Update to the current version of netmap.
Mostly bugfixes or features developed in the past 6 months,
so this is a 10.1 candidate.
Basically no user API changes (some bugfixes in sys/net/netmap_user.h).
In detail:
1. netmap support for virtio-net, including in netmap mode.
Under bhyve and with a netmap backend [2] we reach over 1Mpps
with standard APIs (e.g. libpcap), and 5-8 Mpps in netmap mode.
2. (kernel) add support for multiple memory allocators, so we can
better partition physical and virtual interfaces giving access
to separate users. The most visible effect is one additional
argument to the various kernel functions to compute buffer
addresses. All netmap-supported drivers are affected, but changes
are mechanical and trivial
3. (kernel) simplify the prototype for *txsync() and *rxsync()
driver methods. All netmap drivers affected, changes mostly mechanical.
4. add support for netmap-monitor ports. Think of it as a mirroring
port on a physical switch: a netmap monitor port replicates traffic
present on the main port. Restrictions apply. Drive carefully.
5. if_lem.c: support for various paravirtualization features,
experimental and disabled by default.
Most of these are described in our ANCS'13 paper [1].
Paravirtualized support in netmap mode is new, and beats the
numbers in the paper by a large factor (under qemu-kvm,
we measured gues-host throughput up to 10-12 Mpps).
A lot of refactoring and additional documentation in the files
in sys/dev/netmap, but apart from #2 and #3 above, almost nothing
of this stuff is visible to other kernel parts.
Example programs in tools/tools/netmap have been updated with bugfixes
and to support more of the existing features.
This is meant to go into 10.1 so we plan an MFC before the Aug.22 deadline.
A lot of this code has been contributed by my colleagues at UNIPI,
including Giuseppe Lettieri, Vincenzo Maffione, Stefano Garzarella.
MFC after: 3 days.
2014-08-16 15:00:01 +00:00
|
|
|
|
|
|
|
nm_memid_t nm_id; /* allocator identifier */
|
|
|
|
int nm_grp; /* iommu groupd id */
|
|
|
|
|
|
|
|
/* list of all existing allocators, sorted by nm_id */
|
|
|
|
struct netmap_mem_d *prev, *next;
|
|
|
|
};
|
|
|
|
|
|
|
|
/* accessor functions */
|
|
|
|
struct lut_entry*
|
|
|
|
netmap_mem_get_lut(struct netmap_mem_d *nmd)
|
|
|
|
{
|
|
|
|
return nmd->pools[NETMAP_BUF_POOL].lut;
|
|
|
|
}
|
|
|
|
|
|
|
|
u_int
|
|
|
|
netmap_mem_get_buftotal(struct netmap_mem_d *nmd)
|
|
|
|
{
|
|
|
|
return nmd->pools[NETMAP_BUF_POOL].objtotal;
|
|
|
|
}
|
|
|
|
|
|
|
|
size_t
|
|
|
|
netmap_mem_get_bufsize(struct netmap_mem_d *nmd)
|
|
|
|
{
|
|
|
|
return nmd->pools[NETMAP_BUF_POOL]._objsize;
|
|
|
|
}
|
|
|
|
|
2012-10-19 04:13:12 +00:00
|
|
|
#ifdef linux
|
2013-11-01 21:21:14 +00:00
|
|
|
#define NMA_LOCK_INIT(n) sema_init(&(n)->nm_mtx, 1)
|
|
|
|
#define NMA_LOCK_DESTROY(n)
|
|
|
|
#define NMA_LOCK(n) down(&(n)->nm_mtx)
|
|
|
|
#define NMA_UNLOCK(n) up(&(n)->nm_mtx)
|
2012-10-19 04:13:12 +00:00
|
|
|
#else /* !linux */
|
2013-11-01 21:21:14 +00:00
|
|
|
#define NMA_LOCK_INIT(n) mtx_init(&(n)->nm_mtx, "netmap memory allocator lock", NULL, MTX_DEF)
|
|
|
|
#define NMA_LOCK_DESTROY(n) mtx_destroy(&(n)->nm_mtx)
|
|
|
|
#define NMA_LOCK(n) mtx_lock(&(n)->nm_mtx)
|
|
|
|
#define NMA_UNLOCK(n) mtx_unlock(&(n)->nm_mtx)
|
2012-10-19 04:13:12 +00:00
|
|
|
#endif /* linux */
|
|
|
|
|
|
|
|
|
|
|
|
struct netmap_obj_params netmap_params[NETMAP_POOLS_NR] = {
|
|
|
|
[NETMAP_IF_POOL] = {
|
|
|
|
.size = 1024,
|
|
|
|
.num = 100,
|
|
|
|
},
|
|
|
|
[NETMAP_RING_POOL] = {
|
|
|
|
.size = 9*PAGE_SIZE,
|
|
|
|
.num = 200,
|
|
|
|
},
|
|
|
|
[NETMAP_BUF_POOL] = {
|
|
|
|
.size = 2048,
|
|
|
|
.num = NETMAP_BUF_MAX_NUM,
|
|
|
|
},
|
|
|
|
};
|
|
|
|
|
This new version of netmap brings you the following:
- netmap pipes, providing bidirectional blocking I/O while moving
100+ Mpps between processes using shared memory channels
(no mistake: over one hundred million. But mind you, i said
*moving* not *processing*);
- kqueue support (BHyVe needs it);
- improved user library. Just the interface name lets you select a NIC,
host port, VALE switch port, netmap pipe, and individual queues.
The upcoming netmap-enabled libpcap will use this feature.
- optional extra buffers associated to netmap ports, for applications
that need to buffer data yet don't want to make copies.
- segmentation offloading for the VALE switch, useful between VMs.
and a number of bug fixes and performance improvements.
My colleagues Giuseppe Lettieri and Vincenzo Maffione did a substantial
amount of work on these features so we owe them a big thanks.
There are some external repositories that can be of interest:
https://code.google.com/p/netmap
our public repository for netmap/VALE code, including
linux versions and other stuff that does not belong here,
such as python bindings.
https://code.google.com/p/netmap-libpcap
a clone of the libpcap repository with netmap support.
With this any libpcap client has access to most netmap
feature with no recompilation. E.g. tcpdump can filter
packets at 10-15 Mpps.
https://code.google.com/p/netmap-ipfw
a userspace version of ipfw+dummynet which uses netmap
to send/receive packets. Speed is up in the 7-10 Mpps
range per core for simple rulesets.
Both netmap-libpcap and netmap-ipfw will be merged upstream at some
point, but while this happens it is useful to have access to them.
And yes, this code will be merged soon. It is infinitely better
than the version currently in 10 and 9.
MFC after: 3 days
2014-02-15 04:53:04 +00:00
|
|
|
struct netmap_obj_params netmap_min_priv_params[NETMAP_POOLS_NR] = {
|
|
|
|
[NETMAP_IF_POOL] = {
|
|
|
|
.size = 1024,
|
|
|
|
.num = 1,
|
|
|
|
},
|
|
|
|
[NETMAP_RING_POOL] = {
|
|
|
|
.size = 5*PAGE_SIZE,
|
|
|
|
.num = 4,
|
|
|
|
},
|
|
|
|
[NETMAP_BUF_POOL] = {
|
|
|
|
.size = 2048,
|
|
|
|
.num = 4098,
|
|
|
|
},
|
|
|
|
};
|
|
|
|
|
2012-04-13 16:32:33 +00:00
|
|
|
|
2013-04-19 21:08:21 +00:00
|
|
|
/*
|
|
|
|
* nm_mem is the memory allocator used for all physical interfaces
|
|
|
|
* running in netmap mode.
|
|
|
|
* Virtual (VALE) ports will have each its own allocator.
|
|
|
|
*/
|
2013-11-01 21:21:14 +00:00
|
|
|
static int netmap_mem_global_config(struct netmap_mem_d *nmd);
|
|
|
|
static int netmap_mem_global_finalize(struct netmap_mem_d *nmd);
|
|
|
|
static void netmap_mem_global_deref(struct netmap_mem_d *nmd);
|
|
|
|
struct netmap_mem_d nm_mem = { /* Our memory allocator. */
|
2012-10-19 04:13:12 +00:00
|
|
|
.pools = {
|
|
|
|
[NETMAP_IF_POOL] = {
|
|
|
|
.name = "netmap_if",
|
|
|
|
.objminsize = sizeof(struct netmap_if),
|
|
|
|
.objmaxsize = 4096,
|
|
|
|
.nummin = 10, /* don't be stingy */
|
|
|
|
.nummax = 10000, /* XXX very large */
|
|
|
|
},
|
|
|
|
[NETMAP_RING_POOL] = {
|
|
|
|
.name = "netmap_ring",
|
|
|
|
.objminsize = sizeof(struct netmap_ring),
|
|
|
|
.objmaxsize = 32*PAGE_SIZE,
|
|
|
|
.nummin = 2,
|
|
|
|
.nummax = 1024,
|
|
|
|
},
|
|
|
|
[NETMAP_BUF_POOL] = {
|
|
|
|
.name = "netmap_buf",
|
|
|
|
.objminsize = 64,
|
|
|
|
.objmaxsize = 65536,
|
|
|
|
.nummin = 4,
|
|
|
|
.nummax = 1000000, /* one million! */
|
|
|
|
},
|
|
|
|
},
|
2013-11-01 21:21:14 +00:00
|
|
|
.config = netmap_mem_global_config,
|
|
|
|
.finalize = netmap_mem_global_finalize,
|
|
|
|
.deref = netmap_mem_global_deref,
|
This new version of netmap brings you the following:
- netmap pipes, providing bidirectional blocking I/O while moving
100+ Mpps between processes using shared memory channels
(no mistake: over one hundred million. But mind you, i said
*moving* not *processing*);
- kqueue support (BHyVe needs it);
- improved user library. Just the interface name lets you select a NIC,
host port, VALE switch port, netmap pipe, and individual queues.
The upcoming netmap-enabled libpcap will use this feature.
- optional extra buffers associated to netmap ports, for applications
that need to buffer data yet don't want to make copies.
- segmentation offloading for the VALE switch, useful between VMs.
and a number of bug fixes and performance improvements.
My colleagues Giuseppe Lettieri and Vincenzo Maffione did a substantial
amount of work on these features so we owe them a big thanks.
There are some external repositories that can be of interest:
https://code.google.com/p/netmap
our public repository for netmap/VALE code, including
linux versions and other stuff that does not belong here,
such as python bindings.
https://code.google.com/p/netmap-libpcap
a clone of the libpcap repository with netmap support.
With this any libpcap client has access to most netmap
feature with no recompilation. E.g. tcpdump can filter
packets at 10-15 Mpps.
https://code.google.com/p/netmap-ipfw
a userspace version of ipfw+dummynet which uses netmap
to send/receive packets. Speed is up in the 7-10 Mpps
range per core for simple rulesets.
Both netmap-libpcap and netmap-ipfw will be merged upstream at some
point, but while this happens it is useful to have access to them.
And yes, this code will be merged soon. It is infinitely better
than the version currently in 10 and 9.
MFC after: 3 days
2014-02-15 04:53:04 +00:00
|
|
|
|
|
|
|
.nm_id = 1,
|
Update to the current version of netmap.
Mostly bugfixes or features developed in the past 6 months,
so this is a 10.1 candidate.
Basically no user API changes (some bugfixes in sys/net/netmap_user.h).
In detail:
1. netmap support for virtio-net, including in netmap mode.
Under bhyve and with a netmap backend [2] we reach over 1Mpps
with standard APIs (e.g. libpcap), and 5-8 Mpps in netmap mode.
2. (kernel) add support for multiple memory allocators, so we can
better partition physical and virtual interfaces giving access
to separate users. The most visible effect is one additional
argument to the various kernel functions to compute buffer
addresses. All netmap-supported drivers are affected, but changes
are mechanical and trivial
3. (kernel) simplify the prototype for *txsync() and *rxsync()
driver methods. All netmap drivers affected, changes mostly mechanical.
4. add support for netmap-monitor ports. Think of it as a mirroring
port on a physical switch: a netmap monitor port replicates traffic
present on the main port. Restrictions apply. Drive carefully.
5. if_lem.c: support for various paravirtualization features,
experimental and disabled by default.
Most of these are described in our ANCS'13 paper [1].
Paravirtualized support in netmap mode is new, and beats the
numbers in the paper by a large factor (under qemu-kvm,
we measured gues-host throughput up to 10-12 Mpps).
A lot of refactoring and additional documentation in the files
in sys/dev/netmap, but apart from #2 and #3 above, almost nothing
of this stuff is visible to other kernel parts.
Example programs in tools/tools/netmap have been updated with bugfixes
and to support more of the existing features.
This is meant to go into 10.1 so we plan an MFC before the Aug.22 deadline.
A lot of this code has been contributed by my colleagues at UNIPI,
including Giuseppe Lettieri, Vincenzo Maffione, Stefano Garzarella.
MFC after: 3 days.
2014-08-16 15:00:01 +00:00
|
|
|
.nm_grp = -1,
|
This new version of netmap brings you the following:
- netmap pipes, providing bidirectional blocking I/O while moving
100+ Mpps between processes using shared memory channels
(no mistake: over one hundred million. But mind you, i said
*moving* not *processing*);
- kqueue support (BHyVe needs it);
- improved user library. Just the interface name lets you select a NIC,
host port, VALE switch port, netmap pipe, and individual queues.
The upcoming netmap-enabled libpcap will use this feature.
- optional extra buffers associated to netmap ports, for applications
that need to buffer data yet don't want to make copies.
- segmentation offloading for the VALE switch, useful between VMs.
and a number of bug fixes and performance improvements.
My colleagues Giuseppe Lettieri and Vincenzo Maffione did a substantial
amount of work on these features so we owe them a big thanks.
There are some external repositories that can be of interest:
https://code.google.com/p/netmap
our public repository for netmap/VALE code, including
linux versions and other stuff that does not belong here,
such as python bindings.
https://code.google.com/p/netmap-libpcap
a clone of the libpcap repository with netmap support.
With this any libpcap client has access to most netmap
feature with no recompilation. E.g. tcpdump can filter
packets at 10-15 Mpps.
https://code.google.com/p/netmap-ipfw
a userspace version of ipfw+dummynet which uses netmap
to send/receive packets. Speed is up in the 7-10 Mpps
range per core for simple rulesets.
Both netmap-libpcap and netmap-ipfw will be merged upstream at some
point, but while this happens it is useful to have access to them.
And yes, this code will be merged soon. It is infinitely better
than the version currently in 10 and 9.
MFC after: 3 days
2014-02-15 04:53:04 +00:00
|
|
|
|
|
|
|
.prev = &nm_mem,
|
|
|
|
.next = &nm_mem,
|
2012-04-13 16:32:33 +00:00
|
|
|
};
|
|
|
|
|
2013-11-01 21:21:14 +00:00
|
|
|
|
This new version of netmap brings you the following:
- netmap pipes, providing bidirectional blocking I/O while moving
100+ Mpps between processes using shared memory channels
(no mistake: over one hundred million. But mind you, i said
*moving* not *processing*);
- kqueue support (BHyVe needs it);
- improved user library. Just the interface name lets you select a NIC,
host port, VALE switch port, netmap pipe, and individual queues.
The upcoming netmap-enabled libpcap will use this feature.
- optional extra buffers associated to netmap ports, for applications
that need to buffer data yet don't want to make copies.
- segmentation offloading for the VALE switch, useful between VMs.
and a number of bug fixes and performance improvements.
My colleagues Giuseppe Lettieri and Vincenzo Maffione did a substantial
amount of work on these features so we owe them a big thanks.
There are some external repositories that can be of interest:
https://code.google.com/p/netmap
our public repository for netmap/VALE code, including
linux versions and other stuff that does not belong here,
such as python bindings.
https://code.google.com/p/netmap-libpcap
a clone of the libpcap repository with netmap support.
With this any libpcap client has access to most netmap
feature with no recompilation. E.g. tcpdump can filter
packets at 10-15 Mpps.
https://code.google.com/p/netmap-ipfw
a userspace version of ipfw+dummynet which uses netmap
to send/receive packets. Speed is up in the 7-10 Mpps
range per core for simple rulesets.
Both netmap-libpcap and netmap-ipfw will be merged upstream at some
point, but while this happens it is useful to have access to them.
And yes, this code will be merged soon. It is infinitely better
than the version currently in 10 and 9.
MFC after: 3 days
2014-02-15 04:53:04 +00:00
|
|
|
struct netmap_mem_d *netmap_last_mem_d = &nm_mem;
|
|
|
|
|
2013-11-01 21:21:14 +00:00
|
|
|
/* blueprint for the private memory allocators */
|
|
|
|
static int netmap_mem_private_config(struct netmap_mem_d *nmd);
|
|
|
|
static int netmap_mem_private_finalize(struct netmap_mem_d *nmd);
|
|
|
|
static void netmap_mem_private_deref(struct netmap_mem_d *nmd);
|
|
|
|
const struct netmap_mem_d nm_blueprint = {
|
|
|
|
.pools = {
|
|
|
|
[NETMAP_IF_POOL] = {
|
|
|
|
.name = "%s_if",
|
|
|
|
.objminsize = sizeof(struct netmap_if),
|
|
|
|
.objmaxsize = 4096,
|
|
|
|
.nummin = 1,
|
This new version of netmap brings you the following:
- netmap pipes, providing bidirectional blocking I/O while moving
100+ Mpps between processes using shared memory channels
(no mistake: over one hundred million. But mind you, i said
*moving* not *processing*);
- kqueue support (BHyVe needs it);
- improved user library. Just the interface name lets you select a NIC,
host port, VALE switch port, netmap pipe, and individual queues.
The upcoming netmap-enabled libpcap will use this feature.
- optional extra buffers associated to netmap ports, for applications
that need to buffer data yet don't want to make copies.
- segmentation offloading for the VALE switch, useful between VMs.
and a number of bug fixes and performance improvements.
My colleagues Giuseppe Lettieri and Vincenzo Maffione did a substantial
amount of work on these features so we owe them a big thanks.
There are some external repositories that can be of interest:
https://code.google.com/p/netmap
our public repository for netmap/VALE code, including
linux versions and other stuff that does not belong here,
such as python bindings.
https://code.google.com/p/netmap-libpcap
a clone of the libpcap repository with netmap support.
With this any libpcap client has access to most netmap
feature with no recompilation. E.g. tcpdump can filter
packets at 10-15 Mpps.
https://code.google.com/p/netmap-ipfw
a userspace version of ipfw+dummynet which uses netmap
to send/receive packets. Speed is up in the 7-10 Mpps
range per core for simple rulesets.
Both netmap-libpcap and netmap-ipfw will be merged upstream at some
point, but while this happens it is useful to have access to them.
And yes, this code will be merged soon. It is infinitely better
than the version currently in 10 and 9.
MFC after: 3 days
2014-02-15 04:53:04 +00:00
|
|
|
.nummax = 100,
|
2013-11-01 21:21:14 +00:00
|
|
|
},
|
|
|
|
[NETMAP_RING_POOL] = {
|
|
|
|
.name = "%s_ring",
|
|
|
|
.objminsize = sizeof(struct netmap_ring),
|
|
|
|
.objmaxsize = 32*PAGE_SIZE,
|
|
|
|
.nummin = 2,
|
|
|
|
.nummax = 1024,
|
|
|
|
},
|
|
|
|
[NETMAP_BUF_POOL] = {
|
|
|
|
.name = "%s_buf",
|
|
|
|
.objminsize = 64,
|
|
|
|
.objmaxsize = 65536,
|
|
|
|
.nummin = 4,
|
|
|
|
.nummax = 1000000, /* one million! */
|
|
|
|
},
|
|
|
|
},
|
|
|
|
.config = netmap_mem_private_config,
|
|
|
|
.finalize = netmap_mem_private_finalize,
|
|
|
|
.deref = netmap_mem_private_deref,
|
|
|
|
|
|
|
|
.flags = NETMAP_MEM_PRIVATE,
|
|
|
|
};
|
|
|
|
|
2012-10-19 04:13:12 +00:00
|
|
|
/* memory allocator related sysctls */
|
|
|
|
|
|
|
|
#define STRINGIFY(x) #x
|
|
|
|
|
2013-11-01 21:21:14 +00:00
|
|
|
|
2012-10-19 04:13:12 +00:00
|
|
|
#define DECLARE_SYSCTLS(id, name) \
|
|
|
|
SYSCTL_INT(_dev_netmap, OID_AUTO, name##_size, \
|
|
|
|
CTLFLAG_RW, &netmap_params[id].size, 0, "Requested size of netmap " STRINGIFY(name) "s"); \
|
2013-12-15 08:37:24 +00:00
|
|
|
SYSCTL_INT(_dev_netmap, OID_AUTO, name##_curr_size, \
|
|
|
|
CTLFLAG_RD, &nm_mem.pools[id]._objsize, 0, "Current size of netmap " STRINGIFY(name) "s"); \
|
|
|
|
SYSCTL_INT(_dev_netmap, OID_AUTO, name##_num, \
|
|
|
|
CTLFLAG_RW, &netmap_params[id].num, 0, "Requested number of netmap " STRINGIFY(name) "s"); \
|
|
|
|
SYSCTL_INT(_dev_netmap, OID_AUTO, name##_curr_num, \
|
This new version of netmap brings you the following:
- netmap pipes, providing bidirectional blocking I/O while moving
100+ Mpps between processes using shared memory channels
(no mistake: over one hundred million. But mind you, i said
*moving* not *processing*);
- kqueue support (BHyVe needs it);
- improved user library. Just the interface name lets you select a NIC,
host port, VALE switch port, netmap pipe, and individual queues.
The upcoming netmap-enabled libpcap will use this feature.
- optional extra buffers associated to netmap ports, for applications
that need to buffer data yet don't want to make copies.
- segmentation offloading for the VALE switch, useful between VMs.
and a number of bug fixes and performance improvements.
My colleagues Giuseppe Lettieri and Vincenzo Maffione did a substantial
amount of work on these features so we owe them a big thanks.
There are some external repositories that can be of interest:
https://code.google.com/p/netmap
our public repository for netmap/VALE code, including
linux versions and other stuff that does not belong here,
such as python bindings.
https://code.google.com/p/netmap-libpcap
a clone of the libpcap repository with netmap support.
With this any libpcap client has access to most netmap
feature with no recompilation. E.g. tcpdump can filter
packets at 10-15 Mpps.
https://code.google.com/p/netmap-ipfw
a userspace version of ipfw+dummynet which uses netmap
to send/receive packets. Speed is up in the 7-10 Mpps
range per core for simple rulesets.
Both netmap-libpcap and netmap-ipfw will be merged upstream at some
point, but while this happens it is useful to have access to them.
And yes, this code will be merged soon. It is infinitely better
than the version currently in 10 and 9.
MFC after: 3 days
2014-02-15 04:53:04 +00:00
|
|
|
CTLFLAG_RD, &nm_mem.pools[id].objtotal, 0, "Current number of netmap " STRINGIFY(name) "s"); \
|
|
|
|
SYSCTL_INT(_dev_netmap, OID_AUTO, priv_##name##_size, \
|
|
|
|
CTLFLAG_RW, &netmap_min_priv_params[id].size, 0, \
|
|
|
|
"Default size of private netmap " STRINGIFY(name) "s"); \
|
|
|
|
SYSCTL_INT(_dev_netmap, OID_AUTO, priv_##name##_num, \
|
|
|
|
CTLFLAG_RW, &netmap_min_priv_params[id].num, 0, \
|
|
|
|
"Default number of private netmap " STRINGIFY(name) "s")
|
2012-10-19 04:13:12 +00:00
|
|
|
|
2013-11-01 21:21:14 +00:00
|
|
|
SYSCTL_DECL(_dev_netmap);
|
2012-10-19 04:13:12 +00:00
|
|
|
DECLARE_SYSCTLS(NETMAP_IF_POOL, if);
|
|
|
|
DECLARE_SYSCTLS(NETMAP_RING_POOL, ring);
|
|
|
|
DECLARE_SYSCTLS(NETMAP_BUF_POOL, buf);
|
2012-04-13 16:32:33 +00:00
|
|
|
|
This new version of netmap brings you the following:
- netmap pipes, providing bidirectional blocking I/O while moving
100+ Mpps between processes using shared memory channels
(no mistake: over one hundred million. But mind you, i said
*moving* not *processing*);
- kqueue support (BHyVe needs it);
- improved user library. Just the interface name lets you select a NIC,
host port, VALE switch port, netmap pipe, and individual queues.
The upcoming netmap-enabled libpcap will use this feature.
- optional extra buffers associated to netmap ports, for applications
that need to buffer data yet don't want to make copies.
- segmentation offloading for the VALE switch, useful between VMs.
and a number of bug fixes and performance improvements.
My colleagues Giuseppe Lettieri and Vincenzo Maffione did a substantial
amount of work on these features so we owe them a big thanks.
There are some external repositories that can be of interest:
https://code.google.com/p/netmap
our public repository for netmap/VALE code, including
linux versions and other stuff that does not belong here,
such as python bindings.
https://code.google.com/p/netmap-libpcap
a clone of the libpcap repository with netmap support.
With this any libpcap client has access to most netmap
feature with no recompilation. E.g. tcpdump can filter
packets at 10-15 Mpps.
https://code.google.com/p/netmap-ipfw
a userspace version of ipfw+dummynet which uses netmap
to send/receive packets. Speed is up in the 7-10 Mpps
range per core for simple rulesets.
Both netmap-libpcap and netmap-ipfw will be merged upstream at some
point, but while this happens it is useful to have access to them.
And yes, this code will be merged soon. It is infinitely better
than the version currently in 10 and 9.
MFC after: 3 days
2014-02-15 04:53:04 +00:00
|
|
|
static int
|
|
|
|
nm_mem_assign_id(struct netmap_mem_d *nmd)
|
|
|
|
{
|
|
|
|
nm_memid_t id;
|
|
|
|
struct netmap_mem_d *scan = netmap_last_mem_d;
|
|
|
|
int error = ENOMEM;
|
|
|
|
|
|
|
|
NMA_LOCK(&nm_mem);
|
|
|
|
|
|
|
|
do {
|
|
|
|
/* we rely on unsigned wrap around */
|
|
|
|
id = scan->nm_id + 1;
|
|
|
|
if (id == 0) /* reserve 0 as error value */
|
|
|
|
id = 1;
|
|
|
|
scan = scan->next;
|
|
|
|
if (id != scan->nm_id) {
|
|
|
|
nmd->nm_id = id;
|
|
|
|
nmd->prev = scan->prev;
|
|
|
|
nmd->next = scan;
|
|
|
|
scan->prev->next = nmd;
|
|
|
|
scan->prev = nmd;
|
|
|
|
netmap_last_mem_d = nmd;
|
|
|
|
error = 0;
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
} while (scan != netmap_last_mem_d);
|
|
|
|
|
|
|
|
NMA_UNLOCK(&nm_mem);
|
|
|
|
return error;
|
|
|
|
}
|
|
|
|
|
|
|
|
static void
|
|
|
|
nm_mem_release_id(struct netmap_mem_d *nmd)
|
|
|
|
{
|
|
|
|
NMA_LOCK(&nm_mem);
|
|
|
|
|
|
|
|
nmd->prev->next = nmd->next;
|
|
|
|
nmd->next->prev = nmd->prev;
|
|
|
|
|
|
|
|
if (netmap_last_mem_d == nmd)
|
|
|
|
netmap_last_mem_d = nmd->prev;
|
|
|
|
|
|
|
|
nmd->prev = nmd->next = NULL;
|
|
|
|
|
|
|
|
NMA_UNLOCK(&nm_mem);
|
|
|
|
}
|
|
|
|
|
Update to the current version of netmap.
Mostly bugfixes or features developed in the past 6 months,
so this is a 10.1 candidate.
Basically no user API changes (some bugfixes in sys/net/netmap_user.h).
In detail:
1. netmap support for virtio-net, including in netmap mode.
Under bhyve and with a netmap backend [2] we reach over 1Mpps
with standard APIs (e.g. libpcap), and 5-8 Mpps in netmap mode.
2. (kernel) add support for multiple memory allocators, so we can
better partition physical and virtual interfaces giving access
to separate users. The most visible effect is one additional
argument to the various kernel functions to compute buffer
addresses. All netmap-supported drivers are affected, but changes
are mechanical and trivial
3. (kernel) simplify the prototype for *txsync() and *rxsync()
driver methods. All netmap drivers affected, changes mostly mechanical.
4. add support for netmap-monitor ports. Think of it as a mirroring
port on a physical switch: a netmap monitor port replicates traffic
present on the main port. Restrictions apply. Drive carefully.
5. if_lem.c: support for various paravirtualization features,
experimental and disabled by default.
Most of these are described in our ANCS'13 paper [1].
Paravirtualized support in netmap mode is new, and beats the
numbers in the paper by a large factor (under qemu-kvm,
we measured gues-host throughput up to 10-12 Mpps).
A lot of refactoring and additional documentation in the files
in sys/dev/netmap, but apart from #2 and #3 above, almost nothing
of this stuff is visible to other kernel parts.
Example programs in tools/tools/netmap have been updated with bugfixes
and to support more of the existing features.
This is meant to go into 10.1 so we plan an MFC before the Aug.22 deadline.
A lot of this code has been contributed by my colleagues at UNIPI,
including Giuseppe Lettieri, Vincenzo Maffione, Stefano Garzarella.
MFC after: 3 days.
2014-08-16 15:00:01 +00:00
|
|
|
static int
|
|
|
|
nm_mem_assign_group(struct netmap_mem_d *nmd, struct device *dev)
|
|
|
|
{
|
|
|
|
int err = 0, id;
|
|
|
|
id = nm_iommu_group_id(dev);
|
|
|
|
if (netmap_verbose)
|
|
|
|
D("iommu_group %d", id);
|
|
|
|
|
|
|
|
NMA_LOCK(nmd);
|
|
|
|
|
|
|
|
if (nmd->nm_grp < 0)
|
|
|
|
nmd->nm_grp = id;
|
|
|
|
|
|
|
|
if (nmd->nm_grp != id)
|
|
|
|
nmd->lasterr = err = ENOMEM;
|
|
|
|
|
|
|
|
NMA_UNLOCK(nmd);
|
|
|
|
return err;
|
|
|
|
}
|
This new version of netmap brings you the following:
- netmap pipes, providing bidirectional blocking I/O while moving
100+ Mpps between processes using shared memory channels
(no mistake: over one hundred million. But mind you, i said
*moving* not *processing*);
- kqueue support (BHyVe needs it);
- improved user library. Just the interface name lets you select a NIC,
host port, VALE switch port, netmap pipe, and individual queues.
The upcoming netmap-enabled libpcap will use this feature.
- optional extra buffers associated to netmap ports, for applications
that need to buffer data yet don't want to make copies.
- segmentation offloading for the VALE switch, useful between VMs.
and a number of bug fixes and performance improvements.
My colleagues Giuseppe Lettieri and Vincenzo Maffione did a substantial
amount of work on these features so we owe them a big thanks.
There are some external repositories that can be of interest:
https://code.google.com/p/netmap
our public repository for netmap/VALE code, including
linux versions and other stuff that does not belong here,
such as python bindings.
https://code.google.com/p/netmap-libpcap
a clone of the libpcap repository with netmap support.
With this any libpcap client has access to most netmap
feature with no recompilation. E.g. tcpdump can filter
packets at 10-15 Mpps.
https://code.google.com/p/netmap-ipfw
a userspace version of ipfw+dummynet which uses netmap
to send/receive packets. Speed is up in the 7-10 Mpps
range per core for simple rulesets.
Both netmap-libpcap and netmap-ipfw will be merged upstream at some
point, but while this happens it is useful to have access to them.
And yes, this code will be merged soon. It is infinitely better
than the version currently in 10 and 9.
MFC after: 3 days
2014-02-15 04:53:04 +00:00
|
|
|
|
2012-04-13 16:32:33 +00:00
|
|
|
/*
|
2013-04-19 21:08:21 +00:00
|
|
|
* First, find the allocator that contains the requested offset,
|
|
|
|
* then locate the cluster through a lookup table.
|
2012-04-13 16:32:33 +00:00
|
|
|
*/
|
2013-11-01 21:21:14 +00:00
|
|
|
vm_paddr_t
|
|
|
|
netmap_mem_ofstophys(struct netmap_mem_d* nmd, vm_ooffset_t offset)
|
2012-04-13 16:32:33 +00:00
|
|
|
{
|
|
|
|
int i;
|
2013-11-01 21:21:14 +00:00
|
|
|
vm_ooffset_t o = offset;
|
|
|
|
vm_paddr_t pa;
|
|
|
|
struct netmap_obj_pool *p;
|
|
|
|
|
|
|
|
NMA_LOCK(nmd);
|
|
|
|
p = nmd->pools;
|
2012-04-13 16:32:33 +00:00
|
|
|
|
2013-11-01 21:21:14 +00:00
|
|
|
for (i = 0; i < NETMAP_POOLS_NR; offset -= p[i].memtotal, i++) {
|
|
|
|
if (offset >= p[i].memtotal)
|
2012-04-13 16:32:33 +00:00
|
|
|
continue;
|
2013-04-19 21:08:21 +00:00
|
|
|
// now lookup the cluster's address
|
Update to the current version of netmap.
Mostly bugfixes or features developed in the past 6 months,
so this is a 10.1 candidate.
Basically no user API changes (some bugfixes in sys/net/netmap_user.h).
In detail:
1. netmap support for virtio-net, including in netmap mode.
Under bhyve and with a netmap backend [2] we reach over 1Mpps
with standard APIs (e.g. libpcap), and 5-8 Mpps in netmap mode.
2. (kernel) add support for multiple memory allocators, so we can
better partition physical and virtual interfaces giving access
to separate users. The most visible effect is one additional
argument to the various kernel functions to compute buffer
addresses. All netmap-supported drivers are affected, but changes
are mechanical and trivial
3. (kernel) simplify the prototype for *txsync() and *rxsync()
driver methods. All netmap drivers affected, changes mostly mechanical.
4. add support for netmap-monitor ports. Think of it as a mirroring
port on a physical switch: a netmap monitor port replicates traffic
present on the main port. Restrictions apply. Drive carefully.
5. if_lem.c: support for various paravirtualization features,
experimental and disabled by default.
Most of these are described in our ANCS'13 paper [1].
Paravirtualized support in netmap mode is new, and beats the
numbers in the paper by a large factor (under qemu-kvm,
we measured gues-host throughput up to 10-12 Mpps).
A lot of refactoring and additional documentation in the files
in sys/dev/netmap, but apart from #2 and #3 above, almost nothing
of this stuff is visible to other kernel parts.
Example programs in tools/tools/netmap have been updated with bugfixes
and to support more of the existing features.
This is meant to go into 10.1 so we plan an MFC before the Aug.22 deadline.
A lot of this code has been contributed by my colleagues at UNIPI,
including Giuseppe Lettieri, Vincenzo Maffione, Stefano Garzarella.
MFC after: 3 days.
2014-08-16 15:00:01 +00:00
|
|
|
pa = vtophys(p[i].lut[offset / p[i]._objsize].vaddr) +
|
2012-10-19 04:13:12 +00:00
|
|
|
offset % p[i]._objsize;
|
2013-11-01 21:21:14 +00:00
|
|
|
NMA_UNLOCK(nmd);
|
|
|
|
return pa;
|
2012-04-13 16:32:33 +00:00
|
|
|
}
|
2012-10-19 04:13:12 +00:00
|
|
|
/* this is only in case of errors */
|
2012-04-14 16:44:18 +00:00
|
|
|
D("invalid ofs 0x%x out of 0x%x 0x%x 0x%x", (u_int)o,
|
2013-11-01 21:21:14 +00:00
|
|
|
p[NETMAP_IF_POOL].memtotal,
|
|
|
|
p[NETMAP_IF_POOL].memtotal
|
|
|
|
+ p[NETMAP_RING_POOL].memtotal,
|
|
|
|
p[NETMAP_IF_POOL].memtotal
|
|
|
|
+ p[NETMAP_RING_POOL].memtotal
|
|
|
|
+ p[NETMAP_BUF_POOL].memtotal);
|
|
|
|
NMA_UNLOCK(nmd);
|
2012-04-13 16:32:33 +00:00
|
|
|
return 0; // XXX bad address
|
|
|
|
}
|
|
|
|
|
2013-11-01 21:21:14 +00:00
|
|
|
int
|
This new version of netmap brings you the following:
- netmap pipes, providing bidirectional blocking I/O while moving
100+ Mpps between processes using shared memory channels
(no mistake: over one hundred million. But mind you, i said
*moving* not *processing*);
- kqueue support (BHyVe needs it);
- improved user library. Just the interface name lets you select a NIC,
host port, VALE switch port, netmap pipe, and individual queues.
The upcoming netmap-enabled libpcap will use this feature.
- optional extra buffers associated to netmap ports, for applications
that need to buffer data yet don't want to make copies.
- segmentation offloading for the VALE switch, useful between VMs.
and a number of bug fixes and performance improvements.
My colleagues Giuseppe Lettieri and Vincenzo Maffione did a substantial
amount of work on these features so we owe them a big thanks.
There are some external repositories that can be of interest:
https://code.google.com/p/netmap
our public repository for netmap/VALE code, including
linux versions and other stuff that does not belong here,
such as python bindings.
https://code.google.com/p/netmap-libpcap
a clone of the libpcap repository with netmap support.
With this any libpcap client has access to most netmap
feature with no recompilation. E.g. tcpdump can filter
packets at 10-15 Mpps.
https://code.google.com/p/netmap-ipfw
a userspace version of ipfw+dummynet which uses netmap
to send/receive packets. Speed is up in the 7-10 Mpps
range per core for simple rulesets.
Both netmap-libpcap and netmap-ipfw will be merged upstream at some
point, but while this happens it is useful to have access to them.
And yes, this code will be merged soon. It is infinitely better
than the version currently in 10 and 9.
MFC after: 3 days
2014-02-15 04:53:04 +00:00
|
|
|
netmap_mem_get_info(struct netmap_mem_d* nmd, u_int* size, u_int *memflags,
|
|
|
|
nm_memid_t *id)
|
2013-11-01 21:21:14 +00:00
|
|
|
{
|
|
|
|
int error = 0;
|
|
|
|
NMA_LOCK(nmd);
|
|
|
|
error = nmd->config(nmd);
|
|
|
|
if (error)
|
|
|
|
goto out;
|
Update to the current version of netmap.
Mostly bugfixes or features developed in the past 6 months,
so this is a 10.1 candidate.
Basically no user API changes (some bugfixes in sys/net/netmap_user.h).
In detail:
1. netmap support for virtio-net, including in netmap mode.
Under bhyve and with a netmap backend [2] we reach over 1Mpps
with standard APIs (e.g. libpcap), and 5-8 Mpps in netmap mode.
2. (kernel) add support for multiple memory allocators, so we can
better partition physical and virtual interfaces giving access
to separate users. The most visible effect is one additional
argument to the various kernel functions to compute buffer
addresses. All netmap-supported drivers are affected, but changes
are mechanical and trivial
3. (kernel) simplify the prototype for *txsync() and *rxsync()
driver methods. All netmap drivers affected, changes mostly mechanical.
4. add support for netmap-monitor ports. Think of it as a mirroring
port on a physical switch: a netmap monitor port replicates traffic
present on the main port. Restrictions apply. Drive carefully.
5. if_lem.c: support for various paravirtualization features,
experimental and disabled by default.
Most of these are described in our ANCS'13 paper [1].
Paravirtualized support in netmap mode is new, and beats the
numbers in the paper by a large factor (under qemu-kvm,
we measured gues-host throughput up to 10-12 Mpps).
A lot of refactoring and additional documentation in the files
in sys/dev/netmap, but apart from #2 and #3 above, almost nothing
of this stuff is visible to other kernel parts.
Example programs in tools/tools/netmap have been updated with bugfixes
and to support more of the existing features.
This is meant to go into 10.1 so we plan an MFC before the Aug.22 deadline.
A lot of this code has been contributed by my colleagues at UNIPI,
including Giuseppe Lettieri, Vincenzo Maffione, Stefano Garzarella.
MFC after: 3 days.
2014-08-16 15:00:01 +00:00
|
|
|
if (size) {
|
|
|
|
if (nmd->flags & NETMAP_MEM_FINALIZED) {
|
|
|
|
*size = nmd->nm_totalsize;
|
|
|
|
} else {
|
|
|
|
int i;
|
|
|
|
*size = 0;
|
|
|
|
for (i = 0; i < NETMAP_POOLS_NR; i++) {
|
|
|
|
struct netmap_obj_pool *p = nmd->pools + i;
|
|
|
|
*size += (p->_numclusters * p->_clustsize);
|
|
|
|
}
|
2013-11-01 21:21:14 +00:00
|
|
|
}
|
|
|
|
}
|
Update to the current version of netmap.
Mostly bugfixes or features developed in the past 6 months,
so this is a 10.1 candidate.
Basically no user API changes (some bugfixes in sys/net/netmap_user.h).
In detail:
1. netmap support for virtio-net, including in netmap mode.
Under bhyve and with a netmap backend [2] we reach over 1Mpps
with standard APIs (e.g. libpcap), and 5-8 Mpps in netmap mode.
2. (kernel) add support for multiple memory allocators, so we can
better partition physical and virtual interfaces giving access
to separate users. The most visible effect is one additional
argument to the various kernel functions to compute buffer
addresses. All netmap-supported drivers are affected, but changes
are mechanical and trivial
3. (kernel) simplify the prototype for *txsync() and *rxsync()
driver methods. All netmap drivers affected, changes mostly mechanical.
4. add support for netmap-monitor ports. Think of it as a mirroring
port on a physical switch: a netmap monitor port replicates traffic
present on the main port. Restrictions apply. Drive carefully.
5. if_lem.c: support for various paravirtualization features,
experimental and disabled by default.
Most of these are described in our ANCS'13 paper [1].
Paravirtualized support in netmap mode is new, and beats the
numbers in the paper by a large factor (under qemu-kvm,
we measured gues-host throughput up to 10-12 Mpps).
A lot of refactoring and additional documentation in the files
in sys/dev/netmap, but apart from #2 and #3 above, almost nothing
of this stuff is visible to other kernel parts.
Example programs in tools/tools/netmap have been updated with bugfixes
and to support more of the existing features.
This is meant to go into 10.1 so we plan an MFC before the Aug.22 deadline.
A lot of this code has been contributed by my colleagues at UNIPI,
including Giuseppe Lettieri, Vincenzo Maffione, Stefano Garzarella.
MFC after: 3 days.
2014-08-16 15:00:01 +00:00
|
|
|
if (memflags)
|
|
|
|
*memflags = nmd->flags;
|
|
|
|
if (id)
|
|
|
|
*id = nmd->nm_id;
|
2013-11-01 21:21:14 +00:00
|
|
|
out:
|
|
|
|
NMA_UNLOCK(nmd);
|
|
|
|
return error;
|
|
|
|
}
|
|
|
|
|
2012-04-13 16:32:33 +00:00
|
|
|
/*
|
|
|
|
* we store objects by kernel address, need to find the offset
|
|
|
|
* within the pool to export the value to userspace.
|
|
|
|
* Algorithm: scan until we find the cluster, then add the
|
|
|
|
* actual offset in the cluster
|
|
|
|
*/
|
2012-04-13 22:24:57 +00:00
|
|
|
static ssize_t
|
2012-04-13 16:32:33 +00:00
|
|
|
netmap_obj_offset(struct netmap_obj_pool *p, const void *vaddr)
|
|
|
|
{
|
2013-11-01 21:21:14 +00:00
|
|
|
int i, k = p->_clustentries, n = p->objtotal;
|
2012-04-13 16:32:33 +00:00
|
|
|
ssize_t ofs = 0;
|
|
|
|
|
|
|
|
for (i = 0; i < n; i += k, ofs += p->_clustsize) {
|
|
|
|
const char *base = p->lut[i].vaddr;
|
|
|
|
ssize_t relofs = (const char *) vaddr - base;
|
|
|
|
|
2013-04-15 11:49:16 +00:00
|
|
|
if (relofs < 0 || relofs >= p->_clustsize)
|
2012-04-13 16:32:33 +00:00
|
|
|
continue;
|
|
|
|
|
|
|
|
ofs = ofs + relofs;
|
|
|
|
ND("%s: return offset %d (cluster %d) for pointer %p",
|
|
|
|
p->name, ofs, i, vaddr);
|
|
|
|
return ofs;
|
|
|
|
}
|
|
|
|
D("address %p is not contained inside any cluster (%s)",
|
|
|
|
vaddr, p->name);
|
|
|
|
return 0; /* An error occurred */
|
|
|
|
}
|
|
|
|
|
|
|
|
/* Helper functions which convert virtual addresses to offsets */
|
2013-11-01 21:21:14 +00:00
|
|
|
#define netmap_if_offset(n, v) \
|
|
|
|
netmap_obj_offset(&(n)->pools[NETMAP_IF_POOL], (v))
|
2012-04-13 16:32:33 +00:00
|
|
|
|
2013-11-01 21:21:14 +00:00
|
|
|
#define netmap_ring_offset(n, v) \
|
|
|
|
((n)->pools[NETMAP_IF_POOL].memtotal + \
|
|
|
|
netmap_obj_offset(&(n)->pools[NETMAP_RING_POOL], (v)))
|
2012-04-13 16:32:33 +00:00
|
|
|
|
2013-11-01 21:21:14 +00:00
|
|
|
#define netmap_buf_offset(n, v) \
|
|
|
|
((n)->pools[NETMAP_IF_POOL].memtotal + \
|
|
|
|
(n)->pools[NETMAP_RING_POOL].memtotal + \
|
|
|
|
netmap_obj_offset(&(n)->pools[NETMAP_BUF_POOL], (v)))
|
2012-04-13 16:32:33 +00:00
|
|
|
|
|
|
|
|
2013-11-01 21:21:14 +00:00
|
|
|
ssize_t
|
|
|
|
netmap_mem_if_offset(struct netmap_mem_d *nmd, const void *addr)
|
|
|
|
{
|
|
|
|
ssize_t v;
|
|
|
|
NMA_LOCK(nmd);
|
|
|
|
v = netmap_if_offset(nmd, addr);
|
|
|
|
NMA_UNLOCK(nmd);
|
|
|
|
return v;
|
|
|
|
}
|
|
|
|
|
2012-10-19 04:13:12 +00:00
|
|
|
/*
|
|
|
|
* report the index, and use start position as a hint,
|
|
|
|
* otherwise buffer allocation becomes terribly expensive.
|
|
|
|
*/
|
2012-04-13 16:32:33 +00:00
|
|
|
static void *
|
2013-11-01 21:21:14 +00:00
|
|
|
netmap_obj_malloc(struct netmap_obj_pool *p, u_int len, uint32_t *start, uint32_t *index)
|
2012-04-13 16:32:33 +00:00
|
|
|
{
|
|
|
|
uint32_t i = 0; /* index in the bitmap */
|
|
|
|
uint32_t mask, j; /* slot counter */
|
|
|
|
void *vaddr = NULL;
|
|
|
|
|
|
|
|
if (len > p->_objsize) {
|
|
|
|
D("%s request size %d too large", p->name, len);
|
|
|
|
// XXX cannot reduce the size
|
|
|
|
return NULL;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (p->objfree == 0) {
|
2013-12-15 08:37:24 +00:00
|
|
|
D("no more %s objects", p->name);
|
2012-04-13 16:32:33 +00:00
|
|
|
return NULL;
|
|
|
|
}
|
2012-10-19 04:13:12 +00:00
|
|
|
if (start)
|
|
|
|
i = *start;
|
2012-04-13 16:32:33 +00:00
|
|
|
|
2012-10-19 04:13:12 +00:00
|
|
|
/* termination is guaranteed by p->free, but better check bounds on i */
|
|
|
|
while (vaddr == NULL && i < p->bitmap_slots) {
|
2012-04-13 16:32:33 +00:00
|
|
|
uint32_t cur = p->bitmap[i];
|
|
|
|
if (cur == 0) { /* bitmask is fully used */
|
|
|
|
i++;
|
|
|
|
continue;
|
|
|
|
}
|
|
|
|
/* locate a slot */
|
|
|
|
for (j = 0, mask = 1; (cur & mask) == 0; j++, mask <<= 1)
|
|
|
|
;
|
|
|
|
|
|
|
|
p->bitmap[i] &= ~mask; /* mark object as in use */
|
|
|
|
p->objfree--;
|
|
|
|
|
|
|
|
vaddr = p->lut[i * 32 + j].vaddr;
|
2012-10-19 04:13:12 +00:00
|
|
|
if (index)
|
|
|
|
*index = i * 32 + j;
|
2012-04-13 16:32:33 +00:00
|
|
|
}
|
|
|
|
ND("%s allocator: allocated object @ [%d][%d]: vaddr %p", i, j, vaddr);
|
|
|
|
|
2012-10-19 04:13:12 +00:00
|
|
|
if (start)
|
|
|
|
*start = i;
|
2012-04-13 16:32:33 +00:00
|
|
|
return vaddr;
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
/*
|
This new version of netmap brings you the following:
- netmap pipes, providing bidirectional blocking I/O while moving
100+ Mpps between processes using shared memory channels
(no mistake: over one hundred million. But mind you, i said
*moving* not *processing*);
- kqueue support (BHyVe needs it);
- improved user library. Just the interface name lets you select a NIC,
host port, VALE switch port, netmap pipe, and individual queues.
The upcoming netmap-enabled libpcap will use this feature.
- optional extra buffers associated to netmap ports, for applications
that need to buffer data yet don't want to make copies.
- segmentation offloading for the VALE switch, useful between VMs.
and a number of bug fixes and performance improvements.
My colleagues Giuseppe Lettieri and Vincenzo Maffione did a substantial
amount of work on these features so we owe them a big thanks.
There are some external repositories that can be of interest:
https://code.google.com/p/netmap
our public repository for netmap/VALE code, including
linux versions and other stuff that does not belong here,
such as python bindings.
https://code.google.com/p/netmap-libpcap
a clone of the libpcap repository with netmap support.
With this any libpcap client has access to most netmap
feature with no recompilation. E.g. tcpdump can filter
packets at 10-15 Mpps.
https://code.google.com/p/netmap-ipfw
a userspace version of ipfw+dummynet which uses netmap
to send/receive packets. Speed is up in the 7-10 Mpps
range per core for simple rulesets.
Both netmap-libpcap and netmap-ipfw will be merged upstream at some
point, but while this happens it is useful to have access to them.
And yes, this code will be merged soon. It is infinitely better
than the version currently in 10 and 9.
MFC after: 3 days
2014-02-15 04:53:04 +00:00
|
|
|
* free by index, not by address.
|
|
|
|
* XXX should we also cleanup the content ?
|
2012-04-13 16:32:33 +00:00
|
|
|
*/
|
This new version of netmap brings you the following:
- netmap pipes, providing bidirectional blocking I/O while moving
100+ Mpps between processes using shared memory channels
(no mistake: over one hundred million. But mind you, i said
*moving* not *processing*);
- kqueue support (BHyVe needs it);
- improved user library. Just the interface name lets you select a NIC,
host port, VALE switch port, netmap pipe, and individual queues.
The upcoming netmap-enabled libpcap will use this feature.
- optional extra buffers associated to netmap ports, for applications
that need to buffer data yet don't want to make copies.
- segmentation offloading for the VALE switch, useful between VMs.
and a number of bug fixes and performance improvements.
My colleagues Giuseppe Lettieri and Vincenzo Maffione did a substantial
amount of work on these features so we owe them a big thanks.
There are some external repositories that can be of interest:
https://code.google.com/p/netmap
our public repository for netmap/VALE code, including
linux versions and other stuff that does not belong here,
such as python bindings.
https://code.google.com/p/netmap-libpcap
a clone of the libpcap repository with netmap support.
With this any libpcap client has access to most netmap
feature with no recompilation. E.g. tcpdump can filter
packets at 10-15 Mpps.
https://code.google.com/p/netmap-ipfw
a userspace version of ipfw+dummynet which uses netmap
to send/receive packets. Speed is up in the 7-10 Mpps
range per core for simple rulesets.
Both netmap-libpcap and netmap-ipfw will be merged upstream at some
point, but while this happens it is useful to have access to them.
And yes, this code will be merged soon. It is infinitely better
than the version currently in 10 and 9.
MFC after: 3 days
2014-02-15 04:53:04 +00:00
|
|
|
static int
|
2012-04-13 16:32:33 +00:00
|
|
|
netmap_obj_free(struct netmap_obj_pool *p, uint32_t j)
|
|
|
|
{
|
This new version of netmap brings you the following:
- netmap pipes, providing bidirectional blocking I/O while moving
100+ Mpps between processes using shared memory channels
(no mistake: over one hundred million. But mind you, i said
*moving* not *processing*);
- kqueue support (BHyVe needs it);
- improved user library. Just the interface name lets you select a NIC,
host port, VALE switch port, netmap pipe, and individual queues.
The upcoming netmap-enabled libpcap will use this feature.
- optional extra buffers associated to netmap ports, for applications
that need to buffer data yet don't want to make copies.
- segmentation offloading for the VALE switch, useful between VMs.
and a number of bug fixes and performance improvements.
My colleagues Giuseppe Lettieri and Vincenzo Maffione did a substantial
amount of work on these features so we owe them a big thanks.
There are some external repositories that can be of interest:
https://code.google.com/p/netmap
our public repository for netmap/VALE code, including
linux versions and other stuff that does not belong here,
such as python bindings.
https://code.google.com/p/netmap-libpcap
a clone of the libpcap repository with netmap support.
With this any libpcap client has access to most netmap
feature with no recompilation. E.g. tcpdump can filter
packets at 10-15 Mpps.
https://code.google.com/p/netmap-ipfw
a userspace version of ipfw+dummynet which uses netmap
to send/receive packets. Speed is up in the 7-10 Mpps
range per core for simple rulesets.
Both netmap-libpcap and netmap-ipfw will be merged upstream at some
point, but while this happens it is useful to have access to them.
And yes, this code will be merged soon. It is infinitely better
than the version currently in 10 and 9.
MFC after: 3 days
2014-02-15 04:53:04 +00:00
|
|
|
uint32_t *ptr, mask;
|
|
|
|
|
2012-04-13 16:32:33 +00:00
|
|
|
if (j >= p->objtotal) {
|
|
|
|
D("invalid index %u, max %u", j, p->objtotal);
|
This new version of netmap brings you the following:
- netmap pipes, providing bidirectional blocking I/O while moving
100+ Mpps between processes using shared memory channels
(no mistake: over one hundred million. But mind you, i said
*moving* not *processing*);
- kqueue support (BHyVe needs it);
- improved user library. Just the interface name lets you select a NIC,
host port, VALE switch port, netmap pipe, and individual queues.
The upcoming netmap-enabled libpcap will use this feature.
- optional extra buffers associated to netmap ports, for applications
that need to buffer data yet don't want to make copies.
- segmentation offloading for the VALE switch, useful between VMs.
and a number of bug fixes and performance improvements.
My colleagues Giuseppe Lettieri and Vincenzo Maffione did a substantial
amount of work on these features so we owe them a big thanks.
There are some external repositories that can be of interest:
https://code.google.com/p/netmap
our public repository for netmap/VALE code, including
linux versions and other stuff that does not belong here,
such as python bindings.
https://code.google.com/p/netmap-libpcap
a clone of the libpcap repository with netmap support.
With this any libpcap client has access to most netmap
feature with no recompilation. E.g. tcpdump can filter
packets at 10-15 Mpps.
https://code.google.com/p/netmap-ipfw
a userspace version of ipfw+dummynet which uses netmap
to send/receive packets. Speed is up in the 7-10 Mpps
range per core for simple rulesets.
Both netmap-libpcap and netmap-ipfw will be merged upstream at some
point, but while this happens it is useful to have access to them.
And yes, this code will be merged soon. It is infinitely better
than the version currently in 10 and 9.
MFC after: 3 days
2014-02-15 04:53:04 +00:00
|
|
|
return 1;
|
|
|
|
}
|
|
|
|
ptr = &p->bitmap[j / 32];
|
|
|
|
mask = (1 << (j % 32));
|
|
|
|
if (*ptr & mask) {
|
|
|
|
D("ouch, double free on buffer %d", j);
|
|
|
|
return 1;
|
|
|
|
} else {
|
|
|
|
*ptr |= mask;
|
|
|
|
p->objfree++;
|
|
|
|
return 0;
|
2012-04-13 16:32:33 +00:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
This new version of netmap brings you the following:
- netmap pipes, providing bidirectional blocking I/O while moving
100+ Mpps between processes using shared memory channels
(no mistake: over one hundred million. But mind you, i said
*moving* not *processing*);
- kqueue support (BHyVe needs it);
- improved user library. Just the interface name lets you select a NIC,
host port, VALE switch port, netmap pipe, and individual queues.
The upcoming netmap-enabled libpcap will use this feature.
- optional extra buffers associated to netmap ports, for applications
that need to buffer data yet don't want to make copies.
- segmentation offloading for the VALE switch, useful between VMs.
and a number of bug fixes and performance improvements.
My colleagues Giuseppe Lettieri and Vincenzo Maffione did a substantial
amount of work on these features so we owe them a big thanks.
There are some external repositories that can be of interest:
https://code.google.com/p/netmap
our public repository for netmap/VALE code, including
linux versions and other stuff that does not belong here,
such as python bindings.
https://code.google.com/p/netmap-libpcap
a clone of the libpcap repository with netmap support.
With this any libpcap client has access to most netmap
feature with no recompilation. E.g. tcpdump can filter
packets at 10-15 Mpps.
https://code.google.com/p/netmap-ipfw
a userspace version of ipfw+dummynet which uses netmap
to send/receive packets. Speed is up in the 7-10 Mpps
range per core for simple rulesets.
Both netmap-libpcap and netmap-ipfw will be merged upstream at some
point, but while this happens it is useful to have access to them.
And yes, this code will be merged soon. It is infinitely better
than the version currently in 10 and 9.
MFC after: 3 days
2014-02-15 04:53:04 +00:00
|
|
|
/*
|
|
|
|
* free by address. This is slow but is only used for a few
|
|
|
|
* objects (rings, nifp)
|
|
|
|
*/
|
2012-04-13 16:32:33 +00:00
|
|
|
static void
|
|
|
|
netmap_obj_free_va(struct netmap_obj_pool *p, void *vaddr)
|
|
|
|
{
|
2013-11-01 21:21:14 +00:00
|
|
|
u_int i, j, n = p->numclusters;
|
2012-04-13 16:32:33 +00:00
|
|
|
|
2013-11-01 21:21:14 +00:00
|
|
|
for (i = 0, j = 0; i < n; i++, j += p->_clustentries) {
|
|
|
|
void *base = p->lut[i * p->_clustentries].vaddr;
|
2012-04-13 16:32:33 +00:00
|
|
|
ssize_t relofs = (ssize_t) vaddr - (ssize_t) base;
|
|
|
|
|
|
|
|
/* Given address, is out of the scope of the current cluster.*/
|
2013-05-10 08:46:10 +00:00
|
|
|
if (vaddr < base || relofs >= p->_clustsize)
|
2012-04-13 16:32:33 +00:00
|
|
|
continue;
|
|
|
|
|
|
|
|
j = j + relofs / p->_objsize;
|
2013-11-01 21:21:14 +00:00
|
|
|
/* KASSERT(j != 0, ("Cannot free object 0")); */
|
2012-04-13 16:32:33 +00:00
|
|
|
netmap_obj_free(p, j);
|
|
|
|
return;
|
|
|
|
}
|
2013-01-23 03:51:47 +00:00
|
|
|
D("address %p is not contained inside any cluster (%s)",
|
2012-04-13 16:32:33 +00:00
|
|
|
vaddr, p->name);
|
|
|
|
}
|
|
|
|
|
Update to the current version of netmap.
Mostly bugfixes or features developed in the past 6 months,
so this is a 10.1 candidate.
Basically no user API changes (some bugfixes in sys/net/netmap_user.h).
In detail:
1. netmap support for virtio-net, including in netmap mode.
Under bhyve and with a netmap backend [2] we reach over 1Mpps
with standard APIs (e.g. libpcap), and 5-8 Mpps in netmap mode.
2. (kernel) add support for multiple memory allocators, so we can
better partition physical and virtual interfaces giving access
to separate users. The most visible effect is one additional
argument to the various kernel functions to compute buffer
addresses. All netmap-supported drivers are affected, but changes
are mechanical and trivial
3. (kernel) simplify the prototype for *txsync() and *rxsync()
driver methods. All netmap drivers affected, changes mostly mechanical.
4. add support for netmap-monitor ports. Think of it as a mirroring
port on a physical switch: a netmap monitor port replicates traffic
present on the main port. Restrictions apply. Drive carefully.
5. if_lem.c: support for various paravirtualization features,
experimental and disabled by default.
Most of these are described in our ANCS'13 paper [1].
Paravirtualized support in netmap mode is new, and beats the
numbers in the paper by a large factor (under qemu-kvm,
we measured gues-host throughput up to 10-12 Mpps).
A lot of refactoring and additional documentation in the files
in sys/dev/netmap, but apart from #2 and #3 above, almost nothing
of this stuff is visible to other kernel parts.
Example programs in tools/tools/netmap have been updated with bugfixes
and to support more of the existing features.
This is meant to go into 10.1 so we plan an MFC before the Aug.22 deadline.
A lot of this code has been contributed by my colleagues at UNIPI,
including Giuseppe Lettieri, Vincenzo Maffione, Stefano Garzarella.
MFC after: 3 days.
2014-08-16 15:00:01 +00:00
|
|
|
#define netmap_mem_bufsize(n) \
|
|
|
|
((n)->pools[NETMAP_BUF_POOL]._objsize)
|
|
|
|
|
2013-11-01 21:21:14 +00:00
|
|
|
#define netmap_if_malloc(n, len) netmap_obj_malloc(&(n)->pools[NETMAP_IF_POOL], len, NULL, NULL)
|
|
|
|
#define netmap_if_free(n, v) netmap_obj_free_va(&(n)->pools[NETMAP_IF_POOL], (v))
|
|
|
|
#define netmap_ring_malloc(n, len) netmap_obj_malloc(&(n)->pools[NETMAP_RING_POOL], len, NULL, NULL)
|
|
|
|
#define netmap_ring_free(n, v) netmap_obj_free_va(&(n)->pools[NETMAP_RING_POOL], (v))
|
|
|
|
#define netmap_buf_malloc(n, _pos, _index) \
|
Update to the current version of netmap.
Mostly bugfixes or features developed in the past 6 months,
so this is a 10.1 candidate.
Basically no user API changes (some bugfixes in sys/net/netmap_user.h).
In detail:
1. netmap support for virtio-net, including in netmap mode.
Under bhyve and with a netmap backend [2] we reach over 1Mpps
with standard APIs (e.g. libpcap), and 5-8 Mpps in netmap mode.
2. (kernel) add support for multiple memory allocators, so we can
better partition physical and virtual interfaces giving access
to separate users. The most visible effect is one additional
argument to the various kernel functions to compute buffer
addresses. All netmap-supported drivers are affected, but changes
are mechanical and trivial
3. (kernel) simplify the prototype for *txsync() and *rxsync()
driver methods. All netmap drivers affected, changes mostly mechanical.
4. add support for netmap-monitor ports. Think of it as a mirroring
port on a physical switch: a netmap monitor port replicates traffic
present on the main port. Restrictions apply. Drive carefully.
5. if_lem.c: support for various paravirtualization features,
experimental and disabled by default.
Most of these are described in our ANCS'13 paper [1].
Paravirtualized support in netmap mode is new, and beats the
numbers in the paper by a large factor (under qemu-kvm,
we measured gues-host throughput up to 10-12 Mpps).
A lot of refactoring and additional documentation in the files
in sys/dev/netmap, but apart from #2 and #3 above, almost nothing
of this stuff is visible to other kernel parts.
Example programs in tools/tools/netmap have been updated with bugfixes
and to support more of the existing features.
This is meant to go into 10.1 so we plan an MFC before the Aug.22 deadline.
A lot of this code has been contributed by my colleagues at UNIPI,
including Giuseppe Lettieri, Vincenzo Maffione, Stefano Garzarella.
MFC after: 3 days.
2014-08-16 15:00:01 +00:00
|
|
|
netmap_obj_malloc(&(n)->pools[NETMAP_BUF_POOL], netmap_mem_bufsize(n), _pos, _index)
|
2012-04-13 16:32:33 +00:00
|
|
|
|
|
|
|
|
This new version of netmap brings you the following:
- netmap pipes, providing bidirectional blocking I/O while moving
100+ Mpps between processes using shared memory channels
(no mistake: over one hundred million. But mind you, i said
*moving* not *processing*);
- kqueue support (BHyVe needs it);
- improved user library. Just the interface name lets you select a NIC,
host port, VALE switch port, netmap pipe, and individual queues.
The upcoming netmap-enabled libpcap will use this feature.
- optional extra buffers associated to netmap ports, for applications
that need to buffer data yet don't want to make copies.
- segmentation offloading for the VALE switch, useful between VMs.
and a number of bug fixes and performance improvements.
My colleagues Giuseppe Lettieri and Vincenzo Maffione did a substantial
amount of work on these features so we owe them a big thanks.
There are some external repositories that can be of interest:
https://code.google.com/p/netmap
our public repository for netmap/VALE code, including
linux versions and other stuff that does not belong here,
such as python bindings.
https://code.google.com/p/netmap-libpcap
a clone of the libpcap repository with netmap support.
With this any libpcap client has access to most netmap
feature with no recompilation. E.g. tcpdump can filter
packets at 10-15 Mpps.
https://code.google.com/p/netmap-ipfw
a userspace version of ipfw+dummynet which uses netmap
to send/receive packets. Speed is up in the 7-10 Mpps
range per core for simple rulesets.
Both netmap-libpcap and netmap-ipfw will be merged upstream at some
point, but while this happens it is useful to have access to them.
And yes, this code will be merged soon. It is infinitely better
than the version currently in 10 and 9.
MFC after: 3 days
2014-02-15 04:53:04 +00:00
|
|
|
#if 0 // XXX unused
|
2012-04-13 16:32:33 +00:00
|
|
|
/* Return the index associated to the given packet buffer */
|
2013-11-01 21:21:14 +00:00
|
|
|
#define netmap_buf_index(n, v) \
|
|
|
|
(netmap_obj_offset(&(n)->pools[NETMAP_BUF_POOL], (v)) / NETMAP_BDG_BUF_SIZE(n))
|
This new version of netmap brings you the following:
- netmap pipes, providing bidirectional blocking I/O while moving
100+ Mpps between processes using shared memory channels
(no mistake: over one hundred million. But mind you, i said
*moving* not *processing*);
- kqueue support (BHyVe needs it);
- improved user library. Just the interface name lets you select a NIC,
host port, VALE switch port, netmap pipe, and individual queues.
The upcoming netmap-enabled libpcap will use this feature.
- optional extra buffers associated to netmap ports, for applications
that need to buffer data yet don't want to make copies.
- segmentation offloading for the VALE switch, useful between VMs.
and a number of bug fixes and performance improvements.
My colleagues Giuseppe Lettieri and Vincenzo Maffione did a substantial
amount of work on these features so we owe them a big thanks.
There are some external repositories that can be of interest:
https://code.google.com/p/netmap
our public repository for netmap/VALE code, including
linux versions and other stuff that does not belong here,
such as python bindings.
https://code.google.com/p/netmap-libpcap
a clone of the libpcap repository with netmap support.
With this any libpcap client has access to most netmap
feature with no recompilation. E.g. tcpdump can filter
packets at 10-15 Mpps.
https://code.google.com/p/netmap-ipfw
a userspace version of ipfw+dummynet which uses netmap
to send/receive packets. Speed is up in the 7-10 Mpps
range per core for simple rulesets.
Both netmap-libpcap and netmap-ipfw will be merged upstream at some
point, but while this happens it is useful to have access to them.
And yes, this code will be merged soon. It is infinitely better
than the version currently in 10 and 9.
MFC after: 3 days
2014-02-15 04:53:04 +00:00
|
|
|
#endif
|
|
|
|
|
|
|
|
/*
|
|
|
|
* allocate extra buffers in a linked list.
|
|
|
|
* returns the actual number.
|
|
|
|
*/
|
|
|
|
uint32_t
|
|
|
|
netmap_extra_alloc(struct netmap_adapter *na, uint32_t *head, uint32_t n)
|
|
|
|
{
|
|
|
|
struct netmap_mem_d *nmd = na->nm_mem;
|
|
|
|
uint32_t i, pos = 0; /* opaque, scan position in the bitmap */
|
|
|
|
|
|
|
|
NMA_LOCK(nmd);
|
|
|
|
|
|
|
|
*head = 0; /* default, 'null' index ie empty list */
|
|
|
|
for (i = 0 ; i < n; i++) {
|
|
|
|
uint32_t cur = *head; /* save current head */
|
|
|
|
uint32_t *p = netmap_buf_malloc(nmd, &pos, head);
|
|
|
|
if (p == NULL) {
|
|
|
|
D("no more buffers after %d of %d", i, n);
|
|
|
|
*head = cur; /* restore */
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
RD(5, "allocate buffer %d -> %d", *head, cur);
|
|
|
|
*p = cur; /* link to previous head */
|
|
|
|
}
|
|
|
|
|
|
|
|
NMA_UNLOCK(nmd);
|
|
|
|
|
|
|
|
return i;
|
|
|
|
}
|
|
|
|
|
|
|
|
static void
|
|
|
|
netmap_extra_free(struct netmap_adapter *na, uint32_t head)
|
|
|
|
{
|
|
|
|
struct lut_entry *lut = na->na_lut;
|
|
|
|
struct netmap_mem_d *nmd = na->nm_mem;
|
|
|
|
struct netmap_obj_pool *p = &nmd->pools[NETMAP_BUF_POOL];
|
|
|
|
uint32_t i, cur, *buf;
|
|
|
|
|
|
|
|
D("freeing the extra list");
|
|
|
|
for (i = 0; head >=2 && head < p->objtotal; i++) {
|
|
|
|
cur = head;
|
|
|
|
buf = lut[head].vaddr;
|
|
|
|
head = *buf;
|
|
|
|
*buf = 0;
|
|
|
|
if (netmap_obj_free(p, cur))
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
if (head != 0)
|
|
|
|
D("breaking with head %d", head);
|
|
|
|
D("freed %d buffers", i);
|
|
|
|
}
|
2012-04-13 16:32:33 +00:00
|
|
|
|
|
|
|
|
2012-10-19 04:13:12 +00:00
|
|
|
/* Return nonzero on error */
|
|
|
|
static int
|
2013-12-15 08:37:24 +00:00
|
|
|
netmap_new_bufs(struct netmap_mem_d *nmd, struct netmap_slot *slot, u_int n)
|
2012-04-13 16:32:33 +00:00
|
|
|
{
|
2013-11-01 21:21:14 +00:00
|
|
|
struct netmap_obj_pool *p = &nmd->pools[NETMAP_BUF_POOL];
|
|
|
|
u_int i = 0; /* slot counter */
|
2012-10-19 04:13:12 +00:00
|
|
|
uint32_t pos = 0; /* slot in p->bitmap */
|
|
|
|
uint32_t index = 0; /* buffer index */
|
2012-04-13 16:32:33 +00:00
|
|
|
|
|
|
|
for (i = 0; i < n; i++) {
|
2013-11-01 21:21:14 +00:00
|
|
|
void *vaddr = netmap_buf_malloc(nmd, &pos, &index);
|
2012-04-13 16:32:33 +00:00
|
|
|
if (vaddr == NULL) {
|
2013-12-15 08:37:24 +00:00
|
|
|
D("no more buffers after %d of %d", i, n);
|
2012-04-13 16:32:33 +00:00
|
|
|
goto cleanup;
|
|
|
|
}
|
2012-10-19 04:13:12 +00:00
|
|
|
slot[i].buf_idx = index;
|
2012-04-13 16:32:33 +00:00
|
|
|
slot[i].len = p->_objsize;
|
2013-12-15 08:37:24 +00:00
|
|
|
slot[i].flags = 0;
|
2012-04-13 16:32:33 +00:00
|
|
|
}
|
|
|
|
|
2012-10-19 04:13:12 +00:00
|
|
|
ND("allocated %d buffers, %d available, first at %d", n, p->objfree, pos);
|
|
|
|
return (0);
|
2012-04-13 16:32:33 +00:00
|
|
|
|
|
|
|
cleanup:
|
2012-10-17 18:21:14 +00:00
|
|
|
while (i > 0) {
|
|
|
|
i--;
|
2012-10-19 04:13:12 +00:00
|
|
|
netmap_obj_free(p, slot[i].buf_idx);
|
2012-04-13 16:32:33 +00:00
|
|
|
}
|
2012-10-19 04:13:12 +00:00
|
|
|
bzero(slot, n * sizeof(slot[0]));
|
|
|
|
return (ENOMEM);
|
2012-04-13 16:32:33 +00:00
|
|
|
}
|
|
|
|
|
This new version of netmap brings you the following:
- netmap pipes, providing bidirectional blocking I/O while moving
100+ Mpps between processes using shared memory channels
(no mistake: over one hundred million. But mind you, i said
*moving* not *processing*);
- kqueue support (BHyVe needs it);
- improved user library. Just the interface name lets you select a NIC,
host port, VALE switch port, netmap pipe, and individual queues.
The upcoming netmap-enabled libpcap will use this feature.
- optional extra buffers associated to netmap ports, for applications
that need to buffer data yet don't want to make copies.
- segmentation offloading for the VALE switch, useful between VMs.
and a number of bug fixes and performance improvements.
My colleagues Giuseppe Lettieri and Vincenzo Maffione did a substantial
amount of work on these features so we owe them a big thanks.
There are some external repositories that can be of interest:
https://code.google.com/p/netmap
our public repository for netmap/VALE code, including
linux versions and other stuff that does not belong here,
such as python bindings.
https://code.google.com/p/netmap-libpcap
a clone of the libpcap repository with netmap support.
With this any libpcap client has access to most netmap
feature with no recompilation. E.g. tcpdump can filter
packets at 10-15 Mpps.
https://code.google.com/p/netmap-ipfw
a userspace version of ipfw+dummynet which uses netmap
to send/receive packets. Speed is up in the 7-10 Mpps
range per core for simple rulesets.
Both netmap-libpcap and netmap-ipfw will be merged upstream at some
point, but while this happens it is useful to have access to them.
And yes, this code will be merged soon. It is infinitely better
than the version currently in 10 and 9.
MFC after: 3 days
2014-02-15 04:53:04 +00:00
|
|
|
static void
|
|
|
|
netmap_mem_set_ring(struct netmap_mem_d *nmd, struct netmap_slot *slot, u_int n, uint32_t index)
|
|
|
|
{
|
|
|
|
struct netmap_obj_pool *p = &nmd->pools[NETMAP_BUF_POOL];
|
|
|
|
u_int i;
|
|
|
|
|
|
|
|
for (i = 0; i < n; i++) {
|
|
|
|
slot[i].buf_idx = index;
|
|
|
|
slot[i].len = p->_objsize;
|
|
|
|
slot[i].flags = 0;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2012-04-13 16:32:33 +00:00
|
|
|
|
|
|
|
static void
|
2013-12-15 08:37:24 +00:00
|
|
|
netmap_free_buf(struct netmap_mem_d *nmd, uint32_t i)
|
2012-04-13 16:32:33 +00:00
|
|
|
{
|
2013-11-01 21:21:14 +00:00
|
|
|
struct netmap_obj_pool *p = &nmd->pools[NETMAP_BUF_POOL];
|
2012-10-19 04:13:12 +00:00
|
|
|
|
2012-04-13 16:32:33 +00:00
|
|
|
if (i < 2 || i >= p->objtotal) {
|
|
|
|
D("Cannot free buf#%d: should be in [2, %d[", i, p->objtotal);
|
|
|
|
return;
|
|
|
|
}
|
2012-10-19 04:13:12 +00:00
|
|
|
netmap_obj_free(p, i);
|
2012-04-13 16:32:33 +00:00
|
|
|
}
|
|
|
|
|
This new version of netmap brings you the following:
- netmap pipes, providing bidirectional blocking I/O while moving
100+ Mpps between processes using shared memory channels
(no mistake: over one hundred million. But mind you, i said
*moving* not *processing*);
- kqueue support (BHyVe needs it);
- improved user library. Just the interface name lets you select a NIC,
host port, VALE switch port, netmap pipe, and individual queues.
The upcoming netmap-enabled libpcap will use this feature.
- optional extra buffers associated to netmap ports, for applications
that need to buffer data yet don't want to make copies.
- segmentation offloading for the VALE switch, useful between VMs.
and a number of bug fixes and performance improvements.
My colleagues Giuseppe Lettieri and Vincenzo Maffione did a substantial
amount of work on these features so we owe them a big thanks.
There are some external repositories that can be of interest:
https://code.google.com/p/netmap
our public repository for netmap/VALE code, including
linux versions and other stuff that does not belong here,
such as python bindings.
https://code.google.com/p/netmap-libpcap
a clone of the libpcap repository with netmap support.
With this any libpcap client has access to most netmap
feature with no recompilation. E.g. tcpdump can filter
packets at 10-15 Mpps.
https://code.google.com/p/netmap-ipfw
a userspace version of ipfw+dummynet which uses netmap
to send/receive packets. Speed is up in the 7-10 Mpps
range per core for simple rulesets.
Both netmap-libpcap and netmap-ipfw will be merged upstream at some
point, but while this happens it is useful to have access to them.
And yes, this code will be merged soon. It is infinitely better
than the version currently in 10 and 9.
MFC after: 3 days
2014-02-15 04:53:04 +00:00
|
|
|
|
|
|
|
static void
|
|
|
|
netmap_free_bufs(struct netmap_mem_d *nmd, struct netmap_slot *slot, u_int n)
|
|
|
|
{
|
|
|
|
u_int i;
|
|
|
|
|
|
|
|
for (i = 0; i < n; i++) {
|
|
|
|
if (slot[i].buf_idx > 2)
|
|
|
|
netmap_free_buf(nmd, slot[i].buf_idx);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2012-04-13 16:32:33 +00:00
|
|
|
static void
|
2012-10-19 04:13:12 +00:00
|
|
|
netmap_reset_obj_allocator(struct netmap_obj_pool *p)
|
2012-04-13 16:32:33 +00:00
|
|
|
{
|
2013-11-01 21:21:14 +00:00
|
|
|
|
2012-04-13 16:32:33 +00:00
|
|
|
if (p == NULL)
|
|
|
|
return;
|
|
|
|
if (p->bitmap)
|
|
|
|
free(p->bitmap, M_NETMAP);
|
2012-10-19 04:13:12 +00:00
|
|
|
p->bitmap = NULL;
|
2012-04-13 16:32:33 +00:00
|
|
|
if (p->lut) {
|
2013-11-01 21:21:14 +00:00
|
|
|
u_int i;
|
|
|
|
size_t sz = p->_clustsize;
|
|
|
|
|
2015-05-15 15:36:57 +00:00
|
|
|
/*
|
|
|
|
* Free each cluster allocated in
|
|
|
|
* netmap_finalize_obj_allocator(). The cluster start
|
|
|
|
* addresses are stored at multiples of p->_clusterentries
|
|
|
|
* in the lut.
|
|
|
|
*/
|
2013-11-01 21:21:14 +00:00
|
|
|
for (i = 0; i < p->objtotal; i += p->_clustentries) {
|
2012-04-13 16:32:33 +00:00
|
|
|
if (p->lut[i].vaddr)
|
2013-11-01 21:21:14 +00:00
|
|
|
contigfree(p->lut[i].vaddr, sz, M_NETMAP);
|
2012-04-13 16:32:33 +00:00
|
|
|
}
|
|
|
|
bzero(p->lut, sizeof(struct lut_entry) * p->objtotal);
|
2012-10-19 04:13:12 +00:00
|
|
|
#ifdef linux
|
|
|
|
vfree(p->lut);
|
|
|
|
#else
|
2012-04-13 16:32:33 +00:00
|
|
|
free(p->lut, M_NETMAP);
|
2012-10-19 04:13:12 +00:00
|
|
|
#endif
|
2012-04-13 16:32:33 +00:00
|
|
|
}
|
2012-10-19 04:13:12 +00:00
|
|
|
p->lut = NULL;
|
2013-11-01 21:21:14 +00:00
|
|
|
p->objtotal = 0;
|
|
|
|
p->memtotal = 0;
|
|
|
|
p->numclusters = 0;
|
|
|
|
p->objfree = 0;
|
2012-10-19 04:13:12 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Free all resources related to an allocator.
|
|
|
|
*/
|
|
|
|
static void
|
|
|
|
netmap_destroy_obj_allocator(struct netmap_obj_pool *p)
|
|
|
|
{
|
|
|
|
if (p == NULL)
|
|
|
|
return;
|
|
|
|
netmap_reset_obj_allocator(p);
|
2012-04-13 16:32:33 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* We receive a request for objtotal objects, of size objsize each.
|
|
|
|
* Internally we may round up both numbers, as we allocate objects
|
|
|
|
* in small clusters multiple of the page size.
|
2013-11-01 21:21:14 +00:00
|
|
|
* We need to keep track of objtotal and clustentries,
|
2012-04-13 16:32:33 +00:00
|
|
|
* as they are needed when freeing memory.
|
|
|
|
*
|
|
|
|
* XXX note -- userspace needs the buffers to be contiguous,
|
|
|
|
* so we cannot afford gaps at the end of a cluster.
|
|
|
|
*/
|
2012-10-19 04:13:12 +00:00
|
|
|
|
|
|
|
|
|
|
|
/* call with NMA_LOCK held */
|
|
|
|
static int
|
|
|
|
netmap_config_obj_allocator(struct netmap_obj_pool *p, u_int objtotal, u_int objsize)
|
2012-04-13 16:32:33 +00:00
|
|
|
{
|
2013-11-01 21:21:14 +00:00
|
|
|
int i;
|
2012-04-13 16:32:33 +00:00
|
|
|
u_int clustsize; /* the cluster size, multiple of page size */
|
|
|
|
u_int clustentries; /* how many objects per entry */
|
|
|
|
|
2013-11-01 21:21:14 +00:00
|
|
|
/* we store the current request, so we can
|
|
|
|
* detect configuration changes later */
|
|
|
|
p->r_objtotal = objtotal;
|
|
|
|
p->r_objsize = objsize;
|
|
|
|
|
Update to the current version of netmap.
Mostly bugfixes or features developed in the past 6 months,
so this is a 10.1 candidate.
Basically no user API changes (some bugfixes in sys/net/netmap_user.h).
In detail:
1. netmap support for virtio-net, including in netmap mode.
Under bhyve and with a netmap backend [2] we reach over 1Mpps
with standard APIs (e.g. libpcap), and 5-8 Mpps in netmap mode.
2. (kernel) add support for multiple memory allocators, so we can
better partition physical and virtual interfaces giving access
to separate users. The most visible effect is one additional
argument to the various kernel functions to compute buffer
addresses. All netmap-supported drivers are affected, but changes
are mechanical and trivial
3. (kernel) simplify the prototype for *txsync() and *rxsync()
driver methods. All netmap drivers affected, changes mostly mechanical.
4. add support for netmap-monitor ports. Think of it as a mirroring
port on a physical switch: a netmap monitor port replicates traffic
present on the main port. Restrictions apply. Drive carefully.
5. if_lem.c: support for various paravirtualization features,
experimental and disabled by default.
Most of these are described in our ANCS'13 paper [1].
Paravirtualized support in netmap mode is new, and beats the
numbers in the paper by a large factor (under qemu-kvm,
we measured gues-host throughput up to 10-12 Mpps).
A lot of refactoring and additional documentation in the files
in sys/dev/netmap, but apart from #2 and #3 above, almost nothing
of this stuff is visible to other kernel parts.
Example programs in tools/tools/netmap have been updated with bugfixes
and to support more of the existing features.
This is meant to go into 10.1 so we plan an MFC before the Aug.22 deadline.
A lot of this code has been contributed by my colleagues at UNIPI,
including Giuseppe Lettieri, Vincenzo Maffione, Stefano Garzarella.
MFC after: 3 days.
2014-08-16 15:00:01 +00:00
|
|
|
#define MAX_CLUSTSIZE (1<<22) // 4 MB
|
It is 2014 and we have a new version of netmap.
Most relevant features:
- netmap emulation on any NIC, even those without native netmap support.
On the ixgbe we have measured about 4Mpps/core/queue in this mode,
which is still a lot more than with sockets/bpf.
- seamless interconnection of VALE switch, NICs and host stack.
If you disable accelerations on your NIC (say em0)
ifconfig em0 -txcsum -txcsum
you can use the VALE switch to connect the NIC and the host stack:
vale-ctl -h valeXX:em0
allowing sharing the NIC with other netmap clients.
- THE USER API HAS SLIGHTLY CHANGED (head/cur/tail pointers
instead of pointers/count as before). This was unavoidable to support,
in the future, multiple threads operating on the same rings.
Netmap clients require very small source code changes to compile again.
On the plus side, the new API should be easier to understand
and the internals are a lot simpler.
The manual page has been updated extensively to reflect the current
features and give some examples.
This is the result of work of several people including Giuseppe Lettieri,
Vincenzo Maffione, Michio Honda and myself, and has been financially
supported by EU projects CHANGE and OPENLAB, from NetApp University
Research Fund, NEC, and of course the Universita` di Pisa.
2014-01-06 12:53:15 +00:00
|
|
|
#define LINE_ROUND NM_CACHE_ALIGN // 64
|
2012-04-13 16:32:33 +00:00
|
|
|
if (objsize >= MAX_CLUSTSIZE) {
|
|
|
|
/* we could do it but there is no point */
|
|
|
|
D("unsupported allocation for %d bytes", objsize);
|
2013-11-01 21:21:14 +00:00
|
|
|
return EINVAL;
|
2012-04-13 16:32:33 +00:00
|
|
|
}
|
|
|
|
/* make sure objsize is a multiple of LINE_ROUND */
|
|
|
|
i = (objsize & (LINE_ROUND - 1));
|
|
|
|
if (i) {
|
|
|
|
D("XXX aligning object by %d bytes", LINE_ROUND - i);
|
|
|
|
objsize += LINE_ROUND - i;
|
|
|
|
}
|
2012-10-19 04:13:12 +00:00
|
|
|
if (objsize < p->objminsize || objsize > p->objmaxsize) {
|
2013-05-02 16:01:04 +00:00
|
|
|
D("requested objsize %d out of range [%d, %d]",
|
2012-10-19 04:13:12 +00:00
|
|
|
objsize, p->objminsize, p->objmaxsize);
|
2013-11-01 21:21:14 +00:00
|
|
|
return EINVAL;
|
2012-10-19 04:13:12 +00:00
|
|
|
}
|
|
|
|
if (objtotal < p->nummin || objtotal > p->nummax) {
|
2013-05-02 16:01:04 +00:00
|
|
|
D("requested objtotal %d out of range [%d, %d]",
|
2012-10-19 04:13:12 +00:00
|
|
|
objtotal, p->nummin, p->nummax);
|
2013-11-01 21:21:14 +00:00
|
|
|
return EINVAL;
|
2012-10-19 04:13:12 +00:00
|
|
|
}
|
2012-04-13 16:32:33 +00:00
|
|
|
/*
|
|
|
|
* Compute number of objects using a brute-force approach:
|
|
|
|
* given a max cluster size,
|
|
|
|
* we try to fill it with objects keeping track of the
|
|
|
|
* wasted space to the next page boundary.
|
|
|
|
*/
|
|
|
|
for (clustentries = 0, i = 1;; i++) {
|
|
|
|
u_int delta, used = i * objsize;
|
|
|
|
if (used > MAX_CLUSTSIZE)
|
|
|
|
break;
|
|
|
|
delta = used % PAGE_SIZE;
|
|
|
|
if (delta == 0) { // exact solution
|
|
|
|
clustentries = i;
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
}
|
Update to the current version of netmap.
Mostly bugfixes or features developed in the past 6 months,
so this is a 10.1 candidate.
Basically no user API changes (some bugfixes in sys/net/netmap_user.h).
In detail:
1. netmap support for virtio-net, including in netmap mode.
Under bhyve and with a netmap backend [2] we reach over 1Mpps
with standard APIs (e.g. libpcap), and 5-8 Mpps in netmap mode.
2. (kernel) add support for multiple memory allocators, so we can
better partition physical and virtual interfaces giving access
to separate users. The most visible effect is one additional
argument to the various kernel functions to compute buffer
addresses. All netmap-supported drivers are affected, but changes
are mechanical and trivial
3. (kernel) simplify the prototype for *txsync() and *rxsync()
driver methods. All netmap drivers affected, changes mostly mechanical.
4. add support for netmap-monitor ports. Think of it as a mirroring
port on a physical switch: a netmap monitor port replicates traffic
present on the main port. Restrictions apply. Drive carefully.
5. if_lem.c: support for various paravirtualization features,
experimental and disabled by default.
Most of these are described in our ANCS'13 paper [1].
Paravirtualized support in netmap mode is new, and beats the
numbers in the paper by a large factor (under qemu-kvm,
we measured gues-host throughput up to 10-12 Mpps).
A lot of refactoring and additional documentation in the files
in sys/dev/netmap, but apart from #2 and #3 above, almost nothing
of this stuff is visible to other kernel parts.
Example programs in tools/tools/netmap have been updated with bugfixes
and to support more of the existing features.
This is meant to go into 10.1 so we plan an MFC before the Aug.22 deadline.
A lot of this code has been contributed by my colleagues at UNIPI,
including Giuseppe Lettieri, Vincenzo Maffione, Stefano Garzarella.
MFC after: 3 days.
2014-08-16 15:00:01 +00:00
|
|
|
/* exact solution not found */
|
|
|
|
if (clustentries == 0) {
|
|
|
|
D("unsupported allocation for %d bytes", objsize);
|
|
|
|
return EINVAL;
|
|
|
|
}
|
|
|
|
/* compute clustsize */
|
2012-04-13 16:32:33 +00:00
|
|
|
clustsize = clustentries * objsize;
|
2013-01-23 03:51:47 +00:00
|
|
|
if (netmap_verbose)
|
|
|
|
D("objsize %d clustsize %d objects %d",
|
|
|
|
objsize, clustsize, clustentries);
|
2012-04-13 16:32:33 +00:00
|
|
|
|
|
|
|
/*
|
|
|
|
* The number of clusters is n = ceil(objtotal/clustentries)
|
|
|
|
* objtotal' = n * clustentries
|
|
|
|
*/
|
2013-11-01 21:21:14 +00:00
|
|
|
p->_clustentries = clustentries;
|
2012-04-13 16:32:33 +00:00
|
|
|
p->_clustsize = clustsize;
|
2013-11-01 21:21:14 +00:00
|
|
|
p->_numclusters = (objtotal + clustentries - 1) / clustentries;
|
2012-10-19 04:13:12 +00:00
|
|
|
|
2013-11-01 21:21:14 +00:00
|
|
|
/* actual values (may be larger than requested) */
|
2012-10-19 04:13:12 +00:00
|
|
|
p->_objsize = objsize;
|
2013-11-01 21:21:14 +00:00
|
|
|
p->_objtotal = p->_numclusters * clustentries;
|
2012-10-19 04:13:12 +00:00
|
|
|
|
2013-11-01 21:21:14 +00:00
|
|
|
return 0;
|
2012-10-19 04:13:12 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
/* call with NMA_LOCK held */
|
|
|
|
static int
|
|
|
|
netmap_finalize_obj_allocator(struct netmap_obj_pool *p)
|
|
|
|
{
|
2013-11-01 21:21:14 +00:00
|
|
|
int i; /* must be signed */
|
|
|
|
size_t n;
|
|
|
|
|
|
|
|
/* optimistically assume we have enough memory */
|
|
|
|
p->numclusters = p->_numclusters;
|
|
|
|
p->objtotal = p->_objtotal;
|
2012-04-13 16:32:33 +00:00
|
|
|
|
2012-10-19 04:13:12 +00:00
|
|
|
n = sizeof(struct lut_entry) * p->objtotal;
|
|
|
|
#ifdef linux
|
|
|
|
p->lut = vmalloc(n);
|
|
|
|
#else
|
2012-10-19 19:28:35 +00:00
|
|
|
p->lut = malloc(n, M_NETMAP, M_NOWAIT | M_ZERO);
|
2012-10-19 04:13:12 +00:00
|
|
|
#endif
|
2012-04-13 16:32:33 +00:00
|
|
|
if (p->lut == NULL) {
|
2013-11-01 21:21:14 +00:00
|
|
|
D("Unable to create lookup table (%d bytes) for '%s'", (int)n, p->name);
|
2012-04-13 16:32:33 +00:00
|
|
|
goto clean;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* Allocate the bitmap */
|
|
|
|
n = (p->objtotal + 31) / 32;
|
2012-10-19 19:28:35 +00:00
|
|
|
p->bitmap = malloc(sizeof(uint32_t) * n, M_NETMAP, M_NOWAIT | M_ZERO);
|
2012-04-13 16:32:33 +00:00
|
|
|
if (p->bitmap == NULL) {
|
2013-11-01 21:21:14 +00:00
|
|
|
D("Unable to create bitmap (%d entries) for allocator '%s'", (int)n,
|
2012-10-19 04:13:12 +00:00
|
|
|
p->name);
|
2012-04-13 16:32:33 +00:00
|
|
|
goto clean;
|
|
|
|
}
|
2012-10-19 04:13:12 +00:00
|
|
|
p->bitmap_slots = n;
|
2012-04-13 16:32:33 +00:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Allocate clusters, init pointers and bitmap
|
|
|
|
*/
|
2013-11-01 21:21:14 +00:00
|
|
|
|
|
|
|
n = p->_clustsize;
|
|
|
|
for (i = 0; i < (int)p->objtotal;) {
|
|
|
|
int lim = i + p->_clustentries;
|
2012-04-13 16:32:33 +00:00
|
|
|
char *clust;
|
|
|
|
|
2013-11-01 21:21:14 +00:00
|
|
|
clust = contigmalloc(n, M_NETMAP, M_NOWAIT | M_ZERO,
|
|
|
|
(size_t)0, -1UL, PAGE_SIZE, 0);
|
2012-04-13 16:32:33 +00:00
|
|
|
if (clust == NULL) {
|
|
|
|
/*
|
|
|
|
* If we get here, there is a severe memory shortage,
|
|
|
|
* so halve the allocated memory to reclaim some.
|
|
|
|
*/
|
|
|
|
D("Unable to create cluster at %d for '%s' allocator",
|
2012-10-19 04:13:12 +00:00
|
|
|
i, p->name);
|
2013-11-01 21:21:14 +00:00
|
|
|
if (i < 2) /* nothing to halve */
|
|
|
|
goto out;
|
2012-04-13 16:32:33 +00:00
|
|
|
lim = i / 2;
|
2012-10-19 04:13:12 +00:00
|
|
|
for (i--; i >= lim; i--) {
|
2012-04-13 16:32:33 +00:00
|
|
|
p->bitmap[ (i>>5) ] &= ~( 1 << (i & 31) );
|
2013-11-01 21:21:14 +00:00
|
|
|
if (i % p->_clustentries == 0 && p->lut[i].vaddr)
|
2012-04-13 16:32:33 +00:00
|
|
|
contigfree(p->lut[i].vaddr,
|
2013-11-01 21:21:14 +00:00
|
|
|
n, M_NETMAP);
|
2015-05-15 15:36:57 +00:00
|
|
|
p->lut[i].vaddr = NULL;
|
2012-04-13 16:32:33 +00:00
|
|
|
}
|
2013-11-01 21:21:14 +00:00
|
|
|
out:
|
2012-04-13 16:32:33 +00:00
|
|
|
p->objtotal = i;
|
2013-11-01 21:21:14 +00:00
|
|
|
/* we may have stopped in the middle of a cluster */
|
|
|
|
p->numclusters = (i + p->_clustentries - 1) / p->_clustentries;
|
2012-04-13 16:32:33 +00:00
|
|
|
break;
|
|
|
|
}
|
2015-05-15 15:36:57 +00:00
|
|
|
/*
|
|
|
|
* Set bitmap and lut state for all buffers in the current
|
|
|
|
* cluster.
|
|
|
|
*
|
|
|
|
* [i, lim) is the set of buffer indexes that cover the
|
|
|
|
* current cluster.
|
|
|
|
*
|
|
|
|
* 'clust' is really the address of the current buffer in
|
|
|
|
* the current cluster as we index through it with a stride
|
|
|
|
* of p->_objsize.
|
|
|
|
*/
|
2012-10-19 04:13:12 +00:00
|
|
|
for (; i < lim; i++, clust += p->_objsize) {
|
2012-04-13 16:32:33 +00:00
|
|
|
p->bitmap[ (i>>5) ] |= ( 1 << (i & 31) );
|
|
|
|
p->lut[i].vaddr = clust;
|
|
|
|
p->lut[i].paddr = vtophys(clust);
|
|
|
|
}
|
|
|
|
}
|
2013-11-01 21:21:14 +00:00
|
|
|
p->objfree = p->objtotal;
|
|
|
|
p->memtotal = p->numclusters * p->_clustsize;
|
|
|
|
if (p->objfree == 0)
|
|
|
|
goto clean;
|
2013-01-23 03:51:47 +00:00
|
|
|
if (netmap_verbose)
|
|
|
|
D("Pre-allocated %d clusters (%d/%dKB) for '%s'",
|
2013-11-01 21:21:14 +00:00
|
|
|
p->numclusters, p->_clustsize >> 10,
|
|
|
|
p->memtotal >> 10, p->name);
|
2012-04-13 16:32:33 +00:00
|
|
|
|
2012-10-19 04:13:12 +00:00
|
|
|
return 0;
|
2012-04-13 16:32:33 +00:00
|
|
|
|
|
|
|
clean:
|
2012-10-19 04:13:12 +00:00
|
|
|
netmap_reset_obj_allocator(p);
|
|
|
|
return ENOMEM;
|
2012-04-13 16:32:33 +00:00
|
|
|
}
|
|
|
|
|
2012-10-19 04:13:12 +00:00
|
|
|
/* call with lock held */
|
2012-04-13 16:32:33 +00:00
|
|
|
static int
|
2013-11-01 21:21:14 +00:00
|
|
|
netmap_memory_config_changed(struct netmap_mem_d *nmd)
|
2012-04-13 16:32:33 +00:00
|
|
|
{
|
2012-10-19 04:13:12 +00:00
|
|
|
int i;
|
2012-04-13 16:32:33 +00:00
|
|
|
|
2012-10-19 04:13:12 +00:00
|
|
|
for (i = 0; i < NETMAP_POOLS_NR; i++) {
|
2013-11-01 21:21:14 +00:00
|
|
|
if (nmd->pools[i].r_objsize != netmap_params[i].size ||
|
|
|
|
nmd->pools[i].r_objtotal != netmap_params[i].num)
|
2012-10-19 04:13:12 +00:00
|
|
|
return 1;
|
|
|
|
}
|
|
|
|
return 0;
|
|
|
|
}
|
2012-04-13 16:32:33 +00:00
|
|
|
|
2013-11-01 21:21:14 +00:00
|
|
|
static void
|
|
|
|
netmap_mem_reset_all(struct netmap_mem_d *nmd)
|
|
|
|
{
|
|
|
|
int i;
|
This new version of netmap brings you the following:
- netmap pipes, providing bidirectional blocking I/O while moving
100+ Mpps between processes using shared memory channels
(no mistake: over one hundred million. But mind you, i said
*moving* not *processing*);
- kqueue support (BHyVe needs it);
- improved user library. Just the interface name lets you select a NIC,
host port, VALE switch port, netmap pipe, and individual queues.
The upcoming netmap-enabled libpcap will use this feature.
- optional extra buffers associated to netmap ports, for applications
that need to buffer data yet don't want to make copies.
- segmentation offloading for the VALE switch, useful between VMs.
and a number of bug fixes and performance improvements.
My colleagues Giuseppe Lettieri and Vincenzo Maffione did a substantial
amount of work on these features so we owe them a big thanks.
There are some external repositories that can be of interest:
https://code.google.com/p/netmap
our public repository for netmap/VALE code, including
linux versions and other stuff that does not belong here,
such as python bindings.
https://code.google.com/p/netmap-libpcap
a clone of the libpcap repository with netmap support.
With this any libpcap client has access to most netmap
feature with no recompilation. E.g. tcpdump can filter
packets at 10-15 Mpps.
https://code.google.com/p/netmap-ipfw
a userspace version of ipfw+dummynet which uses netmap
to send/receive packets. Speed is up in the 7-10 Mpps
range per core for simple rulesets.
Both netmap-libpcap and netmap-ipfw will be merged upstream at some
point, but while this happens it is useful to have access to them.
And yes, this code will be merged soon. It is infinitely better
than the version currently in 10 and 9.
MFC after: 3 days
2014-02-15 04:53:04 +00:00
|
|
|
|
|
|
|
if (netmap_verbose)
|
|
|
|
D("resetting %p", nmd);
|
2013-11-01 21:21:14 +00:00
|
|
|
for (i = 0; i < NETMAP_POOLS_NR; i++) {
|
|
|
|
netmap_reset_obj_allocator(&nmd->pools[i]);
|
|
|
|
}
|
|
|
|
nmd->flags &= ~NETMAP_MEM_FINALIZED;
|
|
|
|
}
|
|
|
|
|
Update to the current version of netmap.
Mostly bugfixes or features developed in the past 6 months,
so this is a 10.1 candidate.
Basically no user API changes (some bugfixes in sys/net/netmap_user.h).
In detail:
1. netmap support for virtio-net, including in netmap mode.
Under bhyve and with a netmap backend [2] we reach over 1Mpps
with standard APIs (e.g. libpcap), and 5-8 Mpps in netmap mode.
2. (kernel) add support for multiple memory allocators, so we can
better partition physical and virtual interfaces giving access
to separate users. The most visible effect is one additional
argument to the various kernel functions to compute buffer
addresses. All netmap-supported drivers are affected, but changes
are mechanical and trivial
3. (kernel) simplify the prototype for *txsync() and *rxsync()
driver methods. All netmap drivers affected, changes mostly mechanical.
4. add support for netmap-monitor ports. Think of it as a mirroring
port on a physical switch: a netmap monitor port replicates traffic
present on the main port. Restrictions apply. Drive carefully.
5. if_lem.c: support for various paravirtualization features,
experimental and disabled by default.
Most of these are described in our ANCS'13 paper [1].
Paravirtualized support in netmap mode is new, and beats the
numbers in the paper by a large factor (under qemu-kvm,
we measured gues-host throughput up to 10-12 Mpps).
A lot of refactoring and additional documentation in the files
in sys/dev/netmap, but apart from #2 and #3 above, almost nothing
of this stuff is visible to other kernel parts.
Example programs in tools/tools/netmap have been updated with bugfixes
and to support more of the existing features.
This is meant to go into 10.1 so we plan an MFC before the Aug.22 deadline.
A lot of this code has been contributed by my colleagues at UNIPI,
including Giuseppe Lettieri, Vincenzo Maffione, Stefano Garzarella.
MFC after: 3 days.
2014-08-16 15:00:01 +00:00
|
|
|
static int
|
|
|
|
netmap_mem_unmap(struct netmap_obj_pool *p, struct netmap_adapter *na)
|
|
|
|
{
|
|
|
|
int i, lim = p->_objtotal;
|
|
|
|
|
|
|
|
if (na->pdev == NULL)
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
#ifdef __FreeBSD__
|
|
|
|
(void)i;
|
|
|
|
(void)lim;
|
|
|
|
D("unsupported on FreeBSD");
|
|
|
|
#else /* linux */
|
|
|
|
for (i = 2; i < lim; i++) {
|
|
|
|
netmap_unload_map(na, (bus_dma_tag_t) na->pdev, &p->lut[i].paddr);
|
|
|
|
}
|
|
|
|
#endif /* linux */
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int
|
|
|
|
netmap_mem_map(struct netmap_obj_pool *p, struct netmap_adapter *na)
|
|
|
|
{
|
|
|
|
#ifdef __FreeBSD__
|
|
|
|
D("unsupported on FreeBSD");
|
|
|
|
#else /* linux */
|
|
|
|
int i, lim = p->_objtotal;
|
|
|
|
|
|
|
|
if (na->pdev == NULL)
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
for (i = 2; i < lim; i++) {
|
|
|
|
netmap_load_map(na, (bus_dma_tag_t) na->pdev, &p->lut[i].paddr,
|
|
|
|
p->lut[i].vaddr);
|
|
|
|
}
|
|
|
|
#endif /* linux */
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2013-11-01 21:21:14 +00:00
|
|
|
static int
|
|
|
|
netmap_mem_finalize_all(struct netmap_mem_d *nmd)
|
|
|
|
{
|
|
|
|
int i;
|
|
|
|
if (nmd->flags & NETMAP_MEM_FINALIZED)
|
|
|
|
return 0;
|
|
|
|
nmd->lasterr = 0;
|
|
|
|
nmd->nm_totalsize = 0;
|
|
|
|
for (i = 0; i < NETMAP_POOLS_NR; i++) {
|
|
|
|
nmd->lasterr = netmap_finalize_obj_allocator(&nmd->pools[i]);
|
|
|
|
if (nmd->lasterr)
|
|
|
|
goto error;
|
|
|
|
nmd->nm_totalsize += nmd->pools[i].memtotal;
|
|
|
|
}
|
|
|
|
/* buffers 0 and 1 are reserved */
|
|
|
|
nmd->pools[NETMAP_BUF_POOL].objfree -= 2;
|
|
|
|
nmd->pools[NETMAP_BUF_POOL].bitmap[0] = ~3;
|
|
|
|
nmd->flags |= NETMAP_MEM_FINALIZED;
|
|
|
|
|
This new version of netmap brings you the following:
- netmap pipes, providing bidirectional blocking I/O while moving
100+ Mpps between processes using shared memory channels
(no mistake: over one hundred million. But mind you, i said
*moving* not *processing*);
- kqueue support (BHyVe needs it);
- improved user library. Just the interface name lets you select a NIC,
host port, VALE switch port, netmap pipe, and individual queues.
The upcoming netmap-enabled libpcap will use this feature.
- optional extra buffers associated to netmap ports, for applications
that need to buffer data yet don't want to make copies.
- segmentation offloading for the VALE switch, useful between VMs.
and a number of bug fixes and performance improvements.
My colleagues Giuseppe Lettieri and Vincenzo Maffione did a substantial
amount of work on these features so we owe them a big thanks.
There are some external repositories that can be of interest:
https://code.google.com/p/netmap
our public repository for netmap/VALE code, including
linux versions and other stuff that does not belong here,
such as python bindings.
https://code.google.com/p/netmap-libpcap
a clone of the libpcap repository with netmap support.
With this any libpcap client has access to most netmap
feature with no recompilation. E.g. tcpdump can filter
packets at 10-15 Mpps.
https://code.google.com/p/netmap-ipfw
a userspace version of ipfw+dummynet which uses netmap
to send/receive packets. Speed is up in the 7-10 Mpps
range per core for simple rulesets.
Both netmap-libpcap and netmap-ipfw will be merged upstream at some
point, but while this happens it is useful to have access to them.
And yes, this code will be merged soon. It is infinitely better
than the version currently in 10 and 9.
MFC after: 3 days
2014-02-15 04:53:04 +00:00
|
|
|
if (netmap_verbose)
|
|
|
|
D("interfaces %d KB, rings %d KB, buffers %d MB",
|
|
|
|
nmd->pools[NETMAP_IF_POOL].memtotal >> 10,
|
|
|
|
nmd->pools[NETMAP_RING_POOL].memtotal >> 10,
|
|
|
|
nmd->pools[NETMAP_BUF_POOL].memtotal >> 20);
|
2013-11-01 21:21:14 +00:00
|
|
|
|
This new version of netmap brings you the following:
- netmap pipes, providing bidirectional blocking I/O while moving
100+ Mpps between processes using shared memory channels
(no mistake: over one hundred million. But mind you, i said
*moving* not *processing*);
- kqueue support (BHyVe needs it);
- improved user library. Just the interface name lets you select a NIC,
host port, VALE switch port, netmap pipe, and individual queues.
The upcoming netmap-enabled libpcap will use this feature.
- optional extra buffers associated to netmap ports, for applications
that need to buffer data yet don't want to make copies.
- segmentation offloading for the VALE switch, useful between VMs.
and a number of bug fixes and performance improvements.
My colleagues Giuseppe Lettieri and Vincenzo Maffione did a substantial
amount of work on these features so we owe them a big thanks.
There are some external repositories that can be of interest:
https://code.google.com/p/netmap
our public repository for netmap/VALE code, including
linux versions and other stuff that does not belong here,
such as python bindings.
https://code.google.com/p/netmap-libpcap
a clone of the libpcap repository with netmap support.
With this any libpcap client has access to most netmap
feature with no recompilation. E.g. tcpdump can filter
packets at 10-15 Mpps.
https://code.google.com/p/netmap-ipfw
a userspace version of ipfw+dummynet which uses netmap
to send/receive packets. Speed is up in the 7-10 Mpps
range per core for simple rulesets.
Both netmap-libpcap and netmap-ipfw will be merged upstream at some
point, but while this happens it is useful to have access to them.
And yes, this code will be merged soon. It is infinitely better
than the version currently in 10 and 9.
MFC after: 3 days
2014-02-15 04:53:04 +00:00
|
|
|
if (netmap_verbose)
|
|
|
|
D("Free buffers: %d", nmd->pools[NETMAP_BUF_POOL].objfree);
|
2013-11-01 21:21:14 +00:00
|
|
|
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
error:
|
|
|
|
netmap_mem_reset_all(nmd);
|
|
|
|
return nmd->lasterr;
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
void
|
|
|
|
netmap_mem_private_delete(struct netmap_mem_d *nmd)
|
|
|
|
{
|
|
|
|
if (nmd == NULL)
|
|
|
|
return;
|
This new version of netmap brings you the following:
- netmap pipes, providing bidirectional blocking I/O while moving
100+ Mpps between processes using shared memory channels
(no mistake: over one hundred million. But mind you, i said
*moving* not *processing*);
- kqueue support (BHyVe needs it);
- improved user library. Just the interface name lets you select a NIC,
host port, VALE switch port, netmap pipe, and individual queues.
The upcoming netmap-enabled libpcap will use this feature.
- optional extra buffers associated to netmap ports, for applications
that need to buffer data yet don't want to make copies.
- segmentation offloading for the VALE switch, useful between VMs.
and a number of bug fixes and performance improvements.
My colleagues Giuseppe Lettieri and Vincenzo Maffione did a substantial
amount of work on these features so we owe them a big thanks.
There are some external repositories that can be of interest:
https://code.google.com/p/netmap
our public repository for netmap/VALE code, including
linux versions and other stuff that does not belong here,
such as python bindings.
https://code.google.com/p/netmap-libpcap
a clone of the libpcap repository with netmap support.
With this any libpcap client has access to most netmap
feature with no recompilation. E.g. tcpdump can filter
packets at 10-15 Mpps.
https://code.google.com/p/netmap-ipfw
a userspace version of ipfw+dummynet which uses netmap
to send/receive packets. Speed is up in the 7-10 Mpps
range per core for simple rulesets.
Both netmap-libpcap and netmap-ipfw will be merged upstream at some
point, but while this happens it is useful to have access to them.
And yes, this code will be merged soon. It is infinitely better
than the version currently in 10 and 9.
MFC after: 3 days
2014-02-15 04:53:04 +00:00
|
|
|
if (netmap_verbose)
|
|
|
|
D("deleting %p", nmd);
|
2013-11-01 21:21:14 +00:00
|
|
|
if (nmd->refcount > 0)
|
|
|
|
D("bug: deleting mem allocator with refcount=%d!", nmd->refcount);
|
This new version of netmap brings you the following:
- netmap pipes, providing bidirectional blocking I/O while moving
100+ Mpps between processes using shared memory channels
(no mistake: over one hundred million. But mind you, i said
*moving* not *processing*);
- kqueue support (BHyVe needs it);
- improved user library. Just the interface name lets you select a NIC,
host port, VALE switch port, netmap pipe, and individual queues.
The upcoming netmap-enabled libpcap will use this feature.
- optional extra buffers associated to netmap ports, for applications
that need to buffer data yet don't want to make copies.
- segmentation offloading for the VALE switch, useful between VMs.
and a number of bug fixes and performance improvements.
My colleagues Giuseppe Lettieri and Vincenzo Maffione did a substantial
amount of work on these features so we owe them a big thanks.
There are some external repositories that can be of interest:
https://code.google.com/p/netmap
our public repository for netmap/VALE code, including
linux versions and other stuff that does not belong here,
such as python bindings.
https://code.google.com/p/netmap-libpcap
a clone of the libpcap repository with netmap support.
With this any libpcap client has access to most netmap
feature with no recompilation. E.g. tcpdump can filter
packets at 10-15 Mpps.
https://code.google.com/p/netmap-ipfw
a userspace version of ipfw+dummynet which uses netmap
to send/receive packets. Speed is up in the 7-10 Mpps
range per core for simple rulesets.
Both netmap-libpcap and netmap-ipfw will be merged upstream at some
point, but while this happens it is useful to have access to them.
And yes, this code will be merged soon. It is infinitely better
than the version currently in 10 and 9.
MFC after: 3 days
2014-02-15 04:53:04 +00:00
|
|
|
nm_mem_release_id(nmd);
|
|
|
|
if (netmap_verbose)
|
|
|
|
D("done deleting %p", nmd);
|
2013-11-01 21:21:14 +00:00
|
|
|
NMA_LOCK_DESTROY(nmd);
|
|
|
|
free(nmd, M_DEVBUF);
|
|
|
|
}
|
|
|
|
|
|
|
|
static int
|
|
|
|
netmap_mem_private_config(struct netmap_mem_d *nmd)
|
|
|
|
{
|
|
|
|
/* nothing to do, we are configured on creation
|
|
|
|
* and configuration never changes thereafter
|
|
|
|
*/
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int
|
|
|
|
netmap_mem_private_finalize(struct netmap_mem_d *nmd)
|
|
|
|
{
|
|
|
|
int err;
|
|
|
|
nmd->refcount++;
|
|
|
|
err = netmap_mem_finalize_all(nmd);
|
|
|
|
return err;
|
|
|
|
|
|
|
|
}
|
|
|
|
|
2013-12-15 08:37:24 +00:00
|
|
|
static void
|
|
|
|
netmap_mem_private_deref(struct netmap_mem_d *nmd)
|
2013-11-01 21:21:14 +00:00
|
|
|
{
|
|
|
|
if (--nmd->refcount <= 0)
|
|
|
|
netmap_mem_reset_all(nmd);
|
|
|
|
}
|
|
|
|
|
This new version of netmap brings you the following:
- netmap pipes, providing bidirectional blocking I/O while moving
100+ Mpps between processes using shared memory channels
(no mistake: over one hundred million. But mind you, i said
*moving* not *processing*);
- kqueue support (BHyVe needs it);
- improved user library. Just the interface name lets you select a NIC,
host port, VALE switch port, netmap pipe, and individual queues.
The upcoming netmap-enabled libpcap will use this feature.
- optional extra buffers associated to netmap ports, for applications
that need to buffer data yet don't want to make copies.
- segmentation offloading for the VALE switch, useful between VMs.
and a number of bug fixes and performance improvements.
My colleagues Giuseppe Lettieri and Vincenzo Maffione did a substantial
amount of work on these features so we owe them a big thanks.
There are some external repositories that can be of interest:
https://code.google.com/p/netmap
our public repository for netmap/VALE code, including
linux versions and other stuff that does not belong here,
such as python bindings.
https://code.google.com/p/netmap-libpcap
a clone of the libpcap repository with netmap support.
With this any libpcap client has access to most netmap
feature with no recompilation. E.g. tcpdump can filter
packets at 10-15 Mpps.
https://code.google.com/p/netmap-ipfw
a userspace version of ipfw+dummynet which uses netmap
to send/receive packets. Speed is up in the 7-10 Mpps
range per core for simple rulesets.
Both netmap-libpcap and netmap-ipfw will be merged upstream at some
point, but while this happens it is useful to have access to them.
And yes, this code will be merged soon. It is infinitely better
than the version currently in 10 and 9.
MFC after: 3 days
2014-02-15 04:53:04 +00:00
|
|
|
|
|
|
|
/*
|
|
|
|
* allocator for private memory
|
|
|
|
*/
|
2013-11-01 21:21:14 +00:00
|
|
|
struct netmap_mem_d *
|
This new version of netmap brings you the following:
- netmap pipes, providing bidirectional blocking I/O while moving
100+ Mpps between processes using shared memory channels
(no mistake: over one hundred million. But mind you, i said
*moving* not *processing*);
- kqueue support (BHyVe needs it);
- improved user library. Just the interface name lets you select a NIC,
host port, VALE switch port, netmap pipe, and individual queues.
The upcoming netmap-enabled libpcap will use this feature.
- optional extra buffers associated to netmap ports, for applications
that need to buffer data yet don't want to make copies.
- segmentation offloading for the VALE switch, useful between VMs.
and a number of bug fixes and performance improvements.
My colleagues Giuseppe Lettieri and Vincenzo Maffione did a substantial
amount of work on these features so we owe them a big thanks.
There are some external repositories that can be of interest:
https://code.google.com/p/netmap
our public repository for netmap/VALE code, including
linux versions and other stuff that does not belong here,
such as python bindings.
https://code.google.com/p/netmap-libpcap
a clone of the libpcap repository with netmap support.
With this any libpcap client has access to most netmap
feature with no recompilation. E.g. tcpdump can filter
packets at 10-15 Mpps.
https://code.google.com/p/netmap-ipfw
a userspace version of ipfw+dummynet which uses netmap
to send/receive packets. Speed is up in the 7-10 Mpps
range per core for simple rulesets.
Both netmap-libpcap and netmap-ipfw will be merged upstream at some
point, but while this happens it is useful to have access to them.
And yes, this code will be merged soon. It is infinitely better
than the version currently in 10 and 9.
MFC after: 3 days
2014-02-15 04:53:04 +00:00
|
|
|
netmap_mem_private_new(const char *name, u_int txr, u_int txd,
|
|
|
|
u_int rxr, u_int rxd, u_int extra_bufs, u_int npipes, int *perr)
|
2013-11-01 21:21:14 +00:00
|
|
|
{
|
|
|
|
struct netmap_mem_d *d = NULL;
|
|
|
|
struct netmap_obj_params p[NETMAP_POOLS_NR];
|
This new version of netmap brings you the following:
- netmap pipes, providing bidirectional blocking I/O while moving
100+ Mpps between processes using shared memory channels
(no mistake: over one hundred million. But mind you, i said
*moving* not *processing*);
- kqueue support (BHyVe needs it);
- improved user library. Just the interface name lets you select a NIC,
host port, VALE switch port, netmap pipe, and individual queues.
The upcoming netmap-enabled libpcap will use this feature.
- optional extra buffers associated to netmap ports, for applications
that need to buffer data yet don't want to make copies.
- segmentation offloading for the VALE switch, useful between VMs.
and a number of bug fixes and performance improvements.
My colleagues Giuseppe Lettieri and Vincenzo Maffione did a substantial
amount of work on these features so we owe them a big thanks.
There are some external repositories that can be of interest:
https://code.google.com/p/netmap
our public repository for netmap/VALE code, including
linux versions and other stuff that does not belong here,
such as python bindings.
https://code.google.com/p/netmap-libpcap
a clone of the libpcap repository with netmap support.
With this any libpcap client has access to most netmap
feature with no recompilation. E.g. tcpdump can filter
packets at 10-15 Mpps.
https://code.google.com/p/netmap-ipfw
a userspace version of ipfw+dummynet which uses netmap
to send/receive packets. Speed is up in the 7-10 Mpps
range per core for simple rulesets.
Both netmap-libpcap and netmap-ipfw will be merged upstream at some
point, but while this happens it is useful to have access to them.
And yes, this code will be merged soon. It is infinitely better
than the version currently in 10 and 9.
MFC after: 3 days
2014-02-15 04:53:04 +00:00
|
|
|
int i, err;
|
|
|
|
u_int v, maxd;
|
2013-11-01 21:21:14 +00:00
|
|
|
|
|
|
|
d = malloc(sizeof(struct netmap_mem_d),
|
|
|
|
M_DEVBUF, M_NOWAIT | M_ZERO);
|
This new version of netmap brings you the following:
- netmap pipes, providing bidirectional blocking I/O while moving
100+ Mpps between processes using shared memory channels
(no mistake: over one hundred million. But mind you, i said
*moving* not *processing*);
- kqueue support (BHyVe needs it);
- improved user library. Just the interface name lets you select a NIC,
host port, VALE switch port, netmap pipe, and individual queues.
The upcoming netmap-enabled libpcap will use this feature.
- optional extra buffers associated to netmap ports, for applications
that need to buffer data yet don't want to make copies.
- segmentation offloading for the VALE switch, useful between VMs.
and a number of bug fixes and performance improvements.
My colleagues Giuseppe Lettieri and Vincenzo Maffione did a substantial
amount of work on these features so we owe them a big thanks.
There are some external repositories that can be of interest:
https://code.google.com/p/netmap
our public repository for netmap/VALE code, including
linux versions and other stuff that does not belong here,
such as python bindings.
https://code.google.com/p/netmap-libpcap
a clone of the libpcap repository with netmap support.
With this any libpcap client has access to most netmap
feature with no recompilation. E.g. tcpdump can filter
packets at 10-15 Mpps.
https://code.google.com/p/netmap-ipfw
a userspace version of ipfw+dummynet which uses netmap
to send/receive packets. Speed is up in the 7-10 Mpps
range per core for simple rulesets.
Both netmap-libpcap and netmap-ipfw will be merged upstream at some
point, but while this happens it is useful to have access to them.
And yes, this code will be merged soon. It is infinitely better
than the version currently in 10 and 9.
MFC after: 3 days
2014-02-15 04:53:04 +00:00
|
|
|
if (d == NULL) {
|
|
|
|
err = ENOMEM;
|
|
|
|
goto error;
|
|
|
|
}
|
2013-11-01 21:21:14 +00:00
|
|
|
|
|
|
|
*d = nm_blueprint;
|
|
|
|
|
This new version of netmap brings you the following:
- netmap pipes, providing bidirectional blocking I/O while moving
100+ Mpps between processes using shared memory channels
(no mistake: over one hundred million. But mind you, i said
*moving* not *processing*);
- kqueue support (BHyVe needs it);
- improved user library. Just the interface name lets you select a NIC,
host port, VALE switch port, netmap pipe, and individual queues.
The upcoming netmap-enabled libpcap will use this feature.
- optional extra buffers associated to netmap ports, for applications
that need to buffer data yet don't want to make copies.
- segmentation offloading for the VALE switch, useful between VMs.
and a number of bug fixes and performance improvements.
My colleagues Giuseppe Lettieri and Vincenzo Maffione did a substantial
amount of work on these features so we owe them a big thanks.
There are some external repositories that can be of interest:
https://code.google.com/p/netmap
our public repository for netmap/VALE code, including
linux versions and other stuff that does not belong here,
such as python bindings.
https://code.google.com/p/netmap-libpcap
a clone of the libpcap repository with netmap support.
With this any libpcap client has access to most netmap
feature with no recompilation. E.g. tcpdump can filter
packets at 10-15 Mpps.
https://code.google.com/p/netmap-ipfw
a userspace version of ipfw+dummynet which uses netmap
to send/receive packets. Speed is up in the 7-10 Mpps
range per core for simple rulesets.
Both netmap-libpcap and netmap-ipfw will be merged upstream at some
point, but while this happens it is useful to have access to them.
And yes, this code will be merged soon. It is infinitely better
than the version currently in 10 and 9.
MFC after: 3 days
2014-02-15 04:53:04 +00:00
|
|
|
err = nm_mem_assign_id(d);
|
|
|
|
if (err)
|
|
|
|
goto error;
|
|
|
|
|
|
|
|
/* account for the fake host rings */
|
2013-11-01 21:21:14 +00:00
|
|
|
txr++;
|
|
|
|
rxr++;
|
This new version of netmap brings you the following:
- netmap pipes, providing bidirectional blocking I/O while moving
100+ Mpps between processes using shared memory channels
(no mistake: over one hundred million. But mind you, i said
*moving* not *processing*);
- kqueue support (BHyVe needs it);
- improved user library. Just the interface name lets you select a NIC,
host port, VALE switch port, netmap pipe, and individual queues.
The upcoming netmap-enabled libpcap will use this feature.
- optional extra buffers associated to netmap ports, for applications
that need to buffer data yet don't want to make copies.
- segmentation offloading for the VALE switch, useful between VMs.
and a number of bug fixes and performance improvements.
My colleagues Giuseppe Lettieri and Vincenzo Maffione did a substantial
amount of work on these features so we owe them a big thanks.
There are some external repositories that can be of interest:
https://code.google.com/p/netmap
our public repository for netmap/VALE code, including
linux versions and other stuff that does not belong here,
such as python bindings.
https://code.google.com/p/netmap-libpcap
a clone of the libpcap repository with netmap support.
With this any libpcap client has access to most netmap
feature with no recompilation. E.g. tcpdump can filter
packets at 10-15 Mpps.
https://code.google.com/p/netmap-ipfw
a userspace version of ipfw+dummynet which uses netmap
to send/receive packets. Speed is up in the 7-10 Mpps
range per core for simple rulesets.
Both netmap-libpcap and netmap-ipfw will be merged upstream at some
point, but while this happens it is useful to have access to them.
And yes, this code will be merged soon. It is infinitely better
than the version currently in 10 and 9.
MFC after: 3 days
2014-02-15 04:53:04 +00:00
|
|
|
|
|
|
|
/* copy the min values */
|
|
|
|
for (i = 0; i < NETMAP_POOLS_NR; i++) {
|
|
|
|
p[i] = netmap_min_priv_params[i];
|
|
|
|
}
|
|
|
|
|
|
|
|
/* possibly increase them to fit user request */
|
|
|
|
v = sizeof(struct netmap_if) + sizeof(ssize_t) * (txr + rxr);
|
|
|
|
if (p[NETMAP_IF_POOL].size < v)
|
|
|
|
p[NETMAP_IF_POOL].size = v;
|
|
|
|
v = 2 + 4 * npipes;
|
|
|
|
if (p[NETMAP_IF_POOL].num < v)
|
|
|
|
p[NETMAP_IF_POOL].num = v;
|
2013-11-01 21:21:14 +00:00
|
|
|
maxd = (txd > rxd) ? txd : rxd;
|
This new version of netmap brings you the following:
- netmap pipes, providing bidirectional blocking I/O while moving
100+ Mpps between processes using shared memory channels
(no mistake: over one hundred million. But mind you, i said
*moving* not *processing*);
- kqueue support (BHyVe needs it);
- improved user library. Just the interface name lets you select a NIC,
host port, VALE switch port, netmap pipe, and individual queues.
The upcoming netmap-enabled libpcap will use this feature.
- optional extra buffers associated to netmap ports, for applications
that need to buffer data yet don't want to make copies.
- segmentation offloading for the VALE switch, useful between VMs.
and a number of bug fixes and performance improvements.
My colleagues Giuseppe Lettieri and Vincenzo Maffione did a substantial
amount of work on these features so we owe them a big thanks.
There are some external repositories that can be of interest:
https://code.google.com/p/netmap
our public repository for netmap/VALE code, including
linux versions and other stuff that does not belong here,
such as python bindings.
https://code.google.com/p/netmap-libpcap
a clone of the libpcap repository with netmap support.
With this any libpcap client has access to most netmap
feature with no recompilation. E.g. tcpdump can filter
packets at 10-15 Mpps.
https://code.google.com/p/netmap-ipfw
a userspace version of ipfw+dummynet which uses netmap
to send/receive packets. Speed is up in the 7-10 Mpps
range per core for simple rulesets.
Both netmap-libpcap and netmap-ipfw will be merged upstream at some
point, but while this happens it is useful to have access to them.
And yes, this code will be merged soon. It is infinitely better
than the version currently in 10 and 9.
MFC after: 3 days
2014-02-15 04:53:04 +00:00
|
|
|
v = sizeof(struct netmap_ring) + sizeof(struct netmap_slot) * maxd;
|
|
|
|
if (p[NETMAP_RING_POOL].size < v)
|
|
|
|
p[NETMAP_RING_POOL].size = v;
|
|
|
|
/* each pipe endpoint needs two tx rings (1 normal + 1 host, fake)
|
|
|
|
* and two rx rings (again, 1 normal and 1 fake host)
|
|
|
|
*/
|
|
|
|
v = txr + rxr + 8 * npipes;
|
|
|
|
if (p[NETMAP_RING_POOL].num < v)
|
|
|
|
p[NETMAP_RING_POOL].num = v;
|
|
|
|
/* for each pipe we only need the buffers for the 4 "real" rings.
|
2014-06-05 21:12:41 +00:00
|
|
|
* On the other end, the pipe ring dimension may be different from
|
This new version of netmap brings you the following:
- netmap pipes, providing bidirectional blocking I/O while moving
100+ Mpps between processes using shared memory channels
(no mistake: over one hundred million. But mind you, i said
*moving* not *processing*);
- kqueue support (BHyVe needs it);
- improved user library. Just the interface name lets you select a NIC,
host port, VALE switch port, netmap pipe, and individual queues.
The upcoming netmap-enabled libpcap will use this feature.
- optional extra buffers associated to netmap ports, for applications
that need to buffer data yet don't want to make copies.
- segmentation offloading for the VALE switch, useful between VMs.
and a number of bug fixes and performance improvements.
My colleagues Giuseppe Lettieri and Vincenzo Maffione did a substantial
amount of work on these features so we owe them a big thanks.
There are some external repositories that can be of interest:
https://code.google.com/p/netmap
our public repository for netmap/VALE code, including
linux versions and other stuff that does not belong here,
such as python bindings.
https://code.google.com/p/netmap-libpcap
a clone of the libpcap repository with netmap support.
With this any libpcap client has access to most netmap
feature with no recompilation. E.g. tcpdump can filter
packets at 10-15 Mpps.
https://code.google.com/p/netmap-ipfw
a userspace version of ipfw+dummynet which uses netmap
to send/receive packets. Speed is up in the 7-10 Mpps
range per core for simple rulesets.
Both netmap-libpcap and netmap-ipfw will be merged upstream at some
point, but while this happens it is useful to have access to them.
And yes, this code will be merged soon. It is infinitely better
than the version currently in 10 and 9.
MFC after: 3 days
2014-02-15 04:53:04 +00:00
|
|
|
* the parent port ring dimension. As a compromise, we allocate twice the
|
|
|
|
* space actually needed if the pipe rings were the same size as the parent rings
|
|
|
|
*/
|
|
|
|
v = (4 * npipes + rxr) * rxd + (4 * npipes + txr) * txd + 2 + extra_bufs;
|
|
|
|
/* the +2 is for the tx and rx fake buffers (indices 0 and 1) */
|
|
|
|
if (p[NETMAP_BUF_POOL].num < v)
|
|
|
|
p[NETMAP_BUF_POOL].num = v;
|
2013-11-01 21:21:14 +00:00
|
|
|
|
This new version of netmap brings you the following:
- netmap pipes, providing bidirectional blocking I/O while moving
100+ Mpps between processes using shared memory channels
(no mistake: over one hundred million. But mind you, i said
*moving* not *processing*);
- kqueue support (BHyVe needs it);
- improved user library. Just the interface name lets you select a NIC,
host port, VALE switch port, netmap pipe, and individual queues.
The upcoming netmap-enabled libpcap will use this feature.
- optional extra buffers associated to netmap ports, for applications
that need to buffer data yet don't want to make copies.
- segmentation offloading for the VALE switch, useful between VMs.
and a number of bug fixes and performance improvements.
My colleagues Giuseppe Lettieri and Vincenzo Maffione did a substantial
amount of work on these features so we owe them a big thanks.
There are some external repositories that can be of interest:
https://code.google.com/p/netmap
our public repository for netmap/VALE code, including
linux versions and other stuff that does not belong here,
such as python bindings.
https://code.google.com/p/netmap-libpcap
a clone of the libpcap repository with netmap support.
With this any libpcap client has access to most netmap
feature with no recompilation. E.g. tcpdump can filter
packets at 10-15 Mpps.
https://code.google.com/p/netmap-ipfw
a userspace version of ipfw+dummynet which uses netmap
to send/receive packets. Speed is up in the 7-10 Mpps
range per core for simple rulesets.
Both netmap-libpcap and netmap-ipfw will be merged upstream at some
point, but while this happens it is useful to have access to them.
And yes, this code will be merged soon. It is infinitely better
than the version currently in 10 and 9.
MFC after: 3 days
2014-02-15 04:53:04 +00:00
|
|
|
if (netmap_verbose)
|
|
|
|
D("req if %d*%d ring %d*%d buf %d*%d",
|
2013-11-01 21:21:14 +00:00
|
|
|
p[NETMAP_IF_POOL].num,
|
|
|
|
p[NETMAP_IF_POOL].size,
|
|
|
|
p[NETMAP_RING_POOL].num,
|
|
|
|
p[NETMAP_RING_POOL].size,
|
|
|
|
p[NETMAP_BUF_POOL].num,
|
|
|
|
p[NETMAP_BUF_POOL].size);
|
|
|
|
|
|
|
|
for (i = 0; i < NETMAP_POOLS_NR; i++) {
|
|
|
|
snprintf(d->pools[i].name, NETMAP_POOL_MAX_NAMSZ,
|
|
|
|
nm_blueprint.pools[i].name,
|
|
|
|
name);
|
This new version of netmap brings you the following:
- netmap pipes, providing bidirectional blocking I/O while moving
100+ Mpps between processes using shared memory channels
(no mistake: over one hundred million. But mind you, i said
*moving* not *processing*);
- kqueue support (BHyVe needs it);
- improved user library. Just the interface name lets you select a NIC,
host port, VALE switch port, netmap pipe, and individual queues.
The upcoming netmap-enabled libpcap will use this feature.
- optional extra buffers associated to netmap ports, for applications
that need to buffer data yet don't want to make copies.
- segmentation offloading for the VALE switch, useful between VMs.
and a number of bug fixes and performance improvements.
My colleagues Giuseppe Lettieri and Vincenzo Maffione did a substantial
amount of work on these features so we owe them a big thanks.
There are some external repositories that can be of interest:
https://code.google.com/p/netmap
our public repository for netmap/VALE code, including
linux versions and other stuff that does not belong here,
such as python bindings.
https://code.google.com/p/netmap-libpcap
a clone of the libpcap repository with netmap support.
With this any libpcap client has access to most netmap
feature with no recompilation. E.g. tcpdump can filter
packets at 10-15 Mpps.
https://code.google.com/p/netmap-ipfw
a userspace version of ipfw+dummynet which uses netmap
to send/receive packets. Speed is up in the 7-10 Mpps
range per core for simple rulesets.
Both netmap-libpcap and netmap-ipfw will be merged upstream at some
point, but while this happens it is useful to have access to them.
And yes, this code will be merged soon. It is infinitely better
than the version currently in 10 and 9.
MFC after: 3 days
2014-02-15 04:53:04 +00:00
|
|
|
err = netmap_config_obj_allocator(&d->pools[i],
|
|
|
|
p[i].num, p[i].size);
|
|
|
|
if (err)
|
2013-11-01 21:21:14 +00:00
|
|
|
goto error;
|
|
|
|
}
|
|
|
|
|
|
|
|
d->flags &= ~NETMAP_MEM_FINALIZED;
|
|
|
|
|
|
|
|
NMA_LOCK_INIT(d);
|
|
|
|
|
|
|
|
return d;
|
|
|
|
error:
|
|
|
|
netmap_mem_private_delete(d);
|
This new version of netmap brings you the following:
- netmap pipes, providing bidirectional blocking I/O while moving
100+ Mpps between processes using shared memory channels
(no mistake: over one hundred million. But mind you, i said
*moving* not *processing*);
- kqueue support (BHyVe needs it);
- improved user library. Just the interface name lets you select a NIC,
host port, VALE switch port, netmap pipe, and individual queues.
The upcoming netmap-enabled libpcap will use this feature.
- optional extra buffers associated to netmap ports, for applications
that need to buffer data yet don't want to make copies.
- segmentation offloading for the VALE switch, useful between VMs.
and a number of bug fixes and performance improvements.
My colleagues Giuseppe Lettieri and Vincenzo Maffione did a substantial
amount of work on these features so we owe them a big thanks.
There are some external repositories that can be of interest:
https://code.google.com/p/netmap
our public repository for netmap/VALE code, including
linux versions and other stuff that does not belong here,
such as python bindings.
https://code.google.com/p/netmap-libpcap
a clone of the libpcap repository with netmap support.
With this any libpcap client has access to most netmap
feature with no recompilation. E.g. tcpdump can filter
packets at 10-15 Mpps.
https://code.google.com/p/netmap-ipfw
a userspace version of ipfw+dummynet which uses netmap
to send/receive packets. Speed is up in the 7-10 Mpps
range per core for simple rulesets.
Both netmap-libpcap and netmap-ipfw will be merged upstream at some
point, but while this happens it is useful to have access to them.
And yes, this code will be merged soon. It is infinitely better
than the version currently in 10 and 9.
MFC after: 3 days
2014-02-15 04:53:04 +00:00
|
|
|
if (perr)
|
|
|
|
*perr = err;
|
2013-11-01 21:21:14 +00:00
|
|
|
return NULL;
|
|
|
|
}
|
|
|
|
|
2012-04-13 16:32:33 +00:00
|
|
|
|
2012-10-19 04:13:12 +00:00
|
|
|
/* call with lock held */
|
|
|
|
static int
|
2013-11-01 21:21:14 +00:00
|
|
|
netmap_mem_global_config(struct netmap_mem_d *nmd)
|
2012-10-19 04:13:12 +00:00
|
|
|
{
|
|
|
|
int i;
|
2012-04-13 16:32:33 +00:00
|
|
|
|
2013-11-01 21:21:14 +00:00
|
|
|
if (nmd->refcount)
|
|
|
|
/* already in use, we cannot change the configuration */
|
|
|
|
goto out;
|
|
|
|
|
|
|
|
if (!netmap_memory_config_changed(nmd))
|
2012-10-19 04:13:12 +00:00
|
|
|
goto out;
|
|
|
|
|
|
|
|
D("reconfiguring");
|
|
|
|
|
2013-11-01 21:21:14 +00:00
|
|
|
if (nmd->flags & NETMAP_MEM_FINALIZED) {
|
2012-10-19 04:13:12 +00:00
|
|
|
/* reset previous allocation */
|
|
|
|
for (i = 0; i < NETMAP_POOLS_NR; i++) {
|
2013-11-01 21:21:14 +00:00
|
|
|
netmap_reset_obj_allocator(&nmd->pools[i]);
|
2013-05-02 16:01:04 +00:00
|
|
|
}
|
2013-11-01 21:21:14 +00:00
|
|
|
nmd->flags &= ~NETMAP_MEM_FINALIZED;
|
2013-12-15 08:37:24 +00:00
|
|
|
}
|
2012-10-19 04:13:12 +00:00
|
|
|
|
|
|
|
for (i = 0; i < NETMAP_POOLS_NR; i++) {
|
2013-11-01 21:21:14 +00:00
|
|
|
nmd->lasterr = netmap_config_obj_allocator(&nmd->pools[i],
|
2012-10-19 04:13:12 +00:00
|
|
|
netmap_params[i].num, netmap_params[i].size);
|
2013-11-01 21:21:14 +00:00
|
|
|
if (nmd->lasterr)
|
2012-10-19 04:13:12 +00:00
|
|
|
goto out;
|
|
|
|
}
|
2012-04-13 16:32:33 +00:00
|
|
|
|
2012-10-19 04:13:12 +00:00
|
|
|
out:
|
|
|
|
|
2013-11-01 21:21:14 +00:00
|
|
|
return nmd->lasterr;
|
2012-10-19 04:13:12 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
static int
|
2013-11-01 21:21:14 +00:00
|
|
|
netmap_mem_global_finalize(struct netmap_mem_d *nmd)
|
2012-10-19 04:13:12 +00:00
|
|
|
{
|
2013-11-01 21:21:14 +00:00
|
|
|
int err;
|
2015-05-15 15:36:57 +00:00
|
|
|
|
2012-10-19 04:13:12 +00:00
|
|
|
/* update configuration if changed */
|
2013-11-01 21:21:14 +00:00
|
|
|
if (netmap_mem_global_config(nmd))
|
2012-10-19 04:13:12 +00:00
|
|
|
goto out;
|
|
|
|
|
2013-11-01 21:21:14 +00:00
|
|
|
nmd->refcount++;
|
|
|
|
|
|
|
|
if (nmd->flags & NETMAP_MEM_FINALIZED) {
|
2012-10-19 04:13:12 +00:00
|
|
|
/* may happen if config is not changed */
|
|
|
|
ND("nothing to do");
|
|
|
|
goto out;
|
|
|
|
}
|
|
|
|
|
2013-11-01 21:21:14 +00:00
|
|
|
if (netmap_mem_finalize_all(nmd))
|
|
|
|
goto out;
|
2012-10-19 04:13:12 +00:00
|
|
|
|
2013-11-01 21:21:14 +00:00
|
|
|
nmd->lasterr = 0;
|
2012-10-19 04:13:12 +00:00
|
|
|
|
|
|
|
out:
|
2013-11-01 21:21:14 +00:00
|
|
|
if (nmd->lasterr)
|
|
|
|
nmd->refcount--;
|
|
|
|
err = nmd->lasterr;
|
2012-10-19 04:13:12 +00:00
|
|
|
|
2013-11-01 21:21:14 +00:00
|
|
|
return err;
|
2012-10-19 04:13:12 +00:00
|
|
|
|
2012-04-13 16:32:33 +00:00
|
|
|
}
|
|
|
|
|
2013-11-01 21:21:14 +00:00
|
|
|
int
|
|
|
|
netmap_mem_init(void)
|
2012-10-19 04:13:12 +00:00
|
|
|
{
|
2013-11-01 21:21:14 +00:00
|
|
|
NMA_LOCK_INIT(&nm_mem);
|
2012-10-19 04:13:12 +00:00
|
|
|
return (0);
|
|
|
|
}
|
2012-04-13 16:32:33 +00:00
|
|
|
|
2013-11-01 21:21:14 +00:00
|
|
|
void
|
|
|
|
netmap_mem_fini(void)
|
2012-04-13 16:32:33 +00:00
|
|
|
{
|
2012-10-19 04:13:12 +00:00
|
|
|
int i;
|
|
|
|
|
|
|
|
for (i = 0; i < NETMAP_POOLS_NR; i++) {
|
|
|
|
netmap_destroy_obj_allocator(&nm_mem.pools[i]);
|
|
|
|
}
|
2013-11-01 21:21:14 +00:00
|
|
|
NMA_LOCK_DESTROY(&nm_mem);
|
2012-10-19 04:13:12 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
static void
|
|
|
|
netmap_free_rings(struct netmap_adapter *na)
|
|
|
|
{
|
This new version of netmap brings you the following:
- netmap pipes, providing bidirectional blocking I/O while moving
100+ Mpps between processes using shared memory channels
(no mistake: over one hundred million. But mind you, i said
*moving* not *processing*);
- kqueue support (BHyVe needs it);
- improved user library. Just the interface name lets you select a NIC,
host port, VALE switch port, netmap pipe, and individual queues.
The upcoming netmap-enabled libpcap will use this feature.
- optional extra buffers associated to netmap ports, for applications
that need to buffer data yet don't want to make copies.
- segmentation offloading for the VALE switch, useful between VMs.
and a number of bug fixes and performance improvements.
My colleagues Giuseppe Lettieri and Vincenzo Maffione did a substantial
amount of work on these features so we owe them a big thanks.
There are some external repositories that can be of interest:
https://code.google.com/p/netmap
our public repository for netmap/VALE code, including
linux versions and other stuff that does not belong here,
such as python bindings.
https://code.google.com/p/netmap-libpcap
a clone of the libpcap repository with netmap support.
With this any libpcap client has access to most netmap
feature with no recompilation. E.g. tcpdump can filter
packets at 10-15 Mpps.
https://code.google.com/p/netmap-ipfw
a userspace version of ipfw+dummynet which uses netmap
to send/receive packets. Speed is up in the 7-10 Mpps
range per core for simple rulesets.
Both netmap-libpcap and netmap-ipfw will be merged upstream at some
point, but while this happens it is useful to have access to them.
And yes, this code will be merged soon. It is infinitely better
than the version currently in 10 and 9.
MFC after: 3 days
2014-02-15 04:53:04 +00:00
|
|
|
struct netmap_kring *kring;
|
|
|
|
struct netmap_ring *ring;
|
2013-01-23 03:51:47 +00:00
|
|
|
if (!na->tx_rings)
|
|
|
|
return;
|
This new version of netmap brings you the following:
- netmap pipes, providing bidirectional blocking I/O while moving
100+ Mpps between processes using shared memory channels
(no mistake: over one hundred million. But mind you, i said
*moving* not *processing*);
- kqueue support (BHyVe needs it);
- improved user library. Just the interface name lets you select a NIC,
host port, VALE switch port, netmap pipe, and individual queues.
The upcoming netmap-enabled libpcap will use this feature.
- optional extra buffers associated to netmap ports, for applications
that need to buffer data yet don't want to make copies.
- segmentation offloading for the VALE switch, useful between VMs.
and a number of bug fixes and performance improvements.
My colleagues Giuseppe Lettieri and Vincenzo Maffione did a substantial
amount of work on these features so we owe them a big thanks.
There are some external repositories that can be of interest:
https://code.google.com/p/netmap
our public repository for netmap/VALE code, including
linux versions and other stuff that does not belong here,
such as python bindings.
https://code.google.com/p/netmap-libpcap
a clone of the libpcap repository with netmap support.
With this any libpcap client has access to most netmap
feature with no recompilation. E.g. tcpdump can filter
packets at 10-15 Mpps.
https://code.google.com/p/netmap-ipfw
a userspace version of ipfw+dummynet which uses netmap
to send/receive packets. Speed is up in the 7-10 Mpps
range per core for simple rulesets.
Both netmap-libpcap and netmap-ipfw will be merged upstream at some
point, but while this happens it is useful to have access to them.
And yes, this code will be merged soon. It is infinitely better
than the version currently in 10 and 9.
MFC after: 3 days
2014-02-15 04:53:04 +00:00
|
|
|
for (kring = na->tx_rings; kring != na->rx_rings; kring++) {
|
|
|
|
ring = kring->ring;
|
|
|
|
if (ring == NULL)
|
|
|
|
continue;
|
|
|
|
netmap_free_bufs(na->nm_mem, ring->slot, kring->nkr_num_slots);
|
|
|
|
netmap_ring_free(na->nm_mem, ring);
|
|
|
|
kring->ring = NULL;
|
2012-10-19 04:13:12 +00:00
|
|
|
}
|
This new version of netmap brings you the following:
- netmap pipes, providing bidirectional blocking I/O while moving
100+ Mpps between processes using shared memory channels
(no mistake: over one hundred million. But mind you, i said
*moving* not *processing*);
- kqueue support (BHyVe needs it);
- improved user library. Just the interface name lets you select a NIC,
host port, VALE switch port, netmap pipe, and individual queues.
The upcoming netmap-enabled libpcap will use this feature.
- optional extra buffers associated to netmap ports, for applications
that need to buffer data yet don't want to make copies.
- segmentation offloading for the VALE switch, useful between VMs.
and a number of bug fixes and performance improvements.
My colleagues Giuseppe Lettieri and Vincenzo Maffione did a substantial
amount of work on these features so we owe them a big thanks.
There are some external repositories that can be of interest:
https://code.google.com/p/netmap
our public repository for netmap/VALE code, including
linux versions and other stuff that does not belong here,
such as python bindings.
https://code.google.com/p/netmap-libpcap
a clone of the libpcap repository with netmap support.
With this any libpcap client has access to most netmap
feature with no recompilation. E.g. tcpdump can filter
packets at 10-15 Mpps.
https://code.google.com/p/netmap-ipfw
a userspace version of ipfw+dummynet which uses netmap
to send/receive packets. Speed is up in the 7-10 Mpps
range per core for simple rulesets.
Both netmap-libpcap and netmap-ipfw will be merged upstream at some
point, but while this happens it is useful to have access to them.
And yes, this code will be merged soon. It is infinitely better
than the version currently in 10 and 9.
MFC after: 3 days
2014-02-15 04:53:04 +00:00
|
|
|
for (/* cont'd from above */; kring != na->tailroom; kring++) {
|
|
|
|
ring = kring->ring;
|
|
|
|
if (ring == NULL)
|
|
|
|
continue;
|
|
|
|
netmap_free_bufs(na->nm_mem, ring->slot, kring->nkr_num_slots);
|
|
|
|
netmap_ring_free(na->nm_mem, ring);
|
|
|
|
kring->ring = NULL;
|
2012-10-19 04:13:12 +00:00
|
|
|
}
|
2012-04-13 16:32:33 +00:00
|
|
|
}
|
|
|
|
|
2013-12-15 08:37:24 +00:00
|
|
|
/* call with NMA_LOCK held *
|
2013-11-01 21:21:14 +00:00
|
|
|
*
|
2013-12-15 08:37:24 +00:00
|
|
|
* Allocate netmap rings and buffers for this card
|
|
|
|
* The rings are contiguous, but have variable size.
|
This new version of netmap brings you the following:
- netmap pipes, providing bidirectional blocking I/O while moving
100+ Mpps between processes using shared memory channels
(no mistake: over one hundred million. But mind you, i said
*moving* not *processing*);
- kqueue support (BHyVe needs it);
- improved user library. Just the interface name lets you select a NIC,
host port, VALE switch port, netmap pipe, and individual queues.
The upcoming netmap-enabled libpcap will use this feature.
- optional extra buffers associated to netmap ports, for applications
that need to buffer data yet don't want to make copies.
- segmentation offloading for the VALE switch, useful between VMs.
and a number of bug fixes and performance improvements.
My colleagues Giuseppe Lettieri and Vincenzo Maffione did a substantial
amount of work on these features so we owe them a big thanks.
There are some external repositories that can be of interest:
https://code.google.com/p/netmap
our public repository for netmap/VALE code, including
linux versions and other stuff that does not belong here,
such as python bindings.
https://code.google.com/p/netmap-libpcap
a clone of the libpcap repository with netmap support.
With this any libpcap client has access to most netmap
feature with no recompilation. E.g. tcpdump can filter
packets at 10-15 Mpps.
https://code.google.com/p/netmap-ipfw
a userspace version of ipfw+dummynet which uses netmap
to send/receive packets. Speed is up in the 7-10 Mpps
range per core for simple rulesets.
Both netmap-libpcap and netmap-ipfw will be merged upstream at some
point, but while this happens it is useful to have access to them.
And yes, this code will be merged soon. It is infinitely better
than the version currently in 10 and 9.
MFC after: 3 days
2014-02-15 04:53:04 +00:00
|
|
|
* The kring array must follow the layout described
|
|
|
|
* in netmap_krings_create().
|
2013-01-23 03:51:47 +00:00
|
|
|
*/
|
2013-12-15 08:37:24 +00:00
|
|
|
int
|
|
|
|
netmap_mem_rings_create(struct netmap_adapter *na)
|
2012-04-13 16:32:33 +00:00
|
|
|
{
|
|
|
|
struct netmap_ring *ring;
|
2013-12-15 08:37:24 +00:00
|
|
|
u_int len, ndesc;
|
2012-04-13 16:32:33 +00:00
|
|
|
struct netmap_kring *kring;
|
This new version of netmap brings you the following:
- netmap pipes, providing bidirectional blocking I/O while moving
100+ Mpps between processes using shared memory channels
(no mistake: over one hundred million. But mind you, i said
*moving* not *processing*);
- kqueue support (BHyVe needs it);
- improved user library. Just the interface name lets you select a NIC,
host port, VALE switch port, netmap pipe, and individual queues.
The upcoming netmap-enabled libpcap will use this feature.
- optional extra buffers associated to netmap ports, for applications
that need to buffer data yet don't want to make copies.
- segmentation offloading for the VALE switch, useful between VMs.
and a number of bug fixes and performance improvements.
My colleagues Giuseppe Lettieri and Vincenzo Maffione did a substantial
amount of work on these features so we owe them a big thanks.
There are some external repositories that can be of interest:
https://code.google.com/p/netmap
our public repository for netmap/VALE code, including
linux versions and other stuff that does not belong here,
such as python bindings.
https://code.google.com/p/netmap-libpcap
a clone of the libpcap repository with netmap support.
With this any libpcap client has access to most netmap
feature with no recompilation. E.g. tcpdump can filter
packets at 10-15 Mpps.
https://code.google.com/p/netmap-ipfw
a userspace version of ipfw+dummynet which uses netmap
to send/receive packets. Speed is up in the 7-10 Mpps
range per core for simple rulesets.
Both netmap-libpcap and netmap-ipfw will be merged upstream at some
point, but while this happens it is useful to have access to them.
And yes, this code will be merged soon. It is infinitely better
than the version currently in 10 and 9.
MFC after: 3 days
2014-02-15 04:53:04 +00:00
|
|
|
u_int i;
|
2013-11-01 21:21:14 +00:00
|
|
|
|
|
|
|
NMA_LOCK(na->nm_mem);
|
|
|
|
|
This new version of netmap brings you the following:
- netmap pipes, providing bidirectional blocking I/O while moving
100+ Mpps between processes using shared memory channels
(no mistake: over one hundred million. But mind you, i said
*moving* not *processing*);
- kqueue support (BHyVe needs it);
- improved user library. Just the interface name lets you select a NIC,
host port, VALE switch port, netmap pipe, and individual queues.
The upcoming netmap-enabled libpcap will use this feature.
- optional extra buffers associated to netmap ports, for applications
that need to buffer data yet don't want to make copies.
- segmentation offloading for the VALE switch, useful between VMs.
and a number of bug fixes and performance improvements.
My colleagues Giuseppe Lettieri and Vincenzo Maffione did a substantial
amount of work on these features so we owe them a big thanks.
There are some external repositories that can be of interest:
https://code.google.com/p/netmap
our public repository for netmap/VALE code, including
linux versions and other stuff that does not belong here,
such as python bindings.
https://code.google.com/p/netmap-libpcap
a clone of the libpcap repository with netmap support.
With this any libpcap client has access to most netmap
feature with no recompilation. E.g. tcpdump can filter
packets at 10-15 Mpps.
https://code.google.com/p/netmap-ipfw
a userspace version of ipfw+dummynet which uses netmap
to send/receive packets. Speed is up in the 7-10 Mpps
range per core for simple rulesets.
Both netmap-libpcap and netmap-ipfw will be merged upstream at some
point, but while this happens it is useful to have access to them.
And yes, this code will be merged soon. It is infinitely better
than the version currently in 10 and 9.
MFC after: 3 days
2014-02-15 04:53:04 +00:00
|
|
|
/* transmit rings */
|
|
|
|
for (i =0, kring = na->tx_rings; kring != na->rx_rings; kring++, i++) {
|
|
|
|
if (kring->ring) {
|
|
|
|
ND("%s %ld already created", kring->name, kring - na->tx_rings);
|
|
|
|
continue; /* already created by somebody else */
|
|
|
|
}
|
2013-12-15 08:37:24 +00:00
|
|
|
ndesc = kring->nkr_num_slots;
|
2012-04-13 16:32:33 +00:00
|
|
|
len = sizeof(struct netmap_ring) +
|
|
|
|
ndesc * sizeof(struct netmap_slot);
|
2013-11-01 21:21:14 +00:00
|
|
|
ring = netmap_ring_malloc(na->nm_mem, len);
|
2012-04-13 16:32:33 +00:00
|
|
|
if (ring == NULL) {
|
2013-12-15 08:37:24 +00:00
|
|
|
D("Cannot allocate tx_ring");
|
2012-04-13 16:32:33 +00:00
|
|
|
goto cleanup;
|
|
|
|
}
|
2014-01-10 16:00:27 +00:00
|
|
|
ND("txring at %p", ring);
|
2012-04-13 16:32:33 +00:00
|
|
|
kring->ring = ring;
|
2013-12-15 08:37:24 +00:00
|
|
|
*(uint32_t *)(uintptr_t)&ring->num_slots = ndesc;
|
It is 2014 and we have a new version of netmap.
Most relevant features:
- netmap emulation on any NIC, even those without native netmap support.
On the ixgbe we have measured about 4Mpps/core/queue in this mode,
which is still a lot more than with sockets/bpf.
- seamless interconnection of VALE switch, NICs and host stack.
If you disable accelerations on your NIC (say em0)
ifconfig em0 -txcsum -txcsum
you can use the VALE switch to connect the NIC and the host stack:
vale-ctl -h valeXX:em0
allowing sharing the NIC with other netmap clients.
- THE USER API HAS SLIGHTLY CHANGED (head/cur/tail pointers
instead of pointers/count as before). This was unavoidable to support,
in the future, multiple threads operating on the same rings.
Netmap clients require very small source code changes to compile again.
On the plus side, the new API should be easier to understand
and the internals are a lot simpler.
The manual page has been updated extensively to reflect the current
features and give some examples.
This is the result of work of several people including Giuseppe Lettieri,
Vincenzo Maffione, Michio Honda and myself, and has been financially
supported by EU projects CHANGE and OPENLAB, from NetApp University
Research Fund, NEC, and of course the Universita` di Pisa.
2014-01-06 12:53:15 +00:00
|
|
|
*(int64_t *)(uintptr_t)&ring->buf_ofs =
|
2013-11-01 21:21:14 +00:00
|
|
|
(na->nm_mem->pools[NETMAP_IF_POOL].memtotal +
|
|
|
|
na->nm_mem->pools[NETMAP_RING_POOL].memtotal) -
|
|
|
|
netmap_ring_offset(na->nm_mem, ring);
|
2012-04-13 16:32:33 +00:00
|
|
|
|
It is 2014 and we have a new version of netmap.
Most relevant features:
- netmap emulation on any NIC, even those without native netmap support.
On the ixgbe we have measured about 4Mpps/core/queue in this mode,
which is still a lot more than with sockets/bpf.
- seamless interconnection of VALE switch, NICs and host stack.
If you disable accelerations on your NIC (say em0)
ifconfig em0 -txcsum -txcsum
you can use the VALE switch to connect the NIC and the host stack:
vale-ctl -h valeXX:em0
allowing sharing the NIC with other netmap clients.
- THE USER API HAS SLIGHTLY CHANGED (head/cur/tail pointers
instead of pointers/count as before). This was unavoidable to support,
in the future, multiple threads operating on the same rings.
Netmap clients require very small source code changes to compile again.
On the plus side, the new API should be easier to understand
and the internals are a lot simpler.
The manual page has been updated extensively to reflect the current
features and give some examples.
This is the result of work of several people including Giuseppe Lettieri,
Vincenzo Maffione, Michio Honda and myself, and has been financially
supported by EU projects CHANGE and OPENLAB, from NetApp University
Research Fund, NEC, and of course the Universita` di Pisa.
2014-01-06 12:53:15 +00:00
|
|
|
/* copy values from kring */
|
|
|
|
ring->head = kring->rhead;
|
|
|
|
ring->cur = kring->rcur;
|
|
|
|
ring->tail = kring->rtail;
|
2013-11-01 21:21:14 +00:00
|
|
|
*(uint16_t *)(uintptr_t)&ring->nr_buf_size =
|
Update to the current version of netmap.
Mostly bugfixes or features developed in the past 6 months,
so this is a 10.1 candidate.
Basically no user API changes (some bugfixes in sys/net/netmap_user.h).
In detail:
1. netmap support for virtio-net, including in netmap mode.
Under bhyve and with a netmap backend [2] we reach over 1Mpps
with standard APIs (e.g. libpcap), and 5-8 Mpps in netmap mode.
2. (kernel) add support for multiple memory allocators, so we can
better partition physical and virtual interfaces giving access
to separate users. The most visible effect is one additional
argument to the various kernel functions to compute buffer
addresses. All netmap-supported drivers are affected, but changes
are mechanical and trivial
3. (kernel) simplify the prototype for *txsync() and *rxsync()
driver methods. All netmap drivers affected, changes mostly mechanical.
4. add support for netmap-monitor ports. Think of it as a mirroring
port on a physical switch: a netmap monitor port replicates traffic
present on the main port. Restrictions apply. Drive carefully.
5. if_lem.c: support for various paravirtualization features,
experimental and disabled by default.
Most of these are described in our ANCS'13 paper [1].
Paravirtualized support in netmap mode is new, and beats the
numbers in the paper by a large factor (under qemu-kvm,
we measured gues-host throughput up to 10-12 Mpps).
A lot of refactoring and additional documentation in the files
in sys/dev/netmap, but apart from #2 and #3 above, almost nothing
of this stuff is visible to other kernel parts.
Example programs in tools/tools/netmap have been updated with bugfixes
and to support more of the existing features.
This is meant to go into 10.1 so we plan an MFC before the Aug.22 deadline.
A lot of this code has been contributed by my colleagues at UNIPI,
including Giuseppe Lettieri, Vincenzo Maffione, Stefano Garzarella.
MFC after: 3 days.
2014-08-16 15:00:01 +00:00
|
|
|
netmap_mem_bufsize(na->nm_mem);
|
This new version of netmap brings you the following:
- netmap pipes, providing bidirectional blocking I/O while moving
100+ Mpps between processes using shared memory channels
(no mistake: over one hundred million. But mind you, i said
*moving* not *processing*);
- kqueue support (BHyVe needs it);
- improved user library. Just the interface name lets you select a NIC,
host port, VALE switch port, netmap pipe, and individual queues.
The upcoming netmap-enabled libpcap will use this feature.
- optional extra buffers associated to netmap ports, for applications
that need to buffer data yet don't want to make copies.
- segmentation offloading for the VALE switch, useful between VMs.
and a number of bug fixes and performance improvements.
My colleagues Giuseppe Lettieri and Vincenzo Maffione did a substantial
amount of work on these features so we owe them a big thanks.
There are some external repositories that can be of interest:
https://code.google.com/p/netmap
our public repository for netmap/VALE code, including
linux versions and other stuff that does not belong here,
such as python bindings.
https://code.google.com/p/netmap-libpcap
a clone of the libpcap repository with netmap support.
With this any libpcap client has access to most netmap
feature with no recompilation. E.g. tcpdump can filter
packets at 10-15 Mpps.
https://code.google.com/p/netmap-ipfw
a userspace version of ipfw+dummynet which uses netmap
to send/receive packets. Speed is up in the 7-10 Mpps
range per core for simple rulesets.
Both netmap-libpcap and netmap-ipfw will be merged upstream at some
point, but while this happens it is useful to have access to them.
And yes, this code will be merged soon. It is infinitely better
than the version currently in 10 and 9.
MFC after: 3 days
2014-02-15 04:53:04 +00:00
|
|
|
ND("%s h %d c %d t %d", kring->name,
|
|
|
|
ring->head, ring->cur, ring->tail);
|
2013-12-15 08:37:24 +00:00
|
|
|
ND("initializing slots for txring");
|
This new version of netmap brings you the following:
- netmap pipes, providing bidirectional blocking I/O while moving
100+ Mpps between processes using shared memory channels
(no mistake: over one hundred million. But mind you, i said
*moving* not *processing*);
- kqueue support (BHyVe needs it);
- improved user library. Just the interface name lets you select a NIC,
host port, VALE switch port, netmap pipe, and individual queues.
The upcoming netmap-enabled libpcap will use this feature.
- optional extra buffers associated to netmap ports, for applications
that need to buffer data yet don't want to make copies.
- segmentation offloading for the VALE switch, useful between VMs.
and a number of bug fixes and performance improvements.
My colleagues Giuseppe Lettieri and Vincenzo Maffione did a substantial
amount of work on these features so we owe them a big thanks.
There are some external repositories that can be of interest:
https://code.google.com/p/netmap
our public repository for netmap/VALE code, including
linux versions and other stuff that does not belong here,
such as python bindings.
https://code.google.com/p/netmap-libpcap
a clone of the libpcap repository with netmap support.
With this any libpcap client has access to most netmap
feature with no recompilation. E.g. tcpdump can filter
packets at 10-15 Mpps.
https://code.google.com/p/netmap-ipfw
a userspace version of ipfw+dummynet which uses netmap
to send/receive packets. Speed is up in the 7-10 Mpps
range per core for simple rulesets.
Both netmap-libpcap and netmap-ipfw will be merged upstream at some
point, but while this happens it is useful to have access to them.
And yes, this code will be merged soon. It is infinitely better
than the version currently in 10 and 9.
MFC after: 3 days
2014-02-15 04:53:04 +00:00
|
|
|
if (i != na->num_tx_rings || (na->na_flags & NAF_HOST_RINGS)) {
|
|
|
|
/* this is a real ring */
|
|
|
|
if (netmap_new_bufs(na->nm_mem, ring->slot, ndesc)) {
|
|
|
|
D("Cannot allocate buffers for tx_ring");
|
|
|
|
goto cleanup;
|
|
|
|
}
|
|
|
|
} else {
|
|
|
|
/* this is a fake tx ring, set all indices to 0 */
|
|
|
|
netmap_mem_set_ring(na->nm_mem, ring->slot, ndesc, 0);
|
2012-10-19 04:13:12 +00:00
|
|
|
}
|
2012-04-13 16:32:33 +00:00
|
|
|
}
|
|
|
|
|
This new version of netmap brings you the following:
- netmap pipes, providing bidirectional blocking I/O while moving
100+ Mpps between processes using shared memory channels
(no mistake: over one hundred million. But mind you, i said
*moving* not *processing*);
- kqueue support (BHyVe needs it);
- improved user library. Just the interface name lets you select a NIC,
host port, VALE switch port, netmap pipe, and individual queues.
The upcoming netmap-enabled libpcap will use this feature.
- optional extra buffers associated to netmap ports, for applications
that need to buffer data yet don't want to make copies.
- segmentation offloading for the VALE switch, useful between VMs.
and a number of bug fixes and performance improvements.
My colleagues Giuseppe Lettieri and Vincenzo Maffione did a substantial
amount of work on these features so we owe them a big thanks.
There are some external repositories that can be of interest:
https://code.google.com/p/netmap
our public repository for netmap/VALE code, including
linux versions and other stuff that does not belong here,
such as python bindings.
https://code.google.com/p/netmap-libpcap
a clone of the libpcap repository with netmap support.
With this any libpcap client has access to most netmap
feature with no recompilation. E.g. tcpdump can filter
packets at 10-15 Mpps.
https://code.google.com/p/netmap-ipfw
a userspace version of ipfw+dummynet which uses netmap
to send/receive packets. Speed is up in the 7-10 Mpps
range per core for simple rulesets.
Both netmap-libpcap and netmap-ipfw will be merged upstream at some
point, but while this happens it is useful to have access to them.
And yes, this code will be merged soon. It is infinitely better
than the version currently in 10 and 9.
MFC after: 3 days
2014-02-15 04:53:04 +00:00
|
|
|
/* receive rings */
|
|
|
|
for ( i = 0 /* kring cont'd from above */ ; kring != na->tailroom; kring++, i++) {
|
|
|
|
if (kring->ring) {
|
|
|
|
ND("%s %ld already created", kring->name, kring - na->rx_rings);
|
|
|
|
continue; /* already created by somebody else */
|
|
|
|
}
|
2013-12-15 08:37:24 +00:00
|
|
|
ndesc = kring->nkr_num_slots;
|
2012-04-13 16:32:33 +00:00
|
|
|
len = sizeof(struct netmap_ring) +
|
|
|
|
ndesc * sizeof(struct netmap_slot);
|
2013-11-01 21:21:14 +00:00
|
|
|
ring = netmap_ring_malloc(na->nm_mem, len);
|
2012-04-13 16:32:33 +00:00
|
|
|
if (ring == NULL) {
|
2013-12-15 08:37:24 +00:00
|
|
|
D("Cannot allocate rx_ring");
|
2012-04-13 16:32:33 +00:00
|
|
|
goto cleanup;
|
|
|
|
}
|
2014-01-10 16:00:27 +00:00
|
|
|
ND("rxring at %p", ring);
|
2012-04-13 16:32:33 +00:00
|
|
|
kring->ring = ring;
|
2013-12-15 08:37:24 +00:00
|
|
|
*(uint32_t *)(uintptr_t)&ring->num_slots = ndesc;
|
It is 2014 and we have a new version of netmap.
Most relevant features:
- netmap emulation on any NIC, even those without native netmap support.
On the ixgbe we have measured about 4Mpps/core/queue in this mode,
which is still a lot more than with sockets/bpf.
- seamless interconnection of VALE switch, NICs and host stack.
If you disable accelerations on your NIC (say em0)
ifconfig em0 -txcsum -txcsum
you can use the VALE switch to connect the NIC and the host stack:
vale-ctl -h valeXX:em0
allowing sharing the NIC with other netmap clients.
- THE USER API HAS SLIGHTLY CHANGED (head/cur/tail pointers
instead of pointers/count as before). This was unavoidable to support,
in the future, multiple threads operating on the same rings.
Netmap clients require very small source code changes to compile again.
On the plus side, the new API should be easier to understand
and the internals are a lot simpler.
The manual page has been updated extensively to reflect the current
features and give some examples.
This is the result of work of several people including Giuseppe Lettieri,
Vincenzo Maffione, Michio Honda and myself, and has been financially
supported by EU projects CHANGE and OPENLAB, from NetApp University
Research Fund, NEC, and of course the Universita` di Pisa.
2014-01-06 12:53:15 +00:00
|
|
|
*(int64_t *)(uintptr_t)&ring->buf_ofs =
|
2013-11-01 21:21:14 +00:00
|
|
|
(na->nm_mem->pools[NETMAP_IF_POOL].memtotal +
|
|
|
|
na->nm_mem->pools[NETMAP_RING_POOL].memtotal) -
|
|
|
|
netmap_ring_offset(na->nm_mem, ring);
|
2012-04-13 16:32:33 +00:00
|
|
|
|
It is 2014 and we have a new version of netmap.
Most relevant features:
- netmap emulation on any NIC, even those without native netmap support.
On the ixgbe we have measured about 4Mpps/core/queue in this mode,
which is still a lot more than with sockets/bpf.
- seamless interconnection of VALE switch, NICs and host stack.
If you disable accelerations on your NIC (say em0)
ifconfig em0 -txcsum -txcsum
you can use the VALE switch to connect the NIC and the host stack:
vale-ctl -h valeXX:em0
allowing sharing the NIC with other netmap clients.
- THE USER API HAS SLIGHTLY CHANGED (head/cur/tail pointers
instead of pointers/count as before). This was unavoidable to support,
in the future, multiple threads operating on the same rings.
Netmap clients require very small source code changes to compile again.
On the plus side, the new API should be easier to understand
and the internals are a lot simpler.
The manual page has been updated extensively to reflect the current
features and give some examples.
This is the result of work of several people including Giuseppe Lettieri,
Vincenzo Maffione, Michio Honda and myself, and has been financially
supported by EU projects CHANGE and OPENLAB, from NetApp University
Research Fund, NEC, and of course the Universita` di Pisa.
2014-01-06 12:53:15 +00:00
|
|
|
/* copy values from kring */
|
|
|
|
ring->head = kring->rhead;
|
|
|
|
ring->cur = kring->rcur;
|
|
|
|
ring->tail = kring->rtail;
|
2013-11-01 21:21:14 +00:00
|
|
|
*(int *)(uintptr_t)&ring->nr_buf_size =
|
Update to the current version of netmap.
Mostly bugfixes or features developed in the past 6 months,
so this is a 10.1 candidate.
Basically no user API changes (some bugfixes in sys/net/netmap_user.h).
In detail:
1. netmap support for virtio-net, including in netmap mode.
Under bhyve and with a netmap backend [2] we reach over 1Mpps
with standard APIs (e.g. libpcap), and 5-8 Mpps in netmap mode.
2. (kernel) add support for multiple memory allocators, so we can
better partition physical and virtual interfaces giving access
to separate users. The most visible effect is one additional
argument to the various kernel functions to compute buffer
addresses. All netmap-supported drivers are affected, but changes
are mechanical and trivial
3. (kernel) simplify the prototype for *txsync() and *rxsync()
driver methods. All netmap drivers affected, changes mostly mechanical.
4. add support for netmap-monitor ports. Think of it as a mirroring
port on a physical switch: a netmap monitor port replicates traffic
present on the main port. Restrictions apply. Drive carefully.
5. if_lem.c: support for various paravirtualization features,
experimental and disabled by default.
Most of these are described in our ANCS'13 paper [1].
Paravirtualized support in netmap mode is new, and beats the
numbers in the paper by a large factor (under qemu-kvm,
we measured gues-host throughput up to 10-12 Mpps).
A lot of refactoring and additional documentation in the files
in sys/dev/netmap, but apart from #2 and #3 above, almost nothing
of this stuff is visible to other kernel parts.
Example programs in tools/tools/netmap have been updated with bugfixes
and to support more of the existing features.
This is meant to go into 10.1 so we plan an MFC before the Aug.22 deadline.
A lot of this code has been contributed by my colleagues at UNIPI,
including Giuseppe Lettieri, Vincenzo Maffione, Stefano Garzarella.
MFC after: 3 days.
2014-08-16 15:00:01 +00:00
|
|
|
netmap_mem_bufsize(na->nm_mem);
|
This new version of netmap brings you the following:
- netmap pipes, providing bidirectional blocking I/O while moving
100+ Mpps between processes using shared memory channels
(no mistake: over one hundred million. But mind you, i said
*moving* not *processing*);
- kqueue support (BHyVe needs it);
- improved user library. Just the interface name lets you select a NIC,
host port, VALE switch port, netmap pipe, and individual queues.
The upcoming netmap-enabled libpcap will use this feature.
- optional extra buffers associated to netmap ports, for applications
that need to buffer data yet don't want to make copies.
- segmentation offloading for the VALE switch, useful between VMs.
and a number of bug fixes and performance improvements.
My colleagues Giuseppe Lettieri and Vincenzo Maffione did a substantial
amount of work on these features so we owe them a big thanks.
There are some external repositories that can be of interest:
https://code.google.com/p/netmap
our public repository for netmap/VALE code, including
linux versions and other stuff that does not belong here,
such as python bindings.
https://code.google.com/p/netmap-libpcap
a clone of the libpcap repository with netmap support.
With this any libpcap client has access to most netmap
feature with no recompilation. E.g. tcpdump can filter
packets at 10-15 Mpps.
https://code.google.com/p/netmap-ipfw
a userspace version of ipfw+dummynet which uses netmap
to send/receive packets. Speed is up in the 7-10 Mpps
range per core for simple rulesets.
Both netmap-libpcap and netmap-ipfw will be merged upstream at some
point, but while this happens it is useful to have access to them.
And yes, this code will be merged soon. It is infinitely better
than the version currently in 10 and 9.
MFC after: 3 days
2014-02-15 04:53:04 +00:00
|
|
|
ND("%s h %d c %d t %d", kring->name,
|
|
|
|
ring->head, ring->cur, ring->tail);
|
2014-01-10 16:00:27 +00:00
|
|
|
ND("initializing slots for rxring %p", ring);
|
This new version of netmap brings you the following:
- netmap pipes, providing bidirectional blocking I/O while moving
100+ Mpps between processes using shared memory channels
(no mistake: over one hundred million. But mind you, i said
*moving* not *processing*);
- kqueue support (BHyVe needs it);
- improved user library. Just the interface name lets you select a NIC,
host port, VALE switch port, netmap pipe, and individual queues.
The upcoming netmap-enabled libpcap will use this feature.
- optional extra buffers associated to netmap ports, for applications
that need to buffer data yet don't want to make copies.
- segmentation offloading for the VALE switch, useful between VMs.
and a number of bug fixes and performance improvements.
My colleagues Giuseppe Lettieri and Vincenzo Maffione did a substantial
amount of work on these features so we owe them a big thanks.
There are some external repositories that can be of interest:
https://code.google.com/p/netmap
our public repository for netmap/VALE code, including
linux versions and other stuff that does not belong here,
such as python bindings.
https://code.google.com/p/netmap-libpcap
a clone of the libpcap repository with netmap support.
With this any libpcap client has access to most netmap
feature with no recompilation. E.g. tcpdump can filter
packets at 10-15 Mpps.
https://code.google.com/p/netmap-ipfw
a userspace version of ipfw+dummynet which uses netmap
to send/receive packets. Speed is up in the 7-10 Mpps
range per core for simple rulesets.
Both netmap-libpcap and netmap-ipfw will be merged upstream at some
point, but while this happens it is useful to have access to them.
And yes, this code will be merged soon. It is infinitely better
than the version currently in 10 and 9.
MFC after: 3 days
2014-02-15 04:53:04 +00:00
|
|
|
if (i != na->num_rx_rings || (na->na_flags & NAF_HOST_RINGS)) {
|
|
|
|
/* this is a real ring */
|
|
|
|
if (netmap_new_bufs(na->nm_mem, ring->slot, ndesc)) {
|
|
|
|
D("Cannot allocate buffers for rx_ring");
|
|
|
|
goto cleanup;
|
|
|
|
}
|
|
|
|
} else {
|
|
|
|
/* this is a fake rx ring, set all indices to 1 */
|
|
|
|
netmap_mem_set_ring(na->nm_mem, ring->slot, ndesc, 1);
|
2012-10-19 04:13:12 +00:00
|
|
|
}
|
2012-04-13 16:32:33 +00:00
|
|
|
}
|
2013-12-15 08:37:24 +00:00
|
|
|
|
|
|
|
NMA_UNLOCK(na->nm_mem);
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
cleanup:
|
|
|
|
netmap_free_rings(na);
|
|
|
|
|
|
|
|
NMA_UNLOCK(na->nm_mem);
|
|
|
|
|
|
|
|
return ENOMEM;
|
|
|
|
}
|
|
|
|
|
|
|
|
void
|
|
|
|
netmap_mem_rings_delete(struct netmap_adapter *na)
|
|
|
|
{
|
|
|
|
/* last instance, release bufs and rings */
|
|
|
|
NMA_LOCK(na->nm_mem);
|
|
|
|
|
|
|
|
netmap_free_rings(na);
|
|
|
|
|
|
|
|
NMA_UNLOCK(na->nm_mem);
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
/* call with NMA_LOCK held */
|
|
|
|
/*
|
|
|
|
* Allocate the per-fd structure netmap_if.
|
|
|
|
*
|
|
|
|
* We assume that the configuration stored in na
|
|
|
|
* (number of tx/rx rings and descs) does not change while
|
|
|
|
* the interface is in netmap mode.
|
|
|
|
*/
|
|
|
|
struct netmap_if *
|
Update to the current version of netmap.
Mostly bugfixes or features developed in the past 6 months,
so this is a 10.1 candidate.
Basically no user API changes (some bugfixes in sys/net/netmap_user.h).
In detail:
1. netmap support for virtio-net, including in netmap mode.
Under bhyve and with a netmap backend [2] we reach over 1Mpps
with standard APIs (e.g. libpcap), and 5-8 Mpps in netmap mode.
2. (kernel) add support for multiple memory allocators, so we can
better partition physical and virtual interfaces giving access
to separate users. The most visible effect is one additional
argument to the various kernel functions to compute buffer
addresses. All netmap-supported drivers are affected, but changes
are mechanical and trivial
3. (kernel) simplify the prototype for *txsync() and *rxsync()
driver methods. All netmap drivers affected, changes mostly mechanical.
4. add support for netmap-monitor ports. Think of it as a mirroring
port on a physical switch: a netmap monitor port replicates traffic
present on the main port. Restrictions apply. Drive carefully.
5. if_lem.c: support for various paravirtualization features,
experimental and disabled by default.
Most of these are described in our ANCS'13 paper [1].
Paravirtualized support in netmap mode is new, and beats the
numbers in the paper by a large factor (under qemu-kvm,
we measured gues-host throughput up to 10-12 Mpps).
A lot of refactoring and additional documentation in the files
in sys/dev/netmap, but apart from #2 and #3 above, almost nothing
of this stuff is visible to other kernel parts.
Example programs in tools/tools/netmap have been updated with bugfixes
and to support more of the existing features.
This is meant to go into 10.1 so we plan an MFC before the Aug.22 deadline.
A lot of this code has been contributed by my colleagues at UNIPI,
including Giuseppe Lettieri, Vincenzo Maffione, Stefano Garzarella.
MFC after: 3 days.
2014-08-16 15:00:01 +00:00
|
|
|
netmap_mem_if_new(struct netmap_adapter *na)
|
2013-12-15 08:37:24 +00:00
|
|
|
{
|
|
|
|
struct netmap_if *nifp;
|
|
|
|
ssize_t base; /* handy for relative offsets between rings and nifp */
|
|
|
|
u_int i, len, ntx, nrx;
|
|
|
|
|
This new version of netmap brings you the following:
- netmap pipes, providing bidirectional blocking I/O while moving
100+ Mpps between processes using shared memory channels
(no mistake: over one hundred million. But mind you, i said
*moving* not *processing*);
- kqueue support (BHyVe needs it);
- improved user library. Just the interface name lets you select a NIC,
host port, VALE switch port, netmap pipe, and individual queues.
The upcoming netmap-enabled libpcap will use this feature.
- optional extra buffers associated to netmap ports, for applications
that need to buffer data yet don't want to make copies.
- segmentation offloading for the VALE switch, useful between VMs.
and a number of bug fixes and performance improvements.
My colleagues Giuseppe Lettieri and Vincenzo Maffione did a substantial
amount of work on these features so we owe them a big thanks.
There are some external repositories that can be of interest:
https://code.google.com/p/netmap
our public repository for netmap/VALE code, including
linux versions and other stuff that does not belong here,
such as python bindings.
https://code.google.com/p/netmap-libpcap
a clone of the libpcap repository with netmap support.
With this any libpcap client has access to most netmap
feature with no recompilation. E.g. tcpdump can filter
packets at 10-15 Mpps.
https://code.google.com/p/netmap-ipfw
a userspace version of ipfw+dummynet which uses netmap
to send/receive packets. Speed is up in the 7-10 Mpps
range per core for simple rulesets.
Both netmap-libpcap and netmap-ipfw will be merged upstream at some
point, but while this happens it is useful to have access to them.
And yes, this code will be merged soon. It is infinitely better
than the version currently in 10 and 9.
MFC after: 3 days
2014-02-15 04:53:04 +00:00
|
|
|
/* account for the (eventually fake) host rings */
|
|
|
|
ntx = na->num_tx_rings + 1;
|
|
|
|
nrx = na->num_rx_rings + 1;
|
2013-12-15 08:37:24 +00:00
|
|
|
/*
|
|
|
|
* the descriptor is followed inline by an array of offsets
|
|
|
|
* to the tx and rx rings in the shared memory region.
|
|
|
|
*/
|
|
|
|
|
|
|
|
NMA_LOCK(na->nm_mem);
|
|
|
|
|
|
|
|
len = sizeof(struct netmap_if) + (nrx + ntx) * sizeof(ssize_t);
|
|
|
|
nifp = netmap_if_malloc(na->nm_mem, len);
|
|
|
|
if (nifp == NULL) {
|
|
|
|
NMA_UNLOCK(na->nm_mem);
|
|
|
|
return NULL;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* initialize base fields -- override const */
|
|
|
|
*(u_int *)(uintptr_t)&nifp->ni_tx_rings = na->num_tx_rings;
|
|
|
|
*(u_int *)(uintptr_t)&nifp->ni_rx_rings = na->num_rx_rings;
|
Update to the current version of netmap.
Mostly bugfixes or features developed in the past 6 months,
so this is a 10.1 candidate.
Basically no user API changes (some bugfixes in sys/net/netmap_user.h).
In detail:
1. netmap support for virtio-net, including in netmap mode.
Under bhyve and with a netmap backend [2] we reach over 1Mpps
with standard APIs (e.g. libpcap), and 5-8 Mpps in netmap mode.
2. (kernel) add support for multiple memory allocators, so we can
better partition physical and virtual interfaces giving access
to separate users. The most visible effect is one additional
argument to the various kernel functions to compute buffer
addresses. All netmap-supported drivers are affected, but changes
are mechanical and trivial
3. (kernel) simplify the prototype for *txsync() and *rxsync()
driver methods. All netmap drivers affected, changes mostly mechanical.
4. add support for netmap-monitor ports. Think of it as a mirroring
port on a physical switch: a netmap monitor port replicates traffic
present on the main port. Restrictions apply. Drive carefully.
5. if_lem.c: support for various paravirtualization features,
experimental and disabled by default.
Most of these are described in our ANCS'13 paper [1].
Paravirtualized support in netmap mode is new, and beats the
numbers in the paper by a large factor (under qemu-kvm,
we measured gues-host throughput up to 10-12 Mpps).
A lot of refactoring and additional documentation in the files
in sys/dev/netmap, but apart from #2 and #3 above, almost nothing
of this stuff is visible to other kernel parts.
Example programs in tools/tools/netmap have been updated with bugfixes
and to support more of the existing features.
This is meant to go into 10.1 so we plan an MFC before the Aug.22 deadline.
A lot of this code has been contributed by my colleagues at UNIPI,
including Giuseppe Lettieri, Vincenzo Maffione, Stefano Garzarella.
MFC after: 3 days.
2014-08-16 15:00:01 +00:00
|
|
|
strncpy(nifp->ni_name, na->name, (size_t)IFNAMSIZ);
|
2013-12-15 08:37:24 +00:00
|
|
|
|
2012-04-13 16:32:33 +00:00
|
|
|
/*
|
|
|
|
* fill the slots for the rx and tx rings. They contain the offset
|
|
|
|
* between the ring and nifp, so the information is usable in
|
|
|
|
* userspace to reach the ring from the nifp.
|
|
|
|
*/
|
2013-11-01 21:21:14 +00:00
|
|
|
base = netmap_if_offset(na->nm_mem, nifp);
|
2012-04-13 16:32:33 +00:00
|
|
|
for (i = 0; i < ntx; i++) {
|
|
|
|
*(ssize_t *)(uintptr_t)&nifp->ring_ofs[i] =
|
2013-11-01 21:21:14 +00:00
|
|
|
netmap_ring_offset(na->nm_mem, na->tx_rings[i].ring) - base;
|
2012-04-13 16:32:33 +00:00
|
|
|
}
|
|
|
|
for (i = 0; i < nrx; i++) {
|
|
|
|
*(ssize_t *)(uintptr_t)&nifp->ring_ofs[i+ntx] =
|
2013-11-01 21:21:14 +00:00
|
|
|
netmap_ring_offset(na->nm_mem, na->rx_rings[i].ring) - base;
|
2012-04-13 16:32:33 +00:00
|
|
|
}
|
2013-11-01 21:21:14 +00:00
|
|
|
|
|
|
|
NMA_UNLOCK(na->nm_mem);
|
|
|
|
|
2012-04-13 16:32:33 +00:00
|
|
|
return (nifp);
|
|
|
|
}
|
|
|
|
|
2013-11-01 21:21:14 +00:00
|
|
|
void
|
|
|
|
netmap_mem_if_delete(struct netmap_adapter *na, struct netmap_if *nifp)
|
|
|
|
{
|
|
|
|
if (nifp == NULL)
|
|
|
|
/* nothing to do */
|
|
|
|
return;
|
|
|
|
NMA_LOCK(na->nm_mem);
|
This new version of netmap brings you the following:
- netmap pipes, providing bidirectional blocking I/O while moving
100+ Mpps between processes using shared memory channels
(no mistake: over one hundred million. But mind you, i said
*moving* not *processing*);
- kqueue support (BHyVe needs it);
- improved user library. Just the interface name lets you select a NIC,
host port, VALE switch port, netmap pipe, and individual queues.
The upcoming netmap-enabled libpcap will use this feature.
- optional extra buffers associated to netmap ports, for applications
that need to buffer data yet don't want to make copies.
- segmentation offloading for the VALE switch, useful between VMs.
and a number of bug fixes and performance improvements.
My colleagues Giuseppe Lettieri and Vincenzo Maffione did a substantial
amount of work on these features so we owe them a big thanks.
There are some external repositories that can be of interest:
https://code.google.com/p/netmap
our public repository for netmap/VALE code, including
linux versions and other stuff that does not belong here,
such as python bindings.
https://code.google.com/p/netmap-libpcap
a clone of the libpcap repository with netmap support.
With this any libpcap client has access to most netmap
feature with no recompilation. E.g. tcpdump can filter
packets at 10-15 Mpps.
https://code.google.com/p/netmap-ipfw
a userspace version of ipfw+dummynet which uses netmap
to send/receive packets. Speed is up in the 7-10 Mpps
range per core for simple rulesets.
Both netmap-libpcap and netmap-ipfw will be merged upstream at some
point, but while this happens it is useful to have access to them.
And yes, this code will be merged soon. It is infinitely better
than the version currently in 10 and 9.
MFC after: 3 days
2014-02-15 04:53:04 +00:00
|
|
|
if (nifp->ni_bufs_head)
|
|
|
|
netmap_extra_free(na, nifp->ni_bufs_head);
|
2013-11-01 21:21:14 +00:00
|
|
|
netmap_if_free(na->nm_mem, nifp);
|
|
|
|
|
|
|
|
NMA_UNLOCK(na->nm_mem);
|
|
|
|
}
|
|
|
|
|
2012-04-13 16:32:33 +00:00
|
|
|
static void
|
2013-11-01 21:21:14 +00:00
|
|
|
netmap_mem_global_deref(struct netmap_mem_d *nmd)
|
2012-04-13 16:32:33 +00:00
|
|
|
{
|
2013-11-01 21:21:14 +00:00
|
|
|
|
|
|
|
nmd->refcount--;
|
Update to the current version of netmap.
Mostly bugfixes or features developed in the past 6 months,
so this is a 10.1 candidate.
Basically no user API changes (some bugfixes in sys/net/netmap_user.h).
In detail:
1. netmap support for virtio-net, including in netmap mode.
Under bhyve and with a netmap backend [2] we reach over 1Mpps
with standard APIs (e.g. libpcap), and 5-8 Mpps in netmap mode.
2. (kernel) add support for multiple memory allocators, so we can
better partition physical and virtual interfaces giving access
to separate users. The most visible effect is one additional
argument to the various kernel functions to compute buffer
addresses. All netmap-supported drivers are affected, but changes
are mechanical and trivial
3. (kernel) simplify the prototype for *txsync() and *rxsync()
driver methods. All netmap drivers affected, changes mostly mechanical.
4. add support for netmap-monitor ports. Think of it as a mirroring
port on a physical switch: a netmap monitor port replicates traffic
present on the main port. Restrictions apply. Drive carefully.
5. if_lem.c: support for various paravirtualization features,
experimental and disabled by default.
Most of these are described in our ANCS'13 paper [1].
Paravirtualized support in netmap mode is new, and beats the
numbers in the paper by a large factor (under qemu-kvm,
we measured gues-host throughput up to 10-12 Mpps).
A lot of refactoring and additional documentation in the files
in sys/dev/netmap, but apart from #2 and #3 above, almost nothing
of this stuff is visible to other kernel parts.
Example programs in tools/tools/netmap have been updated with bugfixes
and to support more of the existing features.
This is meant to go into 10.1 so we plan an MFC before the Aug.22 deadline.
A lot of this code has been contributed by my colleagues at UNIPI,
including Giuseppe Lettieri, Vincenzo Maffione, Stefano Garzarella.
MFC after: 3 days.
2014-08-16 15:00:01 +00:00
|
|
|
if (!nmd->refcount)
|
|
|
|
nmd->nm_grp = -1;
|
2013-01-23 03:51:47 +00:00
|
|
|
if (netmap_verbose)
|
2013-11-01 21:21:14 +00:00
|
|
|
D("refcount = %d", nmd->refcount);
|
|
|
|
|
|
|
|
}
|
|
|
|
|
2013-12-15 08:37:24 +00:00
|
|
|
int
|
Update to the current version of netmap.
Mostly bugfixes or features developed in the past 6 months,
so this is a 10.1 candidate.
Basically no user API changes (some bugfixes in sys/net/netmap_user.h).
In detail:
1. netmap support for virtio-net, including in netmap mode.
Under bhyve and with a netmap backend [2] we reach over 1Mpps
with standard APIs (e.g. libpcap), and 5-8 Mpps in netmap mode.
2. (kernel) add support for multiple memory allocators, so we can
better partition physical and virtual interfaces giving access
to separate users. The most visible effect is one additional
argument to the various kernel functions to compute buffer
addresses. All netmap-supported drivers are affected, but changes
are mechanical and trivial
3. (kernel) simplify the prototype for *txsync() and *rxsync()
driver methods. All netmap drivers affected, changes mostly mechanical.
4. add support for netmap-monitor ports. Think of it as a mirroring
port on a physical switch: a netmap monitor port replicates traffic
present on the main port. Restrictions apply. Drive carefully.
5. if_lem.c: support for various paravirtualization features,
experimental and disabled by default.
Most of these are described in our ANCS'13 paper [1].
Paravirtualized support in netmap mode is new, and beats the
numbers in the paper by a large factor (under qemu-kvm,
we measured gues-host throughput up to 10-12 Mpps).
A lot of refactoring and additional documentation in the files
in sys/dev/netmap, but apart from #2 and #3 above, almost nothing
of this stuff is visible to other kernel parts.
Example programs in tools/tools/netmap have been updated with bugfixes
and to support more of the existing features.
This is meant to go into 10.1 so we plan an MFC before the Aug.22 deadline.
A lot of this code has been contributed by my colleagues at UNIPI,
including Giuseppe Lettieri, Vincenzo Maffione, Stefano Garzarella.
MFC after: 3 days.
2014-08-16 15:00:01 +00:00
|
|
|
netmap_mem_finalize(struct netmap_mem_d *nmd, struct netmap_adapter *na)
|
2013-11-01 21:21:14 +00:00
|
|
|
{
|
Update to the current version of netmap.
Mostly bugfixes or features developed in the past 6 months,
so this is a 10.1 candidate.
Basically no user API changes (some bugfixes in sys/net/netmap_user.h).
In detail:
1. netmap support for virtio-net, including in netmap mode.
Under bhyve and with a netmap backend [2] we reach over 1Mpps
with standard APIs (e.g. libpcap), and 5-8 Mpps in netmap mode.
2. (kernel) add support for multiple memory allocators, so we can
better partition physical and virtual interfaces giving access
to separate users. The most visible effect is one additional
argument to the various kernel functions to compute buffer
addresses. All netmap-supported drivers are affected, but changes
are mechanical and trivial
3. (kernel) simplify the prototype for *txsync() and *rxsync()
driver methods. All netmap drivers affected, changes mostly mechanical.
4. add support for netmap-monitor ports. Think of it as a mirroring
port on a physical switch: a netmap monitor port replicates traffic
present on the main port. Restrictions apply. Drive carefully.
5. if_lem.c: support for various paravirtualization features,
experimental and disabled by default.
Most of these are described in our ANCS'13 paper [1].
Paravirtualized support in netmap mode is new, and beats the
numbers in the paper by a large factor (under qemu-kvm,
we measured gues-host throughput up to 10-12 Mpps).
A lot of refactoring and additional documentation in the files
in sys/dev/netmap, but apart from #2 and #3 above, almost nothing
of this stuff is visible to other kernel parts.
Example programs in tools/tools/netmap have been updated with bugfixes
and to support more of the existing features.
This is meant to go into 10.1 so we plan an MFC before the Aug.22 deadline.
A lot of this code has been contributed by my colleagues at UNIPI,
including Giuseppe Lettieri, Vincenzo Maffione, Stefano Garzarella.
MFC after: 3 days.
2014-08-16 15:00:01 +00:00
|
|
|
if (nm_mem_assign_group(nmd, na->pdev) < 0) {
|
|
|
|
return ENOMEM;
|
|
|
|
} else {
|
2015-05-15 15:36:57 +00:00
|
|
|
NMA_LOCK(nmd);
|
Update to the current version of netmap.
Mostly bugfixes or features developed in the past 6 months,
so this is a 10.1 candidate.
Basically no user API changes (some bugfixes in sys/net/netmap_user.h).
In detail:
1. netmap support for virtio-net, including in netmap mode.
Under bhyve and with a netmap backend [2] we reach over 1Mpps
with standard APIs (e.g. libpcap), and 5-8 Mpps in netmap mode.
2. (kernel) add support for multiple memory allocators, so we can
better partition physical and virtual interfaces giving access
to separate users. The most visible effect is one additional
argument to the various kernel functions to compute buffer
addresses. All netmap-supported drivers are affected, but changes
are mechanical and trivial
3. (kernel) simplify the prototype for *txsync() and *rxsync()
driver methods. All netmap drivers affected, changes mostly mechanical.
4. add support for netmap-monitor ports. Think of it as a mirroring
port on a physical switch: a netmap monitor port replicates traffic
present on the main port. Restrictions apply. Drive carefully.
5. if_lem.c: support for various paravirtualization features,
experimental and disabled by default.
Most of these are described in our ANCS'13 paper [1].
Paravirtualized support in netmap mode is new, and beats the
numbers in the paper by a large factor (under qemu-kvm,
we measured gues-host throughput up to 10-12 Mpps).
A lot of refactoring and additional documentation in the files
in sys/dev/netmap, but apart from #2 and #3 above, almost nothing
of this stuff is visible to other kernel parts.
Example programs in tools/tools/netmap have been updated with bugfixes
and to support more of the existing features.
This is meant to go into 10.1 so we plan an MFC before the Aug.22 deadline.
A lot of this code has been contributed by my colleagues at UNIPI,
including Giuseppe Lettieri, Vincenzo Maffione, Stefano Garzarella.
MFC after: 3 days.
2014-08-16 15:00:01 +00:00
|
|
|
nmd->finalize(nmd);
|
2015-05-15 15:36:57 +00:00
|
|
|
NMA_UNLOCK(nmd);
|
Update to the current version of netmap.
Mostly bugfixes or features developed in the past 6 months,
so this is a 10.1 candidate.
Basically no user API changes (some bugfixes in sys/net/netmap_user.h).
In detail:
1. netmap support for virtio-net, including in netmap mode.
Under bhyve and with a netmap backend [2] we reach over 1Mpps
with standard APIs (e.g. libpcap), and 5-8 Mpps in netmap mode.
2. (kernel) add support for multiple memory allocators, so we can
better partition physical and virtual interfaces giving access
to separate users. The most visible effect is one additional
argument to the various kernel functions to compute buffer
addresses. All netmap-supported drivers are affected, but changes
are mechanical and trivial
3. (kernel) simplify the prototype for *txsync() and *rxsync()
driver methods. All netmap drivers affected, changes mostly mechanical.
4. add support for netmap-monitor ports. Think of it as a mirroring
port on a physical switch: a netmap monitor port replicates traffic
present on the main port. Restrictions apply. Drive carefully.
5. if_lem.c: support for various paravirtualization features,
experimental and disabled by default.
Most of these are described in our ANCS'13 paper [1].
Paravirtualized support in netmap mode is new, and beats the
numbers in the paper by a large factor (under qemu-kvm,
we measured gues-host throughput up to 10-12 Mpps).
A lot of refactoring and additional documentation in the files
in sys/dev/netmap, but apart from #2 and #3 above, almost nothing
of this stuff is visible to other kernel parts.
Example programs in tools/tools/netmap have been updated with bugfixes
and to support more of the existing features.
This is meant to go into 10.1 so we plan an MFC before the Aug.22 deadline.
A lot of this code has been contributed by my colleagues at UNIPI,
including Giuseppe Lettieri, Vincenzo Maffione, Stefano Garzarella.
MFC after: 3 days.
2014-08-16 15:00:01 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
if (!nmd->lasterr && na->pdev)
|
|
|
|
netmap_mem_map(&nmd->pools[NETMAP_BUF_POOL], na);
|
|
|
|
|
|
|
|
return nmd->lasterr;
|
2013-11-01 21:21:14 +00:00
|
|
|
}
|
|
|
|
|
2013-12-15 08:37:24 +00:00
|
|
|
void
|
Update to the current version of netmap.
Mostly bugfixes or features developed in the past 6 months,
so this is a 10.1 candidate.
Basically no user API changes (some bugfixes in sys/net/netmap_user.h).
In detail:
1. netmap support for virtio-net, including in netmap mode.
Under bhyve and with a netmap backend [2] we reach over 1Mpps
with standard APIs (e.g. libpcap), and 5-8 Mpps in netmap mode.
2. (kernel) add support for multiple memory allocators, so we can
better partition physical and virtual interfaces giving access
to separate users. The most visible effect is one additional
argument to the various kernel functions to compute buffer
addresses. All netmap-supported drivers are affected, but changes
are mechanical and trivial
3. (kernel) simplify the prototype for *txsync() and *rxsync()
driver methods. All netmap drivers affected, changes mostly mechanical.
4. add support for netmap-monitor ports. Think of it as a mirroring
port on a physical switch: a netmap monitor port replicates traffic
present on the main port. Restrictions apply. Drive carefully.
5. if_lem.c: support for various paravirtualization features,
experimental and disabled by default.
Most of these are described in our ANCS'13 paper [1].
Paravirtualized support in netmap mode is new, and beats the
numbers in the paper by a large factor (under qemu-kvm,
we measured gues-host throughput up to 10-12 Mpps).
A lot of refactoring and additional documentation in the files
in sys/dev/netmap, but apart from #2 and #3 above, almost nothing
of this stuff is visible to other kernel parts.
Example programs in tools/tools/netmap have been updated with bugfixes
and to support more of the existing features.
This is meant to go into 10.1 so we plan an MFC before the Aug.22 deadline.
A lot of this code has been contributed by my colleagues at UNIPI,
including Giuseppe Lettieri, Vincenzo Maffione, Stefano Garzarella.
MFC after: 3 days.
2014-08-16 15:00:01 +00:00
|
|
|
netmap_mem_deref(struct netmap_mem_d *nmd, struct netmap_adapter *na)
|
2013-11-01 21:21:14 +00:00
|
|
|
{
|
Update to the current version of netmap.
Mostly bugfixes or features developed in the past 6 months,
so this is a 10.1 candidate.
Basically no user API changes (some bugfixes in sys/net/netmap_user.h).
In detail:
1. netmap support for virtio-net, including in netmap mode.
Under bhyve and with a netmap backend [2] we reach over 1Mpps
with standard APIs (e.g. libpcap), and 5-8 Mpps in netmap mode.
2. (kernel) add support for multiple memory allocators, so we can
better partition physical and virtual interfaces giving access
to separate users. The most visible effect is one additional
argument to the various kernel functions to compute buffer
addresses. All netmap-supported drivers are affected, but changes
are mechanical and trivial
3. (kernel) simplify the prototype for *txsync() and *rxsync()
driver methods. All netmap drivers affected, changes mostly mechanical.
4. add support for netmap-monitor ports. Think of it as a mirroring
port on a physical switch: a netmap monitor port replicates traffic
present on the main port. Restrictions apply. Drive carefully.
5. if_lem.c: support for various paravirtualization features,
experimental and disabled by default.
Most of these are described in our ANCS'13 paper [1].
Paravirtualized support in netmap mode is new, and beats the
numbers in the paper by a large factor (under qemu-kvm,
we measured gues-host throughput up to 10-12 Mpps).
A lot of refactoring and additional documentation in the files
in sys/dev/netmap, but apart from #2 and #3 above, almost nothing
of this stuff is visible to other kernel parts.
Example programs in tools/tools/netmap have been updated with bugfixes
and to support more of the existing features.
This is meant to go into 10.1 so we plan an MFC before the Aug.22 deadline.
A lot of this code has been contributed by my colleagues at UNIPI,
including Giuseppe Lettieri, Vincenzo Maffione, Stefano Garzarella.
MFC after: 3 days.
2014-08-16 15:00:01 +00:00
|
|
|
NMA_LOCK(nmd);
|
|
|
|
netmap_mem_unmap(&nmd->pools[NETMAP_BUF_POOL], na);
|
2015-05-15 15:36:57 +00:00
|
|
|
if (nmd->refcount == 1) {
|
|
|
|
u_int i;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Reset the allocator when it falls out of use so that any
|
|
|
|
* pool resources leaked by unclean application exits are
|
|
|
|
* reclaimed.
|
|
|
|
*/
|
|
|
|
for (i = 0; i < NETMAP_POOLS_NR; i++) {
|
|
|
|
struct netmap_obj_pool *p;
|
|
|
|
u_int j;
|
|
|
|
|
|
|
|
p = &nmd->pools[i];
|
|
|
|
p->objfree = p->objtotal;
|
|
|
|
/*
|
|
|
|
* Reproduce the net effect of the M_ZERO malloc()
|
|
|
|
* and marking of free entries in the bitmap that
|
|
|
|
* occur in finalize_obj_allocator()
|
|
|
|
*/
|
|
|
|
memset(p->bitmap,
|
|
|
|
'\0',
|
|
|
|
sizeof(uint32_t) * ((p->objtotal + 31) / 32));
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Set all the bits in the bitmap that have
|
|
|
|
* corresponding buffers to 1 to indicate they are
|
|
|
|
* free.
|
|
|
|
*/
|
|
|
|
for (j = 0; j < p->objtotal; j++) {
|
|
|
|
if (p->lut[j].vaddr != NULL) {
|
|
|
|
p->bitmap[ (j>>5) ] |= ( 1 << (j & 31) );
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Per netmap_mem_finalize_all(),
|
|
|
|
* buffers 0 and 1 are reserved
|
|
|
|
*/
|
|
|
|
nmd->pools[NETMAP_BUF_POOL].objfree -= 2;
|
|
|
|
nmd->pools[NETMAP_BUF_POOL].bitmap[0] = ~3;
|
|
|
|
}
|
|
|
|
nmd->deref(nmd);
|
Update to the current version of netmap.
Mostly bugfixes or features developed in the past 6 months,
so this is a 10.1 candidate.
Basically no user API changes (some bugfixes in sys/net/netmap_user.h).
In detail:
1. netmap support for virtio-net, including in netmap mode.
Under bhyve and with a netmap backend [2] we reach over 1Mpps
with standard APIs (e.g. libpcap), and 5-8 Mpps in netmap mode.
2. (kernel) add support for multiple memory allocators, so we can
better partition physical and virtual interfaces giving access
to separate users. The most visible effect is one additional
argument to the various kernel functions to compute buffer
addresses. All netmap-supported drivers are affected, but changes
are mechanical and trivial
3. (kernel) simplify the prototype for *txsync() and *rxsync()
driver methods. All netmap drivers affected, changes mostly mechanical.
4. add support for netmap-monitor ports. Think of it as a mirroring
port on a physical switch: a netmap monitor port replicates traffic
present on the main port. Restrictions apply. Drive carefully.
5. if_lem.c: support for various paravirtualization features,
experimental and disabled by default.
Most of these are described in our ANCS'13 paper [1].
Paravirtualized support in netmap mode is new, and beats the
numbers in the paper by a large factor (under qemu-kvm,
we measured gues-host throughput up to 10-12 Mpps).
A lot of refactoring and additional documentation in the files
in sys/dev/netmap, but apart from #2 and #3 above, almost nothing
of this stuff is visible to other kernel parts.
Example programs in tools/tools/netmap have been updated with bugfixes
and to support more of the existing features.
This is meant to go into 10.1 so we plan an MFC before the Aug.22 deadline.
A lot of this code has been contributed by my colleagues at UNIPI,
including Giuseppe Lettieri, Vincenzo Maffione, Stefano Garzarella.
MFC after: 3 days.
2014-08-16 15:00:01 +00:00
|
|
|
NMA_UNLOCK(nmd);
|
2012-04-13 16:32:33 +00:00
|
|
|
}
|