89 Commits

Author SHA1 Message Date
vmaffione
ef731a36ec netmap: fix knote() argument to match the mutex state
The nm_os_selwakeup function needs to call knote() to wake up kqueue(9)
users. However, this function can be called from different code paths,
with different lock requirements.
This patch fixes the knote() call argument to match the relavant lock state.
Also, comments have been updated to reflect current code.

PR:	https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=219846
Reported by:	Aleksandr Fedorov <aleksandr.fedorov@itglobal.com>
Reviewed by:	markj
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D18876
2019-01-23 14:21:23 +00:00
vmaffione
fa99a1efa8 netmap: fix txsync check in netmap poll
To check if txsync can be skipped, it is necessary to look for
unseen TX space. However, this means comparing ring->cur
against ring->tail, rather than ring->head against ring->tail
(like nm_ring_empty() does).
This change also adds some more comments to explain the optimization
performed at the beginning of netmap_poll().

MFC after:	3 days
Sponsored by:	Sunny Valley Networks
2018-12-22 16:23:42 +00:00
vmaffione
058483a0c3 netmap: fix bug in netmap_poll() optimization
The bug was introduced by r339639, although it is present in the upstream
netmap code since 2015. It is due to resetting the want_rx variable to
POLLIN, rather than resetting it to POLLIN|POLLRDNORM.
It only affects select(), which uses POLLRDNORM. poll() is not affected,
because it uses POLLIN.
Also, it only affects FreeBSD, because Linux skips the optimization
implemented by the piece of code where the bug occurs.

MFC after:	3 days
Sponsored by:	Sunny Valley Networks
2018-12-22 15:15:45 +00:00
vmaffione
ed4f78efee netmap: move buf_size validation code to its own function
This code validates the netmap buf_size against the interface MTU
and maximum descriptor size, to make sure the values are consistent.
Moving this functionality to its own function is needed because this
function is also called by Linux-specific code.

MFC after:	3 days
2018-12-21 11:50:14 +00:00
vmaffione
4a965dbfd6 netmap: netmap_transmit should honor bpf packet tap hook
This allows tcpdump to capture outbound kernel packets while
in netmap mode

Submitted by:	Marc de la Gueronniere <mdelagueronniere@verisign.com>
Reviewed by:	vmaffione
MFC after:	1 week
Sponsored by:	Verisign, Inc.
Differential Revision:	https://reviews.freebsd.org/D17896
2018-12-06 09:45:25 +00:00
vmaffione
9899d78b5d netmap: align codebase to the current upstream (760279cfb2730a585)
Changelist:
  - Replace netmap passthrough host support with a more general
    mechanism to call TXSYNC/RXSYNC from an in-kernel event-loop.
    No kernel threads are used to use this feature: the application
    is required to spawn a thread (or a process) and issue a
    SYNC_KLOOP_START (NIOCCTRL) command in the thread body. The
    kernel loop is executed by the ioctl implementation, which returns
    to userspace only when a different thread calls SYNC_KLOOP_STOP
    or the netmap file descriptor is closed.
  - Update the if_ptnet driver to cope with the new data structures,
    and prune all the obsolete ptnetmap code.
  - Add support for "null" netmap ports, useful to allocate netmap_if,
    netmap_ring and netmap buffers to be used by specialized applications
    (e.g. hypervisors). TXSYNC/RXSYNC on these ports have no effect.
  - Various fixes and code refactoring.

Sponsored by:	Sunny Valley Networks
Differential Revision:	https://reviews.freebsd.org/D18015
2018-12-05 11:57:16 +00:00
vmaffione
7b9456a050 netmap: align codebase to the current upstream (sha 8374e1a7e6941)
Changelist:
    - Move large parts of VALE code to a new file and header netmap_bdg.[ch].
      This is useful to reuse the code within upcoming projects.
    - Improvements and bug fixes to pipes and monitors.
    - Introduce nm_os_onattach(), nm_os_onenter() and nm_os_onexit() to
      handle differences between FreeBSD and Linux.
    - Introduce some new helper functions to handle more host rings and fake
      rings (netmap_all_rings(), netmap_real_rings(), ...)
    - Added new sysctl to enable/disable hw checksum in emulated netmap mode.
    - nm_inject: add support for NS_MOREFRAG

Approved by:	gnn (mentor)
Differential Revision:	https://reviews.freebsd.org/D17364
2018-10-23 08:55:16 +00:00
mmacy
98af9f469b netmap: pull fix for 32-bit support from upstream
Approved by:	sbruno
2018-05-18 03:38:17 +00:00
vmaffione
3c7434c730 netmap: align codebase to the current upstream (commit id 3fb001303718146)
Changelist:
    - Turn tx_rings and rx_rings arrays into arrays of pointers to kring
      structs. This patch includes fixes for ixv, ixl, ix, re, cxgbe, iflib,
      vtnet and ptnet drivers to cope with the change.
    - Generalize the nm_config() callback to accept a struct containing many
      parameters.
    - Introduce NKR_FAKERING to support buffers sharing (used for netmap
      pipes)
    - Improved API for external VALE modules.
    - Various bug fixes and improvements to the netmap memory allocator,
      including support for externally (userspace) allocated memory.
    - Refactoring of netmap pipes: now linked rings share the same netmap
      buffers, with a separate set of kring pointers (rhead, rcur, rtail).
      Buffer swapping does not need to happen anymore.
    - Large refactoring of the control API towards an extensible solution;
      the goal is to allow the addition of more commands and extension of
      existing ones (with new options) without the need of hacks or the
      risk of running out of configuration space.
      A new NIOCCTRL ioctl has been added to handle all the requests of the
      new control API, which cover all the functionalities so far supported.
      The netmap API bumps from 11 to 12 with this patch. Full backward
      compatibility is provided for the old control command (NIOCREGIF), by
      means of a new netmap_legacy module. Many parts of the old netmap.h
      header has now been moved to netmap_legacy.h (included by netmap.h).

Approved by:	hrs (mentor)
2018-04-12 07:20:50 +00:00
vmaffione
8b391e44ef netmap: align codebase to upstream version v11.4
Changelist:
  - remove unused nkr_slot_flags
  - new nm_intr adapter callback to enable/disable interrupts
  - remove unused sysctls and document the other sysctls
  - new infrastructure to support NS_MOREFRAG for NIC ports
  - support for external memory allocator (for now linux-only),
    including linux-specific changes in common headers
  - optimizations within netmap pipes datapath
  - improvements on VALE control API
  - new nm_parse() helper function in netmap_user.h
  - various bug fixes and code clean up

Approved by:	hrs (mentor)
2018-04-09 09:24:26 +00:00
pfg
1537078d8f sys/dev: further adoption of SPDX licensing ID tags.
Mainly focus on files that use BSD 2-Clause license, however the tool I
was using misidentified many licenses so this was mostly a manual - error
prone - task.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.
2017-11-27 14:52:40 +00:00
loos
18c5abc0bf Update the current version of netmap to bring it in sync with the github
version.

This commit contains mostly refactoring, a few fixes and minor added
functionality.

Submitted by:	Vincenzo Maffione <v.maffione at gmail.com>
Requested by:	many
Sponsored by:	Rubicon Communications, LLC (Netgate)
2017-06-12 22:53:18 +00:00
luigi
eff8c9eb56 Various fixes for ptnet/ptnetmap (passthrough of netmap ports). In detail:
- use PCI_VENDOR and PCI_DEVICE ids from a publicly allocated range
  (thanks to RedHat)
- export memory pool information through PCI registers
- improve mechanism for configuring passthrough on different hypervisors
Code is from Vincenzo Maffione as a follow up to his GSOC work.
2016-10-27 09:46:22 +00:00
luigi
9750eb8786 remove stale and unused code from various files
fix build on 32 bit platforms
simplify logic in netmap_virt.h

The commands (in net/netmap.h) to configure communication with the
hypervisor may be revised soon.
At the moment they are unused so this will not be a change of API.
2016-10-18 16:18:25 +00:00
luigi
cdb805690c Import the current version of netmap, aligned with the one on github.
This commit, long overdue, contains contributions in the last 2 years
from Stefano Garzarella, Giuseppe Lettieri, Vincenzo Maffione, including:
+ fixes on monitor ports
+ the 'ptnet' virtual device driver, and ptnetmap backend, for
  high speed virtual passthrough on VMs (bhyve fixes in an upcoming commit)
+ improved emulated netmap mode
+ more robust error handling
+ removal of stale code
+ various fixes to code and documentation (some mixup between RX and TX
  parameters, and private and public variables)

We also include an additional tool, nmreplay, which is functionally
equivalent to tcpreplay but operating on netmap ports.
2016-10-16 14:13:32 +00:00
eadler
156fd4834a Don't repeat the the word 'the'
(one manual change to fix grammar)

Confirmed With: db
Approved by: secteam (not really, but this is a comment typo fix)
2016-05-17 12:52:31 +00:00
pfg
eed4bd22ad sys/dev: minor spelling fixes.
Most affect comments, very few have user-visible effects.
2016-05-03 03:41:25 +00:00
kevlo
866a3cb2f1 Fix typo (s/harware/hardware/) 2015-12-25 14:51:36 +00:00
adrian
c5bfe674df Don't call enable_all_rings if the adapter has been freed.
This is a subtle use-after-free race that results in some very undesirable
hang behaviour.

Reviewed by:	pkelsey
Obtained from:	Kip Macy, NextBSD (91a9bd1dbb)
2015-09-07 23:16:39 +00:00
luigi
1e6fb0ec09 add a use count so the netmap module cannot be unloaded while in use. 2015-07-19 18:07:25 +00:00
luigi
65802f0a4f staticize functions only used in netmap.c
(detected by jenkins run with gcc 4.9)

Update documentation on the use of netmap_priv_d,
rename the refcount and use the same structure in
FreeBSD and linux

No functional changes.
2015-07-10 16:05:24 +00:00
luigi
c354cad8fd Sync netmap sources with the version in our private tree.
This commit contains large contributions from Giuseppe Lettieri and
Stefano Garzarella, is partly supported by grants from Verisign and Cisco,
and brings in the following:

- fix zerocopy monitor ports and introduce copying monitor ports
  (the latter are lower performance but give access to all traffic
  in parallel with the application)

- exclusive open mode, useful to implement solutions that recover
  from crashes of the main netmap client (suggested by Patrick Kelsey)

- revised memory allocator in preparation for the 'passthrough mode'
  (ptnetmap) recently presented at bsdcan. ptnetmap is described in
        S. Garzarella, G. Lettieri, L. Rizzo;
        Virtual device passthrough for high speed VM networking,
        ACM/IEEE ANCS 2015, Oakland (CA) May 2015
        http://info.iet.unipi.it/~luigi/research.html

- fix rx CRC handing on ixl

- add module dependencies for netmap when building drivers as modules

- minor simplifications to device-specific routines (*txsync, *rxsync)

- general code cleanup (remove unused variables, introduce macros
  to access rings and remove duplicate code,

Applications do not need to be recompiled, unless of course
they want to use the new features (monitors and exclusive open).

Those willing to try this code on stable/10 can just update the
sys/dev/netmap/*, sys/net/netmap* with the version in HEAD
and apply the small patches to individual device drivers.

MFC after:	1 month
Sponsored by:	(partly) Verisign, Cisco
2015-07-10 05:51:36 +00:00
rpaulo
692e8bf361 netmap: improve the netmap attach message on FreeBSD.
MFC after:	1 week
2015-04-11 06:20:46 +00:00
luigi
cc7b7b78d7 two minor changes from the master netmap version:
1. handle errors from nm_config(), if any (none of the FreeBSD drivers
   currently returns an error on this function, so this change
   is a no-op at this time
2. use a full memory barrier on ioctls
2015-02-14 19:03:11 +00:00
luigi
25a9544367 whitespace change:
clarify the role of MAKEDEV_ETERNAL_KLD, and remove an old
#ifdef __FreeBSD__ since the code is valid on all platforms.
2015-02-14 18:59:31 +00:00
adrian
6132573ef1 Change the permissions from 0660 to 0600.
Otherwise people in wheel can do things with netmap, including
but not limited to promisc transmit/receive.

Approved by:	luigi
MFC after:	1 week
2015-01-24 19:49:27 +00:00
luigi
2470d86c17 add support for private knote lock (reduces lock contention),
adapting OS_selrecord accordingly.
Problem and fix suggested by adrian and jmg
2014-11-13 00:40:34 +00:00
luigi
b8be8bfdc8 fix a panic when passing ifioctl from a netmap file descriptor to
the underlying device. This needs to be merged to 10.1

Reported by: Patrick Kelsey
MFC after:	3 days
2014-09-25 16:22:32 +00:00
luigi
3ab69a246b Update to the current version of netmap.
Mostly bugfixes or features developed in the past 6 months,
so this is a 10.1 candidate.

Basically no user API changes (some bugfixes in sys/net/netmap_user.h).

In detail:

1. netmap support for virtio-net, including in netmap mode.
  Under bhyve and with a netmap backend [2] we reach over 1Mpps
  with standard APIs (e.g. libpcap), and 5-8 Mpps in netmap mode.

2. (kernel) add support for multiple memory allocators, so we can
  better partition physical and virtual interfaces giving access
  to separate users. The most visible effect is one additional
  argument to the various kernel functions to compute buffer
  addresses. All netmap-supported drivers are affected, but changes
  are mechanical and trivial

3. (kernel) simplify the prototype for *txsync() and *rxsync()
  driver methods. All netmap drivers affected, changes mostly mechanical.

4. add support for netmap-monitor ports. Think of it as a mirroring
  port on a physical switch: a netmap monitor port replicates traffic
  present on the main port. Restrictions apply. Drive carefully.

5. if_lem.c: support for various paravirtualization features,
  experimental and disabled by default.
  Most of these are described in our ANCS'13 paper [1].
  Paravirtualized support in netmap mode is new, and beats the
  numbers in the paper by a large factor (under qemu-kvm,
  we measured gues-host throughput up to 10-12 Mpps).

A lot of refactoring and additional documentation in the files
in sys/dev/netmap, but apart from #2 and #3 above, almost nothing
of this stuff is visible to other kernel parts.

Example programs in tools/tools/netmap have been updated with bugfixes
and to support more of the existing features.

This is meant to go into 10.1 so we plan an MFC before the Aug.22 deadline.

A lot of this code has been contributed by my colleagues at UNIPI,
including Giuseppe Lettieri, Vincenzo Maffione, Stefano Garzarella.

MFC after:	3 days.
2014-08-16 15:00:01 +00:00
luigi
25f232081c Fixes from Fanco Ficthner on transparent mode
* The way rings are updated changed with the last API bump.
  Also sync ->head when moving slots in netmap_sw_to_nic().

* Remove a crashing selrecord() call.

* Unclog the logic surrounding netmap_rxsync_from_host().

* Add timestamping to RX host ring.

* Remove a couple of obsolete comments.

Submitted by:	Franco Fichtner
MFC after:	3 days
Sponsored by:	Packetwerk
2014-06-09 15:46:11 +00:00
luigi
c55588c12b introduce mbq_lock() and mbq_unlock() for the mbq,
so it is easier to buil the same code on linux
(this generalizes the change in svn 267142)

MFC after:	3 days
2014-06-06 18:02:32 +00:00
luigi
22e9dc725d align comments with the ones in our development trunk 2014-06-06 14:58:25 +00:00
luigi
797957e4e5 prevent a panic when the netdev/ifp is not set in attach
(internal  c63a7b85)

MFC after:	3 days
2014-06-06 10:40:20 +00:00
zont
e062f6dc12 Use mtx_lock_spin/mtx_unlock_spin primitives on spin lock
Reviewed by:	luigi
MFC after:	1 week
2014-06-06 00:24:04 +00:00
luigi
359dad8e6c whitespace change: remove trailing whitespace 2014-06-05 21:12:41 +00:00
luigi
c2bafade93 two small changes:
- intercept FIONBIO and FIOASYNC ioctls on netmap file descriptors.
  libpcap calls them to set non blocking I/O on the file descriptor,
  for netmap this is a no-op because there is no read/write,
  but not intercepting would cause fcntl() to return -1
- rate limit and put under netmap.verbose some messages that occur
  when threads use concurrently the same file descriptor.
2014-02-18 04:27:41 +00:00
luigi
51f5fa46d7 This new version of netmap brings you the following:
- netmap pipes, providing bidirectional blocking I/O while moving
  100+ Mpps between processes using shared memory channels
  (no mistake: over one hundred million. But mind you, i said
  *moving* not *processing*);

- kqueue support (BHyVe needs it);

- improved user library. Just the interface name lets you select a NIC,
  host port, VALE switch port, netmap pipe, and individual queues.
  The upcoming netmap-enabled libpcap will use this feature.

- optional extra buffers associated to netmap ports, for applications
  that need to buffer data yet don't want to make copies.

- segmentation offloading for the VALE switch, useful between VMs.

and a number of bug fixes and performance improvements.

My colleagues Giuseppe Lettieri and Vincenzo Maffione did a substantial
amount of work on these features so we owe them a big thanks.

There are some external repositories that can be of interest:

    https://code.google.com/p/netmap
        our public repository for netmap/VALE code, including
        linux versions and other stuff that does not belong here,
        such as python bindings.

    https://code.google.com/p/netmap-libpcap
        a clone of the libpcap repository with netmap support.
	With this any libpcap client has access to most netmap
	feature with no recompilation. E.g. tcpdump can filter
	packets at 10-15 Mpps.

    https://code.google.com/p/netmap-ipfw
        a userspace version of ipfw+dummynet which uses netmap
        to send/receive packets. Speed is up in the 7-10 Mpps
        range per core for simple rulesets.

Both netmap-libpcap and netmap-ipfw will be merged upstream at some
point, but while this happens it is useful to have access to them.

And yes, this code will be merged soon. It is infinitely better
than the version currently in 10 and 9.

MFC after:	3 days
2014-02-15 04:53:04 +00:00
luigi
f11710f126 netmap_user.h:
add separate rx/tx ring indexes
   add ring specifier in nm_open device name

netmap.c, netmap_vale.c
   more consistent errno numbers

netmap_generic.c
   correctly handle failure in registering interfaces.

tools/tools/netmap/
   massive cleanup of the example programs
   (a lot of common code is now in netmap_user.h.)

nm_util.[ch] are going away soon.
pcap.c will also go when i commit the native netmap support for libpcap.
2014-01-16 00:20:42 +00:00
glebius
1cebfc36ae Fix build with VIMAGE. 2014-01-09 00:59:03 +00:00
luigi
07f442b39d fix use after free when releasing a netmap adapter.
Submitted by:	Giuseppe Lettieri
2014-01-07 21:14:28 +00:00
luigi
41068e3dad It is 2014 and we have a new version of netmap.
Most relevant features:

- netmap emulation on any NIC, even those without native netmap support.

  On the ixgbe we have measured about 4Mpps/core/queue in this mode,
  which is still a lot more than with sockets/bpf.

- seamless interconnection of VALE switch, NICs and host stack.

  If you disable accelerations on your NIC (say em0)

        ifconfig em0 -txcsum -txcsum

  you can use the VALE switch to connect the NIC and the host stack:

        vale-ctl -h valeXX:em0

  allowing sharing the NIC with other netmap clients.

- THE USER API HAS SLIGHTLY CHANGED (head/cur/tail pointers
  instead of pointers/count as before). This was unavoidable to support,
  in the future, multiple threads operating on the same rings.
  Netmap clients require very small source code changes to compile again.
      On the plus side, the new API should be easier to understand
  and the internals are a lot simpler.

The manual page has been updated extensively to reflect the current
features and give some examples.

This is the result of work of several people including Giuseppe Lettieri,
Vincenzo Maffione, Michio Honda and myself, and has been financially
supported by EU projects CHANGE and OPENLAB, from NetApp University
Research Fund, NEC, and of course the Universita` di Pisa.
2014-01-06 12:53:15 +00:00
glebius
226d58924f Fix build. 2013-12-18 04:36:35 +00:00
luigi
eb4897aa4a split netmap code according to functions:
- netmap.c		base code
- netmap_freebsd.c	FreeBSD-specific code
- netmap_generic.c	emulate netmap over standard drivers
- netmap_mbq.c		simple mbuf tailq
- netmap_mem2.c		memory management
- netmap_vale.c		VALE switch

simplify devce-specific code
2013-12-15 08:37:24 +00:00
luigi
76003cfce0 remove a debugging message 2013-11-06 19:18:39 +00:00
luigi
b960d67ff3 remove some test code. 2013-11-05 01:06:22 +00:00
luigi
b8dcf5f297 fix a bug when a device has 1 tx (or rx) queue and more than
one queue of a different type.

Submitted by:	Vincenzo Maffione
MFC after:	3 days
2013-11-05 00:56:07 +00:00
luigi
dbdf2cf58b check errors on return from netmap_attach()
Submitted by:	Giuseppe Lettieri
MFC after:	3 days
2013-11-05 00:50:59 +00:00
luigi
ed9032c329 circumvent a couple of warnings:
- on line 2550 intentionally overriding a const qualifier
- on line 3219 intentionally converting uint64_t to a pointer
2013-11-02 18:03:21 +00:00
luigi
41bc3f25be update to the latest netmap snapshot.
This includes the following:
- use separate memory regions for VALE ports
- locking fixes
- some simplifications in the NIC-specific routines
- performance improvements for the VALE switch
- some new features in the pkt-gen test program
- documentation updates

There are small API changes that require programs to be recompiled
(NETMAP_API has been bumped so you will detect old binaries at runtime).

In particular:
- struct netmap_slot now is 16 bytes to support an extra pointer,
  which may save one data copy when using VALE ports or VMs;
- the struct netmap_if has two extra fields;

MFC after:	3 days
2013-11-01 21:21:14 +00:00
glebius
ff6e113f1b The r48589 promised to remove implicit inclusion of if_var.h soon. Prepare
to this event, adding if_var.h to files that do need it. Also, include
all includes that now are included due to implicit pollution via if_var.h

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2013-10-26 17:58:36 +00:00