121 Commits

Author SHA1 Message Date
luigi
cc7b7b78d7 two minor changes from the master netmap version:
1. handle errors from nm_config(), if any (none of the FreeBSD drivers
   currently returns an error on this function, so this change
   is a no-op at this time
2. use a full memory barrier on ioctls
2015-02-14 19:03:11 +00:00
luigi
25a9544367 whitespace change:
clarify the role of MAKEDEV_ETERNAL_KLD, and remove an old
#ifdef __FreeBSD__ since the code is valid on all platforms.
2015-02-14 18:59:31 +00:00
adrian
6132573ef1 Change the permissions from 0660 to 0600.
Otherwise people in wheel can do things with netmap, including
but not limited to promisc transmit/receive.

Approved by:	luigi
MFC after:	1 week
2015-01-24 19:49:27 +00:00
hselasky
12fec3618b Start process of removing the use of the deprecated "M_FLOWID" flag
from the FreeBSD network code. The flag is still kept around in the
"sys/mbuf.h" header file, but does no longer have any users. Instead
the "m_pkthdr.rsstype" field in the mbuf structure is now used to
decide the meaning of the "m_pkthdr.flowid" field. To modify the
"m_pkthdr.rsstype" field please use the existing "M_HASHTYPE_XXX"
macros as defined in the "sys/mbuf.h" header file.

This patch introduces new behaviour in the transmit direction.
Previously network drivers checked if "M_FLOWID" was set in "m_flags"
before using the "m_pkthdr.flowid" field. This check has now now been
replaced by checking if "M_HASHTYPE_GET(m)" is different from
"M_HASHTYPE_NONE". In the future more hashtypes will be added, for
example hashtypes for hardware dedicated flows.

"M_HASHTYPE_OPAQUE" indicates that the "m_pkthdr.flowid" value is
valid and has no particular type. This change removes the need for an
"if" statement in TCP transmit code checking for the presence of a
valid flowid value. The "if" statement mentioned above is now a direct
variable assignment which is then later checked by the respective
network drivers like before.

Additional notes:
- The SCTP code changes will be committed as a separate patch.
- Removal of the "M_FLOWID" flag will also be done separately.
- The FreeBSD version has been bumped.

MFC after:	1 month
Sponsored by:	Mellanox Technologies
2014-12-01 11:45:24 +00:00
luigi
2470d86c17 add support for private knote lock (reduces lock contention),
adapting OS_selrecord accordingly.
Problem and fix suggested by adrian and jmg
2014-11-13 00:40:34 +00:00
luigi
605bfa5bb9 we need full barriers here 2014-11-13 00:14:25 +00:00
luigi
42e544acc7 in the Linux section, properly define the NMG_LOCK type.
Also import WITH_GENERIC in preparation to adding fine-grained
options to disable specific netmap components.
2014-11-11 00:13:28 +00:00
luigi
1db47d8b31 - fix typo: use ring size from the rx ring, not the tx one (they should be
the same, but just in case);
- reuse the previously computed len-1 value
2014-11-11 00:10:44 +00:00
luigi
3600d5f7ad fix a typo 2014-11-10 21:00:23 +00:00
luigi
820fd43f72 initialize *color if passed as an argument 2014-11-10 20:25:33 +00:00
luigi
301aa31f64 sync a comment with our internal repo 2014-11-10 20:19:58 +00:00
luigi
b8be8bfdc8 fix a panic when passing ifioctl from a netmap file descriptor to
the underlying device. This needs to be merged to 10.1

Reported by: Patrick Kelsey
MFC after:	3 days
2014-09-25 16:22:32 +00:00
luigi
ca2279d44a adapt the code to different freebsd versions.
Not necessary to MFC
2014-09-25 15:57:57 +00:00
glebius
b3b337a80e Mechanically convert to if_inc_counter(). 2014-09-19 03:51:26 +00:00
glebius
3b5ede57e9 Provide pointer from struct ifnet to struct netmap_adapter,
instead of abusing spare field.
2014-08-31 11:33:19 +00:00
np
0789e26a48 Change netmap's global lock to sx instead of a mutex.
Reviewed by:	luigi@
MFC after:	1 day
2014-08-20 23:37:44 +00:00
luigi
a9e3b46b61 staticize two functions, and use proper format for a struct sglist
(reported by bz)
2014-08-17 10:25:27 +00:00
luigi
3ab69a246b Update to the current version of netmap.
Mostly bugfixes or features developed in the past 6 months,
so this is a 10.1 candidate.

Basically no user API changes (some bugfixes in sys/net/netmap_user.h).

In detail:

1. netmap support for virtio-net, including in netmap mode.
  Under bhyve and with a netmap backend [2] we reach over 1Mpps
  with standard APIs (e.g. libpcap), and 5-8 Mpps in netmap mode.

2. (kernel) add support for multiple memory allocators, so we can
  better partition physical and virtual interfaces giving access
  to separate users. The most visible effect is one additional
  argument to the various kernel functions to compute buffer
  addresses. All netmap-supported drivers are affected, but changes
  are mechanical and trivial

3. (kernel) simplify the prototype for *txsync() and *rxsync()
  driver methods. All netmap drivers affected, changes mostly mechanical.

4. add support for netmap-monitor ports. Think of it as a mirroring
  port on a physical switch: a netmap monitor port replicates traffic
  present on the main port. Restrictions apply. Drive carefully.

5. if_lem.c: support for various paravirtualization features,
  experimental and disabled by default.
  Most of these are described in our ANCS'13 paper [1].
  Paravirtualized support in netmap mode is new, and beats the
  numbers in the paper by a large factor (under qemu-kvm,
  we measured gues-host throughput up to 10-12 Mpps).

A lot of refactoring and additional documentation in the files
in sys/dev/netmap, but apart from #2 and #3 above, almost nothing
of this stuff is visible to other kernel parts.

Example programs in tools/tools/netmap have been updated with bugfixes
and to support more of the existing features.

This is meant to go into 10.1 so we plan an MFC before the Aug.22 deadline.

A lot of this code has been contributed by my colleagues at UNIPI,
including Giuseppe Lettieri, Vincenzo Maffione, Stefano Garzarella.

MFC after:	3 days.
2014-08-16 15:00:01 +00:00
glebius
75acbf068a Fix style bug: rename the refcount field of m_ext to ext_cnt, to match
other members.

Sponsored by:	Nginx, Inc.
2014-07-11 14:34:29 +00:00
luigi
f598f1250c change the netmap mbuf destructor so the same code works also on FreeBSD 9.
For head and 10 this change has no effect, but on stable/9 it would cause
panics when using emulated netmap on top of a standard device driver.
2014-06-10 16:06:59 +00:00
luigi
25f232081c Fixes from Fanco Ficthner on transparent mode
* The way rings are updated changed with the last API bump.
  Also sync ->head when moving slots in netmap_sw_to_nic().

* Remove a crashing selrecord() call.

* Unclog the logic surrounding netmap_rxsync_from_host().

* Add timestamping to RX host ring.

* Remove a couple of obsolete comments.

Submitted by:	Franco Fichtner
MFC after:	3 days
Sponsored by:	Packetwerk
2014-06-09 15:46:11 +00:00
luigi
a8101b601a sync the code with the one in stable/10
(wrap the if_t compatibilty function into a __FreeBSD_version
conditional block)
2014-06-09 15:44:31 +00:00
luigi
1971caf17c better handling of netmap emulation over standard device drivers:
plug a potential mbuf leak, and detect bogus drivers that
return ENOBUFS even when the packet has been queued.

MFC after:	3 days
2014-06-06 18:36:02 +00:00
luigi
c55588c12b introduce mbq_lock() and mbq_unlock() for the mbq,
so it is easier to buil the same code on linux
(this generalizes the change in svn 267142)

MFC after:	3 days
2014-06-06 18:02:32 +00:00
luigi
c5e9447e90 move netmap_getna() to a freebsd-specific file 2014-06-06 16:23:08 +00:00
luigi
22e9dc725d align comments with the ones in our development trunk 2014-06-06 14:58:25 +00:00
luigi
fba7963ed5 rate limit some error messages 2014-06-06 14:57:40 +00:00
luigi
36eb85a508 remove two debugging messages, align comments with the code
in our development trunk
2014-06-06 14:57:16 +00:00
luigi
2bd9920d99 add checks for invalid buffer pointers and lengths 2014-06-06 10:50:14 +00:00
luigi
797957e4e5 prevent a panic when the netdev/ifp is not set in attach
(internal  c63a7b85)

MFC after:	3 days
2014-06-06 10:40:20 +00:00
zont
e062f6dc12 Use mtx_lock_spin/mtx_unlock_spin primitives on spin lock
Reviewed by:	luigi
MFC after:	1 week
2014-06-06 00:24:04 +00:00
luigi
359dad8e6c whitespace change: remove trailing whitespace 2014-06-05 21:12:41 +00:00
marcel
916c7006f5 Introduce a procedural interface to the ifnet structure. The new
interface allows the ifnet structure to be defined as an opaque
type in NIC drivers.  This then allows the ifnet structure to be
changed without a need to change or recompile NIC drivers.

Put differently, NIC drivers can be written and compiled once and
be used with different network stack implementations, provided of
course that those network stack implementations have an API and
ABI compatible interface.

This commit introduces the 'if_t' type to replace 'struct ifnet *'
as the type of a network interface. The 'if_t' type is defined as
'void *' to enable the compiler to perform type conversion to
'struct ifnet *' and vice versa where needed and without warnings.
The functions that implement the API are the only functions that
need to have an explicit cast.

The MII code has been converted to use the driver API to avoid
unnecessary code churn. Code churn comes from having to work with
both converted and unconverted drivers in correlation with having
callback functions that take an interface. By converting the MII
code first, the callback functions can be defined so that the
compiler will perform the typecasts automatically.

As soon as all drivers have been converted, the if_t type can be
redefined as needed and the API functions can be fix to not need
an explicit cast.

The immediate benefactors of this change are:
1.  Juniper Networks - The network stack implementation in Junos
    is entirely different from FreeBSD's one and this change
    allows Juniper to build "stock" NIC drivers that can be used
    in combination with both the FreeBSD and Junos stacks.
2.  FreeBSD - This change opens the door towards changing ifnet
    and implementing new features and optimizations in the network
    stack without it requiring a change in the many NIC drivers
    FreeBSD has.

Submitted by:	Anuranjan Shukla <anshukla@juniper.net>
Reviewed by:	glebius@
Obtained from:	Juniper Networks, Inc.
2014-06-02 17:54:39 +00:00
luigi
bc86850d45 compile with NOINET 2014-02-20 04:56:55 +00:00
luigi
c2bafade93 two small changes:
- intercept FIONBIO and FIOASYNC ioctls on netmap file descriptors.
  libpcap calls them to set non blocking I/O on the file descriptor,
  for netmap this is a no-op because there is no read/write,
  but not intercepting would cause fcntl() to return -1
- rate limit and put under netmap.verbose some messages that occur
  when threads use concurrently the same file descriptor.
2014-02-18 04:27:41 +00:00
luigi
51f5fa46d7 This new version of netmap brings you the following:
- netmap pipes, providing bidirectional blocking I/O while moving
  100+ Mpps between processes using shared memory channels
  (no mistake: over one hundred million. But mind you, i said
  *moving* not *processing*);

- kqueue support (BHyVe needs it);

- improved user library. Just the interface name lets you select a NIC,
  host port, VALE switch port, netmap pipe, and individual queues.
  The upcoming netmap-enabled libpcap will use this feature.

- optional extra buffers associated to netmap ports, for applications
  that need to buffer data yet don't want to make copies.

- segmentation offloading for the VALE switch, useful between VMs.

and a number of bug fixes and performance improvements.

My colleagues Giuseppe Lettieri and Vincenzo Maffione did a substantial
amount of work on these features so we owe them a big thanks.

There are some external repositories that can be of interest:

    https://code.google.com/p/netmap
        our public repository for netmap/VALE code, including
        linux versions and other stuff that does not belong here,
        such as python bindings.

    https://code.google.com/p/netmap-libpcap
        a clone of the libpcap repository with netmap support.
	With this any libpcap client has access to most netmap
	feature with no recompilation. E.g. tcpdump can filter
	packets at 10-15 Mpps.

    https://code.google.com/p/netmap-ipfw
        a userspace version of ipfw+dummynet which uses netmap
        to send/receive packets. Speed is up in the 7-10 Mpps
        range per core for simple rulesets.

Both netmap-libpcap and netmap-ipfw will be merged upstream at some
point, but while this happens it is useful to have access to them.

And yes, this code will be merged soon. It is infinitely better
than the version currently in 10 and 9.

MFC after:	3 days
2014-02-15 04:53:04 +00:00
luigi
f11710f126 netmap_user.h:
add separate rx/tx ring indexes
   add ring specifier in nm_open device name

netmap.c, netmap_vale.c
   more consistent errno numbers

netmap_generic.c
   correctly handle failure in registering interfaces.

tools/tools/netmap/
   massive cleanup of the example programs
   (a lot of common code is now in netmap_user.h.)

nm_util.[ch] are going away soon.
pcap.c will also go when i commit the native netmap support for libpcap.
2014-01-16 00:20:42 +00:00
luigi
fc0f950d52 Fix netmap emulation when NICs attached to a VALE switch have a different
number of tx and rx rings

Submitted by:	Vincenzo Maffione
2014-01-10 16:01:44 +00:00
luigi
634487fef4 sync with our internal repo - small change in debugging messages 2014-01-10 16:00:27 +00:00
glebius
1cebfc36ae Fix build with VIMAGE. 2014-01-09 00:59:03 +00:00
luigi
07f442b39d fix use after free when releasing a netmap adapter.
Submitted by:	Giuseppe Lettieri
2014-01-07 21:14:28 +00:00
luigi
41068e3dad It is 2014 and we have a new version of netmap.
Most relevant features:

- netmap emulation on any NIC, even those without native netmap support.

  On the ixgbe we have measured about 4Mpps/core/queue in this mode,
  which is still a lot more than with sockets/bpf.

- seamless interconnection of VALE switch, NICs and host stack.

  If you disable accelerations on your NIC (say em0)

        ifconfig em0 -txcsum -txcsum

  you can use the VALE switch to connect the NIC and the host stack:

        vale-ctl -h valeXX:em0

  allowing sharing the NIC with other netmap clients.

- THE USER API HAS SLIGHTLY CHANGED (head/cur/tail pointers
  instead of pointers/count as before). This was unavoidable to support,
  in the future, multiple threads operating on the same rings.
  Netmap clients require very small source code changes to compile again.
      On the plus side, the new API should be easier to understand
  and the internals are a lot simpler.

The manual page has been updated extensively to reflect the current
features and give some examples.

This is the result of work of several people including Giuseppe Lettieri,
Vincenzo Maffione, Michio Honda and myself, and has been financially
supported by EU projects CHANGE and OPENLAB, from NetApp University
Research Fund, NEC, and of course the Universita` di Pisa.
2014-01-06 12:53:15 +00:00
glebius
226d58924f Fix build. 2013-12-18 04:36:35 +00:00
luigi
fde32e1ad2 fix the build using __builtin_prefetch() instead of redefining prefetch() 2013-12-16 23:57:43 +00:00
luigi
eb4897aa4a split netmap code according to functions:
- netmap.c		base code
- netmap_freebsd.c	FreeBSD-specific code
- netmap_generic.c	emulate netmap over standard drivers
- netmap_mbq.c		simple mbuf tailq
- netmap_mem2.c		memory management
- netmap_vale.c		VALE switch

simplify devce-specific code
2013-12-15 08:37:24 +00:00
luigi
76003cfce0 remove a debugging message 2013-11-06 19:18:39 +00:00
luigi
b960d67ff3 remove some test code. 2013-11-05 01:06:22 +00:00
luigi
b8dcf5f297 fix a bug when a device has 1 tx (or rx) queue and more than
one queue of a different type.

Submitted by:	Vincenzo Maffione
MFC after:	3 days
2013-11-05 00:56:07 +00:00
luigi
dbdf2cf58b check errors on return from netmap_attach()
Submitted by:	Giuseppe Lettieri
MFC after:	3 days
2013-11-05 00:50:59 +00:00
luigi
ed9032c329 circumvent a couple of warnings:
- on line 2550 intentionally overriding a const qualifier
- on line 3219 intentionally converting uint64_t to a pointer
2013-11-02 18:03:21 +00:00