freebsd-dev

Author	SHA1	Message	Date
Luigi Rizzo	63a3395e5d	change the netmap mbuf destructor so the same code works also on FreeBSD 9. For head and 10 this change has no effect, but on stable/9 it would cause panics when using emulated netmap on top of a standard device driver.	2014-06-10 16:06:59 +00:00
Luigi Rizzo	348c44a5be	Fixes from Fanco Ficthner on transparent mode * The way rings are updated changed with the last API bump. Also sync ->head when moving slots in netmap_sw_to_nic(). * Remove a crashing selrecord() call. * Unclog the logic surrounding netmap_rxsync_from_host(). * Add timestamping to RX host ring. * Remove a couple of obsolete comments. Submitted by: Franco Fichtner MFC after: 3 days Sponsored by: Packetwerk	2014-06-09 15:46:11 +00:00
Luigi Rizzo	46aa1303f3	sync the code with the one in stable/10 (wrap the if_t compatibilty function into a __FreeBSD_version conditional block)	2014-06-09 15:44:31 +00:00
Luigi Rizzo	e4166283fb	better handling of netmap emulation over standard device drivers: plug a potential mbuf leak, and detect bogus drivers that return ENOBUFS even when the packet has been queued. MFC after: 3 days	2014-06-06 18:36:02 +00:00
Luigi Rizzo	997b054cf1	introduce mbq_lock() and mbq_unlock() for the mbq, so it is easier to buil the same code on linux (this generalizes the change in svn 267142) MFC after: 3 days	2014-06-06 18:02:32 +00:00
Luigi Rizzo	0dc809c034	move netmap_getna() to a freebsd-specific file	2014-06-06 16:23:08 +00:00
Luigi Rizzo	89cc25561c	align comments with the ones in our development trunk	2014-06-06 14:58:25 +00:00
Luigi Rizzo	d8e1c53b15	rate limit some error messages	2014-06-06 14:57:40 +00:00
Luigi Rizzo	5899a007ae	remove two debugging messages, align comments with the code in our development trunk	2014-06-06 14:57:16 +00:00
Luigi Rizzo	e31c6ec7e2	add checks for invalid buffer pointers and lengths	2014-06-06 10:50:14 +00:00
Luigi Rizzo	441ab64f52	prevent a panic when the netdev/ifp is not set in attach (internal c63a7b85) MFC after: 3 days	2014-06-06 10:40:20 +00:00
Andrey Zonov	dc8a95e62b	Use mtx_lock_spin/mtx_unlock_spin primitives on spin lock Reviewed by: luigi MFC after: 1 week	2014-06-06 00:24:04 +00:00
Luigi Rizzo	43ed1d3c76	whitespace change: remove trailing whitespace	2014-06-05 21:12:41 +00:00
Marcel Moolenaar	62d76917b8	Introduce a procedural interface to the ifnet structure. The new interface allows the ifnet structure to be defined as an opaque type in NIC drivers. This then allows the ifnet structure to be changed without a need to change or recompile NIC drivers. Put differently, NIC drivers can be written and compiled once and be used with different network stack implementations, provided of course that those network stack implementations have an API and ABI compatible interface. This commit introduces the 'if_t' type to replace 'struct ifnet ' as the type of a network interface. The 'if_t' type is defined as 'void ' to enable the compiler to perform type conversion to 'struct ifnet *' and vice versa where needed and without warnings. The functions that implement the API are the only functions that need to have an explicit cast. The MII code has been converted to use the driver API to avoid unnecessary code churn. Code churn comes from having to work with both converted and unconverted drivers in correlation with having callback functions that take an interface. By converting the MII code first, the callback functions can be defined so that the compiler will perform the typecasts automatically. As soon as all drivers have been converted, the if_t type can be redefined as needed and the API functions can be fix to not need an explicit cast. The immediate benefactors of this change are: 1. Juniper Networks - The network stack implementation in Junos is entirely different from FreeBSD's one and this change allows Juniper to build "stock" NIC drivers that can be used in combination with both the FreeBSD and Junos stacks. 2. FreeBSD - This change opens the door towards changing ifnet and implementing new features and optimizations in the network stack without it requiring a change in the many NIC drivers FreeBSD has. Submitted by: Anuranjan Shukla <anshukla@juniper.net> Reviewed by: glebius@ Obtained from: Juniper Networks, Inc.	2014-06-02 17:54:39 +00:00
Luigi Rizzo	5a067ae187	compile with NOINET	2014-02-20 04:56:55 +00:00
Luigi Rizzo	89e3fd5247	two small changes: - intercept FIONBIO and FIOASYNC ioctls on netmap file descriptors. libpcap calls them to set non blocking I/O on the file descriptor, for netmap this is a no-op because there is no read/write, but not intercepting would cause fcntl() to return -1 - rate limit and put under netmap.verbose some messages that occur when threads use concurrently the same file descriptor.	2014-02-18 04:27:41 +00:00
Luigi Rizzo	f0ea3689a9	This new version of netmap brings you the following: - netmap pipes, providing bidirectional blocking I/O while moving 100+ Mpps between processes using shared memory channels (no mistake: over one hundred million. But mind you, i said moving not processing); - kqueue support (BHyVe needs it); - improved user library. Just the interface name lets you select a NIC, host port, VALE switch port, netmap pipe, and individual queues. The upcoming netmap-enabled libpcap will use this feature. - optional extra buffers associated to netmap ports, for applications that need to buffer data yet don't want to make copies. - segmentation offloading for the VALE switch, useful between VMs. and a number of bug fixes and performance improvements. My colleagues Giuseppe Lettieri and Vincenzo Maffione did a substantial amount of work on these features so we owe them a big thanks. There are some external repositories that can be of interest: https://code.google.com/p/netmap our public repository for netmap/VALE code, including linux versions and other stuff that does not belong here, such as python bindings. https://code.google.com/p/netmap-libpcap a clone of the libpcap repository with netmap support. With this any libpcap client has access to most netmap feature with no recompilation. E.g. tcpdump can filter packets at 10-15 Mpps. https://code.google.com/p/netmap-ipfw a userspace version of ipfw+dummynet which uses netmap to send/receive packets. Speed is up in the 7-10 Mpps range per core for simple rulesets. Both netmap-libpcap and netmap-ipfw will be merged upstream at some point, but while this happens it is useful to have access to them. And yes, this code will be merged soon. It is infinitely better than the version currently in 10 and 9. MFC after: 3 days	2014-02-15 04:53:04 +00:00
Luigi Rizzo	f263752668	netmap_user.h: add separate rx/tx ring indexes add ring specifier in nm_open device name netmap.c, netmap_vale.c more consistent errno numbers netmap_generic.c correctly handle failure in registering interfaces. tools/tools/netmap/ massive cleanup of the example programs (a lot of common code is now in netmap_user.h.) nm_util.[ch] are going away soon. pcap.c will also go when i commit the native netmap support for libpcap.	2014-01-16 00:20:42 +00:00
Luigi Rizzo	0c7ba37e01	Fix netmap emulation when NICs attached to a VALE switch have a different number of tx and rx rings Submitted by: Vincenzo Maffione	2014-01-10 16:01:44 +00:00
Luigi Rizzo	f6c2a31f72	sync with our internal repo - small change in debugging messages	2014-01-10 16:00:27 +00:00
Gleb Smirnoff	339f59c096	Fix build with VIMAGE.	2014-01-09 00:59:03 +00:00
Luigi Rizzo	fb25194fb0	fix use after free when releasing a netmap adapter. Submitted by: Giuseppe Lettieri	2014-01-07 21:14:28 +00:00
Luigi Rizzo	17885a7bfd	It is 2014 and we have a new version of netmap. Most relevant features: - netmap emulation on any NIC, even those without native netmap support. On the ixgbe we have measured about 4Mpps/core/queue in this mode, which is still a lot more than with sockets/bpf. - seamless interconnection of VALE switch, NICs and host stack. If you disable accelerations on your NIC (say em0) ifconfig em0 -txcsum -txcsum you can use the VALE switch to connect the NIC and the host stack: vale-ctl -h valeXX:em0 allowing sharing the NIC with other netmap clients. - THE USER API HAS SLIGHTLY CHANGED (head/cur/tail pointers instead of pointers/count as before). This was unavoidable to support, in the future, multiple threads operating on the same rings. Netmap clients require very small source code changes to compile again. On the plus side, the new API should be easier to understand and the internals are a lot simpler. The manual page has been updated extensively to reflect the current features and give some examples. This is the result of work of several people including Giuseppe Lettieri, Vincenzo Maffione, Michio Honda and myself, and has been financially supported by EU projects CHANGE and OPENLAB, from NetApp University Research Fund, NEC, and of course the Universita` di Pisa.	2014-01-06 12:53:15 +00:00
Gleb Smirnoff	256d9417f8	Fix build.	2013-12-18 04:36:35 +00:00
Luigi Rizzo	2e159ef0b5	fix the build using __builtin_prefetch() instead of redefining prefetch()	2013-12-16 23:57:43 +00:00
Luigi Rizzo	f9790aeb88	split netmap code according to functions: - netmap.c base code - netmap_freebsd.c FreeBSD-specific code - netmap_generic.c emulate netmap over standard drivers - netmap_mbq.c simple mbuf tailq - netmap_mem2.c memory management - netmap_vale.c VALE switch simplify devce-specific code	2013-12-15 08:37:24 +00:00
Luigi Rizzo	5864b3a586	remove a debugging message	2013-11-06 19:18:39 +00:00
Luigi Rizzo	bc4c07d8e6	remove some test code.	2013-11-05 01:06:22 +00:00
Luigi Rizzo	954dca4c99	fix a bug when a device has 1 tx (or rx) queue and more than one queue of a different type. Submitted by: Vincenzo Maffione MFC after: 3 days	2013-11-05 00:56:07 +00:00
Luigi Rizzo	5ab0d24d48	check errors on return from netmap_attach() Submitted by: Giuseppe Lettieri MFC after: 3 days	2013-11-05 00:50:59 +00:00
Luigi Rizzo	3d819cb610	circumvent a couple of warnings: - on line 2550 intentionally overriding a const qualifier - on line 3219 intentionally converting uint64_t to a pointer	2013-11-02 18:03:21 +00:00
Luigi Rizzo	ab1e286051	add missing file from previous netmap update...	2013-11-02 00:54:47 +00:00
Luigi Rizzo	ce3ee1e7c4	update to the latest netmap snapshot. This includes the following: - use separate memory regions for VALE ports - locking fixes - some simplifications in the NIC-specific routines - performance improvements for the VALE switch - some new features in the pkt-gen test program - documentation updates There are small API changes that require programs to be recompiled (NETMAP_API has been bumped so you will detect old binaries at runtime). In particular: - struct netmap_slot now is 16 bytes to support an extra pointer, which may save one data copy when using VALE ports or VMs; - the struct netmap_if has two extra fields; MFC after: 3 days	2013-11-01 21:21:14 +00:00
Gleb Smirnoff	76039bc84f	The r48589 promised to remove implicit inclusion of if_var.h soon. Prepare to this event, adding if_var.h to files that do need it. Also, include all includes that now are included due to implicit pollution via if_var.h Sponsored by: Netflix Sponsored by: Nginx, Inc.	2013-10-26 17:58:36 +00:00
Jack F Vogel	7609433eb6	Update the Intel igb driver to version 2.4.0 - This version has support for the new Intel Avoton systems, including 2.5Gb support, further it now has IPv6/TSO6 support as well. Shared code has been updated where necessary as well. Thanks to my new assistant Eric Joyner for doing the transmit path changes to bring in the IPv6/TSO6 support. Thanks to Gleb for catching the one bug and change needed in NETMAP. Approved by: re	2013-10-09 17:32:52 +00:00
Luigi Rizzo	85233a7d39	- fix a bug in the previous commit that was dropping the last packet from each batch flowing on the VALE switch - feature: add glue for 'indirect' buffers on the sender side: if a slot has NS_INDIRECT set, the netmap buffer contains pointer(s) to the actual userspace buffers, which are accessed with copyin(). The feature is not finalised yet, as it will likely need to deal with some iovec variant for proper scatter/gather support. This will save one copy for clients (e.g. qemu) that cannot use the netmap buffer directly. A curiosity: on amd64 copyin() appears to be 10-15% faster than pkt_copy() or bcopy() at least for sizes of 256 and greater.	2013-06-05 17:27:59 +00:00
Luigi Rizzo	f18be5766f	Bring in a number of new features, mostly implemented by Michio Honda: - the VALE switch now support up to 254 destinations per switch, unicast or broadcast (multicast goes to all ports). - we can attach hw interfaces and the host stack to a VALE switch, which means we will be able to use it more or less as a native bridge (minor tweaks still necessary). A 'vale-ctl' program is supplied in tools/tools/netmap to attach/detach ports the switch, and list current configuration. - the lookup function in the VALE switch can be reassigned to something else, similar to the pf hooks. This will enable attaching the firewall, or other processing functions (e.g. in-kernel openvswitch) directly on the netmap port. The internal API used by device drivers does not change. Userspace applications should be recompiled because we bump NETMAP_API as we now use some fields in the struct nmreq that were previously ignored -- otherwise, data structures are the same. Manpages will be committed separately.	2013-05-30 14:07:14 +00:00
Luigi Rizzo	ede69cff5b	another minor bugfix in the memory allocator, this time in the free routine.	2013-05-10 08:46:10 +00:00
Luigi Rizzo	654ae8d68c	remove trailing whitespace	2013-05-02 16:01:04 +00:00
Luigi Rizzo	849bec0e76	Partial cleanup in preparation for upcoming changes: - netmap_rx_irq()/netmap_tx_irq() can now be called by FreeBSD drivers hiding the logic for handling NIC interrupts in netmap mode. This also simplifies the case of NICs attached to VALE switches. Individual drivers will be updated with separate commits. - use the same refcount() API for FreeBSD and linux - plus some comments, typos and formatting fixes Portions contributed by Michio Honda	2013-04-30 16:08:34 +00:00
Luigi Rizzo	28228e0816	whitespace - document alternative locking under linux	2013-04-29 19:30:35 +00:00
Luigi Rizzo	d4b42e0869	whitespace changes: remove $Id$ lines, and add blank lines around some #if / #elif /#endif	2013-04-29 18:00:53 +00:00
Luigi Rizzo	b865453e3e	explicitly mark some variables as const	2013-04-29 16:58:21 +00:00
Luigi Rizzo	2579e2d715	mostly whitespace changes: - remove vestiges of the old memory allocator - clean up some comments	2013-04-19 21:08:21 +00:00
Luigi Rizzo	aa76317cfc	fix a bug in the computation of the userspace offset for a give netmap buffer. Submitted by: Hugh Nhan	2013-04-15 11:49:16 +00:00
Attilio Rao	89f6b8632c	Switch the vm_object mutex to be a rwlock. This will enable in the future further optimizations where the vm_object lock will be held in read mode most of the time the page cache resident pool of pages are accessed for reading purposes. The change is mostly mechanical but few notes are reported: * The KPI changes as follow: - VM_OBJECT_LOCK() -> VM_OBJECT_WLOCK() - VM_OBJECT_TRYLOCK() -> VM_OBJECT_TRYWLOCK() - VM_OBJECT_UNLOCK() -> VM_OBJECT_WUNLOCK() - VM_OBJECT_LOCK_ASSERT(MA_OWNED) -> VM_OBJECT_ASSERT_WLOCKED() (in order to avoid visibility of implementation details) - The read-mode operations are added: VM_OBJECT_RLOCK(), VM_OBJECT_TRYRLOCK(), VM_OBJECT_RUNLOCK(), VM_OBJECT_ASSERT_RLOCKED(), VM_OBJECT_ASSERT_LOCKED() * The vm/vm_pager.h namespace pollution avoidance (forcing requiring sys/mutex.h in consumers directly to cater its inlining functions using VM_OBJECT_LOCK()) imposes that all the vm/vm_pager.h consumers now must include also sys/rwlock.h. * zfs requires a quite convoluted fix to include FreeBSD rwlocks into the compat layer because the name clash between FreeBSD and solaris versions must be avoided. At this purpose zfs redefines the vm_object locking functions directly, isolating the FreeBSD components in specific compat stubs. The KPI results heavilly broken by this commit. Thirdy part ports must be updated accordingly (I can think off-hand of VirtualBox, for example). Sponsored by: EMC / Isilon storage division Reviewed by: jeff Reviewed by: pjd (ZFS specific review) Discussed with: alc Tested by: pho	2013-03-09 02:32:23 +00:00
Luigi Rizzo	091fd0ab54	Add support for transparent mode while in netmap. By setting dev.netmap.fwd=1 (or enabling the feature with a per-ring flag), packets are forwarded between the NIC and the host stack unless the netmap client clears the NS_FORWARD flag on the individual descriptors. This feature greatly simplifies applications where some traffic (think of ARP, control traffic, ssh sessions...) must be processed by the host stack, whereas the bulk is handled by the netmap process which simply (un)marks packets that should not be forwarded. The default is chosen so that now a netmap receiver operates in a mode very similar to bpf. Of course there is no free lunch: traffic to/from the host stack still operates at OS speed (or less, as there is one extra copy in one direction). HOWEVER, since traffic goes to the user process before being reinjected, and reinjection occurs in a user context, you get some form of livelock protection for free.	2013-01-23 05:37:45 +00:00
Luigi Rizzo	ae10d1afee	control some debugging messages with dev.netmap.verbose add infrastracture to adapt to changes in number of queues and buffers at runtime	2013-01-23 03:51:47 +00:00
Luigi Rizzo	70ca194a4c	remove the old memory allocator, not useful anymore	2013-01-17 23:14:17 +00:00
Luigi Rizzo	1dce924d25	add some definition and driver changes in preparation for two upcoming features: semi-transparent mode: when a device is opened in this mode, the user program will be able to mark slots that must be forwarded to the "other" side (i.e. from NIC to host stack, or viceversa), and the forwarding will occur automatically at the next netmap syscall. This saves the need to open another file descriptor and do the forwarding manually. direct-forwarding mode: when operating with a VALE port, the user can specify in the slot the actual destination port, overriding the forwarding decision made by a lookup of the destination MAC. This can be useful to implement packet dispatchers. No API changes will be introduced. No new functionality in this patch yet.	2013-01-17 22:14:58 +00:00

1 2 3

102 Commits