freebsd-nq

Author	SHA1	Message	Date
Attilio Rao	89f6b8632c	Switch the vm_object mutex to be a rwlock. This will enable in the future further optimizations where the vm_object lock will be held in read mode most of the time the page cache resident pool of pages are accessed for reading purposes. The change is mostly mechanical but few notes are reported: * The KPI changes as follow: - VM_OBJECT_LOCK() -> VM_OBJECT_WLOCK() - VM_OBJECT_TRYLOCK() -> VM_OBJECT_TRYWLOCK() - VM_OBJECT_UNLOCK() -> VM_OBJECT_WUNLOCK() - VM_OBJECT_LOCK_ASSERT(MA_OWNED) -> VM_OBJECT_ASSERT_WLOCKED() (in order to avoid visibility of implementation details) - The read-mode operations are added: VM_OBJECT_RLOCK(), VM_OBJECT_TRYRLOCK(), VM_OBJECT_RUNLOCK(), VM_OBJECT_ASSERT_RLOCKED(), VM_OBJECT_ASSERT_LOCKED() * The vm/vm_pager.h namespace pollution avoidance (forcing requiring sys/mutex.h in consumers directly to cater its inlining functions using VM_OBJECT_LOCK()) imposes that all the vm/vm_pager.h consumers now must include also sys/rwlock.h. * zfs requires a quite convoluted fix to include FreeBSD rwlocks into the compat layer because the name clash between FreeBSD and solaris versions must be avoided. At this purpose zfs redefines the vm_object locking functions directly, isolating the FreeBSD components in specific compat stubs. The KPI results heavilly broken by this commit. Thirdy part ports must be updated accordingly (I can think off-hand of VirtualBox, for example). Sponsored by: EMC / Isilon storage division Reviewed by: jeff Reviewed by: pjd (ZFS specific review) Discussed with: alc Tested by: pho	2013-03-09 02:32:23 +00:00
Luigi Rizzo	091fd0ab54	Add support for transparent mode while in netmap. By setting dev.netmap.fwd=1 (or enabling the feature with a per-ring flag), packets are forwarded between the NIC and the host stack unless the netmap client clears the NS_FORWARD flag on the individual descriptors. This feature greatly simplifies applications where some traffic (think of ARP, control traffic, ssh sessions...) must be processed by the host stack, whereas the bulk is handled by the netmap process which simply (un)marks packets that should not be forwarded. The default is chosen so that now a netmap receiver operates in a mode very similar to bpf. Of course there is no free lunch: traffic to/from the host stack still operates at OS speed (or less, as there is one extra copy in one direction). HOWEVER, since traffic goes to the user process before being reinjected, and reinjection occurs in a user context, you get some form of livelock protection for free.	2013-01-23 05:37:45 +00:00
Luigi Rizzo	ae10d1afee	control some debugging messages with dev.netmap.verbose add infrastracture to adapt to changes in number of queues and buffers at runtime	2013-01-23 03:51:47 +00:00
Luigi Rizzo	70ca194a4c	remove the old memory allocator, not useful anymore	2013-01-17 23:14:17 +00:00
Luigi Rizzo	1dce924d25	add some definition and driver changes in preparation for two upcoming features: semi-transparent mode: when a device is opened in this mode, the user program will be able to mark slots that must be forwarded to the "other" side (i.e. from NIC to host stack, or viceversa), and the forwarding will occur automatically at the next netmap syscall. This saves the need to open another file descriptor and do the forwarding manually. direct-forwarding mode: when operating with a VALE port, the user can specify in the slot the actual destination port, overriding the forwarding decision made by a lookup of the destination MAC. This can be useful to implement packet dispatchers. No API changes will be introduced. No new functionality in this patch yet.	2013-01-17 22:14:58 +00:00
Luigi Rizzo	e814dcebf3	remove an incorrect comment and debugging code	2013-01-17 19:27:12 +00:00
Luigi Rizzo	60372f6f58	rename the 'tag' and 'map' fields used the rx ring to their previous names, 'ptag' and 'pmap' -- p stands for packet. This change reduces the difference between the code in stable/9 and head, and also helps using the same ixgbe_netmap.h on both branches. Approved by: Jack Vogel	2012-12-20 22:26:03 +00:00
Jack F Vogel	7d1157eec8	First of a series of 11 patches leading to new ixgbe version 2.5.0 This removes the header split and supporting code from the driver.	2012-11-30 22:19:18 +00:00
Ed Maste	d2b9185176	Use M_NOWAIT when calling malloc with a lock held. The check for a NULL return was already in place so I assume this was just an oversight.	2012-10-19 19:28:35 +00:00
Gleb Smirnoff	88f7905789	Fix build.	2012-10-19 09:41:45 +00:00
Luigi Rizzo	8241616dc5	This is an import of code, mostly from Giuseppe Lettieri, that revises the netmap memory allocator so that the various parameters (number and size of buffers, rings, descriptors) can be modified at runtime through sysctl variables. The changes become effective when no netmap clients are active. The API is mostly unchanged, although the NIOCUNREGIF ioctl now does not bring the interface back to normal mode: and you need to close the file descriptor for that. This change was necessary to track who is using the mapped region, and since it is a simplification of the API there was no incentive in trying to preserve NIOCUNREGIF. We will remove the ioctl from the kernel next time we need a real API change (and version bump). Among other things, buffer allocation when opening devices is now much faster: it used to take O(N^2) time, now it is linear. Submitted by: Giuseppe Lettieri	2012-10-19 04:13:12 +00:00
Ed Maste	4cf8455f59	Avoid panic when a netmap instance cannot obtain memory. A uint32_t is always >= 0. Sponsored by: ADARA Networks	2012-10-17 18:21:14 +00:00
Ed Maste	033ed050a0	Reword comment to try to improve clarity, and fix a typo.	2012-08-13 19:14:45 +00:00
Ed Maste	2f70fca5ec	Improve lock and unlock symmetry - Move destruction of per-ring locks to netmap_dtor_locked to mirror the initialization that happens in NIOCREGIF. Otherwise unloading a netmap- capable interface that was never put into netmap mode would try to mtx_destroy an uninitialized mutex, and panic. - Destroy core_lock in netmap_detach, mirroring init in netmap_attach. - Also comment out the knlist_destroy for now as there is currently no knlist_init. Sponsored by: ADARA Networks Reviewed by: luigi@	2012-08-09 14:46:52 +00:00
Ed Maste	0bf8895411	Fix whitespace (missing newline)	2012-08-08 15:28:29 +00:00
Ed Maste	24e57ec96d	Clarify comments about number of tx / rx rings	2012-08-08 15:27:01 +00:00
Luigi Rizzo	b3d5301688	fix some signed/unsigned warnings in the netmap code. Unfortunately the original drivers still have a lot of sign conversion/comparison warnings.	2012-08-02 11:59:43 +00:00
Luigi Rizzo	42a3a5bd91	Add a newline on an error message; rename linux functions to avoid confusion; fix error reporting on linux	2012-08-02 07:35:40 +00:00
Luigi Rizzo	d198a63d44	remove a redundant MALLOC_DECLARE	2012-07-31 05:51:48 +00:00
Luigi Rizzo	0b8ed8e069	- move the inclusion of netmap headers to the common part of the code; - more portable annotations for unused arguments;	2012-07-30 18:21:48 +00:00
Luigi Rizzo	01c7d25ff4	use __builtin_prefetch() for prefetch. merge in the remaining part of the linux-specific glue so i do not need to maintain two different distributions.	2012-07-27 10:52:21 +00:00
Luigi Rizzo	826e7ddbfc	remove unused definition, whitespace cleanup	2012-07-27 10:31:26 +00:00
Luigi Rizzo	29ecb031b6	define prefetch as a noop on !x86	2012-07-26 21:37:58 +00:00
Luigi Rizzo	f196ce3869	Add support for VALE bridges to the netmap core, see http://info.iet.unipi.it/~luigi/vale/ VALE lets you dynamically instantiate multiple software bridges that talk the netmap API (and are extremely fast), so you can test netmap applications without the need for high end hardware. This is particularly useful as I am completing a netmap-aware version of ipfw, and VALE provides an excellent testing platform. Also, I also have netmap backends for qemu mostly ready for commit to the port, and this too will let you interconnect virtual machines at high speed without fiddling with bridges, tap or other slow solutions. The API for applications is unchanged, so you can use the code in tools/tools/netmap (which i will update soon) on the VALE ports. This commit also syncs the code with the one in my internal repository, so you will see some conditional code for other platforms. The code should run mostly unmodified on stable/9 so people interested in trying it can just copy sys/dev/netmap/ and sys/net/netmap*.h from HEAD VALE is joint work with my colleague Giuseppe Lettieri, and is partly supported by the EU Projects CHANGE and OPENLAB	2012-07-26 16:45:28 +00:00
Luigi Rizzo	0ee29d4125	this file is too old and not interesting anymore now that netmap has been MFC'ed.	2012-05-17 20:05:13 +00:00
Luigi Rizzo	5b24837478	print 'netmap stack ring full' only in verbose mode.	2012-05-03 21:16:53 +00:00
Luigi Rizzo	b1123b0137	i prefer this fix for the -Wformat warning (just one cast, all the other variables are already correct for %x). My previous attempt put the cast in the wrong place.	2012-04-14 16:44:18 +00:00
Bjoern A. Zeeb	92083c91d2	Make compile on 64bit somehow for now after a first try at r234242 on maybe 32bit?	2012-04-14 13:39:39 +00:00
Luigi Rizzo	ce2cb79269	fix build with -Wformat -Wmissing-prototypes	2012-04-13 22:24:57 +00:00
Luigi Rizzo	9b034c6f08	Properly disable crc stripping when operating in netmap mode. Contrarily to what i wrote in my previous commit, the 82599 does include the CRC in the length. The operating mode is reset in ixgbe_init_locked() and so we need to hook into the places where the two registers (HLREG0 and RDRXCTL) are modified.	2012-04-13 16:42:54 +00:00
Luigi Rizzo	ccdc3305e4	add the new memory allocator for netmap, which allocates memory in small clusters instead of one big contiguous chunk. This was already enabled in the previous commit.	2012-04-13 16:32:33 +00:00
Luigi Rizzo	d76bf4ff7b	A bit of cleanup in the names of fields of netmap-related structures. Use the name 'ring' instead of 'queue' in all fields. Bump NETMAP_API.	2012-04-13 16:03:07 +00:00
Luigi Rizzo	82d2fe1069	do not use a deprecated field in a structure.	2012-04-13 15:33:12 +00:00
Luigi Rizzo	4f609083e5	Apparently the length field in advanced descriptors does not include the CRC irrespective of the setting of CRCSTRIP. The 82599 data sheets (sec. 7.1.6) say differently. Very strange. Need to check what happens on legacy descriptors, but for the time being this restores functionality.	2012-04-12 14:06:05 +00:00
Luigi Rizzo	3c0caf6ce6	Some code restructuring to bring the memory allocator out of netmap.c and make it easier to replace it with a different implementation. On passing, also fix indentation. NOTE: I know that #include "foo.c" is ugly, but the alternative (add another entry to sys/conf/files, add a separate header with structs and prototypes, and expose functions that are meant to be private) looks even worse to me. We need a more modular way to specify dependencies and build options.	2012-04-12 11:27:09 +00:00
Luigi Rizzo	13b9940fdc	use correct selinfo pointer for the generic interrupt handler (it is never used in current FreeBSD drivers).	2012-04-12 08:54:01 +00:00
Luigi Rizzo	c85cb1a0a2	A couple of changes related to ixgbe operation in netmap mode: - add a sysctl, dev.netmap.ix_crcstrip, to control whether ixgbe should strip the CRC on received frames. Defaults to 0, which keeps the CRC. and improves performance when receiving min-sized (64-byte) frames. This matters because min-sized frames is one of the standard benchmarks for switches and routers, some chipsets seem to issue read-modify-write cycles for PCIe transactions that are not a full cache line, and a min-sized frame triggers the bug, resulting in reduced throughput -- 9.7 instead of 14.88 Mpps -- and heavy bus load. - for the time being, always look for incoming packets on a select/poll even if there has not been an interrupt in the meantime. This is only a temporary workaround for a probable race condition in keeping track of rx interrupts. Add a couple of diagnostic vars to help studying the problem.	2012-04-11 16:11:08 +00:00
Luigi Rizzo	64ae02c365	A bunch of netmap fixes: USERSPACE: 1. add support for devices with different number of rx and tx queues; 2. add better support for zero-copy operation, adding an extra field to the netmap ring to indicate how many buffers we have already processed but not yet released (with help from Eddie Kohler); 3. The two changes above unfortunately require an API change, so while at it add a version field and some spares to the ioctl() argument to help detect mismatches. 4. update the manual page for the two changes above; 5. update sample applications in tools/tools/netmap KERNEL: 1. simplify the internal structures moving the global wait queues to the 'struct netmap_adapter'; 2. simplify the functions that map kring<->nic ring indexes 3. normalize device-specific code, helps mainteinance; 4. start exploring the impact of micro-optimizations (prefetch etc.) in the ixgbe driver. Use 'legacy' descriptors on the tx ring and prefetch slots gives about 20% speedup at 900 MHz. Another 7-10% would come from removing the explict calls to bus_dmamap* in the core (they are effectively NOPs in this case, but it takes expensive load of the per-buffer dma maps to figure out that they are all NULL. Rx performance not investigated. I am postponing the MFC so i can import a few more improvements before merging.	2012-02-27 19:05:01 +00:00
Luigi Rizzo	babc7c1258	Various cleanups for readability (no functional changes) - remove the KEVENT code, which was incomplete and not compiled anyways; - change some while() loops into for() - adjust indentation - remove extra whitespace MFC after: 1 week	2012-02-17 14:09:04 +00:00
Luigi Rizzo	5644ccec61	(This commit only touches code within the DEV_NETMAP blocks) Introduce some functions to map NIC ring indexes into netmap ring indexes and vice versa. This way we can implement the bound checks only in one place (and hopefully in a correct way). On passing, make the code and comments more uniform across the various drivers.	2012-02-15 23:13:29 +00:00
Luigi Rizzo	4b0a800988	reduce the differences between these three files. The three drivers (em, lem and igb) are extremely similar, too bad that the structures use different names and we cannot share the code.	2012-02-15 18:59:26 +00:00
Luigi Rizzo	1a26580ee8	- use struct ifnet as explicit type of the argument to the txsync() and rxsync() callbacks, removing some variables made useless by this change; - add generic lock and irq handling routines. These can be useful in case there are no driver locks that we can reuse; - add a few macros to reduce differences with the Linux version.	2012-02-13 18:56:34 +00:00
Luigi Rizzo	5819da83ce	- change the buffer size from a constant to a TUNABLE variable (hw.netmap.buf_size) so we can experiment with values different from 2048 which may give better cache performance. - rearrange the memory allocation code so it will be easier to replace it with a different implementation. The current code relies on a single large contiguous chunk of memory obtained through contigmalloc. The new implementation (not committed yet) uses multiple smaller chunks which are easier to fit in a fragmented address space.	2012-02-08 11:43:29 +00:00
Luigi Rizzo	2157a17ce2	ixgbe changes: - remove experimental code for disabling CRC - use the correct constant for conversion between interrupt rate and EITR values (the previous values were off by a factor of 2) - make dev.ix.N.queueM.interrupt_rate a RW sysctl variable. Changing individual values affects the queue immediately, and propagates to all interfaces at the next reinit. - add dev.ix.N.queueM.irqs rdonly sysctl, to export the actual interrupt counts Netmap-related changes for ixgbe: - use the "new" format for TX descriptors in netmap mode. - pass interrupt mitigation delays to the user process doing poll() on a netmap file descriptor. On the RX side this means we will not check the ring more than once per interrupt. This gives the process a chance to sleep and process packets in larger batches, thus reducing CPU usage. On the TX side we take this even further: completed transmissions are reclaimed every half ring even if the NIC interrupts more often. This saves even more CPU without any additional tx delays. Generic Netmap-related changes: - align the netmap_kring to cache lines so that there is no false sharing (possibly useful for multiqueue NICs and MSIX interrupts, which are handled by different cores). It's a minor improvement but it does not cost anything. Reviewed by: Jack Vogel Approved by: Jack Vogel	2012-01-26 09:55:16 +00:00
Luigi Rizzo	bcda432e01	indentation and whitespace fixes	2012-01-13 11:58:06 +00:00
Luigi Rizzo	38b4948b5e	fix indentation	2012-01-13 11:01:23 +00:00
Luigi Rizzo	6dba29a285	Two performance-related fixes: 1. as reported by Alexander Fiveg, the allocator was reporting half of the allocated memory. Fix this by exiting from the loop earlier (not too critical because this code is going away soon). 2. following a discussion on freebsd-current http://lists.freebsd.org/pipermail/freebsd-current/2012-January/031144.html turns out that (re)loading the dmamap was expensive and not optimized. This operation is in the critical path when doing zero-copy forwarding between interfaces. At least on netmap and i386/amd64, the bus_dmamap_load can be completely bypassed if the map is NULL, so we do it. The latter change gives an almost 3x improvement in forwarding performance, from the previous 9.5Mpps at 2.9GHz to the current line rate (14.2Mpps) at 1.733GHz. (this is for 64+4 byte packets, in other configurations the PCIe bus is a bottleneck).	2012-01-13 10:21:15 +00:00
Luigi Rizzo	446ee30192	other simplifications in the internal interfaces to the memory allocator.	2012-01-10 23:02:01 +00:00
Luigi Rizzo	6e10c8b8c5	small code cleanup in preparation for future modifications in the memory allocator used by netmap. No functional change, two small bug fixes: - in if_re.c add a missing bus_dmamap_sync() - in netmap.c comment out a spurious free() in an error handling block	2012-01-10 19:57:23 +00:00
Luigi Rizzo	6f3bc95594	remove a variable definition which shadows the correct one. Submitted by: Eitan Adler	2011-12-25 21:00:56 +00:00

1 2

57 Commits