freebsd-dev

Author	SHA1	Message	Date
Vincenzo Maffione	b7d6913862	netmap: use FreeBSD guards for epoch calls EPOCH calls are FreeBSD specific. Use guards to protect these, so that the code can compile under Linux. MFC after: 1 week	2020-08-24 20:28:21 +00:00
Vincenzo Maffione	ff48ef48ac	netmap: fix parsing of legacy nmr->nr_ringid Code was checking for NETMAP_{SW,HW}_RING in req->nr_ringid which had already been masked by NETMAP_RING_MASK. Therefore, the comparisons always failed and set NR_REG_ALL_NIC. Check against the original nmr structure. Submitted by: bpoole@packetforensics.com Reported by: bpoole@packetforensics.com Reviewed by: giuseppe.lettieri@unipi.it Approved by: vmaffione MFC after: 1 week	2020-08-18 08:03:28 +00:00
Vincenzo Maffione	16f224b5f8	netmap: vtnet: fix races in vtnet_netmap_reg() The nm_register callback needs to call nm_set_native_flags() or nm_clear_native_flags() once the device has been stopped. However, in the current implementation this is not true, as the device is stopped by vtnet_init_locked(). This causes race conditions where the driver crashes as soon as it dequeues netmap buffers assuming they are mbufs (or the other way around). To fix the issue, we extend vtnet_init_locked() with a second argument that, if not zero, will set/clear the netmap flags. This results in a huge simplification of the nm_register callback itself. Also, use netmap_reset() to check if a ring is going to be re-initialized in netmap mode. MFC after: 1 week	2020-06-14 20:47:31 +00:00
Vincenzo Maffione	6682323732	netmap: introduce netmap_kring_on() This function returns NULL if the ring identified by queue id and direction is in netmap mode. Otherwise return the corresponding kring. Use this function to replace vtnet_netmap_queue_on(). MFC after: 1 week	2020-06-11 20:35:28 +00:00
Vincenzo Maffione	e8c07b1246	netmap: vtnet: clean up rxsync disabled logs MFC after: 1 week	2020-06-03 17:47:32 +00:00
Vincenzo Maffione	1b6d5a80a6	netmap: vtnet: fix race condition in rxsync This change prevents a race that happens when rxsync dequeues N-1 rx packets (with N being the size of the netmap rx ring). In this situation, the loop exits without re-enabling the rx interrupts, thus causing the VQ to stall. MFC after: 1 week	2020-06-03 17:46:21 +00:00
Vincenzo Maffione	2d769e25b1	netmap: vtnet: add vtnrx_nm_refill index to receive queues The new index tracks the next netmap slot that is going to be enqueued into the virtqueue. The index is necessary to prevent the receive VQ and the netmap rx ring from going out of sync, considering that we never enqueue N slots, but at most N-1. This change fixes a bug that causes the VQ and the netmap ring to go out of sync after N-1 packets have been received. MFC after: 1 week	2020-06-03 17:42:17 +00:00
Vincenzo Maffione	06f6997eb5	netmap: vale: fix disabled logs MFC after: 1 week	2020-06-03 05:49:19 +00:00
Vincenzo Maffione	81d2cade1c	netmap: vtnet: remove leftover memory barriers MFC after: 1 week	2020-06-03 05:48:42 +00:00
Vincenzo Maffione	9ec71596c0	netmap: if_vtnet: avoid netmap ring wraparound netmap assumes the one "slot" is left unused to distinguish the empty ring and full ring conditions. This assumption was violated by vtnet_netmap_rxq_populate(). MFC after: 1 week	2020-06-01 16:14:29 +00:00
Vincenzo Maffione	36f2d67026	netmap: if_vtnet: replace vtnet_free_used() The functionality contained in this function is duplicated, as it is already available in vtnet_txq_free_mbufs() and vtnet_rxq_free_mbufs(). MFC after: 1 week	2020-06-01 16:12:09 +00:00
Vincenzo Maffione	c9de157d36	netmap: vtnet: fix RX virtqueue initialization bug The vtnet_netmap_rxq_populate() function erroneously assumed that kring->nr_hwcur = 0, i.e. the kring was in the initial state. However, this is not always the case: for example, when a vtnet reinit is triggered by some changes in the interface flags or capenable. This patch changes the behaviour of vtnet_netmap_kring_refill() so that it always starts publishing the netmap buffers starting from the current value of kring->nr_hwcur. MFC after: 1 week	2020-06-01 16:10:44 +00:00
Pawel Biernacki	7029da5c36	Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (17 of many) r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are still not MPSAFE (or already are but aren’t properly marked). Use it in preparation for a general review of all nodes. This is non-functional change that adds annotations to SYSCTL_NODE and SYSCTL_PROC nodes using one of the soon-to-be-required flags. Mark all obvious cases as MPSAFE. All entries that haven't been marked as MPSAFE before are by default marked as NEEDGIANT Approved by: kib (mentor, blanket) Commented by: kib, gallatin, melifaro Differential Revision: https://reviews.freebsd.org/D23718	2020-02-26 14:26:36 +00:00
Gleb Smirnoff	6c3e93cb5a	Use NET_TASK_INIT() and NET_GROUPTASK_INIT() for drivers that process incoming packets in taskqueue context. Reviewed by: hselasky Differential Revision: https://reviews.freebsd.org/D23518	2020-02-11 18:57:07 +00:00
Vincenzo Maffione	723180da59	netmap: improve netmap(4) and vale(4) man pages Clean up obsolete sysctl descriptions and add missing ones. PR: 243838 Reviewed by: bcr MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D23546	2020-02-07 19:26:26 +00:00
Vincenzo Maffione	de27b30340	netmap_mem_unmap: fix NULL pointer dereference MFC after: 3 days	2020-01-26 21:34:46 +00:00
Gleb Smirnoff	a44700782e	In netmap() call ether_input() within the network epoch.	2020-01-23 01:35:02 +00:00
Vincenzo Maffione	2ec213aba4	netmap: disable passthrough with no hypervisor support The netmap passthrough subsystem requires proper support in the hypervisor. In particular, two PCI device ids (from the Red Hat PCI vendor id 0x1b36) need to be assigned to the two netmap virtual devices. We then disable these devices until the ids have not been assigned, in order to avoid conflicts with other virtual devices emulated by upstream QEMU. PR: 241774 MFC after: 3 days	2020-01-13 21:47:23 +00:00
Jeff Roberson	3cf3b4e641	Make page busy state deterministic on free. Pages must be xbusy when removed from objects including calls to free. Pages must not be xbusy when freed and not on an object. Strengthen assertions to match these expectations. In practice very little code had to change busy handling to meet these rules but we can now make stronger guarantees to busy holders and avoid conditionally dropping busy in free. Refine vm_page_remove() and vm_page_replace() semantics now that we have stronger guarantees about busy state. This removes redundant and potentially problematic code that has proliferated. Discussed with: markj Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D22822	2019-12-22 06:56:44 +00:00
Vincenzo Maffione	c7c7805531	add valectl to the system commands The valectl(4) program is used to manage vale(4) switches. Add it to the system commands so that it can be used right away. This program was previously called vale-ctl, and stored in tools/tools/netmap Reviewed by: hrs, bcr, lwhsu, kevans MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D22146	2019-10-31 21:01:34 +00:00
Vincenzo Maffione	484456b2d8	netmap: enter NET_EPOCH on generic txsync After r353292, netmap generic adapter on if_vlan interfaces panics on asserting the NET_EPOCH. In more detail, this happens when nm_os_generic_xmit_frame() is called, that is in the generic txsync routine. Fix the issue by entering the NET_EPOCH during the generic txsync. We amortize the cost of entering/exiting over a whole batch of transmissions. PR: 241489 Reported by: Aleksandr Fedorov <aleksandr.fedorov@itglobal.com>	2019-10-28 19:00:27 +00:00
Vincenzo Maffione	760fa2ab5d	netmap: minor misc improvements - use ring->head rather than ring->cur in lb(8) - use strlcat() rather than strncat() - fix bandwidth computation in pkt-gen(8) MFC after: 1 week	2019-10-20 14:15:45 +00:00
Vincenzo Maffione	f8bc74e2f4	tap: add support for virtio-net offloads This patch is part of an effort to make bhyve networking (in particular TCP) faster. The key strategy to enhance TCP throughput is to let the whole packet datapath work with TSO/LRO packets (up to 64KB each), so that the per-packet overhead is amortized over a large number of bytes. This capability is supported in the guest by means of the vtnet(4) driver, which is able to handle TSO/LRO packets leveraging the virtio-net header (see struct virtio_net_hdr and struct virtio_net_hdr_mrg_rxbuf). A bhyve VM exchanges packets with the host through a network backend, which can be vale(4) or if_tap(4). While vale(4) supports TSO/LRO packets, if_tap(4) does not. This patch extends if_tap(4) with the ability to understand the virtio-net header, so that a tapX interface can process TSO/LRO packets. A couple of ioctl commands have been added to configure and probe the virtio-net header. Once the virtio-net header is set, the tapX interface acquires all the IFCAP capabilities necessary for TSO/LRO. Reviewed by: kevans Differential Revision: https://reviews.freebsd.org/D21263	2019-10-18 21:53:27 +00:00
Jeff Roberson	0012f373e4	(4/6) Protect page valid with the busy lock. Atomics are used for page busy and valid state when the shared busy is held. The details of the locking protocol and valid and dirty synchronization are in the updated vm_page.h comments. Reviewed by: kib, markj Tested by: pho Sponsored by: Netflix, Intel Differential Revision: https://reviews.freebsd.org/D21594	2019-10-15 03:45:41 +00:00
Mark Johnston	fee2a2fa39	Change synchonization rules for vm_page reference counting. There are several mechanisms by which a vm_page reference is held, preventing the page from being freed back to the page allocator. In particular, holding the page's object lock is sufficient to prevent the page from being freed; holding the busy lock or a wiring is sufficent as well. These references are protected by the page lock, which must therefore be acquired for many per-page operations. This results in false sharing since the page locks are external to the vm_page structures themselves and each lock protects multiple structures. Transition to using an atomically updated per-page reference counter. The object's reference is counted using a flag bit in the counter. A second flag bit is used to atomically block new references via pmap_extract_and_hold() while removing managed mappings of a page. Thus, the reference count of a page is guaranteed not to increase if the page is unbusied, unmapped, and the object's write lock is held. As a consequence of this, the page lock no longer protects a page's identity; operations which move pages between objects are now synchronized solely by the objects' locks. The vm_page_wire() and vm_page_unwire() KPIs are changed. The former requires that either the object lock or the busy lock is held. The latter no longer has a return value and may free the page if it releases the last reference to that page. vm_page_unwire_noq() behaves the same as before; the caller is responsible for checking its return value and freeing or enqueuing the page as appropriate. vm_page_wire_mapped() is introduced for use in pmap_extract_and_hold(). It fails if the page is concurrently being unmapped, typically triggering a fallback to the fault handler. vm_page_wire() no longer requires the page lock and vm_page_unwire() now internally acquires the page lock when releasing the last wiring of a page (since the page lock still protects a page's queue state). In particular, synchronization details are no longer leaked into the caller. The change excises the page lock from several frequently executed code paths. In particular, vm_object_terminate() no longer bounces between page locks as it releases an object's pages, and direct I/O and sendfile(SF_NOCACHE) completions no longer require the page lock. In these latter cases we now get linear scalability in the common scenario where different threads are operating on different files. __FreeBSD_version is bumped. The DRM ports have been updated to accomodate the KPI changes. Reviewed by: jeff (earlier version) Tested by: gallatin (earlier version), pho Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D20486	2019-09-09 21:32:42 +00:00
Vincenzo Maffione	253b2ec199	netmap: import changes from upstream (SHA 137f537eae513) - Rework option processing. - Use larger integers for memory size values in the memory management code. MFC after: 2 weeks	2019-09-01 14:47:41 +00:00
Vincenzo Maffione	df4e516f0f	netmap: remove obsolete file The netmap_pt.c module has become obsolete after the refactoring that added netmap_kloop.c. Remove it and unlink it from the build system. MFC after: 1 week	2019-08-25 20:16:03 +00:00
Vincenzo Maffione	d7143780ce	netmap: fix bug introduced by r349752 r349752 introduced a NULL pointer reference bug in the emulated netmap code. Reported by: lwhsu MFC after: 3 days	2019-07-13 08:08:25 +00:00
Vincenzo Maffione	5d47236b18	netmap: Remove pointer leakage in netmap_mem2.c PR: 238641 Submitted by: Fuqian Huang <huangfq.daxian@gmail.com> Reviewed by: vmaffione MFC after: 1 week	2019-07-04 21:31:49 +00:00
Vincenzo Maffione	5fe59a51dd	netmap: fix kernel pointer printing in netmap_generic.c Print the adapter name rather than the address of the adapter to avoid kernel address leakage. PR: Bug 238642 Submitted by: Fuqian Huang <huangfq.daxian@gmail.com> Reviewed by: vmaffione MFC after: 1 week	2019-07-04 21:11:45 +00:00
Vincenzo Maffione	23ced94451	netmap: fix two panics with emulated adapter This patch fixes 2 panics. The first one is due to the current VNET not being set in the emulated adapter transmission path. The second one is caused by the M_PKTHDR flag not being set when preallocated mbufs are recycled in the transmit path. Submitted by: aleksandr.fedorov@itglobal.com Reviewed by: vmaffione MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D20824	2019-07-01 20:37:35 +00:00
Conrad Meyer	04e0c883c5	Add two missing eventhandler.h headers These are obviously missing from the .c files, but don't show up in any tinderbox configuration (due to latent header pollution of some kind). It seems some configurations don't have this pollution, and the includes are obviously missing, so go ahead and add them. Reported by: Peter Jeremy <peter AT rulingia.com> X-MFC-With: r347984	2019-05-21 00:04:19 +00:00
Vincenzo Maffione	d337c8c731	netmap: align if_ptnet to the changes introduced by r347233 This removes non-functional SCTP checksum offload support. More information in the log message of r347233. MFC after: 2 weeks	2019-05-17 20:29:31 +00:00
Vincenzo Maffione	d12354a56c	netmap: add support for multiple host rings Some applications forward from/to host rings most or all the traffic received or sent on a physical interface. In this cases it is desirable to have more than a pair of RX/TX host rings, and use multiple threads to speed up forwarding. This change adds support for multiple host rings. On registering a netmap port, the user can specify the number of desired receive and transmit host rings in the nr_host_tx_rings and nr_host_rx_rings fields of the nmreq_register structure. MFC after: 2 weeks	2019-03-18 12:22:23 +00:00
Vincenzo Maffione	352a2062c9	netmap: remove redundant call to nm_set_native_flags() This redundant call was introduced by mistake in r343772. MFC after: 3 days Sponsored by: Sunny Valley Networks	2019-02-25 09:57:06 +00:00
Vincenzo Maffione	45100257c6	netmap: don't schedule kqueue notify task when kqueue is not used This change adds a counter (kqueue_users) to keep track of how many kqueue users are referencing a given struct nm_selinfo. In this way, nm_os_selwakeup() can schedule the kevent notification task only when kqueue is actually being used. This is important to avoid wasting CPU in the common case where kqueue is not used. Reviewed by: Aleksandr Fedorov <aleksandr.fedorov@itglobal.com> MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D19177	2019-02-18 14:21:41 +00:00
Vincenzo Maffione	1ef2a88149	netmap: revert netmap_attach_ext() to pre-r343772 Reported by: marius MFC after: 1 week	2019-02-07 11:28:53 +00:00
Vincenzo Maffione	75f4f3ed51	netmap: refactor logging macros and pipes Changelist: - Replace ND, D and RD macros with nm_prdis, nm_prinf, nm_prerr and nm_prlim, to avoid possible naming conflicts. - Add netmap_krings_mode_commit() helper function and use that to reduce code duplication. - Refactor pipes control code to export some functions that can be reused by the veth driver (on Linux) and epair(4). - Add check to reject API requests with version less than 11. - Small code refactoring for the null adapter. MFC after: 1 week	2019-02-05 12:10:48 +00:00
Vincenzo Maffione	5faab77822	netmap: upgrade sync-kloop support Add SYNC_KLOOP_MODE option, and add support for direct mode, where application executes the TXSYNC and RXSYNC in the context of the ioeventfd wake up callback. MFC after: 5 days	2019-02-02 22:39:29 +00:00
Vincenzo Maffione	19c4ec08ad	netmap: fix lock order reversal related to kqueue usage When using poll(), select() or kevent() on netmap file descriptors, netmap executes the equivalent of NIOCTXSYNC and NIOCRXSYNC commands, before collecting the events that are ready. In other words, the poll/kevent callback has side effects. This is done to avoid the overhead of two system call per iteration (e.g., poll() + ioctl(NIOC*XSYNC)). When the kqueue subsystem invokes the kqueue(9) f_event callback (netmap_knrw), it holds the lock of the struct knlist object associated to the netmap port (the lock is provided at initialization, by calling knlist_init_mtx). However, netmap_knrw() may need to wake up another netmap port (or even the same one), which means that it may need to call knote(). Since knote() needs the lock of the struct knlist object associated to the to-be-wake-up netmap port, it is possible to have a lock order reversal problem (AB/BA deadlock). This change prevents the deadlock by executing the knote() call in a per-selinfo taskqueue, where it is possible to hold a mutex. Reviewed by: aleksandr.fedorov_itglobal.com MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D18956	2019-01-30 15:51:55 +00:00
Vincenzo Maffione	a56136a1ba	netmap: add notifications on kloop stop On sync-kloop stop, send a wake-up signal to the kloop, so that waiting for the timeout is not needed. Also, improve logging in netmap_freebsd.c. MFC after: 3 days	2019-01-29 10:28:50 +00:00
Vincenzo Maffione	aa4dd64dfe	netmap: fix crash with monitors and VALE ports Crash report described here: https://github.com/luigirizzo/netmap/issues/583 Fixed by providing dummy sync callback in case it is missing.	2019-01-24 22:09:26 +00:00
Vincenzo Maffione	f79ba6d75b	netmap: improvements to the netmap kloop (CSB mode) Changelist: - Add the proper memory barriers in the kloop ring processing functions. - Fix memory barriers usage in the user helpers (nm_sync_kloop_appl_write, nm_sync_kloop_appl_read). - Fix nm_kr_txempty() helper to look at rhead rather than rcur. This is important since the kloop can read a value of rcur which is ahead of the value of rhead (see explanation in nm_sync_kloop_appl_write) - Remove obsolete ptnetmap_guest_write_kring_csb() and ptnet_guest_read_kring_csb(), and update if_ptnet(4) to use those. - Prepare in advance the arguments for netmap_sync_kloop_[tr]x_ring(), to make the kloop faster. - Provide kernel and user implementation for nm_ldld_barrier() and nm_ldst_barrier() MFC after: 2 weeks	2019-01-23 14:51:36 +00:00
Vincenzo Maffione	8c9874f5b1	netmap: fix knote() argument to match the mutex state The nm_os_selwakeup function needs to call knote() to wake up kqueue(9) users. However, this function can be called from different code paths, with different lock requirements. This patch fixes the knote() call argument to match the relavant lock state. Also, comments have been updated to reflect current code. PR: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=219846 Reported by: Aleksandr Fedorov <aleksandr.fedorov@itglobal.com> Reviewed by: markj MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D18876	2019-01-23 14:21:23 +00:00
Vincenzo Maffione	58e185425a	netmap: fix txsync check in netmap poll To check if txsync can be skipped, it is necessary to look for unseen TX space. However, this means comparing ring->cur against ring->tail, rather than ring->head against ring->tail (like nm_ring_empty() does). This change also adds some more comments to explain the optimization performed at the beginning of netmap_poll(). MFC after: 3 days Sponsored by: Sunny Valley Networks	2018-12-22 16:23:42 +00:00
Vincenzo Maffione	e1ed1fbdea	netmap: fix bug in netmap_poll() optimization The bug was introduced by r339639, although it is present in the upstream netmap code since 2015. It is due to resetting the want_rx variable to POLLIN, rather than resetting it to POLLIN\|POLLRDNORM. It only affects select(), which uses POLLRDNORM. poll() is not affected, because it uses POLLIN. Also, it only affects FreeBSD, because Linux skips the optimization implemented by the piece of code where the bug occurs. MFC after: 3 days Sponsored by: Sunny Valley Networks	2018-12-22 15:15:45 +00:00
Vincenzo Maffione	77a2baf551	netmap: move buf_size validation code to its own function This code validates the netmap buf_size against the interface MTU and maximum descriptor size, to make sure the values are consistent. Moving this functionality to its own function is needed because this function is also called by Linux-specific code. MFC after: 3 days	2018-12-21 11:50:14 +00:00
Vincenzo Maffione	c52382bd40	netmap: pipes: make sure both ends use the same number of slots	2018-12-21 11:32:55 +00:00
Vincenzo Maffione	dde885de95	netmap: fix warning in netmap_kloop.c Reported by: markj MFC after: 3 days	2018-12-12 16:32:15 +00:00
Vincenzo Maffione	2605ddfce9	netmap: remove dead code obsoleted by iflib The iflib subsystem implements netmap support in a driver-independent way (sys/net/iflib.c). We can therefore remove the headers that used to implement netmap support for all the drivers now supported by iflib (em, igb, ixl, ixgbe, lem). MFC after: 1 week	2018-12-07 11:47:42 +00:00

1 2 3 4 5

233 Commits