Commit Graph

257 Commits

Author SHA1 Message Date
Vincenzo Maffione
3e3314a8b7 netmap: fix uint32_t overflow in pool size calculation
MFC after:	1 week
2021-09-26 13:56:33 +00:00
Vincenzo Maffione
6127ce9d91 netmap: monitor: support offsets in copy mode 2021-09-26 13:52:16 +00:00
Vincenzo Maffione
98399ab06f netmap: import changes from upstream
- make sure rings are disabled during resets
 - introduce netmap_update_hostrings_mode(), with support
   for multiple host rings
 - always initialize ni_bufs_head in netmap_if
      ni_bufs_head was not properly initialized when no external buffers were
      requestedx and contained the ni_bufs_head from the last request. This
      was causing spurious buffer frees when alternating between apps that
      used external buffers and apps that did not use them.
 - check na validitity under lock on detach
 - netmap_mem: fix leak on error path
 - nm_dispatch: fix compilation on Raspberry Pi

MFC after:	2 weeks
2021-08-22 09:31:05 +00:00
Vincenzo Maffione
f4a54f4333 netmap: use safer defaults for hwbuf_len
We must make sure that incoming packets will never overflow the netmap
buffers, even when the user is using the offset feature. In the typical
scenario, the netmap buffer is 2KiB and, with an MTU of 1500, there are
~500 bytes available for user offsets.

Unfortunately, some NICs accept incoming packets even when they are
larger then the MTU. This means that the only way to stop DMA from
overflowing the netmap buffers, when offsets are allowed, is to choose
a hardware buffer length which is smaller than the netmap buffer
length. For most NICs and for 2KiB netmap buffers, this means 1024
bytes, which is unconveniently small.

The current code will select the small hardware buf size even when
offsets are not     in use. The main purpose of this change is to
fix this bug by returning to the normal behavior for the no-offsets
case.

At the same time, the patch pushes the handling of the offset case
to the lower level driver code, so that it can be made NIC-specific
(in future patches).
2021-04-18 13:39:15 +00:00
Cy Schubert
b51f459a20 wpa: Import wpa_supplicant/hostapd commit f91680c15
This is the April update to vendor/wpa committed upstream
2021/04/07.

This is MFV efec822389.

Suggested by:		philip
Reviewed by:		philip
MFC after:		2 months
Differential Revision:	https://reviews.freebsd.org/D29744
2021-04-17 07:21:12 -07:00
Vincenzo Maffione
13c4641188 netmap: make sure rings are disabled during resets
Explicitly disable ring synchronization before calling
callbacks that may result in a hardware reset.

Before this patch we relied on capturing the down/up events which,
however, may not be issued by all drivers.
2021-04-17 14:02:47 +00:00
Vincenzo Maffione
70275a6735 netmap: don't use linux type struct device *
Such type cannot be used in code that is in common between
FreeBSD and Linux. Use the FreeBSD type instead.

MFC after:	3 days
Reported by:	markj
Differential Revision:	https://reviews.freebsd.org/D29677
2021-04-11 21:13:01 +00:00
Vincenzo Maffione
172c5eb272 netmap: vtnet: remove unused variable
Reported by:	bdragon
2021-04-09 19:33:41 +00:00
Vincenzo Maffione
15dc713ceb netmap: vtnet: add support for netmap offsets
Follow-up change to a6d768d845.
This change adds support for netmap offsets.
2021-04-07 21:32:20 +00:00
Vincenzo Maffione
45c67e8f6b netmap: several typo fixes
No functional changes intended.
2021-04-02 07:01:20 +00:00
Vincenzo Maffione
66671ae589 netmap: fix typo bug in netmap_compute_buf_len 2021-04-02 06:47:28 +00:00
Vincenzo Maffione
660a47cb99 netmap: monitor: add a flag to distinguish packet direction
The netmap monitor intercepts any TX/RX packets on the monitored
port. However, before this change there was no way to tell
whether an intercepted packet was being transmitted or received
on the monitored port.
A TXMON flag in the netmap slot has been added for this purpose.
2021-03-29 16:32:54 +00:00
Vincenzo Maffione
a6d768d845 netmap: add kernel support for the "offsets" feature
This feature enables applications to ask netmap to transmit or
receive packets starting at a user-specified offset from the
beginning of the netmap buffer. This is meant to ease those
packet manipulation operations such as pushing or popping packet
headers, that may be useful to implement software switches,
routers and other packet processors.
To use the feature, drivers (e.g., iflib, vtnet, etc.) must have
explicit support. This change does not add support for any driver,
but introduces the necessary kernel changes. However, offsets support
is already included for VALE ports and pipes.
2021-03-29 16:29:01 +00:00
Vincenzo Maffione
ee7ffaa2e6 netmap: fix issues in nm_os_extmem_create()
- Call vm_object_reference() before vm_map_lookup_done().
- Use vm_mmap_to_errno() to convert vm_map_* return values to errno.
- Fix memory leak of e->obj.

Reported by:	markj
Reviewed by:	markj
MFC after:	1 week
2021-03-20 17:15:50 +00:00
Vincenzo Maffione
0ab5902e8a netmap: fix memory leak in NETMAP_REQ_PORT_INFO_GET
The netmap_ioctl() function has a reference counting bug in case of
NETMAP_REQ_PORT_INFO_GET command. When `hdr->nr_name[0] == '\0'`,
the function does not decrease the refcount of "nmd", which is
increased by netmap_mem_find(), causing a refcount leak.

Reported by:	Xiyu Yang <sherllyyang00@gmail.com>
Submitted by:	Carl Smith <carl.smith@alliedtelesis.co.nz>
MFC after: 3 days
PR:	254311
2021-03-15 17:39:18 +00:00
Mark Johnston
fef8450971 netmap: Stop printing a line to the dmesg in netmap_init()
netmap is compiled into the kernel by default so initialization was
always reported, and netmap uses a formatting convention not used in the
rest of the kernel.

Reviewed by:	vmaffione
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D29099
2021-03-05 18:07:47 -05:00
Vincenzo Maffione
ee0005f11f netmap: simplify parameter passing
Changes imported from the netmap github.
2021-01-24 21:59:02 +00:00
Vincenzo Maffione
3005e10ddb netmap: vtnet: fix RX initialization after netmap_reset()
At device reset, we must not publish those netmap receive buffers
that are owned by userspace (nm_kr_rxspace).

MFC after:	1 week
2021-01-11 21:38:32 +00:00
Vincenzo Maffione
55f0ad5fde netmap: restore hwofs and support it in iflib
Restore the hwofs functionality temporarily disabled by
7ba6ecf216 to prevent issues with iflib.
This patch brings the necessary changes to iflib to
enable howfs to allow interface restarts without
disrupting netmap applications actively using its
rings.
After this change, it becomes possible for multiple
non-cooperating netmap applications to use non-overlapping
subsets of the available netmap rings without clashing
with each other.

PR:		252453
MFC after:	1 week
2021-01-10 22:51:15 +00:00
Vincenzo Maffione
bb714db6d3 netmap: vtnet: enable/disable krings on any interface reinit
See 3d65fd97e8 for a detailed explanation.

PR:             252453
MFC after:      1 week
2021-01-10 14:10:09 +00:00
Vincenzo Maffione
9ac59d42c0 netmap: vtnet: stop krings during interface reset
Similarly to what done for iflib in 1d238b07d5,
this patch prevents access to the krings during the interface
reset triggered by netmap_register().

MFC after:	1 week
2021-01-09 22:34:52 +00:00
Vincenzo Maffione
7ba6ecf216 netmap: refactor netmap_reset
The netmap_reset() function is meant to be called by the driver
when they initialize (or re-initialize) a hardware ring.
However, since the introduction of support for opening (in
netmap mode) a subset of the available rings, netmap_reset()
may be called multiple times on actively used rings, causing
both kring and netmap ring to transition to an inconsistent
state.
This changes improves the situation by resetting all the
indices fields of the kring to 0, as expected after the
reinitialization of a hardware ring.

PR:	    252518
MFC after:  1 week
2021-01-09 22:07:24 +00:00
Vincenzo Maffione
1d238b07d5 netmap: iflib: stop krings during interface reset
When different processes open separate subsets of the
available rings of a same netmap interface, a device
reset may be performed while one of the processes
is actively using some rings (e.g., caused by another
process executing a nmport_open()).
With this patch, such situation will cause the
active process to get a POLLERR, so that it can
have a chance to detect the situation.
We also guarantee that no process is running a txsync
or rxsync (ioctl or poll) while an iflib device reset
is in progress.

PR:	    252453
MFC after:  1 week
2021-01-09 21:01:46 +00:00
Vincenzo Maffione
174f809da5 netmap: fix mutex double unlock bug
https://github.com/luigirizzo/netmap/pull/733

Submitted by:	 brian90013
MFC after:	3 days
2020-10-22 20:21:11 +00:00
Vincenzo Maffione
b7d6913862 netmap: use FreeBSD guards for epoch calls
EPOCH calls are FreeBSD specific. Use guards to protect these, so
that the code can compile under Linux.

MFC after:	1 week
2020-08-24 20:28:21 +00:00
Vincenzo Maffione
ff48ef48ac netmap: fix parsing of legacy nmr->nr_ringid
Code was checking for NETMAP_{SW,HW}_RING in req->nr_ringid which
had already been masked by NETMAP_RING_MASK. Therefore, the comparisons
always failed and set NR_REG_ALL_NIC. Check against the original nmr
structure.

Submitted by:	bpoole@packetforensics.com
Reported by:	bpoole@packetforensics.com
Reviewed by:	giuseppe.lettieri@unipi.it
Approved by:	vmaffione
MFC after:	1 week
2020-08-18 08:03:28 +00:00
Vincenzo Maffione
16f224b5f8 netmap: vtnet: fix races in vtnet_netmap_reg()
The nm_register callback needs to call nm_set_native_flags()
or nm_clear_native_flags() once the device has been stopped.
However, in the current implementation this is not true,
as the device is stopped by vtnet_init_locked(). This causes
race conditions where the driver crashes as soon as it
dequeues netmap buffers assuming they are mbufs (or the other
way around).
To fix the issue, we extend vtnet_init_locked() with a second
argument that, if not zero, will set/clear the netmap flags.
This results in a huge simplification of the nm_register
callback itself.
Also, use netmap_reset() to check if a ring is going to be
re-initialized in netmap mode.

MFC after:	1 week
2020-06-14 20:47:31 +00:00
Vincenzo Maffione
6682323732 netmap: introduce netmap_kring_on()
This function returns NULL if the ring identified by
queue id and direction is in netmap mode. Otherwise
return the corresponding kring.
Use this function to replace vtnet_netmap_queue_on().

MFC after:	1 week
2020-06-11 20:35:28 +00:00
Vincenzo Maffione
e8c07b1246 netmap: vtnet: clean up rxsync disabled logs
MFC after:	1 week
2020-06-03 17:47:32 +00:00
Vincenzo Maffione
1b6d5a80a6 netmap: vtnet: fix race condition in rxsync
This change prevents a race that happens when rxsync dequeues
N-1 rx packets (with N being the size of the netmap rx ring).
In this situation, the loop exits without re-enabling the
rx interrupts, thus causing the VQ to stall.

MFC after:	1 week
2020-06-03 17:46:21 +00:00
Vincenzo Maffione
2d769e25b1 netmap: vtnet: add vtnrx_nm_refill index to receive queues
The new index tracks the next netmap slot that is going
to be enqueued into the virtqueue. The index is necessary
to prevent the receive VQ and the netmap rx ring from going
out of sync, considering that we never enqueue N slots, but
at most N-1. This change fixes a bug that causes the VQ
and the netmap ring to go out of sync after N-1 packets
have been received.

MFC after:	1 week
2020-06-03 17:42:17 +00:00
Vincenzo Maffione
06f6997eb5 netmap: vale: fix disabled logs
MFC after:	1 week
2020-06-03 05:49:19 +00:00
Vincenzo Maffione
81d2cade1c netmap: vtnet: remove leftover memory barriers
MFC after:	1 week
2020-06-03 05:48:42 +00:00
Vincenzo Maffione
9ec71596c0 netmap: if_vtnet: avoid netmap ring wraparound
netmap assumes the one "slot" is left unused to distinguish
the empty ring and full ring conditions. This assumption was
violated by vtnet_netmap_rxq_populate().

MFC after:	1 week
2020-06-01 16:14:29 +00:00
Vincenzo Maffione
36f2d67026 netmap: if_vtnet: replace vtnet_free_used()
The functionality contained in this function is duplicated,
as it is already available in vtnet_txq_free_mbufs()
and vtnet_rxq_free_mbufs().

MFC after:	1 week
2020-06-01 16:12:09 +00:00
Vincenzo Maffione
c9de157d36 netmap: vtnet: fix RX virtqueue initialization bug
The vtnet_netmap_rxq_populate() function erroneously assumed
that kring->nr_hwcur = 0, i.e. the kring was in the initial
state. However, this is not always the case: for example,
when a vtnet reinit is triggered by some changes in the
interface flags or capenable.
This patch changes the behaviour of vtnet_netmap_kring_refill()
so that it always starts publishing the netmap buffers starting
from the current value of kring->nr_hwcur.

MFC after:	1 week
2020-06-01 16:10:44 +00:00
Pawel Biernacki
7029da5c36 Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (17 of many)
r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are
still not MPSAFE (or already are but aren’t properly marked).
Use it in preparation for a general review of all nodes.

This is non-functional change that adds annotations to SYSCTL_NODE and
SYSCTL_PROC nodes using one of the soon-to-be-required flags.

Mark all obvious cases as MPSAFE.  All entries that haven't been marked
as MPSAFE before are by default marked as NEEDGIANT

Approved by:	kib (mentor, blanket)
Commented by:	kib, gallatin, melifaro
Differential Revision:	https://reviews.freebsd.org/D23718
2020-02-26 14:26:36 +00:00
Gleb Smirnoff
6c3e93cb5a Use NET_TASK_INIT() and NET_GROUPTASK_INIT() for drivers that process
incoming packets in taskqueue context.

Reviewed by:	hselasky
Differential Revision:	https://reviews.freebsd.org/D23518
2020-02-11 18:57:07 +00:00
Vincenzo Maffione
723180da59 netmap: improve netmap(4) and vale(4) man pages
Clean up obsolete sysctl descriptions and add missing ones.

PR:		243838
Reviewed by:	bcr
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D23546
2020-02-07 19:26:26 +00:00
Vincenzo Maffione
de27b30340 netmap_mem_unmap: fix NULL pointer dereference
MFC after:	3 days
2020-01-26 21:34:46 +00:00
Gleb Smirnoff
a44700782e In netmap() call ether_input() within the network epoch. 2020-01-23 01:35:02 +00:00
Vincenzo Maffione
2ec213aba4 netmap: disable passthrough with no hypervisor support
The netmap passthrough subsystem requires proper support in the
hypervisor. In particular, two PCI device ids (from the Red Hat
PCI vendor id 0x1b36) need to be assigned to the two netmap
virtual devices. We then disable these devices until the ids have
not been assigned, in order to avoid conflicts with other
virtual devices emulated by upstream QEMU.

PR:	241774
MFC after:	3 days
2020-01-13 21:47:23 +00:00
Jeff Roberson
3cf3b4e641 Make page busy state deterministic on free. Pages must be xbusy when
removed from objects including calls to free.  Pages must not be xbusy
when freed and not on an object.  Strengthen assertions to match these
expectations.  In practice very little code had to change busy handling
to meet these rules but we can now make stronger guarantees to busy
holders and avoid conditionally dropping busy in free.

Refine vm_page_remove() and vm_page_replace() semantics now that we have
stronger guarantees about busy state.  This removes redundant and
potentially problematic code that has proliferated.

Discussed with:	markj
Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D22822
2019-12-22 06:56:44 +00:00
Vincenzo Maffione
c7c7805531 add valectl to the system commands
The valectl(4) program is used to manage vale(4) switches.
Add it to the system commands so that it can be used right away.
This program was previously called vale-ctl, and stored in
tools/tools/netmap

Reviewed by:	hrs, bcr, lwhsu, kevans
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D22146
2019-10-31 21:01:34 +00:00
Vincenzo Maffione
484456b2d8 netmap: enter NET_EPOCH on generic txsync
After r353292, netmap generic adapter on if_vlan interfaces panics on
asserting the NET_EPOCH. In more detail, this happens when
nm_os_generic_xmit_frame() is called, that is in the generic txsync
routine.
Fix the issue by entering the NET_EPOCH during the generic txsync.
We amortize the cost of entering/exiting over a whole batch of
transmissions.

PR:		241489
Reported by:	Aleksandr Fedorov <aleksandr.fedorov@itglobal.com>
2019-10-28 19:00:27 +00:00
Vincenzo Maffione
760fa2ab5d netmap: minor misc improvements
- use ring->head rather than ring->cur in lb(8)
 - use strlcat() rather than strncat()
 - fix bandwidth computation in pkt-gen(8)

MFC after:	1 week
2019-10-20 14:15:45 +00:00
Vincenzo Maffione
f8bc74e2f4 tap: add support for virtio-net offloads
This patch is part of an effort to make bhyve networking (in particular TCP)
faster. The key strategy to enhance TCP throughput is to let the whole packet
datapath work with TSO/LRO packets (up to 64KB each), so that the per-packet
overhead is amortized over a large number of bytes.
This capability is supported in the guest by means of the vtnet(4) driver,
which is able to handle TSO/LRO packets leveraging the virtio-net header
(see struct virtio_net_hdr and struct virtio_net_hdr_mrg_rxbuf).
A bhyve VM exchanges packets with the host through a network backend,
which can be vale(4) or if_tap(4).
While vale(4) supports TSO/LRO packets, if_tap(4) does not.
This patch extends if_tap(4) with the ability to understand the virtio-net
header, so that a tapX interface can process TSO/LRO packets.
A couple of ioctl commands have been added to configure and probe the
virtio-net header. Once the virtio-net header is set, the tapX interface
acquires all the IFCAP capabilities necessary for TSO/LRO.

Reviewed by:	kevans
Differential Revision:	https://reviews.freebsd.org/D21263
2019-10-18 21:53:27 +00:00
Jeff Roberson
0012f373e4 (4/6) Protect page valid with the busy lock.
Atomics are used for page busy and valid state when the shared busy is
held.  The details of the locking protocol and valid and dirty
synchronization are in the updated vm_page.h comments.

Reviewed by:    kib, markj
Tested by:      pho
Sponsored by:   Netflix, Intel
Differential Revision:        https://reviews.freebsd.org/D21594
2019-10-15 03:45:41 +00:00
Mark Johnston
fee2a2fa39 Change synchonization rules for vm_page reference counting.
There are several mechanisms by which a vm_page reference is held,
preventing the page from being freed back to the page allocator.  In
particular, holding the page's object lock is sufficient to prevent the
page from being freed; holding the busy lock or a wiring is sufficent as
well.  These references are protected by the page lock, which must
therefore be acquired for many per-page operations.  This results in
false sharing since the page locks are external to the vm_page
structures themselves and each lock protects multiple structures.

Transition to using an atomically updated per-page reference counter.
The object's reference is counted using a flag bit in the counter.  A
second flag bit is used to atomically block new references via
pmap_extract_and_hold() while removing managed mappings of a page.
Thus, the reference count of a page is guaranteed not to increase if the
page is unbusied, unmapped, and the object's write lock is held.  As
a consequence of this, the page lock no longer protects a page's
identity; operations which move pages between objects are now
synchronized solely by the objects' locks.

The vm_page_wire() and vm_page_unwire() KPIs are changed.  The former
requires that either the object lock or the busy lock is held.  The
latter no longer has a return value and may free the page if it releases
the last reference to that page.  vm_page_unwire_noq() behaves the same
as before; the caller is responsible for checking its return value and
freeing or enqueuing the page as appropriate.  vm_page_wire_mapped() is
introduced for use in pmap_extract_and_hold().  It fails if the page is
concurrently being unmapped, typically triggering a fallback to the
fault handler.  vm_page_wire() no longer requires the page lock and
vm_page_unwire() now internally acquires the page lock when releasing
the last wiring of a page (since the page lock still protects a page's
queue state).  In particular, synchronization details are no longer
leaked into the caller.

The change excises the page lock from several frequently executed code
paths.  In particular, vm_object_terminate() no longer bounces between
page locks as it releases an object's pages, and direct I/O and
sendfile(SF_NOCACHE) completions no longer require the page lock.  In
these latter cases we now get linear scalability in the common scenario
where different threads are operating on different files.

__FreeBSD_version is bumped.  The DRM ports have been updated to
accomodate the KPI changes.

Reviewed by:	jeff (earlier version)
Tested by:	gallatin (earlier version), pho
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D20486
2019-09-09 21:32:42 +00:00
Vincenzo Maffione
253b2ec199 netmap: import changes from upstream (SHA 137f537eae513)
- Rework option processing.
 - Use larger integers for memory size values in the
   memory management code.

MFC after:	2 weeks
2019-09-01 14:47:41 +00:00