Commit Graph

107 Commits

Author SHA1 Message Date
Alexander V. Chernikov
4044af03a4 Fix vtnet TCP lro panic
Differential Revision: https://reviews.freebsd.org/D29900
Reviewed by:	hps, kp
2021-04-19 17:06:34 +01:00
Michael Tuexen
d4697a6b56 vtnet: fix TSO for TCP/IPv6
The decision whether a TCP packet is sent over IPv4 or IPv6 was
based on ethertype, which works correctly. In D27926 the criteria
was changed to checking if the CSUM_IP_TSO flag is set in the
csum-flags and then considering it to be TCP/IPv4.
However, the TCP stack sets the flag to CSUM_TSO for IPv4 and IPv6,
where CSUM_TSO is defined as CSUM_IP_TSO|CSUM_IP6_TSO.
Therefore TCP/IPv6 packets gets mis-classified as TCP/IPv4,
which breaks TSO for TCP/IPv6.
This patch bases the check again on the ethertype.
This fix will be MFC instantly as discussed with re(gjb).

MFC after:		instantly
PR:			254366
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D29331
2021-03-18 21:32:20 +01:00
Alex Richardson
c1b554c868 if_vtnet: Fix pointer-sign and used parameter warnings
Reviewed By:	grehan
Differential Revision: https://reviews.freebsd.org/D28726
2021-02-22 17:41:04 +00:00
Jessica Clarke
633218ee46 virtio: Reduce boilerplate for device driver module definitions
Rather than have every device register itself for both virtio_pci and
virtio_mmio, provide a VIRTIO_DRIVER_MODULE wrapper to declare both,
merge VIRTIO_SIMPLE_PNPTABLE with VIRTIO_SIMPLE_PNPINFO and make the
latter register for both buses. This also has the benefit of abstracting
away the available transports and their names.

Reviewed by:	bryanv
Differential Revision:	https://reviews.freebsd.org/D28073
2021-01-21 01:07:23 +00:00
Bryan Venteicher
e6cc42f181 virtio: Handle possible failure of virtio_finalize_features()
Try to standardize how drivers negotiate feature and the
function names

Reviewed by: grehan (mentor)
Differential Revision: https://reviews.freebsd.org/D27930
2021-01-19 04:55:26 +00:00
Bryan Venteicher
2bfab35774 if_vtnet: Add counter for received host LRO
Reviewed by: grehan (mentor)
Differential Revision: https://reviews.freebsd.org/D27928
2021-01-19 04:55:26 +00:00
Bryan Venteicher
475a60aec7 if_vtnet: Misc Tx path cleanup
- Add and fix a few error path counters
  - Improve sysctl descriptions
  - Use flags consistently to determine IPv4 vs IPv6

Reviewed by: grehan (mentor)
Differential Revision: https://reviews.freebsd.org/D27926
2021-01-19 04:55:26 +00:00
Bryan Venteicher
6b53aeed91 if_vtnet: Set lro_nsegs for host LRO packets
Reviewed by: grehan (mentor)
Differential Revision: https://reviews.freebsd.org/D27933
2021-01-19 04:55:25 +00:00
Bryan Venteicher
74cd316a09 if_vtnet: Resort softc fields
Reviewed by: grehan (mentor)
Differential Revision: https://reviews.freebsd.org/D27925
2021-01-19 04:55:25 +00:00
Bryan Venteicher
33b5433fd7 if_vtnet: Remove unnecessary TUNABLE_INTs because of CTLFLAG_RDTUN
Reviewed by: grehan (mentor)
Differential Revision: https://reviews.freebsd.org/D27923
2021-01-19 04:55:25 +00:00
Bryan Venteicher
4f18e23f84 if_vtnet: Schedule Rx task if pending items when enabling interrupt
Prior to V1, the driver would enable interrupts and then notify the
host that DRIVER_OK. Since for V1, DRIVER_OK needs to be set before
notifying the virtqueues, there may be items in the queues waiting
to be processed by the time interrupts are enabled.

This fixes a bug where the Rx queue would appear stuck, only being
usable after an interface down/up cycle.

Reviewed by: grehan (mentor)
Differential Revision: https://reviews.freebsd.org/D27922
2021-01-19 04:55:25 +00:00
Bryan Venteicher
c3187190c7 if_vtnet: Disable F_MTU feature if MTU is invalid
Reviewed by: grehan (mentor)
Differential Revision: https://reviews.freebsd.org/D27931
2021-01-19 04:55:25 +00:00
Bryan Venteicher
bd8809df20 if_vtnet: Limit allocations of unused virtqueues
For multiqueue, we may use fewer than the provided maximum number of
queues. Try to limit allocations of the unused queues: no interrupts,
no indirect descriptors, and no taskqueues.

Reviewed by: grehan (mentor)
Differential Revision: https://reviews.freebsd.org/D27921
2021-01-19 04:55:25 +00:00
Bryan Venteicher
b470419ea5 if_vtnet: Rework 4be723f63 max multiqueue pairs check
Verify the max_virtqueue_pairs is within the range allowed by
the spec.

Reviewed by: grehan (mentor)
Differential Revision: https://reviews.freebsd.org/D27920
2021-01-19 04:55:25 +00:00
Bryan Venteicher
42343a6316 if_vtnet: Add support for software LRO
This useful when running on hosts that support checksum offloading
but not the GUEST_TSO (LRO) feature. Or potentially, some GRO-like
support when doing forwarding.

Only enable SW LRO when the host LRO is not available since both
tends to be harmful, and difficult to enable/disable selectively
with only a single IFCAP_LRO flag.

Reviewed by: grehan (mentor)
Differential Revision: https://reviews.freebsd.org/D27919
2021-01-19 04:55:25 +00:00
Bryan Venteicher
177761e4c4 if_vtnet: Set the interface max TSO values
Reviewed by: grehan (mentor)
Differential Revision: https://reviews.freebsd.org/D27917
2021-01-19 04:55:25 +00:00
Bryan Venteicher
e36a6b1b1f if_vtnet: Add support for CTRL_GUEST_OFFLOADS feature
This allows the Rx checksum and LRO to be modified without a full
reinit of the device.

Remove IFCAP_RXCSUM_IPV6 from the interface capabilities since in
VirtIO Rx checksums are just enabled or disabled for all protocols.

Properly update IFCAP_LRO if LRO is becomes disabled when Rx
checksums are disabled.

Reviewed by: grehan (mentor)
Differential Revision: https://reviews.freebsd.org/D27916
2021-01-19 04:55:25 +00:00
Bryan Venteicher
dc9029d863 if_vtnet: Move ioctl handlers into separate functions
Reviewed by: grehan (mentor)
Differential Revision: https://reviews.freebsd.org/D27914 https://reviews.freebsd.org/D27915
2021-01-19 04:55:25 +00:00
Bryan Venteicher
44559b26af if_vtnet: Cleanup the reinit process
In modern VirtIO, the virtqueues cannot be notified before setting
DRIVER_OK status.

Reviewed by: grehan (mentor)
Differential Revision: https://reviews.freebsd.org/D27932
2021-01-19 04:55:25 +00:00
Bryan Venteicher
32e0493c92 if_vtnet: Cleanup the interface setup methods
Defer the ether_ifattach until the interface capabilities
are configured

Reviewed by: grehan (mentor)
Differential Revision: https://reviews.freebsd.org/D27913
2021-01-19 04:55:25 +00:00
Bryan Venteicher
2520cd3821 if_vtnet: Only set IFCAP_JUMBO_MTU when jumbo MTU is supported
Reviewed by: grehan (mentor)
Differential Revision: https://reviews.freebsd.org/D27912
2021-01-19 04:55:24 +00:00
Bryan Venteicher
baa5234fbe if_vtnet: Move the Tx interrupt threshold into the Txq structure
Reviewed by: grehan (mentor)
Differential Revision: https://reviews.freebsd.org/D27911
2021-01-19 04:55:24 +00:00
Bryan Venteicher
05041794d0 if_vtnet: Defer updating generated MAC address until attached
This improves spec compliance because the driver is not suppose
to notify the device prior to setting the DRIVER_OK status, which
could happen with the VIRTIO_NET_F_CTRL_MAC_ADDR.

The VIRTIO_NET_F_MAC feature should always be negotiated so would
be a rare situation.

Reviewed by: grehan (mentor)
Differential Revision: https://reviews.freebsd.org/D27910
2021-01-19 04:55:24 +00:00
Bryan Venteicher
25dbc30ef5 if_vtnet: Remove at attach PROMISC handling
This may have been required in an early, early, early version of the
specification but I cannot find any reference to it, and a promiscuous
default seems very odd so remove this code.

Reviewed by: grehan (mentor)
Differential Revision: https://reviews.freebsd.org/D27909
2021-01-19 04:55:24 +00:00
Bryan Venteicher
6a73339365 if_vtnet: Support VIRTIO_NET_F_SPEED_DUPLEX
This features lets the guest driver know the speed and duplex of
the "link". Instead of trying to support many media types based
on the possible/likely speeds/duplexes, only use the speed to
set the interface baudrate.

Cleanup ifmedia code to match other drivers.

Reviewed by: grehan (mentor)
Differential Revision: https://reviews.freebsd.org/D27908
2021-01-19 04:55:24 +00:00
Bryan Venteicher
aabdf5b6e8 if_vtnet: Support VIRTIO_NET_F_MTU
This feature lets the guest driver know the maximum MTU size
supported by the host device. If set, use this to limit the
acceptable MTUs, and improve how the receive mbuf cluster size
then is selected.

Reviewed by: grehan (mentor)
Differential Revision: https://reviews.freebsd.org/D27907
2021-01-19 04:55:24 +00:00
Bryan Venteicher
fa7ca1e332 if_vtnet: Rx path cleanup
- Fix the NEEDS_CSUM and DATA_VALID checksum flags. The NEEDS_CSUM
    checksum is incomplete (partial) so offer a fallback for the driver
    to calculate the checksum. Simplify DATA_VALID because we know
    the host has validated the checksum.

  - Default 4K mbuf clusters for mergeable buffers. May need to
    scale this down to 2K clusters in certain configurations such
    many queue pairs, big queues (like 4096 in GCP), and low memory.

  - Use the MTU when calculated the receive mbuf cluster size
    when not doing TSO/LRO. This will need more adjustment once
    the MTU feature is supported.

Reviewed by: grehan (mentor)
Differential Revision: https://reviews.freebsd.org/D27906
2021-01-19 04:55:24 +00:00
Bryan Venteicher
5e22081177 if_vtnet: Add initial modern (V1) support
Very basic support to get packets flowing on modern QEMU but still
several conformance issues remain that will be addressed in later
commits.

First of many passes at cleaning up various accumulated cruft

Reviewed by: grehan (mentor)
Differential Revision: https://reviews.freebsd.org/D27904
2021-01-19 04:55:24 +00:00
Bryan Venteicher
1cd1ed3f5d Revert: virtio: Support non-legacy network device and queue
And subsequent fix 576b099a.

By adding the mergable header to the vtnet_rx_header structure, the size
was increased by 2 bytes, breaking the alignment of this structure as
described the in preceding comments.

Furthermore, the mergable header does not belong the structure. With the
mergable feature, the header is placed in line with the data, so there is
no need for a separate segment, and misleading to follow the mergable
header with any padding.

The V1 header is effectively identical to mergable header, and the driver
has long supported the mergable feature. Revert this so the later changes
that add V1 support can show how V1 is derived from the existing mergable
buffers support, and to facilitate a later MFC.

Reviewed by: grehan (mentor)
Differential Revision: https://reviews.freebsd.org/D27855
2021-01-19 04:55:23 +00:00
Vincenzo Maffione
bb714db6d3 netmap: vtnet: enable/disable krings on any interface reinit
See 3d65fd97e8 for a detailed explanation.

PR:             252453
MFC after:      1 week
2021-01-10 14:10:09 +00:00
Mateusz Guzik
068dbf361a virtio: clean up empty lines in .c and .h files 2020-09-01 21:31:26 +00:00
Vincenzo Maffione
ef6fdb3312 if_vtnet: let vtnet_rx_vq_intr() and vtnet_rxq_tq_intr() share code
Since the two functions are similar, introduce a common function
(vtnet_rx_vq_process()) to share common code.
This also improves locking, by ensuring vrxs_rescheduled is accessed
under the RXQ lock, and taskqueue_enqueue() is not called under the
lock (therefore avoiding a spurious duplicate lock warning).

Reported by:	jrtc27
MFC after:	2 weeks
2020-06-15 19:46:34 +00:00
Jessica Clarke
576b099a5f vtnet: Fix regression introduced in r361944
For legacy devices that don't support MrgRxBuf (such as bhyve pre-r358180),
r361944 failed to update the receive handler to account for the additional
padding introduced by the unused num_buffers field that is now always present
in struct vtnet_rx_header. Thus, calculate the padding dynamically based on
vtnet_hdr_size.

PR:		247242
Reported by:	thj
Tested by:	thj
2020-06-14 22:39:34 +00:00
Vincenzo Maffione
16f224b5f8 netmap: vtnet: fix races in vtnet_netmap_reg()
The nm_register callback needs to call nm_set_native_flags()
or nm_clear_native_flags() once the device has been stopped.
However, in the current implementation this is not true,
as the device is stopped by vtnet_init_locked(). This causes
race conditions where the driver crashes as soon as it
dequeues netmap buffers assuming they are mbufs (or the other
way around).
To fix the issue, we extend vtnet_init_locked() with a second
argument that, if not zero, will set/clear the netmap flags.
This results in a huge simplification of the nm_register
callback itself.
Also, use netmap_reset() to check if a ring is going to be
re-initialized in netmap mode.

MFC after:	1 week
2020-06-14 20:47:31 +00:00
Vincenzo Maffione
6682323732 netmap: introduce netmap_kring_on()
This function returns NULL if the ring identified by
queue id and direction is in netmap mode. Otherwise
return the corresponding kring.
Use this function to replace vtnet_netmap_queue_on().

MFC after:	1 week
2020-06-11 20:35:28 +00:00
Jessica Clarke
8c3988dff9 virtio: Support non-legacy network device and queue
The non-legacy interface always defines num_buffers in the header,
regardless of whether VIRTIO_NET_F_MRG_RXBUF, just leaving it unused. We
also need to ensure our virtqueue doesn't filter out VIRTIO_F_VERSION_1
during negotiation, as it supports non-legacy transports just fine. This
fixes network packet transmission on TinyEMU.

Reviewed by:	br, brooks (mentor), jhb (mentor)
Approved by:	br, brooks (mentor), jhb (mentor)
Differential Revision:	https://reviews.freebsd.org/D25132
2020-06-08 21:51:36 +00:00
Vincenzo Maffione
2d769e25b1 netmap: vtnet: add vtnrx_nm_refill index to receive queues
The new index tracks the next netmap slot that is going
to be enqueued into the virtqueue. The index is necessary
to prevent the receive VQ and the netmap rx ring from going
out of sync, considering that we never enqueue N slots, but
at most N-1. This change fixes a bug that causes the VQ
and the netmap ring to go out of sync after N-1 packets
have been received.

MFC after:	1 week
2020-06-03 17:42:17 +00:00
Vincenzo Maffione
f0d8d352c0 netmap: vtnet: call netmap_rx_irq() under VQ lock
The netmap_rx_irq() function normally wakes up user-space threads
waiting for more packets. In this case, it is not necessary to
call it under the driver queue lock. However, if the interface is
attached to a VALE switch, netmap_rx_irq() ends up calling rxsync
on the interface (see netmap_bwrap_intr_notify()). Although
concurrent rxsyncs are serialized through the kring lock
(see nm_kr_tryget()), the lock acquire operation is not blocking.
As a result, it may happen that netmap_rx_irq() is called on
an RX ring while another instance is running, causing the
second call to fail, and received packets stall in the receive VQ.
We fix this issue by calling netmap_irx_irq() under the VQ lock.

MFC after:	1 week
2020-06-03 05:27:29 +00:00
Vincenzo Maffione
1b89d00bd4 netmap: vtnet: honor NM_IRQ_RESCHED
The netmap_rx_irq() function may return NM_IRQ_RESCHED to inform the
driver that more work is pending, and that netmap expects netmap_rx_irq()
to be called again as soon as possible.
This change implements this behaviour in the vtnet driver.

MFC after:	1 week
2020-06-03 05:09:33 +00:00
Pawel Biernacki
7029da5c36 Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (17 of many)
r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are
still not MPSAFE (or already are but aren’t properly marked).
Use it in preparation for a general review of all nodes.

This is non-functional change that adds annotations to SYSCTL_NODE and
SYSCTL_PROC nodes using one of the soon-to-be-required flags.

Mark all obvious cases as MPSAFE.  All entries that haven't been marked
as MPSAFE before are by default marked as NEEDGIANT

Approved by:	kib (mentor, blanket)
Commented by:	kib, gallatin, melifaro
Differential Revision:	https://reviews.freebsd.org/D23718
2020-02-26 14:26:36 +00:00
Gleb Smirnoff
e87c494015 Although most of the NIC drivers are epoch ready, due to peer pressure
switch over to opt-in instead of opt-out for epoch.

Instead of IFF_NEEDSEPOCH, provide IFF_KNOWSEPOCH. If driver marks
itself with IFF_KNOWSEPOCH, then ether_input() would not enter epoch
when processing its packets.

Now this will create recursive entrance in epoch in >90% network
drivers, but will guarantee safeness of the transition.

Mark several tested drivers as IFF_KNOWSEPOCH.

Reviewed by:		hselasky, jeff, bz, gallatin
Differential Revision:	https://reviews.freebsd.org/D23674
2020-02-24 21:07:30 +00:00
Gleb Smirnoff
6c3e93cb5a Use NET_TASK_INIT() and NET_GROUPTASK_INIT() for drivers that process
incoming packets in taskqueue context.

Reviewed by:	hselasky
Differential Revision:	https://reviews.freebsd.org/D23518
2020-02-11 18:57:07 +00:00
Gleb Smirnoff
629667a148 Pacify gcc.
Reported by:	rlibby
2020-01-11 20:07:30 +00:00
Gleb Smirnoff
ed6cbf4805 Add pfil(9) hook to vtnet(4).
The patch could be simplier, using only the second chunk to
vtnet_rxq_eof(), that passes full mbufs to pfil(9). Packet
filter would m_free() them in case of returning PFIL_DROPPED.

However, we pretend to be a hardware driver, so we first try
to pass a memory buffer via PFIL_MEMPTR feature. This is mostly
done for debugging purposes, so that one can experiment in bhyve
with packet filters utilizing same features as a true driver.
2020-01-10 21:22:03 +00:00
Kristof Provost
29bfe2102d vtnet: Pre-allocate debugnet data immediately
Don't wait until the vtnet_debugnet_init() call happens, because at that
point we might already have allocated something from
vtnet_tx_header_zone.

Some systems showed this panic:

        vtnet0: link state changed to UP
        panic: keg vtnet_tx_hdr initialization after use.
        cpuid = 5
        time = 1578427700
        KDB: stack backtrace:
        db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe004db427f0
        vpanic() at vpanic+0x17e/frame 0xfffffe004db42850
        panic() at panic+0x43/frame 0xfffffe004db428b0
        uma_zone_reserve() at uma_zone_reserve+0xf6/frame 0xfffffe004db428f0
        vtnet_debugnet_init() at vtnet_debugnet_init+0x77/frame 0xfffffe004db42930
        debugnet_any_ifnet_update() at debugnet_any_ifnet_update+0x42/frame 0xfffffe004db42980
        do_link_state_change() at do_link_state_change+0x1b3/frame 0xfffffe004db429d0
        taskqueue_run_locked() at taskqueue_run_locked+0x178/frame 0xfffffe004db42a30
        taskqueue_run() at taskqueue_run+0x4d/frame 0xfffffe004db42a50
        ithread_loop() at ithread_loop+0x1d6/frame 0xfffffe004db42ab0
        fork_exit() at fork_exit+0x80/frame 0xfffffe004db42af0
        fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe004db42af0
        --- trap 0, rip = 0, rsp = 0, rbp = 0 ---
        KDB: enter: panic
        [ thread pid 12 tid 100011 ]
        Stopped at      kdb_enter+0x37: movq    $0,0x1084eb6(%rip)
        db>

Reviewed by:	cem, markj
Differential Revision:	https://reviews.freebsd.org/D23073
2020-01-08 10:06:32 +00:00
Gleb Smirnoff
7dce56596f Convert to if_foreach_llmaddr() KPI. 2019-10-21 17:59:02 +00:00
Vincenzo Maffione
f8bc74e2f4 tap: add support for virtio-net offloads
This patch is part of an effort to make bhyve networking (in particular TCP)
faster. The key strategy to enhance TCP throughput is to let the whole packet
datapath work with TSO/LRO packets (up to 64KB each), so that the per-packet
overhead is amortized over a large number of bytes.
This capability is supported in the guest by means of the vtnet(4) driver,
which is able to handle TSO/LRO packets leveraging the virtio-net header
(see struct virtio_net_hdr and struct virtio_net_hdr_mrg_rxbuf).
A bhyve VM exchanges packets with the host through a network backend,
which can be vale(4) or if_tap(4).
While vale(4) supports TSO/LRO packets, if_tap(4) does not.
This patch extends if_tap(4) with the ability to understand the virtio-net
header, so that a tapX interface can process TSO/LRO packets.
A couple of ioctl commands have been added to configure and probe the
virtio-net header. Once the virtio-net header is set, the tapX interface
acquires all the IFCAP capabilities necessary for TSO/LRO.

Reviewed by:	kevans
Differential Revision:	https://reviews.freebsd.org/D21263
2019-10-18 21:53:27 +00:00
Conrad Meyer
7790c8c199 Split out a more generic debugnet(4) from netdump(4)
Debugnet is a simplistic and specialized panic- or debug-time reliable
datagram transport.  It can drive a single connection at a time and is
currently unidirectional (debug/panic machine transmit to remote server
only).

It is mostly a verbatim code lift from netdump(4).  Netdump(4) remains
the only consumer (until the rest of this patch series lands).

The INET-specific logic has been extracted somewhat more thoroughly than
previously in netdump(4), into debugnet_inet.c.  UDP-layer logic and up, as
much as possible as is protocol-independent, remains in debugnet.c.  The
separation is not perfect and future improvement is welcome.  Supporting
INET6 is a long-term goal.

Much of the diff is "gratuitous" renaming from 'netdump_' or 'nd_' to
'debugnet_' or 'dn_' -- sorry.  I thought keeping the netdump name on the
generic module would be more confusing than the refactoring.

The only functional change here is the mbuf allocation / tracking.  Instead
of initiating solely on netdump-configured interface(s) at dumpon(8)
configuration time, we watch for any debugnet-enabled NIC for link
activation and query it for mbuf parameters at that time.  If they exceed
the existing high-water mark allocation, we re-allocate and track the new
high-water mark.  Otherwise, we leave the pre-panic mbuf allocation alone.
In a future patch in this series, this will allow initiating netdump from
panic ddb(4) without pre-panic configuration.

No other functional change intended.

Reviewed by:	markj (earlier version)
Some discussion with:	emaste, jhb
Objection from:	marius
Differential Revision:	https://reviews.freebsd.org/D21421
2019-10-17 16:23:03 +00:00
Conrad Meyer
0f6040f03e virtio(4): Add PNP match metadata for virtio devices
Register MODULE_PNP_INFO for virtio devices using the newbus PNP information
provided by the previous commit.  Matching can be quite simple; existing
probe routines only matched on bus (implicit) and device_type.  The same
matching criteria are retained exactly, but is now also available to
devmatch(8).

Reviewed by:	bryanv, markj; imp (earlier version)
Differential Revision:	https://reviews.freebsd.org/D20407
2019-06-04 02:37:11 +00:00
Michael Tuexen
132ea9f2ad Remove non-functional SCTP checksum offload support for virtio.
Checksum offloading for SCTP is not currently specified for virtio.
If the hypervisor announces checksum offloading support, it means TCP
and UDP checksum offload. If an SCTP packet is sent and the host announced
checksum offload support, the hypervisor inserts the IP checksum (16-bit)
at the correct offset, but this is not the right checksum, which is a CRC32c.
This results in all outgoing packets having the wrong checksum and therefore
breaking SCTP based communications.

This patch removes SCTP checksum offloading support from the virtio
network interface.

Thanks to Felix Weinrank for making me aware of the issue.

Reviewed by:		bryanv@
MFC after:		1 week
Differential Revision:	https://reviews.freebsd.org/D20147
2019-05-07 20:28:12 +00:00