Correctly limit npairs passed to vtnet_ctrl_mq_cmd. This ensures that
VQ_ALLOC_INFO_INIT is called with the correct value, preventing the system
from hanging when max_virtqueue_pairs > VTNET_MAX_QUEUE_PAIRS.
Add new sysctl requested_vq_pairs which allow the user to configure
the requested number of virtqueue pairs. The actual value will still take
into account the system limits.
Also missing sysctls for the current tunables so their values can be seen.
PR: 207446
Reported by: Andy Carrel
MFC after: 3 days
Relnotes: Yes
Sponsored by: Multiplay
Since r276367 added the virtio_mmio support vtnet_modevent() gets called twice.
This resulted in a memory leak during load and a panic on unload.
Count the loads so we only initialise once (just like cxgbe(4)), and only clean
up in the final unload.
PR: 209428
Submitted by: novel@FreeBSD.org
MFC after: 1 week
vtnet interfaces are always in promiscuous mode (at least if the
VIRTIO_NET_F_CTRL_RX feature is not negotiated with the host). if_promisc() on
a vtnet interface returned ENOTSUP although it has IFF_PROMISC set. This
confused the bridge code. Instead we now accept all enable/disable promiscuous
commands (and always keep IFF_PROMISC set).
There are also two issues with the if_bridge error handling.
If if_promisc() fails it uses bridge_delete_member() to clean up. This tries to
disable promiscuous mode on the interface. That runs into an assert, because
promiscuous mode was never set in the first place. (That's the panic reported in
PR 200210.)
We can only unset promiscuous mode if the interface actually is promiscuous.
This goes against the reference counting done by if_promisc(), but only the
first/last if_promic() calls can actually fail, so this is safe.
A second issue is a double free of bif. It's already freed by
bridge_delete_member().
PR: 200210
Differential Revision: https://reviews.freebsd.org/D2804
Reviewed by: philip (mentor)
Currently if you ifconfig down a vtnet interface while it is being used
via netmap, the kernel panics due to trying to treat the cookie values
in the virtio rings as mbufs to be freed. When netmap is enabled, these
cookie values are pointers to something else.
Note that other netmap-aware drivers don't seem to need this as they
store the mbuf pointers in the software rings that mirror the hardware
descriptor rings, and since netmap doesn't touch those, the software
state always has NULL mbuf pointers causing the loops to free mbufs to
not do anything. However, vtnet reuses the same state area for both
netmap and non-netmap mode, so it needs to explicitly avoid looking at
the rings and treating the cookie values as mbufs if netmap is
enabled.
Differential Revision: https://reviews.freebsd.org/D2348
Reviewed by: adrian, bryanv, luigi
MFC after: 1 week
Sponsored by: Norse Corp, Inc.
from the FreeBSD network code. The flag is still kept around in the
"sys/mbuf.h" header file, but does no longer have any users. Instead
the "m_pkthdr.rsstype" field in the mbuf structure is now used to
decide the meaning of the "m_pkthdr.flowid" field. To modify the
"m_pkthdr.rsstype" field please use the existing "M_HASHTYPE_XXX"
macros as defined in the "sys/mbuf.h" header file.
This patch introduces new behaviour in the transmit direction.
Previously network drivers checked if "M_FLOWID" was set in "m_flags"
before using the "m_pkthdr.flowid" field. This check has now now been
replaced by checking if "M_HASHTYPE_GET(m)" is different from
"M_HASHTYPE_NONE". In the future more hashtypes will be added, for
example hashtypes for hardware dedicated flows.
"M_HASHTYPE_OPAQUE" indicates that the "m_pkthdr.flowid" value is
valid and has no particular type. This change removes the need for an
"if" statement in TCP transmit code checking for the presence of a
valid flowid value. The "if" statement mentioned above is now a direct
variable assignment which is then later checked by the respective
network drivers like before.
Additional notes:
- The SCTP code changes will be committed as a separate patch.
- Removal of the "M_FLOWID" flag will also be done separately.
- The FreeBSD version has been bumped.
MFC after: 1 month
Sponsored by: Mellanox Technologies
The prior change to not enable LRO by default has confused several
people. The configurations where LRO is problematic is not the
typical use case for VirtIO, and due to other issues, this often
requires checksum offloading to be disabled anyways.
PR: 185864
MFC after: 2 weeks
- Do not accumulate statistics on every tick.
- Accumulate statistics in vtnet_setup_stat_sysctl()
and in vtnet_get_counter().
Sponsored by: Netflix
Sponsored by: Nginx, Inc.
Mostly bugfixes or features developed in the past 6 months,
so this is a 10.1 candidate.
Basically no user API changes (some bugfixes in sys/net/netmap_user.h).
In detail:
1. netmap support for virtio-net, including in netmap mode.
Under bhyve and with a netmap backend [2] we reach over 1Mpps
with standard APIs (e.g. libpcap), and 5-8 Mpps in netmap mode.
2. (kernel) add support for multiple memory allocators, so we can
better partition physical and virtual interfaces giving access
to separate users. The most visible effect is one additional
argument to the various kernel functions to compute buffer
addresses. All netmap-supported drivers are affected, but changes
are mechanical and trivial
3. (kernel) simplify the prototype for *txsync() and *rxsync()
driver methods. All netmap drivers affected, changes mostly mechanical.
4. add support for netmap-monitor ports. Think of it as a mirroring
port on a physical switch: a netmap monitor port replicates traffic
present on the main port. Restrictions apply. Drive carefully.
5. if_lem.c: support for various paravirtualization features,
experimental and disabled by default.
Most of these are described in our ANCS'13 paper [1].
Paravirtualized support in netmap mode is new, and beats the
numbers in the paper by a large factor (under qemu-kvm,
we measured gues-host throughput up to 10-12 Mpps).
A lot of refactoring and additional documentation in the files
in sys/dev/netmap, but apart from #2 and #3 above, almost nothing
of this stuff is visible to other kernel parts.
Example programs in tools/tools/netmap have been updated with bugfixes
and to support more of the existing features.
This is meant to go into 10.1 so we plan an MFC before the Aug.22 deadline.
A lot of this code has been contributed by my colleagues at UNIPI,
including Giuseppe Lettieri, Vincenzo Maffione, Stefano Garzarella.
MFC after: 3 days.
The Tx interrupt is now kept disabled in the common case, only
enabled when the number of free descriptors in the queue falls
below a threshold. Transmitted frames are cleared from the VQ
before subsequent transmit, or in the watchdog timer.
This was a very big performance improvement for an experimental
Netmap bhyve backend.
MFC after: 1 month
The header structure consists of two 1-byte elements, but it must always
be describable by a single SG entry. Note for consistency, specify the
alignment everywhere, even if the structure has the appropriate natural
alignment since it contains a uint16_t.
Obtained from: DragonFlyBSD
MFC after: 1 week
A similar fix should be applied to vmxnet, ixgbe, igb, i40e.
(some of them previously reported by Michael Tuexen)
Drivers using if_transmit are correct, and so are most of the
other drivers that reassing if_transmit.
Among other things, this bug causes panics when using netmap emulation
on top of generic drivers.
Approved by: bryanv
MFC after: 3 days
interface, in the r241616 a crutch was provided. It didn't work well, and
finally we decided that it is time to break ABI and simply make if_baudrate
a 64-bit value. Meanwhile, the entire struct if_data was reviewed.
o Remove the if_baudrate_pf crutch.
o Make all fields of struct if_data fixed machine independent size. The
notion of data (packet counters, etc) are by no means MD. And it is a
bug that on amd64 we've got a 64-bit counters, while on i386 32-bit,
which at modern speeds overflow within a second.
This also removes quite a lot of COMPAT_FREEBSD32 code.
o Give 16 bit for the ifi_datalen field. This field was provided to
make future changes to if_data less ABI breaking. Unfortunately the
8 bit size of it had effectively limited sizeof if_data to 256 bytes.
o Give 32 bits to ifi_mtu and ifi_metric.
o Give 64 bits to the rest of fields, since they are counters.
__FreeBSD_version bumped.
Discussed with: emax
Sponsored by: Netflix
Sponsored by: Nginx, Inc.
The sglist segment array has grown to a bit over 512 bytes (on
64-bit system) which is more than ideally should be put on the
stack. Instead allocate an appropriately sized sglist and hang
it off each Rx/Tx queue structure.
Bump the maximum number of Tx segments to 64 to make it unlikely
we'll have defragment an mbuf chain. Our previous count was
rounded up to this value since it is the next power of two, so
effective memory usage should not change.
Also only allocate the maximum number of Tx segments if TSO was
negotiated.
to this event, adding if_var.h to files that do need it. Also, include
all includes that now are included due to implicit pollution via if_var.h
Sponsored by: Netflix
Sponsored by: Nginx, Inc.
This is a significant rewrite of much of the previous driver; lots of
misc. cleanup was also performed, and support for a few other minor
features was also added.
Minor changes to the network driver. A multiqueue driver that is
a significant rewrite will be in merged shortly.
Contains projects/virtio commits:
r246058:
vtnet: Move an mbuf ASSERT to the calling function
r246059:
vtnet: Tweak ASSERT message
MFC after: 1 month
Contains projects/virtio commits:
r245709:
Each VirtIO device was scheduling its own taskqueue(9) to do the
off-level interrupt handling. ithreads(9) is the more nature way
to do this. The primary motivation for this work to better support
network multiqueue.
r245710:
virtio: Change virtqueue intr handlers to return void
r245711:
virtio_blk: Remove interrupt taskqueue
r245721:
vtnet: Remove interrupt taskqueue
r245722:
virtio_scsi: Remove interrupt taskqueue
r245747:
vtnet: Remove taskqueue fields missed in r245721
MFC after: 1 month
QEMU 1.4 made the descriptor requirement stricter - the size of buffer
descriptor must exactly match the number of MAC addresses provided.
PR: kern/178955
MFC after: 5 days
If virtio_setup_intr() failed during boot, we would hang in
taskqueue_free() -> taskqueue_terminate() for all the taskq
threads to terminate. This will never happen since the
scheduler is not running by this point.
Reported by: neel, grehan
Approved by: grehan (mentor)
PCI:
- Properly handle interrupt fallback from MSIX to MSI to legacy.
The host may not have sufficient resources to support MSIX,
so we must be able to fallback to legacy interrupts.
- Add interface to get the (sub) vendor and device IDs.
- Rename flags to VTPCI_FLAG_* like other VirtIO drivers.
Block:
- No longer allocate vtblk_requests from separate UMA zone.
malloc(9) from M_DEVBUF is sufficient. Assert segment counts
at allocation.
- More verbose error and debug messages.
Network:
- Remove stray write once variable.
Virtqueue:
- Shuffle code around in preparation of converting the mb()s to
the appropriate atomic(9) operations.
- Only walk the descriptor chain when freeing if INVARIANTS is
defined since the result is only KASSERT()ed.
Submitted by: Bryan Venteicher (bryanv@daemoninthecloset.org)
a8af6270bd96be6ccd86f70b60fa6512b710e4f0
virtio_blk: Include function name in panic string
cbdb03a694b76c5253d7ae3a59b9995b9afbb67a
virtio_balloon: Do the notify outside of the lock
By the time we return from virtqueue_notify(), the descriptor
will be in the used ring so we shouldn't have to sleep.
10ba392e60692529a5cbc1e9987e4064e0128447
virtio: Use DEVMETHOD_END
80cbcc4d6552cac758be67f0c99c36f23ce62110
virtqueue: Add support for VIRTIO_F_RING_EVENT_IDX
This can be used to reduce the number of guest/host and
host/guest interrupts by delaying the interrupt until a
certain index value is reached.
Actual use by the network driver will come along later.
8fc465969acc0c58477153e4c3530390db436c02
virtqueue: Simplify virtqueue_nused()
Since the values just wrap naturally at UINT16_MAX, we
can just subtract the two values directly, rather than
doing 2's complement math.
a8aa22f25959e2767d006cd621b69050e7ffb0ae
virtio_blk: Remove debugging crud from 75dd732a
There seems to be an issue with Qemu (or FreeBSD VirtIO) that sets
the PCI register space for the device config to bogus values. This
only seems to happen after unloading and reloading the module.
d404800661cb2a9769c033f8a50b2133934501aa
virtio_blk: Use better variable name
75dd732a97743d96e7c63f7ced3c2169696dadd3
virtio_blk: Partially revert 92ba40e65
Just use the virtqueue to determine if any requests are
still inflight.
06661ed66b7a9efaea240f99f414c368f1bbcdc7
virtio_blk: error if allowed too few segments
Should never happen unless the host provides use with a
bogus seg_max value.
4b33e5085bc87a818433d7e664a0a2c8f56a1a89
virtio_blk: Sort function declarations
426b9f5cac892c9c64cc7631966461514f7e08c6
virtio_blk: Cleanup whitespace
617c23e12c61e3c2233d942db713c6b8ff0bd112
virtio_blk: Call disk_err() on error'd completed requests
081a5712d4b2e0abf273be4d26affcf3870263a9
virtio_blk: ASSERT the ready and inflight request queues are empty
a9be2631a4f770a84145c18ee03a3f103bed4ca8
virtio_blk: Simplify check for too many segments
At the cost of a small style violation.
e00ec09da014f2e60cc75542d0ab78898672d521
virtio_blk: Add beginnings of suspend/resume
Still not sure if we need to virtio_stop()/virtio_reinit()
the device before/after a suspend.
Don't start additional IO when marked as suspending.
47c71dc6ce8c238aa59ce8afd4bda5aa294bc884
virtio_blk: Panic when dealt an unhandled BIO cmd
1055544f90fb8c0cc6a2395f5b6104039606aafe
virtio_blk: Add VQ enqueue/dequeue wrappers
Wrapper functions managed the added/removing to the in-flight
list of requests.
Normally biodone() any completed IO when draining the virtqueue.
92ba40e65b3bb5e4acb9300ece711f1ea8f3f7f4
virtio_blk: Add in-flight list of requests
74f6d260e075443544522c0833dc2712dd93f49b
virtio_blk: Rename VTBLK_FLAG_DETACHING to VTBLK_FLAG_DETACH
7aa549050f6fc6551c09c6362ed6b2a0728956ef
virtio_blk: Finish all BIOs through vtblk_finish_bio()
Also properly set bio_resid in the case of errors. Most geom_disk
providers seem to do the same.
9eef6d0e6f7e5dd362f71ba097f2e2e4c3744882
Added function to translate VirtIO status to error code
ef06adc337f31e1129d6d5f26de6d8d1be27bcd2
Reset dumping flag when given unexpected parameters
393b3e390c644193a2e392220dcc6a6c50b212d9
Added missing VTBLK_LOCK() in dump handler
Obtained from: Bryan Venteicher bryanv at daemoninthecloset dot org
c162516
Remove vtblk_sector_size
c162515
Wrap long license lines
c162514
Remove vtblk_unit
c162513
Wrap long lines in the license.
c162512
Remove verbose messages when link goes up/down.
A similar message is printed elsewhere as a result of
if_link_state_change().
c162511
Explicity compare pointer to NULL
c162510
Allocate the mac filter table at attach time.
c162509
Add real BSD licenses to the header files copied from Linux.
The chases upstream changes made in Linux awhile ago.
c162508
Only notify if we actually dequeued something.
c162507
Change a couple of if () { KASSERT(...) } to just KASSERTs.
In non-debug kernels, the if() { } probably get optomized
away, but I guess this is clearer.
c162506
Remove VIRTIO_BLK_F_TOPOLOGY fields in the config.
TOPOLOGY has since been removed from the spec, and the FreeBSD
didn't really do anything with the fields anyways.
c162505
Move vtblk_enqueue_request() outside the locks when getting the ident.
c162504
Remove soon to be uneeded trylock during dump [1].
http://lists.freebsd.org/pipermail/freebsd-current/2011-November/029226.html
c162503
Remove emtpy line
c162502
Drop frame if cannot allocate a vtnet_tx_header.
If we don't, we set OACTIVE, but if there are no
other frames in flight, vtnet_txeof() will never
be called to unset OACTIVE. The interface would
have to be down/up'ed in order to become usable.
We could be cuter here and only do this if the
virtqueue is emtpy, but its probably not worth
the complication.
c162501
Start mbuf replacement loop at 1 for clarity
Obtained from: Bryan Venteicher bryanv at daemoninthecloset dot org
Tested on Qemu/KVM, VirtualBox, and BHyVe.
Currently built as modules-only on i386/amd64. Man pages not yet hooked
up, pending review.
Submitted by: Bryan Venteicher bryanv at daemoninthecloset dot org
Reviewed by: bz
MFC after: 4 weeks or so