Commit Graph

43 Commits

Author SHA1 Message Date
Kristof Provost
581e697036 Fix panic when adding vtnet interfaces to a bridge
vtnet interfaces are always in promiscuous mode (at least if the
VIRTIO_NET_F_CTRL_RX feature is not negotiated with the host).  if_promisc() on
a vtnet interface returned ENOTSUP although it has IFF_PROMISC set. This
confused the bridge code. Instead we now accept all enable/disable promiscuous
commands (and always keep IFF_PROMISC set).

There are also two issues with the if_bridge error handling.

If if_promisc() fails it uses bridge_delete_member() to clean up. This tries to
disable promiscuous mode on the interface. That runs into an assert, because
promiscuous mode was never set in the first place. (That's the panic reported in
PR 200210.)
We can only unset promiscuous mode if the interface actually is promiscuous.
This goes against the reference counting done by if_promisc(), but only the
first/last if_promic() calls can actually fail, so this is safe.

A second issue is a double free of bif. It's already freed by
bridge_delete_member().

PR:		200210
Differential Revision:	https://reviews.freebsd.org/D2804
Reviewed by:	philip (mentor)
2015-06-13 19:39:21 +00:00
Bryan Venteicher
cab10cc1d1 Fix typo when deregistering the VLAN unconfig event handler
Submitted by:	Masao Uebayashi <uebayasi@tombiinc.com>
MFC after:	3 days
2015-06-13 16:13:31 +00:00
John Baldwin
4dc78216f8 Don't free mbufs when stopping an interface in netmap mode.
Currently if you ifconfig down a vtnet interface while it is being used
via netmap, the kernel panics due to trying to treat the cookie values
in the virtio rings as mbufs to be freed. When netmap is enabled, these
cookie values are pointers to something else.

Note that other netmap-aware drivers don't seem to need this as they
store the mbuf pointers in the software rings that mirror the hardware
descriptor rings, and since netmap doesn't touch those, the software
state always has NULL mbuf pointers causing the loops to free mbufs to
not do anything. However, vtnet reuses the same state area for both
netmap and non-netmap mode, so it needs to explicitly avoid looking at
the rings and treating the cookie values as mbufs if netmap is
enabled.

Differential Revision:	https://reviews.freebsd.org/D2348
Reviewed by:	adrian, bryanv, luigi
MFC after:	1 week
Sponsored by:	Norse Corp, Inc.
2015-04-29 17:48:25 +00:00
Bryan Venteicher
ab4c2818f2 Add softc flag for when the indirect descriptor feature was negotiated
MFC after:	2 weeks
2015-01-01 02:06:00 +00:00
Bryan Venteicher
5b32b2faaa Use the appropriate IPv4 or IPv6 TSO HW assist flag
MFC after:	2 weeks
2015-01-01 02:03:09 +00:00
Andrew Turner
e51f2e72db Attach vtnet to virtio_mmio. Qemu provides this as an option with AArch64.
Sponsored by:	The FreeBSD Foundation
2014-12-29 17:17:01 +00:00
Hans Petter Selasky
c25290420e Start process of removing the use of the deprecated "M_FLOWID" flag
from the FreeBSD network code. The flag is still kept around in the
"sys/mbuf.h" header file, but does no longer have any users. Instead
the "m_pkthdr.rsstype" field in the mbuf structure is now used to
decide the meaning of the "m_pkthdr.flowid" field. To modify the
"m_pkthdr.rsstype" field please use the existing "M_HASHTYPE_XXX"
macros as defined in the "sys/mbuf.h" header file.

This patch introduces new behaviour in the transmit direction.
Previously network drivers checked if "M_FLOWID" was set in "m_flags"
before using the "m_pkthdr.flowid" field. This check has now now been
replaced by checking if "M_HASHTYPE_GET(m)" is different from
"M_HASHTYPE_NONE". In the future more hashtypes will be added, for
example hashtypes for hardware dedicated flows.

"M_HASHTYPE_OPAQUE" indicates that the "m_pkthdr.flowid" value is
valid and has no particular type. This change removes the need for an
"if" statement in TCP transmit code checking for the presence of a
valid flowid value. The "if" statement mentioned above is now a direct
variable assignment which is then later checked by the respective
network drivers like before.

Additional notes:
- The SCTP code changes will be committed as a separate patch.
- Removal of the "M_FLOWID" flag will also be done separately.
- The FreeBSD version has been bumped.

MFC after:	1 month
Sponsored by:	Mellanox Technologies
2014-12-01 11:45:24 +00:00
Bryan Venteicher
9a4dabdc5a Enable LRO by default when available on vtnet interfaces
The prior change to not enable LRO by default has confused several
people. The configurations where LRO is problematic is not the
typical use case for VirtIO, and due to other issues, this often
requires checksum offloading to be disabled anyways.

PR:		185864
MFC after:	2 weeks
2014-11-09 20:04:12 +00:00
Gleb Smirnoff
84047b19df - Provide if_get_counter() method for vtnet(4).
- Do not accumulate statistics on every tick.
- Accumulate statistics in vtnet_setup_stat_sysctl()
  and in vtnet_get_counter().

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2014-09-18 19:15:40 +00:00
Gleb Smirnoff
1bffa9511f Use define from if_var.h to access a field inside struct if_data,
that resides in struct ifnet.

Sponsored by:	Nginx, Inc.
2014-08-30 19:55:54 +00:00
Luigi Rizzo
4bf50f18eb Update to the current version of netmap.
Mostly bugfixes or features developed in the past 6 months,
so this is a 10.1 candidate.

Basically no user API changes (some bugfixes in sys/net/netmap_user.h).

In detail:

1. netmap support for virtio-net, including in netmap mode.
  Under bhyve and with a netmap backend [2] we reach over 1Mpps
  with standard APIs (e.g. libpcap), and 5-8 Mpps in netmap mode.

2. (kernel) add support for multiple memory allocators, so we can
  better partition physical and virtual interfaces giving access
  to separate users. The most visible effect is one additional
  argument to the various kernel functions to compute buffer
  addresses. All netmap-supported drivers are affected, but changes
  are mechanical and trivial

3. (kernel) simplify the prototype for *txsync() and *rxsync()
  driver methods. All netmap drivers affected, changes mostly mechanical.

4. add support for netmap-monitor ports. Think of it as a mirroring
  port on a physical switch: a netmap monitor port replicates traffic
  present on the main port. Restrictions apply. Drive carefully.

5. if_lem.c: support for various paravirtualization features,
  experimental and disabled by default.
  Most of these are described in our ANCS'13 paper [1].
  Paravirtualized support in netmap mode is new, and beats the
  numbers in the paper by a large factor (under qemu-kvm,
  we measured gues-host throughput up to 10-12 Mpps).

A lot of refactoring and additional documentation in the files
in sys/dev/netmap, but apart from #2 and #3 above, almost nothing
of this stuff is visible to other kernel parts.

Example programs in tools/tools/netmap have been updated with bugfixes
and to support more of the existing features.

This is meant to go into 10.1 so we plan an MFC before the Aug.22 deadline.

A lot of this code has been contributed by my colleagues at UNIPI,
including Giuseppe Lettieri, Vincenzo Maffione, Stefano Garzarella.

MFC after:	3 days.
2014-08-16 15:00:01 +00:00
Bryan Venteicher
32487a8973 Rework when the Tx queue completion interrupt is enabled
The Tx interrupt is now kept disabled in the common case, only
enabled when the number of free descriptors in the queue falls
below a threshold. Transmitted frames are cleared from the VQ
before subsequent transmit, or in the watchdog timer.

This was a very big performance improvement for an experimental
Netmap bhyve backend.

MFC after:	1 month
2014-07-10 05:36:04 +00:00
Bryan Venteicher
bae486f5d7 Force two byte alignment for all control message headers
The header structure consists of two 1-byte elements, but it must always
be describable by a single SG entry. Note for consistency, specify the
alignment everywhere, even if the structure has the appropriate natural
alignment since it contains a uint16_t.

Obtained from:	DragonFlyBSD
MFC after:	1 week
2014-06-16 04:32:27 +00:00
Bryan Venteicher
fd5b395117 Make the feature negotiation code easier to follow
MFC after:	1 week
2014-06-16 04:29:28 +00:00
Bryan Venteicher
add526c613 - Remove two write-only local variables
- Remove unused element in the vtnet_rxq structure

MFC after:	1 week
2014-06-16 04:12:33 +00:00
Luigi Rizzo
c26e5fc2ed make sure ifp->if_transmit returns 0 if a buffer is enqueued.
A similar fix should be applied to vmxnet, ixgbe, igb, i40e.
(some of them previously reported by Michael Tuexen)

Drivers using if_transmit are correct, and so are most of the
other drivers that reassing if_transmit.

Among other things, this bug causes panics when using netmap emulation
on top of generic drivers.

Approved by:	bryanv
MFC after:	3 days
2014-06-04 16:57:05 +00:00
Gleb Smirnoff
b245f96c44 Since 32-bit if_baudrate isn't enough to describe a baud rate of a 10 Gbit
interface, in the r241616 a crutch was provided. It didn't work well, and
finally we decided that it is time to break ABI and simply make if_baudrate
a 64-bit value. Meanwhile, the entire struct if_data was reviewed.

o Remove the if_baudrate_pf crutch.

o Make all fields of struct if_data fixed machine independent size. The
  notion of data (packet counters, etc) are by no means MD. And it is a
  bug that on amd64 we've got a 64-bit counters, while on i386 32-bit,
  which at modern speeds overflow within a second.

  This also removes quite a lot of COMPAT_FREEBSD32 code.

o Give 16 bit for the ifi_datalen field. This field was provided to
  make future changes to if_data less ABI breaking. Unfortunately the
  8 bit size of it had effectively limited sizeof if_data to 256 bytes.

o Give 32 bits to ifi_mtu and ifi_metric.
o Give 64 bits to the rest of fields, since they are counters.

__FreeBSD_version bumped.

Discussed with:	emax
Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2014-03-13 03:42:24 +00:00
Bryan Venteicher
54fb8142b6 Use m_defrag() instead of m_collapse() to compact a long mbuf chain
This should be an infrequent occurrence, so remove the per-queue
counters in favor of just global counters in the softc.
2014-02-02 05:20:46 +00:00
Bryan Venteicher
443c3d0bd1 Do not place the sglist used for Rx/Tx on the stack
The sglist segment array has grown to a bit over 512 bytes (on
64-bit system) which is more than ideally should be put on the
stack. Instead allocate an appropriately sized sglist and hang
it off each Rx/Tx queue structure.

Bump the maximum number of Tx segments to 64 to make it unlikely
we'll have defragment an mbuf chain. Our previous count was
rounded up to this value since it is the next power of two, so
effective memory usage should not change.

Also only allocate the maximum number of Tx segments if TSO was
negotiated.
2014-02-02 05:15:36 +00:00
Bryan Venteicher
9ef6342f9e Check for a full virtqueue in the multiqueue transmit path
With most hosts, we'll negotiate indirect descriptors, so all we
need is one available descriptor to transmit a frame.
2014-01-25 19:58:53 +00:00
Bryan Venteicher
dd6f83a00f Avoid queue unlock followed by relock when the enable interrupt race is lost
This already happens infrequently, and the hold time is still bounded since
we defer to a taskqueue after a few tries.
2014-01-25 19:57:30 +00:00
Bryan Venteicher
bddddcd566 Move duplicated transmit start code into a single function 2014-01-25 19:55:42 +00:00
Bryan Venteicher
5591e479fe Remove stray space 2014-01-25 18:34:57 +00:00
Bryan Venteicher
9471658415 Also include the mbuf's csum_flags in an assert message 2014-01-25 07:35:09 +00:00
Bryan Venteicher
1dbb21dcc9 Read and write the MAC address in the config space byte by byte 2014-01-25 07:13:47 +00:00
Gleb Smirnoff
c3322cb91c Include necessary headers that now are available due to pollution
via if_var.h.

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2013-10-28 07:29:16 +00:00
Gleb Smirnoff
76039bc84f The r48589 promised to remove implicit inclusion of if_var.h soon. Prepare
to this event, adding if_var.h to files that do need it. Also, include
all includes that now are included due to implicit pollution via if_var.h

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2013-10-26 17:58:36 +00:00
Bryan Venteicher
d797300b75 Do not hold the vtnet Rx queue lock when calling up into the stack
This matches other similar drivers and avoids various LOR warnings.

Approved by:	re (marius)
2013-10-05 18:07:24 +00:00
Bryan Venteicher
6e03f31982 Complete any pending Tx frames before attempting the next transmit
Also complete pending frames in the watchdog function when the
EVENT_IDX feature was negotiated just in case the completion
interrupt was postponed.
2013-09-03 02:28:31 +00:00
Eitan Adler
72d9611d24 Fix build with gcc
Reported by:	Michael Butler <imb@protected-networks.net>
Reviewed by:	jilles
2013-09-01 20:22:52 +00:00
Bryan Venteicher
8f3600b108 Import multiqueue VirtIO net driver from my user/bryanv/vtnetmq branch
This is a significant rewrite of much of the previous driver; lots of
misc. cleanup was also performed, and support for a few other minor
features was also added.
2013-09-01 04:33:47 +00:00
Bryan Venteicher
cfc28a5bf7 Sync VirtIO net device header file from recent Linux 2013-09-01 04:23:54 +00:00
Bryan Venteicher
abd6790ce8 Merge virtio changes from projects/virtio
Contains projects/virtio commits:

r245738:
    virtio: Minor man page tweaks
r246060:
    virtio: Cleanup feature description printing
r246306:
    virtio: Remove old debugging flag
r247238:
    virtio: Remove PRIx64 macros from format strings
r247239:
    virtio: Constify some fields
r247240:
    virtio: Minor code simplifications
r249962:
    virtio: Update to my freebsd.org email address

MFC after:	1 month
2013-07-04 17:57:26 +00:00
Bryan Venteicher
3dd8d840ed Merge vtnet changes from projects/virtio
Minor changes to the network driver. A multiqueue driver that is
a significant rewrite will be in merged shortly.

Contains projects/virtio commits:

r246058:
    vtnet: Move an mbuf ASSERT to the calling function
r246059:
    vtnet: Tweak ASSERT message

MFC after:	1 month
2013-07-04 17:55:58 +00:00
Bryan Venteicher
6632efe40d Convert VirtIO to use ithreads instead of taskqueues
Contains projects/virtio commits:

r245709:
    Each VirtIO device was scheduling its own taskqueue(9) to do the
    off-level interrupt handling. ithreads(9) is the more nature way
    to do this. The primary motivation for this work to better support
    network multiqueue.
r245710:
    virtio: Change virtqueue intr handlers to return void
r245711:
    virtio_blk: Remove interrupt taskqueue
r245721:
    vtnet: Remove interrupt taskqueue
r245722:
    virtio_scsi: Remove interrupt taskqueue
r245747:
    vtnet: Remove taskqueue fields missed in r245721

MFC after:	1 month
2013-07-04 17:50:11 +00:00
Bryan Venteicher
b059b01e74 Merge r250802 from bryanv/vtnetmq - Fix setting of the Rx filters
QEMU 1.4 made the descriptor requirement stricter - the size of buffer
descriptor must exactly match the number of MAC addresses provided.

PR:		kern/178955
MFC after:	5 days
2013-06-15 03:55:04 +00:00
Bryan Venteicher
ac4b6bcd17 virtio: Start taskqueues threads after attach cannot fail
If virtio_setup_intr() failed during boot, we would hang in
taskqueue_free() -> taskqueue_terminate() for all the taskq
threads to terminate. This will never happen since the
scheduler is not running by this point.

Reported by:	neel, grehan
Approved by:	grehan (mentor)
2012-12-14 05:27:56 +00:00
Gleb Smirnoff
c6499eccad Mechanically substitute flags from historic mbuf allocator with
malloc(9) flags in sys/dev.
2012-12-04 09:32:43 +00:00
Peter Grehan
310dacd09b Various VirtIO improvements
PCI:
        - Properly handle interrupt fallback from MSIX to MSI to legacy.
          The host may not have sufficient resources to support MSIX,
          so we must be able to fallback to legacy interrupts.
        - Add interface to get the (sub) vendor and device IDs.
        - Rename flags to VTPCI_FLAG_* like other VirtIO drivers.
      Block:
        - No longer allocate vtblk_requests from separate UMA zone.
          malloc(9) from M_DEVBUF is sufficient. Assert segment counts
          at allocation.
        - More verbose error and debug messages.
      Network:
        - Remove stray write once variable.
      Virtqueue:
        - Shuffle code around in preparation of converting the mb()s to
          the appropriate atomic(9) operations.
        - Only walk the descriptor chain when freeing if INVARIANTS is
          defined since the result is only KASSERT()ed.

Submitted by:	Bryan Venteicher (bryanv@daemoninthecloset.org)
2012-07-11 02:57:19 +00:00
David E. O'Brien
7dcc1b85dd Do not include <sys/types.h> in the local headers. The .c files including
them have already included <sys/param.h> before these headers are included.
2012-07-03 15:15:41 +00:00
Peter Grehan
b8a587074f Catch up with Bryan Venteicher's virtio git repo:
a8af6270bd96be6ccd86f70b60fa6512b710e4f0
      virtio_blk: Include function name in panic string

cbdb03a694b76c5253d7ae3a59b9995b9afbb67a
      virtio_balloon: Do the notify outside of the lock

      By the time we return from virtqueue_notify(), the descriptor
      will be in the used ring so we shouldn't have to sleep.

10ba392e60692529a5cbc1e9987e4064e0128447
      virtio: Use DEVMETHOD_END

80cbcc4d6552cac758be67f0c99c36f23ce62110
      virtqueue: Add support for VIRTIO_F_RING_EVENT_IDX

      This can be used to reduce the number of guest/host and
      host/guest interrupts by delaying the interrupt until a
      certain index value is reached.

      Actual use by the network driver will come along later.

8fc465969acc0c58477153e4c3530390db436c02
      virtqueue: Simplify virtqueue_nused()

      Since the values just wrap naturally at UINT16_MAX, we
      can just subtract the two values directly, rather than
      doing 2's complement math.

a8aa22f25959e2767d006cd621b69050e7ffb0ae
      virtio_blk: Remove debugging crud from 75dd732a

      There seems to be an issue with Qemu (or FreeBSD VirtIO) that sets
      the PCI register space for the device config to bogus values. This
      only seems to happen after unloading and reloading the module.

d404800661cb2a9769c033f8a50b2133934501aa
      virtio_blk: Use better variable name

75dd732a97743d96e7c63f7ced3c2169696dadd3
      virtio_blk: Partially revert 92ba40e65

      Just use the virtqueue to determine if any requests are
      still inflight.

06661ed66b7a9efaea240f99f414c368f1bbcdc7
      virtio_blk: error if allowed too few segments

      Should never happen unless the host provides use with a
      bogus seg_max value.

4b33e5085bc87a818433d7e664a0a2c8f56a1a89
      virtio_blk: Sort function declarations

426b9f5cac892c9c64cc7631966461514f7e08c6
      virtio_blk: Cleanup whitespace

617c23e12c61e3c2233d942db713c6b8ff0bd112
      virtio_blk: Call disk_err() on error'd completed requests

081a5712d4b2e0abf273be4d26affcf3870263a9
      virtio_blk: ASSERT the ready and inflight request queues are empty

a9be2631a4f770a84145c18ee03a3f103bed4ca8
      virtio_blk: Simplify check for too many segments

      At the cost of a small style violation.

e00ec09da014f2e60cc75542d0ab78898672d521
      virtio_blk: Add beginnings of suspend/resume

      Still not sure if we need to virtio_stop()/virtio_reinit()
      the device before/after a suspend.

      Don't start additional IO when marked as suspending.

47c71dc6ce8c238aa59ce8afd4bda5aa294bc884
      virtio_blk: Panic when dealt an unhandled BIO cmd

1055544f90fb8c0cc6a2395f5b6104039606aafe
      virtio_blk: Add VQ enqueue/dequeue wrappers

      Wrapper functions managed the added/removing to the in-flight
      list of requests.

      Normally biodone() any completed IO when draining the virtqueue.

92ba40e65b3bb5e4acb9300ece711f1ea8f3f7f4
      virtio_blk: Add in-flight list of requests

74f6d260e075443544522c0833dc2712dd93f49b
      virtio_blk: Rename VTBLK_FLAG_DETACHING to VTBLK_FLAG_DETACH

7aa549050f6fc6551c09c6362ed6b2a0728956ef
      virtio_blk: Finish all BIOs through vtblk_finish_bio()

      Also properly set bio_resid in the case of errors. Most geom_disk
      providers seem to do the same.

9eef6d0e6f7e5dd362f71ba097f2e2e4c3744882
      Added function to translate VirtIO status to error code

ef06adc337f31e1129d6d5f26de6d8d1be27bcd2
      Reset dumping flag when given unexpected parameters

393b3e390c644193a2e392220dcc6a6c50b212d9
      Added missing VTBLK_LOCK() in dump handler

Obtained from:	Bryan Venteicher  bryanv at daemoninthecloset dot org
2012-04-14 05:48:04 +00:00
Peter Grehan
336f459c31 Catch up with Bryan Venteicher's virtio Hg repo:
c162516
  Remove vtblk_sector_size

c162515
  Wrap long license lines

c162514
  Remove vtblk_unit

c162513
  Wrap long lines in the license.

c162512
  Remove verbose messages when link goes up/down.

  A similar message is printed elsewhere as a result of
  if_link_state_change().

c162511
  Explicity compare pointer to NULL

c162510
  Allocate the mac filter table at attach time.

c162509
  Add real BSD licenses to the header files copied from Linux.

  The chases upstream changes made in Linux awhile ago.

c162508
  Only notify if we actually dequeued something.

c162507
  Change a couple of if () { KASSERT(...) } to just KASSERTs.

  In non-debug kernels, the if() { } probably get optomized
  away, but I guess this is clearer.

c162506
  Remove VIRTIO_BLK_F_TOPOLOGY fields in the config.

  TOPOLOGY has since been removed from the spec, and the FreeBSD
  didn't really do anything with the fields anyways.

c162505
  Move vtblk_enqueue_request() outside the locks when getting the ident.

c162504
  Remove soon to be uneeded trylock during dump [1].
  http://lists.freebsd.org/pipermail/freebsd-current/2011-November/029226.html

c162503
  Remove emtpy line

c162502
  Drop frame if cannot allocate a vtnet_tx_header.

  If we don't, we set OACTIVE, but if there are no
  other frames in flight, vtnet_txeof() will never
  be called to unset OACTIVE. The interface would
  have to be down/up'ed in order to become usable.

  We could be cuter here and only do this if the
  virtqueue is emtpy, but its probably not worth
  the complication.

c162501
  Start mbuf replacement loop at 1 for clarity

Obtained from:	Bryan Venteicher  bryanv at daemoninthecloset dot org
2011-12-06 06:28:32 +00:00
Peter Grehan
10b59a9b4a Import virtio base, PCI front-end, and net/block/balloon drivers.
Tested on Qemu/KVM, VirtualBox, and BHyVe.

Currently built as modules-only on i386/amd64. Man pages not yet hooked
up, pending review.

Submitted by:	Bryan Venteicher  bryanv at daemoninthecloset dot org
Reviewed by:	bz
MFC after:	4 weeks or so
2011-11-18 05:43:43 +00:00