71 Commits

Author SHA1 Message Date
Gleb Smirnoff
84047b19df - Provide if_get_counter() method for vtnet(4).
- Do not accumulate statistics on every tick.
- Accumulate statistics in vtnet_setup_stat_sysctl()
  and in vtnet_get_counter().

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2014-09-18 19:15:40 +00:00
Gleb Smirnoff
1bffa9511f Use define from if_var.h to access a field inside struct if_data,
that resides in struct ifnet.

Sponsored by:	Nginx, Inc.
2014-08-30 19:55:54 +00:00
Luigi Rizzo
4bf50f18eb Update to the current version of netmap.
Mostly bugfixes or features developed in the past 6 months,
so this is a 10.1 candidate.

Basically no user API changes (some bugfixes in sys/net/netmap_user.h).

In detail:

1. netmap support for virtio-net, including in netmap mode.
  Under bhyve and with a netmap backend [2] we reach over 1Mpps
  with standard APIs (e.g. libpcap), and 5-8 Mpps in netmap mode.

2. (kernel) add support for multiple memory allocators, so we can
  better partition physical and virtual interfaces giving access
  to separate users. The most visible effect is one additional
  argument to the various kernel functions to compute buffer
  addresses. All netmap-supported drivers are affected, but changes
  are mechanical and trivial

3. (kernel) simplify the prototype for *txsync() and *rxsync()
  driver methods. All netmap drivers affected, changes mostly mechanical.

4. add support for netmap-monitor ports. Think of it as a mirroring
  port on a physical switch: a netmap monitor port replicates traffic
  present on the main port. Restrictions apply. Drive carefully.

5. if_lem.c: support for various paravirtualization features,
  experimental and disabled by default.
  Most of these are described in our ANCS'13 paper [1].
  Paravirtualized support in netmap mode is new, and beats the
  numbers in the paper by a large factor (under qemu-kvm,
  we measured gues-host throughput up to 10-12 Mpps).

A lot of refactoring and additional documentation in the files
in sys/dev/netmap, but apart from #2 and #3 above, almost nothing
of this stuff is visible to other kernel parts.

Example programs in tools/tools/netmap have been updated with bugfixes
and to support more of the existing features.

This is meant to go into 10.1 so we plan an MFC before the Aug.22 deadline.

A lot of this code has been contributed by my colleagues at UNIPI,
including Giuseppe Lettieri, Vincenzo Maffione, Stefano Garzarella.

MFC after:	3 days.
2014-08-16 15:00:01 +00:00
Luigi Rizzo
a5b6123ea9 print additional debugging info in virtqueue_dump()
(not fundamental, but useful to debug performance issues on vtnet)

MFC after:	3 days
2014-08-16 13:13:17 +00:00
Bryan Venteicher
32487a8973 Rework when the Tx queue completion interrupt is enabled
The Tx interrupt is now kept disabled in the common case, only
enabled when the number of free descriptors in the queue falls
below a threshold. Transmitted frames are cleared from the VQ
before subsequent transmit, or in the watchdog timer.

This was a very big performance improvement for an experimental
Netmap bhyve backend.

MFC after:	1 month
2014-07-10 05:36:04 +00:00
Bryan Venteicher
4b59668f0e Add accessor to get the number of free descriptors in the virtqueue
MFC after:	1 month
2014-07-10 05:26:01 +00:00
Roger Pau Monné
68e58ea7ed xen/virtio: fix balloon drivers to not mark pages as WIRED
Prevent the Xen and VirtIO balloon drivers from marking pages as
wired. This prevents them from increasing the system wired page count,
which can lead to mlock failing because of hitting the limit in
vm.max_wired.

In the Xen case make sure pages are zeroed before giving them back to
the hypervisor, or else we might be leaking data. Also remove the
balloon_{append/retrieve} and link pages directly into the
ballooned_pages queue using the plinks.q field in the page struct.

Sponsored by: Citrix Systems R&D
Reviewed by: kib, bryanv
Approved by: gibbs

dev/virtio/balloon/virtio_balloon.c:
 - Don't allocate pages with VM_ALLOC_WIRED.

dev/xen/balloon/balloon.c:
 - Don't allocate pages with VM_ALLOC_WIRED.
 - Make sure pages are zeroed before giving them back to the
   hypervisor.
 - Remove the balloon_entry struct and the balloon_{append/retrieve}
   functions and use the page plinks.q entry to link the pages
   directly into the ballooned_pages queue.
2014-06-25 09:51:08 +00:00
Attilio Rao
3ae10f7477 - Modify vm_page_unwire() and vm_page_enqueue() to directly accept
the queue where to enqueue pages that are going to be unwired.
- Add stronger checks to the enqueue/dequeue for the pagequeues when
  adding and removing pages to them.

Of course, for unmanaged pages the queue parameter of vm_page_unwire() will
be ignored, just as the active parameter today.
This makes adding new pagequeues quicker.

This change effectively modifies the KPI.  __FreeBSD_version will be,
however, bumped just when the full cache of free pages will be
evicted.

Sponsored by:	EMC / Isilon storage division
Reviewed by:	alc
Tested by:	pho
2014-06-16 18:15:27 +00:00
Bryan Venteicher
bae486f5d7 Force two byte alignment for all control message headers
The header structure consists of two 1-byte elements, but it must always
be describable by a single SG entry. Note for consistency, specify the
alignment everywhere, even if the structure has the appropriate natural
alignment since it contains a uint16_t.

Obtained from:	DragonFlyBSD
MFC after:	1 week
2014-06-16 04:32:27 +00:00
Bryan Venteicher
fd5b395117 Make the feature negotiation code easier to follow
MFC after:	1 week
2014-06-16 04:29:28 +00:00
Bryan Venteicher
45543f0751 Move the VIRTIO_RING_F_* defines out of virtqueue.h into virtio_config.h
These defines are applicable to userland too, but virtqueue.h contains
the kernel virtqueue interface, and is therefore not usable in userland.

Note that Linux places these defines in virtio_ring.h, but I don't want
the drivers including this header file to keep the VirtIO ring opaque to
everything but the virtqueue.

MFC after:	1 week
2014-06-16 04:25:04 +00:00
Bryan Venteicher
e026de111e Remove kernel specific macro out of the VirtIO PCI header file
The eventual goal is to share this file with userland, so
remove the macro that is only specific for virtio_pci(4).
Instead, add the VIRTIO_PCI_CONFIG_OFF macro from Linux to
get the config size whether MSIX is enabled or not.

MFC after:	1 week
2014-06-16 04:16:31 +00:00
Bryan Venteicher
add526c613 - Remove two write-only local variables
- Remove unused element in the vtnet_rxq structure

MFC after:	1 week
2014-06-16 04:12:33 +00:00
Bryan Venteicher
49d5172b34 Always append new bios to the tail of the queue, instead of sorting them
MFC after:	1 week
2014-06-10 03:29:15 +00:00
Luigi Rizzo
c26e5fc2ed make sure ifp->if_transmit returns 0 if a buffer is enqueued.
A similar fix should be applied to vmxnet, ixgbe, igb, i40e.
(some of them previously reported by Michael Tuexen)

Drivers using if_transmit are correct, and so are most of the
other drivers that reassing if_transmit.

Among other things, this bug causes panics when using netmap emulation
on top of generic drivers.

Approved by:	bryanv
MFC after:	3 days
2014-06-04 16:57:05 +00:00
Bryan Venteicher
9a73216696 Split the virtio.h header file into multiple files
Reorganize the previous contexts of the file as it is in Linux. The
eventual goal is to install the header files and share them between
the kernel and bhyve.

MFC after:	1 week
2014-06-01 18:16:01 +00:00
Bryan Venteicher
afd5e40ee0 Wait for the callout to finish before unloading the module
MFC after:	3 days
2014-04-24 05:04:54 +00:00
Gleb Smirnoff
b245f96c44 Since 32-bit if_baudrate isn't enough to describe a baud rate of a 10 Gbit
interface, in the r241616 a crutch was provided. It didn't work well, and
finally we decided that it is time to break ABI and simply make if_baudrate
a 64-bit value. Meanwhile, the entire struct if_data was reviewed.

o Remove the if_baudrate_pf crutch.

o Make all fields of struct if_data fixed machine independent size. The
  notion of data (packet counters, etc) are by no means MD. And it is a
  bug that on amd64 we've got a 64-bit counters, while on i386 32-bit,
  which at modern speeds overflow within a second.

  This also removes quite a lot of COMPAT_FREEBSD32 code.

o Give 16 bit for the ifi_datalen field. This field was provided to
  make future changes to if_data less ABI breaking. Unfortunately the
  8 bit size of it had effectively limited sizeof if_data to 256 bytes.

o Give 32 bits to ifi_mtu and ifi_metric.
o Give 64 bits to the rest of fields, since they are counters.

__FreeBSD_version bumped.

Discussed with:	emax
Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2014-03-13 03:42:24 +00:00
Bryan Venteicher
54fb8142b6 Use m_defrag() instead of m_collapse() to compact a long mbuf chain
This should be an infrequent occurrence, so remove the per-queue
counters in favor of just global counters in the softc.
2014-02-02 05:20:46 +00:00
Bryan Venteicher
443c3d0bd1 Do not place the sglist used for Rx/Tx on the stack
The sglist segment array has grown to a bit over 512 bytes (on
64-bit system) which is more than ideally should be put on the
stack. Instead allocate an appropriately sized sglist and hang
it off each Rx/Tx queue structure.

Bump the maximum number of Tx segments to 64 to make it unlikely
we'll have defragment an mbuf chain. Our previous count was
rounded up to this value since it is the next power of two, so
effective memory usage should not change.

Also only allocate the maximum number of Tx segments if TSO was
negotiated.
2014-02-02 05:15:36 +00:00
Bryan Venteicher
9ef6342f9e Check for a full virtqueue in the multiqueue transmit path
With most hosts, we'll negotiate indirect descriptors, so all we
need is one available descriptor to transmit a frame.
2014-01-25 19:58:53 +00:00
Bryan Venteicher
dd6f83a00f Avoid queue unlock followed by relock when the enable interrupt race is lost
This already happens infrequently, and the hold time is still bounded since
we defer to a taskqueue after a few tries.
2014-01-25 19:57:30 +00:00
Bryan Venteicher
bddddcd566 Move duplicated transmit start code into a single function 2014-01-25 19:55:42 +00:00
Bryan Venteicher
5591e479fe Remove stray space 2014-01-25 18:34:57 +00:00
Bryan Venteicher
9471658415 Also include the mbuf's csum_flags in an assert message 2014-01-25 07:35:09 +00:00
Bryan Venteicher
1dbb21dcc9 Read and write the MAC address in the config space byte by byte 2014-01-25 07:13:47 +00:00
Bryan Venteicher
8c457c885e Read each field of the configuration individually
In the forthcoming VirtIO spec, the device configuration is
always in little endian instead of guest edian. This is a
noop change for now.
2014-01-25 07:01:51 +00:00
Bryan Venteicher
31ac03991b Remove spaces before tabs in the function prototype list 2014-01-25 06:54:04 +00:00
Bryan Venteicher
10c4018057 Add very simple virtio_random(4) driver to harvest entropy from host
Reviewed by:	markm (random bits only)
2014-01-18 06:14:38 +00:00
Bryan Venteicher
22525db507 Add unmapped IO support to virtio_scsi(4) 2014-01-13 04:46:48 +00:00
Bryan Venteicher
ee11ec3437 Add unmapped IO support to virtio_blk(4) 2014-01-13 04:43:01 +00:00
Bryan Venteicher
bf51187b26 Remove incorrect bit shift when assigning the LUN request field
This caused duplicate targets appearing on Google Compute Engine
instances.

PR:		kern/185626
Submitted by:	Venkatesh Srinivas <venkateshs@google.com>
MFC after:	3 days
2014-01-12 17:40:47 +00:00
Gleb Smirnoff
c3322cb91c Include necessary headers that now are available due to pollution
via if_var.h.

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2013-10-28 07:29:16 +00:00
Gleb Smirnoff
76039bc84f The r48589 promised to remove implicit inclusion of if_var.h soon. Prepare
to this event, adding if_var.h to files that do need it. Also, include
all includes that now are included due to implicit pollution via if_var.h

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2013-10-26 17:58:36 +00:00
Bryan Venteicher
d797300b75 Do not hold the vtnet Rx queue lock when calling up into the stack
This matches other similar drivers and avoids various LOR warnings.

Approved by:	re (marius)
2013-10-05 18:07:24 +00:00
Bryan Venteicher
6e03f31982 Complete any pending Tx frames before attempting the next transmit
Also complete pending frames in the watchdog function when the
EVENT_IDX feature was negotiated just in case the completion
interrupt was postponed.
2013-09-03 02:28:31 +00:00
Bryan Venteicher
4142b1cbe5 Fix unintended compiler constant folding
Pointed out by:	dim@
2013-09-03 02:26:57 +00:00
Eitan Adler
72d9611d24 Fix build with gcc
Reported by:	Michael Butler <imb@protected-networks.net>
Reviewed by:	jilles
2013-09-01 20:22:52 +00:00
Bryan Venteicher
8f3600b108 Import multiqueue VirtIO net driver from my user/bryanv/vtnetmq branch
This is a significant rewrite of much of the previous driver; lots of
misc. cleanup was also performed, and support for a few other minor
features was also added.
2013-09-01 04:33:47 +00:00
Bryan Venteicher
cfc28a5bf7 Sync VirtIO net device header file from recent Linux 2013-09-01 04:23:54 +00:00
Bryan Venteicher
49a4385d69 Add optional VirtIO device method for post-attach notifications
This is called after the parent device (ie virito_pci) has
completed the device attachment/initialization.
2013-09-01 04:20:23 +00:00
Bryan Venteicher
b619f40aec Add support for postponing VirtIO virtqueue interrupts
Partial support for the EVENT_IDX feature was added a while ago,
but this commit adds an interface for the device driver to hint
how long (in terms of descriptors) the next interrupt should be
delayed.

The first user of this will be used to reduce VirtIO net's Tx
completion interrupts.
2013-09-01 04:16:43 +00:00
Konstantin Belousov
c325e866f4 Different consumers of the struct vm_page abuse pageq member to keep
additional information, when the page is guaranteed to not belong to a
paging queue.  Usually, this results in a lot of type casts which make
reasoning about the code correctness harder.

Sometimes m->object is used instead of pageq, which could cause real
and confusing bugs if non-NULL m->object is leaked.  See r141955 and
r253140 for examples.

Change the pageq member into a union containing explicitly-typed
members.  Use them instead of type-punning or abusing m->object in x86
pmaps, uma and vm_page_alloc_contig().

Requested and reviewed by:	alc
Sponsored by:	The FreeBSD Foundation
2013-08-10 17:36:42 +00:00
Bryan Venteicher
4d5919ec0b Merge virtio_scsi change from projects/virtio
r252680:
    Fix SIM lock not owned panic

    The CAM locking requirements of registering an async
    callback has changed so the SIM lock must be held. Remove
    code that explicitly dropped the lock around the register.

    Also return CAM_SEL_TIMEOUT instead of CAM_TID_INVALID
    for bad targets to avoid a lot console spam during bus
    scans.

MFC after:	1 month
2013-07-04 18:00:27 +00:00
Bryan Venteicher
62a69c4153 Merge virtio_pci changes from projects/virtio
This commit is primarily a significant cleanup to the interrupt
allocation code that had gotten a bit jumbled from having to
support per-vq MSIX, shared MSIX, MSI, and legacy style interrupts.

Contains projects/virtio commits:

r246064:
    virtio_pci: Rewrite allocation of interrupts
r246065:
    virtio_pci: Remove spaces before a tab
r246066:
    virtio_pci: Dynamically allocate the virtqueue array
r246304:
    virtio_pci: Clean up after failed virtqueue alloc attempt
r246305:
    virtio_pci: Move no interrupt check into the PCI interrupt handlers
r246308:
    virtio_pci: Remove unused variable

MFC after:	1 month
2013-07-04 17:59:09 +00:00
Bryan Venteicher
abd6790ce8 Merge virtio changes from projects/virtio
Contains projects/virtio commits:

r245738:
    virtio: Minor man page tweaks
r246060:
    virtio: Cleanup feature description printing
r246306:
    virtio: Remove old debugging flag
r247238:
    virtio: Remove PRIx64 macros from format strings
r247239:
    virtio: Constify some fields
r247240:
    virtio: Minor code simplifications
r249962:
    virtio: Update to my freebsd.org email address

MFC after:	1 month
2013-07-04 17:57:26 +00:00
Bryan Venteicher
3dd8d840ed Merge vtnet changes from projects/virtio
Minor changes to the network driver. A multiqueue driver that is
a significant rewrite will be in merged shortly.

Contains projects/virtio commits:

r246058:
    vtnet: Move an mbuf ASSERT to the calling function
r246059:
    vtnet: Tweak ASSERT message

MFC after:	1 month
2013-07-04 17:55:58 +00:00
Bryan Venteicher
6f7e608220 Merge virtio_balloon changes from projects/virtio
Contains projects/virtio commits:

r245717:
    virtio_balloon: Make the softc lock a regular mutex
r245718:
    virtio_balloon: Remove two unuseful ASSERTs
r245719:
    virtio_balloon: More verbose ASSERT messages
r245720:
    virtio_balloon: Simplify lowmem handling in vtballoon_inflate()
r252530:
    virtio_balloon: Use just a kthread instead of dedciated kproc
r252568:
    virtio_balloon: Need to use kthread_exit() after r252530

MFC after:	1 month
2013-07-04 17:54:46 +00:00
Bryan Venteicher
118619ac60 Merge several virtio_blk changes from projects/virtio
The notable changes of this commit are support for disk resizing
and chases updates to the spec regarding write caching.

Contains projects/virtio commits:

r245713:
    virtio_blk: Replace __FUNCTION__ with __func__
r245714:
    virtio_blk: Use more consistent mutex name
r245715:
    virtio_blk: Print device name too if failed to reinit during dump
r245716:
    virtio_blk: Remove an unuseful ASSERT
r245723:
    virtio_blk: Record the vendor and device information
r245724:
    virtio_blk: Add resize support
r245726:
    virtio_blk: More verbose ASSERT messages
r245730:
    virtio_blk: Tweak resize announcement message
r246061:
    virtio_blk: Do not always read entire config
r246062:
    virtio_blk: Use topology to set the stripe size/offset
r246307:
    virtio_blk: Correct stripe offset calculation
r246063:
    virtio_blk: Add support for write cache enable feature
r246303:
    virtio_blk: Expand a comment
r252529:
    virtio_blk: Improve write cache handling
r252681:
    virtio_blk: Remove unneeded curly braces

MFC after:	1 month
2013-07-04 17:53:02 +00:00
Bryan Venteicher
6632efe40d Convert VirtIO to use ithreads instead of taskqueues
Contains projects/virtio commits:

r245709:
    Each VirtIO device was scheduling its own taskqueue(9) to do the
    off-level interrupt handling. ithreads(9) is the more nature way
    to do this. The primary motivation for this work to better support
    network multiqueue.
r245710:
    virtio: Change virtqueue intr handlers to return void
r245711:
    virtio_blk: Remove interrupt taskqueue
r245721:
    vtnet: Remove interrupt taskqueue
r245722:
    virtio_scsi: Remove interrupt taskqueue
r245747:
    vtnet: Remove taskqueue fields missed in r245721

MFC after:	1 month
2013-07-04 17:50:11 +00:00