Commit Graph

77 Commits

Author SHA1 Message Date
br
9313cb1f2f Add Virtio MMIO bus driver.
Sponsored by:	DARPA, AFRL
2014-11-18 14:11:14 +00:00
bryanv
d20788b3ba Enable LRO by default when available on vtnet interfaces
The prior change to not enable LRO by default has confused several
people. The configurations where LRO is problematic is not the
typical use case for VirtIO, and due to other issues, this often
requires checksum offloading to be disabled anyways.

PR:		185864
MFC after:	2 weeks
2014-11-09 20:04:12 +00:00
bryanv
74370c77ee Several minor changes to hopefully complete the VirtIO console driver
- Support the KDB alt break sequence to enter the debugger,
    panic, reboot, etc. [1]
  - Provide emergency write feature description. Note that QEMU
    does not implement this feature.
  - Make the VTCON_FLAG_* defines sequential once again.
  - When the multiple port feature is not negotiated, query the
    rows and columns of the one console during the device attach
    when the size feature is negotiated.
  - Report failure to the device if hot plugging a port fails.
  - Acknowledge the console port event with an open event. This
    is required by the spec, but QEMU doesn't seem to care.

Submitted by:	Juniper [1]
MFC after:	1 month
2014-11-07 03:36:28 +00:00
bryanv
8b54a0b084 Create the tty device after the port is completely initialized
This fixes a race with a tty open before the host is the ready.

MFC after:	1 month
2014-11-03 22:17:25 +00:00
bryanv
361372e423 Add support for the multiport feature and fix hot plug races
MFC after:	1 month
2014-11-03 16:57:01 +00:00
bryanv
8f4c0531c0 Add VirtIO console driver
Support for the multiport feature is mostly implemented, but currently
disabled due to some potential races in the hot plug code paths.

Requested by:	marcel
MFC after:	1 month
Relnotes:	yes
2014-10-23 04:47:32 +00:00
glebius
43eeb0cc88 - Provide if_get_counter() method for vtnet(4).
- Do not accumulate statistics on every tick.
- Accumulate statistics in vtnet_setup_stat_sysctl()
  and in vtnet_get_counter().

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2014-09-18 19:15:40 +00:00
glebius
9b93b159b3 Use define from if_var.h to access a field inside struct if_data,
that resides in struct ifnet.

Sponsored by:	Nginx, Inc.
2014-08-30 19:55:54 +00:00
luigi
3ab69a246b Update to the current version of netmap.
Mostly bugfixes or features developed in the past 6 months,
so this is a 10.1 candidate.

Basically no user API changes (some bugfixes in sys/net/netmap_user.h).

In detail:

1. netmap support for virtio-net, including in netmap mode.
  Under bhyve and with a netmap backend [2] we reach over 1Mpps
  with standard APIs (e.g. libpcap), and 5-8 Mpps in netmap mode.

2. (kernel) add support for multiple memory allocators, so we can
  better partition physical and virtual interfaces giving access
  to separate users. The most visible effect is one additional
  argument to the various kernel functions to compute buffer
  addresses. All netmap-supported drivers are affected, but changes
  are mechanical and trivial

3. (kernel) simplify the prototype for *txsync() and *rxsync()
  driver methods. All netmap drivers affected, changes mostly mechanical.

4. add support for netmap-monitor ports. Think of it as a mirroring
  port on a physical switch: a netmap monitor port replicates traffic
  present on the main port. Restrictions apply. Drive carefully.

5. if_lem.c: support for various paravirtualization features,
  experimental and disabled by default.
  Most of these are described in our ANCS'13 paper [1].
  Paravirtualized support in netmap mode is new, and beats the
  numbers in the paper by a large factor (under qemu-kvm,
  we measured gues-host throughput up to 10-12 Mpps).

A lot of refactoring and additional documentation in the files
in sys/dev/netmap, but apart from #2 and #3 above, almost nothing
of this stuff is visible to other kernel parts.

Example programs in tools/tools/netmap have been updated with bugfixes
and to support more of the existing features.

This is meant to go into 10.1 so we plan an MFC before the Aug.22 deadline.

A lot of this code has been contributed by my colleagues at UNIPI,
including Giuseppe Lettieri, Vincenzo Maffione, Stefano Garzarella.

MFC after:	3 days.
2014-08-16 15:00:01 +00:00
luigi
bfa4a863c6 print additional debugging info in virtqueue_dump()
(not fundamental, but useful to debug performance issues on vtnet)

MFC after:	3 days
2014-08-16 13:13:17 +00:00
bryanv
ee6f08db44 Rework when the Tx queue completion interrupt is enabled
The Tx interrupt is now kept disabled in the common case, only
enabled when the number of free descriptors in the queue falls
below a threshold. Transmitted frames are cleared from the VQ
before subsequent transmit, or in the watchdog timer.

This was a very big performance improvement for an experimental
Netmap bhyve backend.

MFC after:	1 month
2014-07-10 05:36:04 +00:00
bryanv
eda154e3c4 Add accessor to get the number of free descriptors in the virtqueue
MFC after:	1 month
2014-07-10 05:26:01 +00:00
royger
67e7468d2b xen/virtio: fix balloon drivers to not mark pages as WIRED
Prevent the Xen and VirtIO balloon drivers from marking pages as
wired. This prevents them from increasing the system wired page count,
which can lead to mlock failing because of hitting the limit in
vm.max_wired.

In the Xen case make sure pages are zeroed before giving them back to
the hypervisor, or else we might be leaking data. Also remove the
balloon_{append/retrieve} and link pages directly into the
ballooned_pages queue using the plinks.q field in the page struct.

Sponsored by: Citrix Systems R&D
Reviewed by: kib, bryanv
Approved by: gibbs

dev/virtio/balloon/virtio_balloon.c:
 - Don't allocate pages with VM_ALLOC_WIRED.

dev/xen/balloon/balloon.c:
 - Don't allocate pages with VM_ALLOC_WIRED.
 - Make sure pages are zeroed before giving them back to the
   hypervisor.
 - Remove the balloon_entry struct and the balloon_{append/retrieve}
   functions and use the page plinks.q entry to link the pages
   directly into the ballooned_pages queue.
2014-06-25 09:51:08 +00:00
attilio
2802c525ad - Modify vm_page_unwire() and vm_page_enqueue() to directly accept
the queue where to enqueue pages that are going to be unwired.
- Add stronger checks to the enqueue/dequeue for the pagequeues when
  adding and removing pages to them.

Of course, for unmanaged pages the queue parameter of vm_page_unwire() will
be ignored, just as the active parameter today.
This makes adding new pagequeues quicker.

This change effectively modifies the KPI.  __FreeBSD_version will be,
however, bumped just when the full cache of free pages will be
evicted.

Sponsored by:	EMC / Isilon storage division
Reviewed by:	alc
Tested by:	pho
2014-06-16 18:15:27 +00:00
bryanv
e2a9bb78eb Force two byte alignment for all control message headers
The header structure consists of two 1-byte elements, but it must always
be describable by a single SG entry. Note for consistency, specify the
alignment everywhere, even if the structure has the appropriate natural
alignment since it contains a uint16_t.

Obtained from:	DragonFlyBSD
MFC after:	1 week
2014-06-16 04:32:27 +00:00
bryanv
ae22af0ab4 Make the feature negotiation code easier to follow
MFC after:	1 week
2014-06-16 04:29:28 +00:00
bryanv
e5f8bdea96 Move the VIRTIO_RING_F_* defines out of virtqueue.h into virtio_config.h
These defines are applicable to userland too, but virtqueue.h contains
the kernel virtqueue interface, and is therefore not usable in userland.

Note that Linux places these defines in virtio_ring.h, but I don't want
the drivers including this header file to keep the VirtIO ring opaque to
everything but the virtqueue.

MFC after:	1 week
2014-06-16 04:25:04 +00:00
bryanv
a57033ead4 Remove kernel specific macro out of the VirtIO PCI header file
The eventual goal is to share this file with userland, so
remove the macro that is only specific for virtio_pci(4).
Instead, add the VIRTIO_PCI_CONFIG_OFF macro from Linux to
get the config size whether MSIX is enabled or not.

MFC after:	1 week
2014-06-16 04:16:31 +00:00
bryanv
595f0d43b9 - Remove two write-only local variables
- Remove unused element in the vtnet_rxq structure

MFC after:	1 week
2014-06-16 04:12:33 +00:00
bryanv
03913cd45b Always append new bios to the tail of the queue, instead of sorting them
MFC after:	1 week
2014-06-10 03:29:15 +00:00
luigi
d6393048d3 make sure ifp->if_transmit returns 0 if a buffer is enqueued.
A similar fix should be applied to vmxnet, ixgbe, igb, i40e.
(some of them previously reported by Michael Tuexen)

Drivers using if_transmit are correct, and so are most of the
other drivers that reassing if_transmit.

Among other things, this bug causes panics when using netmap emulation
on top of generic drivers.

Approved by:	bryanv
MFC after:	3 days
2014-06-04 16:57:05 +00:00
bryanv
97f3f8250e Split the virtio.h header file into multiple files
Reorganize the previous contexts of the file as it is in Linux. The
eventual goal is to install the header files and share them between
the kernel and bhyve.

MFC after:	1 week
2014-06-01 18:16:01 +00:00
bryanv
73ece4aff2 Wait for the callout to finish before unloading the module
MFC after:	3 days
2014-04-24 05:04:54 +00:00
glebius
b38edcd355 Since 32-bit if_baudrate isn't enough to describe a baud rate of a 10 Gbit
interface, in the r241616 a crutch was provided. It didn't work well, and
finally we decided that it is time to break ABI and simply make if_baudrate
a 64-bit value. Meanwhile, the entire struct if_data was reviewed.

o Remove the if_baudrate_pf crutch.

o Make all fields of struct if_data fixed machine independent size. The
  notion of data (packet counters, etc) are by no means MD. And it is a
  bug that on amd64 we've got a 64-bit counters, while on i386 32-bit,
  which at modern speeds overflow within a second.

  This also removes quite a lot of COMPAT_FREEBSD32 code.

o Give 16 bit for the ifi_datalen field. This field was provided to
  make future changes to if_data less ABI breaking. Unfortunately the
  8 bit size of it had effectively limited sizeof if_data to 256 bytes.

o Give 32 bits to ifi_mtu and ifi_metric.
o Give 64 bits to the rest of fields, since they are counters.

__FreeBSD_version bumped.

Discussed with:	emax
Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2014-03-13 03:42:24 +00:00
bryanv
0fb8b46977 Use m_defrag() instead of m_collapse() to compact a long mbuf chain
This should be an infrequent occurrence, so remove the per-queue
counters in favor of just global counters in the softc.
2014-02-02 05:20:46 +00:00
bryanv
e20207898f Do not place the sglist used for Rx/Tx on the stack
The sglist segment array has grown to a bit over 512 bytes (on
64-bit system) which is more than ideally should be put on the
stack. Instead allocate an appropriately sized sglist and hang
it off each Rx/Tx queue structure.

Bump the maximum number of Tx segments to 64 to make it unlikely
we'll have defragment an mbuf chain. Our previous count was
rounded up to this value since it is the next power of two, so
effective memory usage should not change.

Also only allocate the maximum number of Tx segments if TSO was
negotiated.
2014-02-02 05:15:36 +00:00
bryanv
4f94357c4c Check for a full virtqueue in the multiqueue transmit path
With most hosts, we'll negotiate indirect descriptors, so all we
need is one available descriptor to transmit a frame.
2014-01-25 19:58:53 +00:00
bryanv
2ff4469dc7 Avoid queue unlock followed by relock when the enable interrupt race is lost
This already happens infrequently, and the hold time is still bounded since
we defer to a taskqueue after a few tries.
2014-01-25 19:57:30 +00:00
bryanv
60948c05ae Move duplicated transmit start code into a single function 2014-01-25 19:55:42 +00:00
bryanv
25b1f83c53 Remove stray space 2014-01-25 18:34:57 +00:00
bryanv
d7761633d8 Also include the mbuf's csum_flags in an assert message 2014-01-25 07:35:09 +00:00
bryanv
96f4283ea8 Read and write the MAC address in the config space byte by byte 2014-01-25 07:13:47 +00:00
bryanv
9e10eebc1c Read each field of the configuration individually
In the forthcoming VirtIO spec, the device configuration is
always in little endian instead of guest edian. This is a
noop change for now.
2014-01-25 07:01:51 +00:00
bryanv
fc455b7918 Remove spaces before tabs in the function prototype list 2014-01-25 06:54:04 +00:00
bryanv
31dc7c36ca Add very simple virtio_random(4) driver to harvest entropy from host
Reviewed by:	markm (random bits only)
2014-01-18 06:14:38 +00:00
bryanv
794929e7d2 Add unmapped IO support to virtio_scsi(4) 2014-01-13 04:46:48 +00:00
bryanv
841c25608b Add unmapped IO support to virtio_blk(4) 2014-01-13 04:43:01 +00:00
bryanv
5710e98625 Remove incorrect bit shift when assigning the LUN request field
This caused duplicate targets appearing on Google Compute Engine
instances.

PR:		kern/185626
Submitted by:	Venkatesh Srinivas <venkateshs@google.com>
MFC after:	3 days
2014-01-12 17:40:47 +00:00
glebius
f469ae1d45 Include necessary headers that now are available due to pollution
via if_var.h.

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2013-10-28 07:29:16 +00:00
glebius
ff6e113f1b The r48589 promised to remove implicit inclusion of if_var.h soon. Prepare
to this event, adding if_var.h to files that do need it. Also, include
all includes that now are included due to implicit pollution via if_var.h

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2013-10-26 17:58:36 +00:00
bryanv
181aa517b6 Do not hold the vtnet Rx queue lock when calling up into the stack
This matches other similar drivers and avoids various LOR warnings.

Approved by:	re (marius)
2013-10-05 18:07:24 +00:00
bryanv
2be33f6260 Complete any pending Tx frames before attempting the next transmit
Also complete pending frames in the watchdog function when the
EVENT_IDX feature was negotiated just in case the completion
interrupt was postponed.
2013-09-03 02:28:31 +00:00
bryanv
b59e843b9f Fix unintended compiler constant folding
Pointed out by:	dim@
2013-09-03 02:26:57 +00:00
eadler
ff37479c6f Fix build with gcc
Reported by:	Michael Butler <imb@protected-networks.net>
Reviewed by:	jilles
2013-09-01 20:22:52 +00:00
bryanv
c401159592 Import multiqueue VirtIO net driver from my user/bryanv/vtnetmq branch
This is a significant rewrite of much of the previous driver; lots of
misc. cleanup was also performed, and support for a few other minor
features was also added.
2013-09-01 04:33:47 +00:00
bryanv
4174a82301 Sync VirtIO net device header file from recent Linux 2013-09-01 04:23:54 +00:00
bryanv
a9e07a227e Add optional VirtIO device method for post-attach notifications
This is called after the parent device (ie virito_pci) has
completed the device attachment/initialization.
2013-09-01 04:20:23 +00:00
bryanv
f175a1e7f9 Add support for postponing VirtIO virtqueue interrupts
Partial support for the EVENT_IDX feature was added a while ago,
but this commit adds an interface for the device driver to hint
how long (in terms of descriptors) the next interrupt should be
delayed.

The first user of this will be used to reduce VirtIO net's Tx
completion interrupts.
2013-09-01 04:16:43 +00:00
kib
4675fcfce0 Different consumers of the struct vm_page abuse pageq member to keep
additional information, when the page is guaranteed to not belong to a
paging queue.  Usually, this results in a lot of type casts which make
reasoning about the code correctness harder.

Sometimes m->object is used instead of pageq, which could cause real
and confusing bugs if non-NULL m->object is leaked.  See r141955 and
r253140 for examples.

Change the pageq member into a union containing explicitly-typed
members.  Use them instead of type-punning or abusing m->object in x86
pmaps, uma and vm_page_alloc_contig().

Requested and reviewed by:	alc
Sponsored by:	The FreeBSD Foundation
2013-08-10 17:36:42 +00:00
bryanv
07bf5c56bf Merge virtio_scsi change from projects/virtio
r252680:
    Fix SIM lock not owned panic

    The CAM locking requirements of registering an async
    callback has changed so the SIM lock must be held. Remove
    code that explicitly dropped the lock around the register.

    Also return CAM_SEL_TIMEOUT instead of CAM_TID_INVALID
    for bad targets to avoid a lot console spam during bus
    scans.

MFC after:	1 month
2013-07-04 18:00:27 +00:00