Commit Graph

78 Commits

Author SHA1 Message Date
Steven Hartland
85c9dd9d89 Prevent overflow issues in timeout processing
Previously, any timeout value for which (timeout * hz) will overflow the
signed integer, will give weird results, since callout(9) routines will
convert negative values of ticks to '1'. For unsigned integer overflow we
will get sufficiently smaller timeout values than expected.

Switch from callout_reset, which requires conversion to int based ticks
to callout_reset_sbt to avoid this.

Also correct isci to correctly resolve ccb timeout.

This was based on the original work done by Eygene Ryabinkin
<rea@freebsd.org> back in 5 Aug 2011 which used a macro to help avoid
the overlow.

Differential Revision:	https://reviews.freebsd.org/D1157
Reviewed by:	mav, davide
MFC after:	1 month
Sponsored by:	Multiplay
2014-11-21 21:01:24 +00:00
Ruslan Bukin
c141c5c6b6 Add Virtio MMIO bus driver.
Sponsored by:	DARPA, AFRL
2014-11-18 14:11:14 +00:00
Bryan Venteicher
9a4dabdc5a Enable LRO by default when available on vtnet interfaces
The prior change to not enable LRO by default has confused several
people. The configurations where LRO is problematic is not the
typical use case for VirtIO, and due to other issues, this often
requires checksum offloading to be disabled anyways.

PR:		185864
MFC after:	2 weeks
2014-11-09 20:04:12 +00:00
Bryan Venteicher
b84b3efdde Several minor changes to hopefully complete the VirtIO console driver
- Support the KDB alt break sequence to enter the debugger,
    panic, reboot, etc. [1]
  - Provide emergency write feature description. Note that QEMU
    does not implement this feature.
  - Make the VTCON_FLAG_* defines sequential once again.
  - When the multiple port feature is not negotiated, query the
    rows and columns of the one console during the device attach
    when the size feature is negotiated.
  - Report failure to the device if hot plugging a port fails.
  - Acknowledge the console port event with an open event. This
    is required by the spec, but QEMU doesn't seem to care.

Submitted by:	Juniper [1]
MFC after:	1 month
2014-11-07 03:36:28 +00:00
Bryan Venteicher
46822c484c Create the tty device after the port is completely initialized
This fixes a race with a tty open before the host is the ready.

MFC after:	1 month
2014-11-03 22:17:25 +00:00
Bryan Venteicher
04434c94dd Add support for the multiport feature and fix hot plug races
MFC after:	1 month
2014-11-03 16:57:01 +00:00
Bryan Venteicher
6f744ddee4 Add VirtIO console driver
Support for the multiport feature is mostly implemented, but currently
disabled due to some potential races in the hot plug code paths.

Requested by:	marcel
MFC after:	1 month
Relnotes:	yes
2014-10-23 04:47:32 +00:00
Gleb Smirnoff
84047b19df - Provide if_get_counter() method for vtnet(4).
- Do not accumulate statistics on every tick.
- Accumulate statistics in vtnet_setup_stat_sysctl()
  and in vtnet_get_counter().

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2014-09-18 19:15:40 +00:00
Gleb Smirnoff
1bffa9511f Use define from if_var.h to access a field inside struct if_data,
that resides in struct ifnet.

Sponsored by:	Nginx, Inc.
2014-08-30 19:55:54 +00:00
Luigi Rizzo
4bf50f18eb Update to the current version of netmap.
Mostly bugfixes or features developed in the past 6 months,
so this is a 10.1 candidate.

Basically no user API changes (some bugfixes in sys/net/netmap_user.h).

In detail:

1. netmap support for virtio-net, including in netmap mode.
  Under bhyve and with a netmap backend [2] we reach over 1Mpps
  with standard APIs (e.g. libpcap), and 5-8 Mpps in netmap mode.

2. (kernel) add support for multiple memory allocators, so we can
  better partition physical and virtual interfaces giving access
  to separate users. The most visible effect is one additional
  argument to the various kernel functions to compute buffer
  addresses. All netmap-supported drivers are affected, but changes
  are mechanical and trivial

3. (kernel) simplify the prototype for *txsync() and *rxsync()
  driver methods. All netmap drivers affected, changes mostly mechanical.

4. add support for netmap-monitor ports. Think of it as a mirroring
  port on a physical switch: a netmap monitor port replicates traffic
  present on the main port. Restrictions apply. Drive carefully.

5. if_lem.c: support for various paravirtualization features,
  experimental and disabled by default.
  Most of these are described in our ANCS'13 paper [1].
  Paravirtualized support in netmap mode is new, and beats the
  numbers in the paper by a large factor (under qemu-kvm,
  we measured gues-host throughput up to 10-12 Mpps).

A lot of refactoring and additional documentation in the files
in sys/dev/netmap, but apart from #2 and #3 above, almost nothing
of this stuff is visible to other kernel parts.

Example programs in tools/tools/netmap have been updated with bugfixes
and to support more of the existing features.

This is meant to go into 10.1 so we plan an MFC before the Aug.22 deadline.

A lot of this code has been contributed by my colleagues at UNIPI,
including Giuseppe Lettieri, Vincenzo Maffione, Stefano Garzarella.

MFC after:	3 days.
2014-08-16 15:00:01 +00:00
Luigi Rizzo
a5b6123ea9 print additional debugging info in virtqueue_dump()
(not fundamental, but useful to debug performance issues on vtnet)

MFC after:	3 days
2014-08-16 13:13:17 +00:00
Bryan Venteicher
32487a8973 Rework when the Tx queue completion interrupt is enabled
The Tx interrupt is now kept disabled in the common case, only
enabled when the number of free descriptors in the queue falls
below a threshold. Transmitted frames are cleared from the VQ
before subsequent transmit, or in the watchdog timer.

This was a very big performance improvement for an experimental
Netmap bhyve backend.

MFC after:	1 month
2014-07-10 05:36:04 +00:00
Bryan Venteicher
4b59668f0e Add accessor to get the number of free descriptors in the virtqueue
MFC after:	1 month
2014-07-10 05:26:01 +00:00
Roger Pau Monné
68e58ea7ed xen/virtio: fix balloon drivers to not mark pages as WIRED
Prevent the Xen and VirtIO balloon drivers from marking pages as
wired. This prevents them from increasing the system wired page count,
which can lead to mlock failing because of hitting the limit in
vm.max_wired.

In the Xen case make sure pages are zeroed before giving them back to
the hypervisor, or else we might be leaking data. Also remove the
balloon_{append/retrieve} and link pages directly into the
ballooned_pages queue using the plinks.q field in the page struct.

Sponsored by: Citrix Systems R&D
Reviewed by: kib, bryanv
Approved by: gibbs

dev/virtio/balloon/virtio_balloon.c:
 - Don't allocate pages with VM_ALLOC_WIRED.

dev/xen/balloon/balloon.c:
 - Don't allocate pages with VM_ALLOC_WIRED.
 - Make sure pages are zeroed before giving them back to the
   hypervisor.
 - Remove the balloon_entry struct and the balloon_{append/retrieve}
   functions and use the page plinks.q entry to link the pages
   directly into the ballooned_pages queue.
2014-06-25 09:51:08 +00:00
Attilio Rao
3ae10f7477 - Modify vm_page_unwire() and vm_page_enqueue() to directly accept
the queue where to enqueue pages that are going to be unwired.
- Add stronger checks to the enqueue/dequeue for the pagequeues when
  adding and removing pages to them.

Of course, for unmanaged pages the queue parameter of vm_page_unwire() will
be ignored, just as the active parameter today.
This makes adding new pagequeues quicker.

This change effectively modifies the KPI.  __FreeBSD_version will be,
however, bumped just when the full cache of free pages will be
evicted.

Sponsored by:	EMC / Isilon storage division
Reviewed by:	alc
Tested by:	pho
2014-06-16 18:15:27 +00:00
Bryan Venteicher
bae486f5d7 Force two byte alignment for all control message headers
The header structure consists of two 1-byte elements, but it must always
be describable by a single SG entry. Note for consistency, specify the
alignment everywhere, even if the structure has the appropriate natural
alignment since it contains a uint16_t.

Obtained from:	DragonFlyBSD
MFC after:	1 week
2014-06-16 04:32:27 +00:00
Bryan Venteicher
fd5b395117 Make the feature negotiation code easier to follow
MFC after:	1 week
2014-06-16 04:29:28 +00:00
Bryan Venteicher
45543f0751 Move the VIRTIO_RING_F_* defines out of virtqueue.h into virtio_config.h
These defines are applicable to userland too, but virtqueue.h contains
the kernel virtqueue interface, and is therefore not usable in userland.

Note that Linux places these defines in virtio_ring.h, but I don't want
the drivers including this header file to keep the VirtIO ring opaque to
everything but the virtqueue.

MFC after:	1 week
2014-06-16 04:25:04 +00:00
Bryan Venteicher
e026de111e Remove kernel specific macro out of the VirtIO PCI header file
The eventual goal is to share this file with userland, so
remove the macro that is only specific for virtio_pci(4).
Instead, add the VIRTIO_PCI_CONFIG_OFF macro from Linux to
get the config size whether MSIX is enabled or not.

MFC after:	1 week
2014-06-16 04:16:31 +00:00
Bryan Venteicher
add526c613 - Remove two write-only local variables
- Remove unused element in the vtnet_rxq structure

MFC after:	1 week
2014-06-16 04:12:33 +00:00
Bryan Venteicher
49d5172b34 Always append new bios to the tail of the queue, instead of sorting them
MFC after:	1 week
2014-06-10 03:29:15 +00:00
Luigi Rizzo
c26e5fc2ed make sure ifp->if_transmit returns 0 if a buffer is enqueued.
A similar fix should be applied to vmxnet, ixgbe, igb, i40e.
(some of them previously reported by Michael Tuexen)

Drivers using if_transmit are correct, and so are most of the
other drivers that reassing if_transmit.

Among other things, this bug causes panics when using netmap emulation
on top of generic drivers.

Approved by:	bryanv
MFC after:	3 days
2014-06-04 16:57:05 +00:00
Bryan Venteicher
9a73216696 Split the virtio.h header file into multiple files
Reorganize the previous contexts of the file as it is in Linux. The
eventual goal is to install the header files and share them between
the kernel and bhyve.

MFC after:	1 week
2014-06-01 18:16:01 +00:00
Bryan Venteicher
afd5e40ee0 Wait for the callout to finish before unloading the module
MFC after:	3 days
2014-04-24 05:04:54 +00:00
Gleb Smirnoff
b245f96c44 Since 32-bit if_baudrate isn't enough to describe a baud rate of a 10 Gbit
interface, in the r241616 a crutch was provided. It didn't work well, and
finally we decided that it is time to break ABI and simply make if_baudrate
a 64-bit value. Meanwhile, the entire struct if_data was reviewed.

o Remove the if_baudrate_pf crutch.

o Make all fields of struct if_data fixed machine independent size. The
  notion of data (packet counters, etc) are by no means MD. And it is a
  bug that on amd64 we've got a 64-bit counters, while on i386 32-bit,
  which at modern speeds overflow within a second.

  This also removes quite a lot of COMPAT_FREEBSD32 code.

o Give 16 bit for the ifi_datalen field. This field was provided to
  make future changes to if_data less ABI breaking. Unfortunately the
  8 bit size of it had effectively limited sizeof if_data to 256 bytes.

o Give 32 bits to ifi_mtu and ifi_metric.
o Give 64 bits to the rest of fields, since they are counters.

__FreeBSD_version bumped.

Discussed with:	emax
Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2014-03-13 03:42:24 +00:00
Bryan Venteicher
54fb8142b6 Use m_defrag() instead of m_collapse() to compact a long mbuf chain
This should be an infrequent occurrence, so remove the per-queue
counters in favor of just global counters in the softc.
2014-02-02 05:20:46 +00:00
Bryan Venteicher
443c3d0bd1 Do not place the sglist used for Rx/Tx on the stack
The sglist segment array has grown to a bit over 512 bytes (on
64-bit system) which is more than ideally should be put on the
stack. Instead allocate an appropriately sized sglist and hang
it off each Rx/Tx queue structure.

Bump the maximum number of Tx segments to 64 to make it unlikely
we'll have defragment an mbuf chain. Our previous count was
rounded up to this value since it is the next power of two, so
effective memory usage should not change.

Also only allocate the maximum number of Tx segments if TSO was
negotiated.
2014-02-02 05:15:36 +00:00
Bryan Venteicher
9ef6342f9e Check for a full virtqueue in the multiqueue transmit path
With most hosts, we'll negotiate indirect descriptors, so all we
need is one available descriptor to transmit a frame.
2014-01-25 19:58:53 +00:00
Bryan Venteicher
dd6f83a00f Avoid queue unlock followed by relock when the enable interrupt race is lost
This already happens infrequently, and the hold time is still bounded since
we defer to a taskqueue after a few tries.
2014-01-25 19:57:30 +00:00
Bryan Venteicher
bddddcd566 Move duplicated transmit start code into a single function 2014-01-25 19:55:42 +00:00
Bryan Venteicher
5591e479fe Remove stray space 2014-01-25 18:34:57 +00:00
Bryan Venteicher
9471658415 Also include the mbuf's csum_flags in an assert message 2014-01-25 07:35:09 +00:00
Bryan Venteicher
1dbb21dcc9 Read and write the MAC address in the config space byte by byte 2014-01-25 07:13:47 +00:00
Bryan Venteicher
8c457c885e Read each field of the configuration individually
In the forthcoming VirtIO spec, the device configuration is
always in little endian instead of guest edian. This is a
noop change for now.
2014-01-25 07:01:51 +00:00
Bryan Venteicher
31ac03991b Remove spaces before tabs in the function prototype list 2014-01-25 06:54:04 +00:00
Bryan Venteicher
10c4018057 Add very simple virtio_random(4) driver to harvest entropy from host
Reviewed by:	markm (random bits only)
2014-01-18 06:14:38 +00:00
Bryan Venteicher
22525db507 Add unmapped IO support to virtio_scsi(4) 2014-01-13 04:46:48 +00:00
Bryan Venteicher
ee11ec3437 Add unmapped IO support to virtio_blk(4) 2014-01-13 04:43:01 +00:00
Bryan Venteicher
bf51187b26 Remove incorrect bit shift when assigning the LUN request field
This caused duplicate targets appearing on Google Compute Engine
instances.

PR:		kern/185626
Submitted by:	Venkatesh Srinivas <venkateshs@google.com>
MFC after:	3 days
2014-01-12 17:40:47 +00:00
Gleb Smirnoff
c3322cb91c Include necessary headers that now are available due to pollution
via if_var.h.

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2013-10-28 07:29:16 +00:00
Gleb Smirnoff
76039bc84f The r48589 promised to remove implicit inclusion of if_var.h soon. Prepare
to this event, adding if_var.h to files that do need it. Also, include
all includes that now are included due to implicit pollution via if_var.h

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2013-10-26 17:58:36 +00:00
Bryan Venteicher
d797300b75 Do not hold the vtnet Rx queue lock when calling up into the stack
This matches other similar drivers and avoids various LOR warnings.

Approved by:	re (marius)
2013-10-05 18:07:24 +00:00
Bryan Venteicher
6e03f31982 Complete any pending Tx frames before attempting the next transmit
Also complete pending frames in the watchdog function when the
EVENT_IDX feature was negotiated just in case the completion
interrupt was postponed.
2013-09-03 02:28:31 +00:00
Bryan Venteicher
4142b1cbe5 Fix unintended compiler constant folding
Pointed out by:	dim@
2013-09-03 02:26:57 +00:00
Eitan Adler
72d9611d24 Fix build with gcc
Reported by:	Michael Butler <imb@protected-networks.net>
Reviewed by:	jilles
2013-09-01 20:22:52 +00:00
Bryan Venteicher
8f3600b108 Import multiqueue VirtIO net driver from my user/bryanv/vtnetmq branch
This is a significant rewrite of much of the previous driver; lots of
misc. cleanup was also performed, and support for a few other minor
features was also added.
2013-09-01 04:33:47 +00:00
Bryan Venteicher
cfc28a5bf7 Sync VirtIO net device header file from recent Linux 2013-09-01 04:23:54 +00:00
Bryan Venteicher
49a4385d69 Add optional VirtIO device method for post-attach notifications
This is called after the parent device (ie virito_pci) has
completed the device attachment/initialization.
2013-09-01 04:20:23 +00:00
Bryan Venteicher
b619f40aec Add support for postponing VirtIO virtqueue interrupts
Partial support for the EVENT_IDX feature was added a while ago,
but this commit adds an interface for the device driver to hint
how long (in terms of descriptors) the next interrupt should be
delayed.

The first user of this will be used to reduce VirtIO net's Tx
completion interrupts.
2013-09-01 04:16:43 +00:00
Konstantin Belousov
c325e866f4 Different consumers of the struct vm_page abuse pageq member to keep
additional information, when the page is guaranteed to not belong to a
paging queue.  Usually, this results in a lot of type casts which make
reasoning about the code correctness harder.

Sometimes m->object is used instead of pageq, which could cause real
and confusing bugs if non-NULL m->object is leaked.  See r141955 and
r253140 for examples.

Change the pageq member into a union containing explicitly-typed
members.  Use them instead of type-punning or abusing m->object in x86
pmaps, uma and vm_page_alloc_contig().

Requested and reviewed by:	alc
Sponsored by:	The FreeBSD Foundation
2013-08-10 17:36:42 +00:00