Commit Graph

110 Commits

Author SHA1 Message Date
Andriy Gapon
18ec6fbc3a virtio_pci: fix announcement of MSI-X interrupts for queues
Queues that do not need interrupts - for instance, output queues - do
not have a corresponding entry in vtpci_msix_vq_interrupts.
So, it was wrong to increment a pointer into that array when iterating
over such a queue.

I ran into this bug while trying to use virtio_console(4) that allocates
a lot of queues with every other being an output queue without an
interrupt handler (if MultiplePorts feature is negotiated).

MFC after:	2 weeks
2016-11-24 21:32:04 +00:00
Andriy Gapon
8be1b98cae virtio_console: correctly determine presense of payload and its length
MFC after:	2 weeks
2016-11-24 21:12:32 +00:00
Jakub Wojciech Klama
f7b1d7f419 Reserve space for control message payload (currently a port name).
Approved by:	trasz (mentor)
Sponsored by:	iXsystems, Inc.
2016-11-12 01:41:43 +00:00
Jakub Wojciech Klama
cb9813432a Create aliases for named virtio-console ports.
Make virtio_console(4) create `/dev/vtcon/<port_name>` alias pointing
to /dev/ttyVx.y upon receiving PORT_NAME (id = 7) event over the control
queue.

Approved by:	trasz
MFC after:	1 month
Sponsored by:	iXsystems, Inc.
Differential Revision:	https://reviews.freebsd.org/D7182
2016-09-17 16:03:33 +00:00
Steven Hartland
4be723f63e Fix vtnet hang with max_virtqueue_pairs > VTNET_MAX_QUEUE_PAIRS
Correctly limit npairs passed to vtnet_ctrl_mq_cmd. This ensures that
VQ_ALLOC_INFO_INIT is called with the correct value, preventing the system
from hanging when max_virtqueue_pairs > VTNET_MAX_QUEUE_PAIRS.

Add new sysctl requested_vq_pairs which allow the user to configure
the requested number of virtqueue pairs. The actual value will still take
into account the system limits.

Also missing sysctls for the current tunables so their values can be seen.

PR:		207446
Reported by:	Andy Carrel
MFC after:	3 days
Relnotes:	Yes
Sponsored by:	Multiplay
2016-08-11 21:13:58 +00:00
Kristof Provost
3fcb1aaef1 vtnet: fix panic on unload
Since r276367 added the virtio_mmio support vtnet_modevent() gets called twice.
This resulted in a memory leak during load and a panic on unload.

Count the loads so we only initialise once (just like cxgbe(4)), and only clean
up in the final unload.

PR:		209428
Submitted by:	novel@FreeBSD.org
MFC after:	1 week
2016-05-14 06:07:15 +00:00
Pedro F. Giffuni
453130d9bf sys/dev: minor spelling fixes.
Most affect comments, very few have user-visible effects.
2016-05-03 03:41:25 +00:00
Warner Losh
c55f57071a Create an API to reset a struct bio (g_reset_bio). This is mandatory
for all struct bio you get back from g_{new,alloc}_bio. Temporary
bios that you create on the stack or elsewhere should use this before
first use of the bio, and between uses of the bio. At the moment, it
is nothing more than a wrapper around bzero, but that may change in
the future. The wrapper also removes one place where we encode the
size of struct bio in the KBI.
2016-02-17 17:16:02 +00:00
Marcelo Araujo
804fc8c859 Lower the compiler warning: unused-but-set-variable.
Approved by:		bapt (mentor)
Differential Revision:	D3556
2015-09-03 06:53:17 +00:00
Luigi Rizzo
0fdeab7bc5 add netmap dependency when compiled as a module 2015-07-10 07:13:14 +00:00
Ruslan Bukin
116b8d2b0b Add 'prewrite' method allowing us to run some platform-specific
code before each write happens, e.g. write-back caches.
This will help booting in Bluespec simulator of CHERI processor.
2015-07-03 14:13:16 +00:00
Mark Murray
d1b06863fb Huge cleanup of random(4) code.
* GENERAL
- Update copyright.
- Make kernel options for RANDOM_YARROW and RANDOM_DUMMY. Set
  neither to ON, which means we want Fortuna
- If there is no 'device random' in the kernel, there will be NO
  random(4) device in the kernel, and the KERN_ARND sysctl will
  return nothing. With RANDOM_DUMMY there will be a random(4) that
  always blocks.
- Repair kern.arandom (KERN_ARND sysctl). The old version went
  through arc4random(9) and was a bit weird.
- Adjust arc4random stirring a bit - the existing code looks a little
  suspect.
- Fix the nasty pre- and post-read overloading by providing explictit
  functions to do these tasks.
- Redo read_random(9) so as to duplicate random(4)'s read internals.
  This makes it a first-class citizen rather than a hack.
- Move stuff out of locked regions when it does not need to be
  there.
- Trim RANDOM_DEBUG printfs. Some are excess to requirement, some
  behind boot verbose.
- Use SYSINIT to sequence the startup.
- Fix init/deinit sysctl stuff.
- Make relevant sysctls also tunables.
- Add different harvesting "styles" to allow for different requirements
  (direct, queue, fast).
- Add harvesting of FFS atime events. This needs to be checked for
  weighing down the FS code.
- Add harvesting of slab allocator events. This needs to be checked for
  weighing down the allocator code.
- Fix the random(9) manpage.
- Loadable modules are not present for now. These will be re-engineered
  when the dust settles.
- Use macros for locks.
- Fix comments.

* src/share/man/...
- Update the man pages.

* src/etc/...
- The startup/shutdown work is done in D2924.

* src/UPDATING
- Add UPDATING announcement.

* src/sys/dev/random/build.sh
- Add copyright.
- Add libz for unit tests.

* src/sys/dev/random/dummy.c
- Remove; no longer needed. Functionality incorporated into randomdev.*.

* live_entropy_sources.c live_entropy_sources.h
- Remove; content moved.
- move content to randomdev.[ch] and optimise.

* src/sys/dev/random/random_adaptors.c src/sys/dev/random/random_adaptors.h
- Remove; plugability is no longer used. Compile-time algorithm
  selection is the way to go.

* src/sys/dev/random/random_harvestq.c src/sys/dev/random/random_harvestq.h
- Add early (re)boot-time randomness caching.

* src/sys/dev/random/randomdev_soft.c src/sys/dev/random/randomdev_soft.h
- Remove; no longer needed.

* src/sys/dev/random/uint128.h
- Provide a fake uint128_t; if a real one ever arrived, we can use
  that instead. All that is needed here is N=0, N++, N==0, and some
  localised trickery is used to manufacture a 128-bit 0ULLL.

* src/sys/dev/random/unit_test.c src/sys/dev/random/unit_test.h
- Improve unit tests; previously the testing human needed clairvoyance;
  now the test will do a basic check of compressibility. Clairvoyant
  talent is still a good idea.
- This is still a long way off a proper unit test.

* src/sys/dev/random/fortuna.c src/sys/dev/random/fortuna.h
- Improve messy union to just uint128_t.
- Remove unneeded 'static struct fortuna_start_cache'.
- Tighten up up arithmetic.
- Provide a method to allow eternal junk to be introduced; harden
  it against blatant by compress/hashing.
- Assert that locks are held correctly.
- Fix the nasty pre- and post-read overloading by providing explictit
  functions to do these tasks.
- Turn into self-sufficient module (no longer requires randomdev_soft.[ch])

* src/sys/dev/random/yarrow.c src/sys/dev/random/yarrow.h
- Improve messy union to just uint128_t.
- Remove unneeded 'staic struct start_cache'.
- Tighten up up arithmetic.
- Provide a method to allow eternal junk to be introduced; harden
  it against blatant by compress/hashing.
- Assert that locks are held correctly.
- Fix the nasty pre- and post-read overloading by providing explictit
  functions to do these tasks.
- Turn into self-sufficient module (no longer requires randomdev_soft.[ch])
- Fix some magic numbers elsewhere used as FAST and SLOW.

Differential Revision: https://reviews.freebsd.org/D2025
Reviewed by: vsevolod,delphij,rwatson,trasz,jmg
Approved by: so (delphij)
2015-06-30 17:00:45 +00:00
Ruslan Bukin
eb4948aa4c Remove duplicate defines.
Sponsored by:	HEIF5
2015-06-18 10:33:04 +00:00
Kristof Provost
581e697036 Fix panic when adding vtnet interfaces to a bridge
vtnet interfaces are always in promiscuous mode (at least if the
VIRTIO_NET_F_CTRL_RX feature is not negotiated with the host).  if_promisc() on
a vtnet interface returned ENOTSUP although it has IFF_PROMISC set. This
confused the bridge code. Instead we now accept all enable/disable promiscuous
commands (and always keep IFF_PROMISC set).

There are also two issues with the if_bridge error handling.

If if_promisc() fails it uses bridge_delete_member() to clean up. This tries to
disable promiscuous mode on the interface. That runs into an assert, because
promiscuous mode was never set in the first place. (That's the panic reported in
PR 200210.)
We can only unset promiscuous mode if the interface actually is promiscuous.
This goes against the reference counting done by if_promisc(), but only the
first/last if_promic() calls can actually fail, so this is safe.

A second issue is a double free of bif. It's already freed by
bridge_delete_member().

PR:		200210
Differential Revision:	https://reviews.freebsd.org/D2804
Reviewed by:	philip (mentor)
2015-06-13 19:39:21 +00:00
Bryan Venteicher
cab10cc1d1 Fix typo when deregistering the VLAN unconfig event handler
Submitted by:	Masao Uebayashi <uebayasi@tombiinc.com>
MFC after:	3 days
2015-06-13 16:13:31 +00:00
Jung-uk Kim
fd90e2ed54 CALLOUT_MPSAFE has lost its meaning since r141428, i.e., for more than ten
years for head.  However, it is continuously misused as the mpsafe argument
for callout_init(9).  Deprecate the flag and clean up callout_init() calls
to make them more consistent.

Differential Revision:	https://reviews.freebsd.org/D2613
Reviewed by:	jhb
MFC after:	2 weeks
2015-05-22 17:05:21 +00:00
John Baldwin
4dc78216f8 Don't free mbufs when stopping an interface in netmap mode.
Currently if you ifconfig down a vtnet interface while it is being used
via netmap, the kernel panics due to trying to treat the cookie values
in the virtio rings as mbufs to be freed. When netmap is enabled, these
cookie values are pointers to something else.

Note that other netmap-aware drivers don't seem to need this as they
store the mbuf pointers in the software rings that mirror the hardware
descriptor rings, and since netmap doesn't touch those, the software
state always has NULL mbuf pointers causing the loops to free mbufs to
not do anything. However, vtnet reuses the same state area for both
netmap and non-netmap mode, so it needs to explicitly avoid looking at
the rings and treating the cookie values as mbufs if netmap is
enabled.

Differential Revision:	https://reviews.freebsd.org/D2348
Reviewed by:	adrian, bryanv, luigi
MFC after:	1 week
Sponsored by:	Norse Corp, Inc.
2015-04-29 17:48:25 +00:00
Alexander Motin
18c5ed7145 Do not report stripe size if it is equal to sector size.
MFC after:	1 week
2015-04-18 19:37:37 +00:00
Alexander Motin
61cefb9bcf Hide virtio features negotiation messages under bootverbose.
Those messages are noisy, but useless for average user.

MFC after:	2 weeks
2015-03-15 21:00:10 +00:00
Alexander Motin
dc2e31d4db Size of opt_io_size field is 32 bit.
MFC after:	2 weeks
2015-03-05 10:29:46 +00:00
Alexander Motin
d8e32bb64e Reenable VIRTIO_BLK_F_TOPOLOGY feature.
MFC after:	2 weeks
2015-03-05 09:51:59 +00:00
Bryan Venteicher
6b0e9233e4 Rework vtblk dump handling of in flight requests
Previously, the driver resets the device and abandon the requests that
are caught in flight when the dump was initiated. This was problematic
if the system is resumed after the dump is completed.

While that is probably not the typical action, it is simple to rework
the driver to very likely have the device usable after the dump without
making it more likely for the dump to fail. The in flight requests are
simply queued for completion once the dump is finished.

Requested by:	markj
MFC after:	1 month
2015-01-27 05:34:46 +00:00
Bryan Venteicher
ab4c2818f2 Add softc flag for when the indirect descriptor feature was negotiated
MFC after:	2 weeks
2015-01-01 02:06:00 +00:00
Bryan Venteicher
5b32b2faaa Use the appropriate IPv4 or IPv6 TSO HW assist flag
MFC after:	2 weeks
2015-01-01 02:03:09 +00:00
Andrew Turner
7c5c5b97fe Set the page size in the virtio-mmio driver. Some backends, e.g QEMU, assume
a 1 byte page size until told otherwise.

Sponsored by:	The FreeBSD Foundation
2014-12-30 12:47:44 +00:00
Andrew Turner
e51f2e72db Attach vtnet to virtio_mmio. Qemu provides this as an option with AArch64.
Sponsored by:	The FreeBSD Foundation
2014-12-29 17:17:01 +00:00
Andrew Turner
82ba170c6e Allow virtio_mmio to attach to ofwbus. Qemu places these here on at least
the AArch64 virtual platform with the Linaro UEFI.

Sponsored by:	The FreeBSD Foundation
2014-12-29 11:02:18 +00:00
Ruslan Bukin
156b97fa1f Add virtio bus 'poll' method allowing us to inform backend we are
going to poll virtqueue.

Use on BERI soft-core to invalidate cpu caches.

Reviewed by:	bryanv
Sponsored by:	DARPA, AFRL
2014-12-12 11:19:10 +00:00
Ruslan Bukin
a8098016f1 o Add BERI Virtio Networking Frontend (if_vtbe)
o Move similar block/networking methods to common file
o Follow r275640 and correct MMIO registers width
o Pass value to MMIO platform_note method.

Sponsored by:	DARPA, AFRL
2014-12-09 16:39:21 +00:00
Andrew Turner
4dbff2397e Update the virtio driver to work on the ARM AArch64 Foundation Model.
There are two main parts to get it to work, 1) most of the register
accesses need to be word sized, other than the config register which
needs to be byte aligned, and 2) we don't need the platform driver
for this to work on the Foundation Model, allow it to be NULL.

Differential Revision:	https://reviews.freebsd.org/D1240
Reviewed by:	bryanv
Sponsored by:	The FreeBSD Foundation
2014-12-09 10:31:35 +00:00
Hans Petter Selasky
c25290420e Start process of removing the use of the deprecated "M_FLOWID" flag
from the FreeBSD network code. The flag is still kept around in the
"sys/mbuf.h" header file, but does no longer have any users. Instead
the "m_pkthdr.rsstype" field in the mbuf structure is now used to
decide the meaning of the "m_pkthdr.flowid" field. To modify the
"m_pkthdr.rsstype" field please use the existing "M_HASHTYPE_XXX"
macros as defined in the "sys/mbuf.h" header file.

This patch introduces new behaviour in the transmit direction.
Previously network drivers checked if "M_FLOWID" was set in "m_flags"
before using the "m_pkthdr.flowid" field. This check has now now been
replaced by checking if "M_HASHTYPE_GET(m)" is different from
"M_HASHTYPE_NONE". In the future more hashtypes will be added, for
example hashtypes for hardware dedicated flows.

"M_HASHTYPE_OPAQUE" indicates that the "m_pkthdr.flowid" value is
valid and has no particular type. This change removes the need for an
"if" statement in TCP transmit code checking for the presence of a
valid flowid value. The "if" statement mentioned above is now a direct
variable assignment which is then later checked by the respective
network drivers like before.

Additional notes:
- The SCTP code changes will be committed as a separate patch.
- Removal of the "M_FLOWID" flag will also be done separately.
- The FreeBSD version has been bumped.

MFC after:	1 month
Sponsored by:	Mellanox Technologies
2014-12-01 11:45:24 +00:00
Bryan Venteicher
abec64bc76 Cleanup and performance improvement of the virtio_blk driver
- Add support for GEOM direct completion. Depending on the benchmark,
    this tends to give a ~30% improvement w.r.t IOPs and BW.
  - Remove an invariants check in the strategy routine. This assertion
    is caught later on by an existing panic.
  - Rename and resort various related functions to make more sense.

MFC after:	1 month
2014-11-30 16:36:26 +00:00
Steven Hartland
85c9dd9d89 Prevent overflow issues in timeout processing
Previously, any timeout value for which (timeout * hz) will overflow the
signed integer, will give weird results, since callout(9) routines will
convert negative values of ticks to '1'. For unsigned integer overflow we
will get sufficiently smaller timeout values than expected.

Switch from callout_reset, which requires conversion to int based ticks
to callout_reset_sbt to avoid this.

Also correct isci to correctly resolve ccb timeout.

This was based on the original work done by Eygene Ryabinkin
<rea@freebsd.org> back in 5 Aug 2011 which used a macro to help avoid
the overlow.

Differential Revision:	https://reviews.freebsd.org/D1157
Reviewed by:	mav, davide
MFC after:	1 month
Sponsored by:	Multiplay
2014-11-21 21:01:24 +00:00
Ruslan Bukin
c141c5c6b6 Add Virtio MMIO bus driver.
Sponsored by:	DARPA, AFRL
2014-11-18 14:11:14 +00:00
Bryan Venteicher
9a4dabdc5a Enable LRO by default when available on vtnet interfaces
The prior change to not enable LRO by default has confused several
people. The configurations where LRO is problematic is not the
typical use case for VirtIO, and due to other issues, this often
requires checksum offloading to be disabled anyways.

PR:		185864
MFC after:	2 weeks
2014-11-09 20:04:12 +00:00
Bryan Venteicher
b84b3efdde Several minor changes to hopefully complete the VirtIO console driver
- Support the KDB alt break sequence to enter the debugger,
    panic, reboot, etc. [1]
  - Provide emergency write feature description. Note that QEMU
    does not implement this feature.
  - Make the VTCON_FLAG_* defines sequential once again.
  - When the multiple port feature is not negotiated, query the
    rows and columns of the one console during the device attach
    when the size feature is negotiated.
  - Report failure to the device if hot plugging a port fails.
  - Acknowledge the console port event with an open event. This
    is required by the spec, but QEMU doesn't seem to care.

Submitted by:	Juniper [1]
MFC after:	1 month
2014-11-07 03:36:28 +00:00
Bryan Venteicher
46822c484c Create the tty device after the port is completely initialized
This fixes a race with a tty open before the host is the ready.

MFC after:	1 month
2014-11-03 22:17:25 +00:00
Bryan Venteicher
04434c94dd Add support for the multiport feature and fix hot plug races
MFC after:	1 month
2014-11-03 16:57:01 +00:00
Bryan Venteicher
6f744ddee4 Add VirtIO console driver
Support for the multiport feature is mostly implemented, but currently
disabled due to some potential races in the hot plug code paths.

Requested by:	marcel
MFC after:	1 month
Relnotes:	yes
2014-10-23 04:47:32 +00:00
Gleb Smirnoff
84047b19df - Provide if_get_counter() method for vtnet(4).
- Do not accumulate statistics on every tick.
- Accumulate statistics in vtnet_setup_stat_sysctl()
  and in vtnet_get_counter().

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2014-09-18 19:15:40 +00:00
Gleb Smirnoff
1bffa9511f Use define from if_var.h to access a field inside struct if_data,
that resides in struct ifnet.

Sponsored by:	Nginx, Inc.
2014-08-30 19:55:54 +00:00
Luigi Rizzo
4bf50f18eb Update to the current version of netmap.
Mostly bugfixes or features developed in the past 6 months,
so this is a 10.1 candidate.

Basically no user API changes (some bugfixes in sys/net/netmap_user.h).

In detail:

1. netmap support for virtio-net, including in netmap mode.
  Under bhyve and with a netmap backend [2] we reach over 1Mpps
  with standard APIs (e.g. libpcap), and 5-8 Mpps in netmap mode.

2. (kernel) add support for multiple memory allocators, so we can
  better partition physical and virtual interfaces giving access
  to separate users. The most visible effect is one additional
  argument to the various kernel functions to compute buffer
  addresses. All netmap-supported drivers are affected, but changes
  are mechanical and trivial

3. (kernel) simplify the prototype for *txsync() and *rxsync()
  driver methods. All netmap drivers affected, changes mostly mechanical.

4. add support for netmap-monitor ports. Think of it as a mirroring
  port on a physical switch: a netmap monitor port replicates traffic
  present on the main port. Restrictions apply. Drive carefully.

5. if_lem.c: support for various paravirtualization features,
  experimental and disabled by default.
  Most of these are described in our ANCS'13 paper [1].
  Paravirtualized support in netmap mode is new, and beats the
  numbers in the paper by a large factor (under qemu-kvm,
  we measured gues-host throughput up to 10-12 Mpps).

A lot of refactoring and additional documentation in the files
in sys/dev/netmap, but apart from #2 and #3 above, almost nothing
of this stuff is visible to other kernel parts.

Example programs in tools/tools/netmap have been updated with bugfixes
and to support more of the existing features.

This is meant to go into 10.1 so we plan an MFC before the Aug.22 deadline.

A lot of this code has been contributed by my colleagues at UNIPI,
including Giuseppe Lettieri, Vincenzo Maffione, Stefano Garzarella.

MFC after:	3 days.
2014-08-16 15:00:01 +00:00
Luigi Rizzo
a5b6123ea9 print additional debugging info in virtqueue_dump()
(not fundamental, but useful to debug performance issues on vtnet)

MFC after:	3 days
2014-08-16 13:13:17 +00:00
Bryan Venteicher
32487a8973 Rework when the Tx queue completion interrupt is enabled
The Tx interrupt is now kept disabled in the common case, only
enabled when the number of free descriptors in the queue falls
below a threshold. Transmitted frames are cleared from the VQ
before subsequent transmit, or in the watchdog timer.

This was a very big performance improvement for an experimental
Netmap bhyve backend.

MFC after:	1 month
2014-07-10 05:36:04 +00:00
Bryan Venteicher
4b59668f0e Add accessor to get the number of free descriptors in the virtqueue
MFC after:	1 month
2014-07-10 05:26:01 +00:00
Roger Pau Monné
68e58ea7ed xen/virtio: fix balloon drivers to not mark pages as WIRED
Prevent the Xen and VirtIO balloon drivers from marking pages as
wired. This prevents them from increasing the system wired page count,
which can lead to mlock failing because of hitting the limit in
vm.max_wired.

In the Xen case make sure pages are zeroed before giving them back to
the hypervisor, or else we might be leaking data. Also remove the
balloon_{append/retrieve} and link pages directly into the
ballooned_pages queue using the plinks.q field in the page struct.

Sponsored by: Citrix Systems R&D
Reviewed by: kib, bryanv
Approved by: gibbs

dev/virtio/balloon/virtio_balloon.c:
 - Don't allocate pages with VM_ALLOC_WIRED.

dev/xen/balloon/balloon.c:
 - Don't allocate pages with VM_ALLOC_WIRED.
 - Make sure pages are zeroed before giving them back to the
   hypervisor.
 - Remove the balloon_entry struct and the balloon_{append/retrieve}
   functions and use the page plinks.q entry to link the pages
   directly into the ballooned_pages queue.
2014-06-25 09:51:08 +00:00
Attilio Rao
3ae10f7477 - Modify vm_page_unwire() and vm_page_enqueue() to directly accept
the queue where to enqueue pages that are going to be unwired.
- Add stronger checks to the enqueue/dequeue for the pagequeues when
  adding and removing pages to them.

Of course, for unmanaged pages the queue parameter of vm_page_unwire() will
be ignored, just as the active parameter today.
This makes adding new pagequeues quicker.

This change effectively modifies the KPI.  __FreeBSD_version will be,
however, bumped just when the full cache of free pages will be
evicted.

Sponsored by:	EMC / Isilon storage division
Reviewed by:	alc
Tested by:	pho
2014-06-16 18:15:27 +00:00
Bryan Venteicher
bae486f5d7 Force two byte alignment for all control message headers
The header structure consists of two 1-byte elements, but it must always
be describable by a single SG entry. Note for consistency, specify the
alignment everywhere, even if the structure has the appropriate natural
alignment since it contains a uint16_t.

Obtained from:	DragonFlyBSD
MFC after:	1 week
2014-06-16 04:32:27 +00:00
Bryan Venteicher
fd5b395117 Make the feature negotiation code easier to follow
MFC after:	1 week
2014-06-16 04:29:28 +00:00
Bryan Venteicher
45543f0751 Move the VIRTIO_RING_F_* defines out of virtqueue.h into virtio_config.h
These defines are applicable to userland too, but virtqueue.h contains
the kernel virtqueue interface, and is therefore not usable in userland.

Note that Linux places these defines in virtio_ring.h, but I don't want
the drivers including this header file to keep the VirtIO ring opaque to
everything but the virtqueue.

MFC after:	1 week
2014-06-16 04:25:04 +00:00