66 Commits

Author SHA1 Message Date
Huawei Xie
39449e7429 vhost: remove concurrent enqueue
All other DPDK PMDs doesn't support concurrent receiving or sending
packets to the same queue. The upper application should deal with
this, normally through queue and core bindings.

Due to historical reason, vhost internally supports concurrent lockless
enqueuing packets to the same virtio queue through costly cmpset operation.
This patch removes this internal lockless implementation and should improve
performance a bit.

Luckily DPDK OVS doesn't rely on this behavior.

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
2016-06-22 09:47:12 +02:00
Yuanhan Liu
0823c1cb0a vhost: workaround stale vring base
When DPDK app crashes (or quits, or gets killed), a restart of DPDK
app would get stale vring base from QEMU. That would break the kernel
virtio net completely, making it non-work any more, unless a driver
reset is done.

So, instead of getting the stale vring base from QEMU, Huawei suggested
we could get a much saner (and may not the most accurate) vring base
from used->idx. That would work because:

- there is a memory barrier between updating used ring entries and
  used->idx. So, even though we crashed at updating the used ring
  entries, it will not cause any issue, as the guest driver will not
  process those stale used entries, for used-idx is not updated yet.

- DPDK process vring in order, that means a crash may just lead some
  packet retransmission for Tx and drop for Rx.

Suggested-by: Huawei Xie <huawei.xie@intel.com>
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Huawei Xie <huawei.xie@intel.com>
2016-06-22 09:47:12 +02:00
Yuanhan Liu
e197bd630a vhost: make virtio header length per device
Virtio net header length is set per device, but not per queue. So, there
is no reason to store it in vhost_virtqueue struct, instead, we should
store it in virtio_net struct, to make one copy only.

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Tested-by: Rich Lane <rich.lane@bigswitch.com>
Acked-by: Rich Lane <rich.lane@bigswitch.com>
2016-06-22 09:44:20 +02:00
Yuanhan Liu
f0fa04e35e vhost: remove virtio-net.h
It barely has anything useful there, just 2 functions prototype. Here
move them to vhost-net.h, and delete it.

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Tested-by: Rich Lane <rich.lane@bigswitch.com>
Acked-by: Rich Lane <rich.lane@bigswitch.com>
2016-06-22 09:43:01 +02:00
Yuanhan Liu
4ecf22e356 vhost: export device id as the interface to applications
With all the previous prepare works, we are just one step away from
the final ABI refactoring. That is, to change current API to let them
stick to vid instead of the old virtio_net dev.

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Tested-by: Rich Lane <rich.lane@bigswitch.com>
Acked-by: Rich Lane <rich.lane@bigswitch.com>
2016-06-22 09:42:57 +02:00
Yuanhan Liu
a67f286a65 vhost: export queue free entries
The new API rte_vhost_avail_entries() is actually a rename of
rte_vring_available_entries(), with the "vring" to "vhost" name
change to keep the consistency of other vhost exported APIs.

This change could let us avoid the dependency of "virtio_net"
struct, to prepare for the ABI refactoring.

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Tested-by: Rich Lane <rich.lane@bigswitch.com>
Acked-by: Rich Lane <rich.lane@bigswitch.com>
2016-06-22 09:02:58 +02:00
Yuanhan Liu
f6d1bd5365 vhost: export interface name
Introduce a new API rte_vhost_get_ifname() to export the ifname to
application.

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Tested-by: Rich Lane <rich.lane@bigswitch.com>
Acked-by: Rich Lane <rich.lane@bigswitch.com>
2016-06-22 09:01:30 +02:00
Yuanhan Liu
4b4af666b9 vhost: export number of queues
Introduce a new API rte_vhost_get_queue_num() to export the number of
queues.

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Tested-by: Rich Lane <rich.lane@bigswitch.com>
Acked-by: Rich Lane <rich.lane@bigswitch.com>
2016-06-22 09:01:30 +02:00
Yuanhan Liu
586e390013 vhost: export numa node
Introduce a new API rte_vhost_get_numa_node() to get the numa node
from which the virtio_net struct is allocated.

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Tested-by: Rich Lane <rich.lane@bigswitch.com>
Acked-by: Rich Lane <rich.lane@bigswitch.com>
2016-06-22 09:01:30 +02:00
Yuanhan Liu
319d362e3b vhost: get device by device id only
get_device() just needs vid, so pass vid as the parameter only.

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Tested-by: Rich Lane <rich.lane@bigswitch.com>
Acked-by: Rich Lane <rich.lane@bigswitch.com>
2016-06-22 09:01:30 +02:00
Yuanhan Liu
e2a1dd1275 vhost: rename device id variable
I failed to figure out what does "fh" mean here for a long while.
The only guess I could have had is "file handle". So, you get the
point that it's not well named.

I then figured it out that "fh" is derived from the fuse lib, and
my above guess is right. However, device_fh represents a virtio
net device ID. Therefore, here I rename it to vid (Virtio-net device
ID, or Vhost device ID; choose one you prefer) to make it easier for
understanding.

This name (vid) then will be considered to the only interface to
applications. That's another reason to do the rename: it's our
interface, make it more understandable.

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Tested-by: Rich Lane <rich.lane@bigswitch.com>
Acked-by: Rich Lane <rich.lane@bigswitch.com>
2016-06-22 09:01:25 +02:00
Yuanhan Liu
c08a349006 vhost: declare device id as int
device_fh repsents the device id for a specific virtio net device.
Firstly, "int" would be big enough: we don't need 64 bit. Secondly,
this could let us avoid the ugly "%" PRIu64 ".." stuff.

And since ctx.fh is derived from device_fh, declare it as int, too.

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Tested-by: Rich Lane <rich.lane@bigswitch.com>
Acked-by: Rich Lane <rich.lane@bigswitch.com>
2016-06-22 08:59:54 +02:00
Yuanhan Liu
550c9d27d1 vhost: set/reset device flags internally
It does not make sense to ask the application to set/unset the flag
VIRTIO_DEV_RUNNING (that used internal only) at new_device()/
destroy_device() callback.

Instead, it should be set after new_device() succeeds and reset before
destroy_device() is invoked inside vhost lib. This patch fixes it.

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Tested-by: Rich Lane <rich.lane@bigswitch.com>
Acked-by: Rich Lane <rich.lane@bigswitch.com>
2016-06-22 06:10:54 +02:00
Yuanhan Liu
092f1c2c77 vhost: declare backend with int type
It's an fd; so define it as "int", which could also save the unncessary
(int) case.

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Tested-by: Rich Lane <rich.lane@bigswitch.com>
Acked-by: Rich Lane <rich.lane@bigswitch.com>
2016-06-22 06:10:54 +02:00
Daniel Mrzyglod
a5e20775a7 vhost: fix name not null terminated
Fix issue reported by Coverity.
Coverity ID 124556

If the buffer is treated as a null terminated string in later operations,
a buffer overflow or over-read may occur.

In vhost_set_ifname: The string buffer may not have a null terminator if
the source string's length is equal to the buffer size

Fixes: 54292e9520e0 ("vhost: support ifname for vhost-user")

Signed-off-by: Daniel Mrzyglod <danielx.t.mrzyglod@intel.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
2016-05-10 20:25:30 +02:00
Yuanhan Liu
71dc571efd vhost: fix error handling in destroy
Fix following coverity defect:

    291     void
    292     vhost_destroy_device(struct vhost_device_ctx ctx)
    293     {
    294             struct virtio_net *dev = get_device(ctx);
    295
    >>>     CID 124565:  Null pointer dereferences  (NULL_RETURNS)
    >>>     Dereferencing a null pointer "dev".

Fixes: 45ca9c6f7bc6 ("vhost: get rid of linked list for devices")

Reported-by: John McNamara <john.mcnamara@intel.com>
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
2016-04-06 12:27:57 +02:00
Tetsuya Mukawa
fb871d0a4d vhost: fix default value of kickfd and callfd
Currently, default values of kickfd and callfd are -1.
If the values are -1, current code guesses kickfd and callfd haven't
been initialized yet. Then vhost library will guess the virtqueue isn't
ready for processing.

But callfd and kickfd will be set as -1 when "--enable-kvm"
isn't specified in QEMU command line. It means we cannot treat -1 as
uninitialized state.

The patch defines -1 and -2 as VIRTIO_INVALID_EVENTFD and
VIRTIO_UNINITIALIZED_EVENTFD, and uses VIRTIO_UNINITIALIZED_EVENTFD for
the default values of kickfd and callfd.

Signed-off-by: Tetsuya Mukawa <mukawa@igel.co.jp>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
2016-03-15 00:20:29 +01:00
Yuanhan Liu
fd2fca6f6b vhost: fix queue pair reallocation
vq is allocated on pairs, hence we should do pair reallocation
at numa_realloc() as well, otherwise an error like following
occurs while do numa reallocation:

    VHOST_CONFIG: reallocate vq from 0 to 1 node
    PANIC in rte_free():
    Fatal error: Invalid memory

The reason we don't catch it is because numa_realloc() will
not take effect when RTE_LIBRTE_VHOST_NUMA is not enabled,
which is the default case.

Fixes: e049ca6d10e0 ("vhost-user: prepare multiple queue setup")

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Acked-by: Huawei Xie <huawei.xie@intel.com>
Tested-by: Ciara Loftus <ciara.loftus@intel.com>
2016-03-11 16:49:20 +01:00
Yuanhan Liu
78ffdaff06 vhost: simplify numa reallocation
We could first check if we need realloc vq or not, if so,
reallocate it. We then do similar to vhost dev realloc.

This could get rid of the tons of repeated "if (realloc_dev)"
and "if (realloc_vq)" statements, therefore, makes code
a bit more readable.

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Acked-by: Huawei Xie <huawei.xie@intel.com>
2016-03-11 16:49:17 +01:00
Yuanhan Liu
45ca9c6f7b vhost: get rid of linked list for devices
While we use a single linked list to maintain all devices, we could
use a static array to achieve the same goal, just like what we did
to maintain the eth devices with rte_eth_devices array. This could
simplifies the code a bit.

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Acked-by: Huawei Xie <huawei.xie@intel.com>
2016-03-11 16:46:18 +01:00
Pavel Fedin
2f29ce885a vhost: check memory map before address translation
Malfunctioning virtio clients may not send VHOST_USER_SET_MEM_TABLE for
some reason. This causes NULL dereference in qva_to_vva().

Signed-off-by: Pavel Fedin <p.fedin@samsung.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
2016-02-21 11:17:48 +01:00
Rich Lane
a90ca1a12e vhost: remove device operations pointers
The vhost_net_device_ops indirection is unnecessary because there is only
one implementation of the vhost common code.
Removing it makes the code more readable.

Signed-off-by: Rich Lane <rich.lane@bigswitch.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
2016-02-19 19:33:31 +01:00
Rich Lane
ca67ed289a vhost: fix leak of fds and mmaps
The common vhost code only supported a single mmap per device. vhost-user
worked around this by saving the address/length/fd of each mmap after the end
of the rte_virtio_memory struct. This only works if the vhost-user code frees
dev->mem, since the common code is unaware of the extra info. The
VHOST_USER_RESET_OWNER message is one situation where the common code frees
dev->mem and leaks the fds and mappings. This happens every time I shut down a
VM.

The new code calls back into the implementation (vhost-user or vhost-cuse) to
clean up these resources.

The vhost-cuse changes are only compile tested.

Signed-off-by: Rich Lane <rich.lane@bigswitch.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
2016-02-19 16:13:32 +01:00
Yuanhan Liu
d293dac8f3 vhost: claim support of guest announce
It's actually a feature already enabled in Linux kernel (since v3.5).
What we need to do is simply to claim that we support such feature,
and nothing else.

With that, the guest will send an ARP message after live migration
to notify the switches about the new location of migrated VM.

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Tested-by: Pavel Fedin <p.fedin@samsung.com>
2016-02-19 15:47:20 +01:00
Yuanhan Liu
b171fad1ff vhost: log used vring changes
Introduce vhost_log_write() helper function to log the dirty pages we
touched. Page size is harded code to 4096 (VHOST_LOG_PAGE), and each
log is presented by 1 bit.

Therefore, vhost_log_write() simply finds the right bit for related
page we are gonna change, and set it to 1. dev->log_base denotes the
start of the dirty page bitmap.

Every time we update virtio used ring, we need to log it. And it's
been done by a new vhost_log_write() wrapper, vhost_log_used_vring().

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Signed-off-by: Victor Kaplansky <victork@redhat.com>
Tested-by: Pavel Fedin <p.fedin@samsung.com>
2016-02-19 15:44:13 +01:00
Jijiang Liu
859b480d5a vhost: add guest offload setting
Add guest offload setting in vhost lib.

Virtio 1.0 spec (5.1.6.4 Processing of Incoming Packets) says:

    1. If the VIRTIO_NET_F_GUEST_CSUM feature was negotiated, the
       VIRTIO_NET_HDR_F_NEEDS_CSUM bit in flags can be set: if so,
       the packet checksum at offset csum_offset from csum_start
       and any preceding checksums have been validated. The checksum
       on the packet is incomplete and csum_start and csum_offset
       indicate how to calculate it (see Packet Transmission point 1).

    2. If the VIRTIO_NET_F_GUEST_TSO4, TSO6 or UFO options were
       negotiated, then gso_type MAY be something other than
       VIRTIO_NET_HDR_GSO_NONE, and gso_size field indicates the
       desired MSS (see Packet Transmission point 2).

In order to support these features, the following changes are added,

1. Extend 'VHOST_SUPPORTED_FEATURES' macro to add the offload features negotiation.

2. Enqueue these offloads: convert some fields in mbuf to the fields in virtio_net_hdr.

There are more explanations for the implementation.

For VM2VM case, there is no need to do checksum, for we think the
  data should be reliable enough, and setting VIRTIO_NET_HDR_F_NEEDS_CSUM
  at RX side will let the TCP layer to bypass the checksum validation,
  so that the RX side could receive the packet in the end.

In terms of us-vhost, at vhost RX side, the offload information is
  inherited from mbuf, which is in turn inherited from TX side. If we
  can still get those info at RX side, it means the packet is from
  another VM at same host. So, it's safe to set the
  VIRTIO_NET_HDR_F_NEEDS_CSUM, to skip checksum validation.

Signed-off-by: Jijiang Liu <jijiang.liu@intel.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
2016-02-17 22:56:44 +01:00
Jijiang Liu
d0cf91303d vhost: add Tx offload capabilities
Add vhost TX offload (CSUM and TSO) support capabilities in vhost lib.

In order to support these features, and the following changes are added,

1. Extend 'VHOST_SUPPORTED_FEATURES' macro to add the offload features
   negotiation.

2. Dequeue TX offload: convert the fileds in virtio_net_hdr to the
   related fileds in mbuf.

Signed-off-by: Jijiang Liu <jijiang.liu@intel.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
2016-02-17 22:56:44 +01:00
Huawei Xie
766ad08900 vhost: fix logically dead code
CID 107107 (#1 of 1): Logically dead code
Fixes: af4f2c5feb2e ("vhost: fix code style")

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Acked-by: John McNamara <john.mcnamara@intel.com>
2015-12-13 02:14:30 +01:00
Tetsuya Mukawa
e1f8f55571 vhost: fix guest descriptor closed on reset owner message
The patch fixes reset_owner message handling not to clear callfd,
because callfd will be valid while connection is established.

Signed-off-by: Tetsuya Mukawa <mukawa@igel.co.jp>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
2015-11-24 21:34:11 +01:00
Yuanhan Liu
87308b5370 vhost: reset device properly
Currently, we reset all fields of a device to zero when reset
happens, which is wrong, since for some fields like device_fh,
ifname, and virt_qp_nb, they should be same and be kept after
reset until the device is removed. And this is what's the new
helper function reset_device() for.

And use rte_zmalloc() instead of rte_malloc, so that we could
avoid init_device(), which basically dose zero reset only so far.
Hence, init_device() is dropped in this patch.

This patch also removes a hack of using the offset a specific
field (which is virtqueue now) inside of `virtio_net' structure
to do reset, which could be broken easily if someone changed the
field order without caution.

Cc: Tetsuya Mukawa <mukawa@igel.co.jp>
Cc: Huawei Xie <huawei.xie@intel.com>
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Acked-by: Rich Lane <rich.lane@bigswitch.com>
2015-11-12 12:39:08 +01:00
Rich Lane
d243ecf0c2 vhost: make destroy callback on reset owner message
QEMU sends VHOST_RESET_OWNER first when shutting down.
There was previously no way for the dataplane to know that the
virtio_net instance had become unusable and it would segfault
when trying to do RX/TX.

Signed-off-by: Rich Lane <rich.lane@bigswitch.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
2015-11-12 12:39:08 +01:00
Marcel Apfelbaum
15e9ee6982 vhost: enable virtio 1.0
Make vhost-user virtio 1.0 compatible by adding it to the
supported features and keeping the header length
the same as for mergeable RX buffers.

Signed-off-by: Marcel Apfelbaum <marcel@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
2015-11-02 23:12:27 +01:00
Yuanhan Liu
71dfdbe66a vhost: fix build with kernel < 3.8
Fix build error:
  virtio-net.c:80:89: error: ‘VIRTIO_NET_F_MQ’ undeclared here
  rte_virtio_net.h:109: error: ‘VIRTIO_NET_CTRL_MQ_VQ_PAIRS_MAX’ undeclared here

Above two virtio-net MQ macros are introduced since kernel v3.8.
For older kernel, we should not reference them directly, hence,
this patch introduced two wrapper macros, with proper values
being set depending on we support MQ or not.

Fixes: b09b198bfb5c ("vhost-user: announce queue number in message")

Reported-by: Yongjie Gu <yongjiex.gu@intel.com>
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Tested-by: David Marchand <david.marchand@6wind.com>
2015-10-30 15:55:05 +01:00
Yuanhan Liu
19d4d7ef2a vhost-user: enable multiple queue
By setting VHOST_USER_PROTOCOL_F_MQ protocol feature bit, and
VIRTIO_NET_F_MQ feature bit.

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Acked-by: Huawei Xie <huawei.xie@intel.com>
2015-10-26 21:23:54 +01:00
Changchun Ouyang
77d20126b4 vhost-user: handle message to enable vring
This message is used to enable/disable a specific vring queue pair.
The first queue pair is enabled by default.

Signed-off-by: Changchun Ouyang <changchun.ouyang@intel.com>
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Acked-by: Huawei Xie <huawei.xie@intel.com>
2015-10-26 21:23:53 +01:00
Yuanhan Liu
e049ca6d10 vhost-user: prepare multiple queue setup
All queue pairs, including the default (the first) queue pair,
are allocated dynamically, when a vring_call message is received
first time for a specific queue pair.

This is a refactor work for enabling vhost-user multiple queue;
it should not break anything as it does no functional changes:
we don't support mq set, so there is only one mq at max.

This patch is based on Changchun's patch.

Signed-off-by: Ouyang Changchun <changchun.ouyang@intel.com>
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Acked-by: Huawei Xie <huawei.xie@intel.com>
2015-10-26 21:22:37 +01:00
Yuanhan Liu
381316f6a2 vhost-user: support protocol features
The two protocol features messages are introduced by qemu vhost
maintainer(Michael) for extendting vhost-user interface. Here is
an excerpta from the vhost-user spec:

    Any protocol extensions are gated by protocol feature bits,
    which allows full backwards compatibility on both master
    and slave.

The vhost-user multiple queue features will be treated as a vhost-user
extension, hence, we have to implement the two messages first.

VHOST_USER_PROTOCOL_FEATURES is initialized to 0, as we don't support
any yet.

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Acked-by: Huawei Xie <huawei.xie@intel.com>
2015-10-26 21:22:27 +01:00
Jerome Jutteau
6c6373c763 vhost: fix missing device checks
virtio-net search for it's device in reset_owner.
The function don't check the return result of get_config_ll_entry.
Using get_config_ll_entry in reset_owner don't show any error when the
device is not found. This patch fix this by using get_device instead
instead of get_config_ll_entry.

In user_get_vring_base, get_device return is not checked and may cause
segfault when device is not found.

Signed-off-by: Jerome Jutteau <jerome.jutteau@outscale.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
2015-10-21 12:21:18 +02:00
Jerome Jutteau
2c95f4de6a vhost: keep device identifier after reset owner
virtio-net clean and init device after a VHOST_USER_RESET_OWNER.
This reset device identifier to 0 and break ll_root listing logic.
This patch keep the old device identifier and re-write it on the cleaned
device.

Signed-off-by: Jerome Jutteau <jerome.jutteau@outscale.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
2015-10-21 12:03:57 +02:00
Yuanhan Liu
9702b2b53f vhost: fix wrong usage of eventfd_t
According to eventfd man page:

    typedef uint64_t eventfd_t;

    int eventfd_read(int fd, eventfd_t *value);
    int eventfd_write(int fd, eventfd_t value);

eventfd_t is defined for the second arg(value), but not for fd.

Here I redefine those fd fields to `int' type, which also removes
the redundant (int) cast. And as the man page stated, we should
cast 1 to eventfd_t type for eventfd_write().

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Acked-by: Huawei Xie <huawei.xie@intel.com>
2015-09-25 14:58:30 +02:00
Yuanhan Liu
2bb29a9fc1 vhost: fix typo
_det => _dev

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Acked-by: Changchun Ouyang <changchun.ouyang@intel.com>
Acked-by: Huawei Xie <huawei.xie@intel.com>
2015-09-25 14:53:28 +02:00
Huawei Xie
af295ad469 vhost: realloc device and queues to same numa node as vring desc
When we get the address of vring descriptor table in VHOST_SET_VRING_ADDR
message, will try to reallocate vhost device and virt queue to the same
numa node.

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
2015-06-29 18:57:33 +02:00
Huawei Xie
4113e38100 vhost: use rte_malloc to allocate device and queues
use rte_malloc to allocate vhost device and queues

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
2015-06-29 18:57:33 +02:00
Krishna Murthy
f75f65abf3 vhost: enable live migration
When we migrate VM, without this feature, qemu will report error:
"migrate: Migration disabled: vhost lacks VHOST_F_LOG_ALL feature".

Signed-off-by: Krishna Murthy <krishna.j.murthy@intel.com>
2015-06-12 17:07:24 +02:00
Huawei Xie
4d16fff496 vhost: check file descriptor before closing
This avoids closing -1 in our case.

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Acked-by: Changchun Ouyang <changchun.ouyang@intel.com>
Acked-by: Tetsuya Mukawa <mukawa@igel.co.jp>
2015-03-09 12:46:46 +01:00
Huawei Xie
64ab971791 vhost: fix file descriptors naming
Previous vhost implementation wrongly name kickfd as callfd and callfd as kickfd.
It is functional correct, but causes confusion.
Exchange kickfd and callfd to avoid confusion.

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Acked-by: Changchun Ouyang <changchun.ouyang@intel.com>
Acked-by: Tetsuya Mukawa <mukawa@igel.co.jp>
2015-03-09 12:46:46 +01:00
Huawei Xie
54292e9520 vhost: support ifname for vhost-user
for vhost-cuse, ifname is the name of the tap device
for vhost-user, ifname is the name of the unix domain socket path

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Signed-off-by: Przemyslaw Czesnowicz <przemyslaw.czesnowicz@intel.com>
Acked-by: Tetsuya Mukawa <mukawa@igel.co.jp>
2015-02-24 01:38:16 +01:00
Huawei Xie
8f972312b8 vhost: support vhost-user
In rte_vhost_driver_register(), vhost unix domain socket listener fd is created
and added to polled(based on select) fdset.

In rte_vhost_driver_session_start(), fds in the fdset are checked for
processing. If there is new connection from qemu, connection fd accepted is
added to polled fdset. The listener and connection fds in the fdset are
then both checked. When there is message on the connection fd, its
callback vserver_message_handler is called to process vhost-user messages.

To support identifying which virtio is from which guest VM, we could call
rte_vhost_driver_register with different socket path. Virtio devices from
same VM will connect to VM specific socket. The socket path information is
stored in the virtio_net structure.

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Signed-off-by: Przemyslaw Czesnowicz <przemyslaw.czesnowicz@intel.com>
Acked-by: Tetsuya Mukawa <mukawa@igel.co.jp>
2015-02-24 01:38:15 +01:00
Huawei Xie
9464a44160 vhost: implement cuse memory table
remove set_memory_table ops

vhost-cuse or vhost-user will both implement their own set_memory_region handler.

In current vhost-cuse implementation, guest numa memory isn't supported.
Assume that guest memory is backed by only one file.

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Signed-off-by: Przemyslaw Czesnowicz <przemyslaw.czesnowicz@intel.com>
2015-02-24 01:38:14 +01:00
Huawei Xie
c2f60667bf vhost: move fd copying into cuse subdirectory
File descriptor is copied from qemu process into vhost process.
vhost-user doesn't need eventfd kernel module to copy fds between processes.

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Signed-off-by: Przemyslaw Czesnowicz <przemyslaw.czesnowicz@intel.com>
2015-02-24 01:38:11 +01:00