Commit Graph

200 Commits

Author SHA1 Message Date
Xiaobo Chi
7836894a64 vhost: fix kernel module insertion
Problem:if I firstly insert my kmod_test.ko, then insert eventfd_link.ko,
error will happen with hint "Device or resource busy". This is because
the default minor device number, 0, has been occupied by my kmod_test.ko .

root@distro:~/test$ lsmod
Module                  Size  Used by
kmod_test                927  0
vboxsf                 35930  4
vboxguest             222130  1 vboxsf
microcode              10315  0
autofs4                25051  0
root@distro:~/test$ insmod ./eventfd_link.ko
insmod: ERROR: could not insert module ./eventfd_link.ko: Device or
resource busy

Explanation: For miscdevices, the major device_no is same, so the minor
device_no should be set to ditinguish different misc devices;  if not set
the minor, it may fail while insmod due to the default minor value, 0, has
been used by other miscdevice. MISC_DYNAMIC_MINOR means to let Linux
kernel dynamically assign one minor devide number while loading.

Signed-off-by: Xiaobo Chi <xiaobo.chi@nokia.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
2015-11-24 21:34:11 +01:00
Victor Kaplansky
cd81ee7cc2 vhost: fix enabling vring per queue
The VHOST_USER_SET_VRING_ENABLE request was sent for each queue-pair.
However, it's changed to be sent per queue in the queue-pair by QEMU
commit dc3db6ad ("vhost-user: start/stop all rings"). The change
is reasonable, as we send all other request per queue, instead of
queue-pair.

Hence we should do proper changes to adapt to the QEMU change here.
Otherwise, a segfault will be triggered when last TX queue was enabled.

Signed-off-by: Victor Kaplansky <victork@redhat.com>
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
2015-11-24 21:34:11 +01:00
Jianfeng Tan
ec09c280b8 vhost: fix mmap not aligned with hugepage size
This patch fixes a bug under lower version linux kernel, mmap()
fails when length is not aligned with hugepage size. mmap()
without flag of MAP_ANONYMOUS, should be called with length
argument aligned with hugepagesz at older longterm version
Linux, like 2.6.32 and 3.2.72, or mmap() will fail with EINVAL.
This bug was fixed in Linux kernel by commit:
dab2d3dc45ae7343216635d981d43637e1cb7d45
To avoid failure, make sure in caller to keep length aligned.

Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Acked-by: Huawei Xie <huawei.xie@intel.com>
2015-11-24 21:34:11 +01:00
Tetsuya Mukawa
e1f8f55571 vhost: fix guest descriptor closed on reset owner message
The patch fixes reset_owner message handling not to clear callfd,
because callfd will be valid while connection is established.

Signed-off-by: Tetsuya Mukawa <mukawa@igel.co.jp>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
2015-11-24 21:34:11 +01:00
Yuanhan Liu
87308b5370 vhost: reset device properly
Currently, we reset all fields of a device to zero when reset
happens, which is wrong, since for some fields like device_fh,
ifname, and virt_qp_nb, they should be same and be kept after
reset until the device is removed. And this is what's the new
helper function reset_device() for.

And use rte_zmalloc() instead of rte_malloc, so that we could
avoid init_device(), which basically dose zero reset only so far.
Hence, init_device() is dropped in this patch.

This patch also removes a hack of using the offset a specific
field (which is virtqueue now) inside of `virtio_net' structure
to do reset, which could be broken easily if someone changed the
field order without caution.

Cc: Tetsuya Mukawa <mukawa@igel.co.jp>
Cc: Huawei Xie <huawei.xie@intel.com>
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Acked-by: Rich Lane <rich.lane@bigswitch.com>
2015-11-12 12:39:08 +01:00
Rich Lane
d243ecf0c2 vhost: make destroy callback on reset owner message
QEMU sends VHOST_RESET_OWNER first when shutting down.
There was previously no way for the dataplane to know that the
virtio_net instance had become unusable and it would segfault
when trying to do RX/TX.

Signed-off-by: Rich Lane <rich.lane@bigswitch.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
2015-11-12 12:39:08 +01:00
Marcel Apfelbaum
45c55d39c5 vhost: fix build with old kernels
Commit 15e9ee6982
uses the VIRTIO_F_VERSION_1 macro existing only in newer kernels.

Fixed it by manually defining it for older kernels.

Fixes: 15e9ee6982 ("vhost: enable virtio 1.0")

Reported-by: Qian Xu <qian.q.xu@intel.com>
Signed-off-by: Marcel Apfelbaum <marcel@redhat.com>
2015-11-03 12:33:04 +01:00
Marcel Apfelbaum
15e9ee6982 vhost: enable virtio 1.0
Make vhost-user virtio 1.0 compatible by adding it to the
supported features and keeping the header length
the same as for mergeable RX buffers.

Signed-off-by: Marcel Apfelbaum <marcel@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
2015-11-02 23:12:27 +01:00
Pavel Boldin
40e8552e32 vhost: use new eventfd copy
Signed-off-by: Pavel Boldin <pboldin@mirantis.com>
Acked-by: Huawei Xie <huawei.xie@intel.com>
2015-10-30 20:06:51 +01:00
Pavel Boldin
c706f3b9c8 vhost: add new eventfd copy ioctl
Signed-off-by: Pavel Boldin <pboldin@mirantis.com>
Acked-by: Huawei Xie <huawei.xie@intel.com>
2015-10-30 20:06:30 +01:00
Pavel Boldin
e658d99749 vhost: refactor eventfd copy handler
* Move ioctl `EVENTFD_COPY' code to a separate function
* Remove extra #includes
* Introduce function fget_from_files
* Fix ioctl return values

Signed-off-by: Pavel Boldin <pboldin@mirantis.com>
Acked-by: Huawei Xie <huawei.xie@intel.com>
2015-10-30 20:03:51 +01:00
Yuanhan Liu
71dfdbe66a vhost: fix build with kernel < 3.8
Fix build error:
  virtio-net.c:80:89: error: ‘VIRTIO_NET_F_MQ’ undeclared here
  rte_virtio_net.h:109: error: ‘VIRTIO_NET_CTRL_MQ_VQ_PAIRS_MAX’ undeclared here

Above two virtio-net MQ macros are introduced since kernel v3.8.
For older kernel, we should not reference them directly, hence,
this patch introduced two wrapper macros, with proper values
being set depending on we support MQ or not.

Fixes: b09b198bfb ("vhost-user: announce queue number in message")

Reported-by: Yongjie Gu <yongjiex.gu@intel.com>
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Tested-by: David Marchand <david.marchand@6wind.com>
2015-10-30 15:55:05 +01:00
Tetsuya Mukawa
07d37fbf5e vhost: fix crash with multiqueue enabled
The patch fixes wrong handling of virtqueue array index when
GET_VRING_BASE message comes.

Signed-off-by: Tetsuya Mukawa <mukawa@igel.co.jp>
Acked-by: Huawei Xie <huawei.xie@intel.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
2015-10-30 15:46:01 +01:00
Yuanhan Liu
19d4d7ef2a vhost-user: enable multiple queue
By setting VHOST_USER_PROTOCOL_F_MQ protocol feature bit, and
VIRTIO_NET_F_MQ feature bit.

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Acked-by: Huawei Xie <huawei.xie@intel.com>
2015-10-26 21:23:54 +01:00
Changchun Ouyang
77d20126b4 vhost-user: handle message to enable vring
This message is used to enable/disable a specific vring queue pair.
The first queue pair is enabled by default.

Signed-off-by: Changchun Ouyang <changchun.ouyang@intel.com>
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Acked-by: Huawei Xie <huawei.xie@intel.com>
2015-10-26 21:23:53 +01:00
Changchun Ouyang
7c46842c9e vhost: use queue id instead of constant ring index
Do not use VIRTIO_RXQ or VIRTIO_TXQ anymore; use the queue_id
instead, which will be set to a proper value for a specific queue
when we have multiple queue support enabled.

For now, queue_id is still set with VIRTIO_RXQ or VIRTIO_TXQ,
so it should not break anything.

Signed-off-by: Changchun Ouyang <changchun.ouyang@intel.com>
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Acked-by: Huawei Xie <huawei.xie@intel.com>
2015-10-26 21:23:49 +01:00
Yuanhan Liu
e049ca6d10 vhost-user: prepare multiple queue setup
All queue pairs, including the default (the first) queue pair,
are allocated dynamically, when a vring_call message is received
first time for a specific queue pair.

This is a refactor work for enabling vhost-user multiple queue;
it should not break anything as it does no functional changes:
we don't support mq set, so there is only one mq at max.

This patch is based on Changchun's patch.

Signed-off-by: Ouyang Changchun <changchun.ouyang@intel.com>
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Acked-by: Huawei Xie <huawei.xie@intel.com>
2015-10-26 21:22:37 +01:00
Yuanhan Liu
b09b198bfb vhost-user: announce queue number in message
Add VHOST_USER_GET_QUEUE_NUM message
to tell the frontend (qemu) how many queue pairs we support.

And it is initiated to VIRTIO_NET_CTRL_MQ_VQ_PAIRS_MAX.

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Acked-by: Huawei Xie <huawei.xie@intel.com>
2015-10-26 21:22:32 +01:00
Yuanhan Liu
381316f6a2 vhost-user: support protocol features
The two protocol features messages are introduced by qemu vhost
maintainer(Michael) for extendting vhost-user interface. Here is
an excerpta from the vhost-user spec:

    Any protocol extensions are gated by protocol feature bits,
    which allows full backwards compatibility on both master
    and slave.

The vhost-user multiple queue features will be treated as a vhost-user
extension, hence, we have to implement the two messages first.

VHOST_USER_PROTOCOL_FEATURES is initialized to 0, as we don't support
any yet.

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Acked-by: Huawei Xie <huawei.xie@intel.com>
2015-10-26 21:22:27 +01:00
Jerome Jutteau
6c6373c763 vhost: fix missing device checks
virtio-net search for it's device in reset_owner.
The function don't check the return result of get_config_ll_entry.
Using get_config_ll_entry in reset_owner don't show any error when the
device is not found. This patch fix this by using get_device instead
instead of get_config_ll_entry.

In user_get_vring_base, get_device return is not checked and may cause
segfault when device is not found.

Signed-off-by: Jerome Jutteau <jerome.jutteau@outscale.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
2015-10-21 12:21:18 +02:00
Jerome Jutteau
2c95f4de6a vhost: keep device identifier after reset owner
virtio-net clean and init device after a VHOST_USER_RESET_OWNER.
This reset device identifier to 0 and break ll_root listing logic.
This patch keep the old device identifier and re-write it on the cleaned
device.

Signed-off-by: Jerome Jutteau <jerome.jutteau@outscale.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
2015-10-21 12:03:57 +02:00
Huawei Xie
5dab2f1137 vhost: inject only one interrupt for a batch of packets
In merge-able RX path, vhost injects interrupts to guest for each packet.
This should degrade performance a lot.
This patch fixes this issue by injecting one interrupt for a batch of packets.

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Acked-by: Jianfeng Tan <jianfeng.tan@intel.com>
2015-09-30 01:19:19 +02:00
Yuanhan Liu
9702b2b53f vhost: fix wrong usage of eventfd_t
According to eventfd man page:

    typedef uint64_t eventfd_t;

    int eventfd_read(int fd, eventfd_t *value);
    int eventfd_write(int fd, eventfd_t value);

eventfd_t is defined for the second arg(value), but not for fd.

Here I redefine those fd fields to `int' type, which also removes
the redundant (int) cast. And as the man page stated, we should
cast 1 to eventfd_t type for eventfd_write().

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Acked-by: Huawei Xie <huawei.xie@intel.com>
2015-09-25 14:58:30 +02:00
Yuanhan Liu
dbd897d0a1 vhost: get rid of duplicate code
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Acked-by: Changchun Ouyang <changchun.ouyang@intel.com>
Acked-by: Huawei Xie <huawei.xie@intel.com>
2015-09-25 14:55:17 +02:00
Yuanhan Liu
2bb29a9fc1 vhost: fix typo
_det => _dev

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Acked-by: Changchun Ouyang <changchun.ouyang@intel.com>
Acked-by: Huawei Xie <huawei.xie@intel.com>
2015-09-25 14:53:28 +02:00
Yuanhan Liu
8426af8e57 vhost: remove extra semicolon
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Acked-by: Changchun Ouyang <changchun.ouyang@intel.com>
Acked-by: Huawei Xie <huawei.xie@intel.com>
2015-09-25 14:50:46 +02:00
Aaron Conole
a17cc4f172 vhost: build eventfd_link module against specified kernel
The vHost eventlink driver is a kernel module that requires a kernel
source/build directory to build the ko. Convert the fixed kernel build
directory specifier to one which may be user specified on the command-line.

Signed-off-by: Aaron Conole <aconole@redhat.com>
2015-09-24 22:01:53 +02:00
Ouyang Changchun
d533647a87 vhost: fix qemu shutdown
This patch originates from the patch:
"Patch for Qemu wrapper for US-VHost to ensure Qemu process ends when
VM is shutdown", http://dpdk.org/ml/archives/dev/2014-June/003606.html

Also update the vhost sample guide doc.

Signed-off-by: Claire Murphy <claire.k.murphy@intel.com>
Signed-off-by: Changchun Ouyang <changchun.ouyang@intel.com>
2015-09-24 14:57:36 +02:00
Ouyang Changchun
43866bf71d doc: fix vhost sample parameter
This commit removes the dev-index, so update the doc for this change:

Fixes: 17b8320a3e ("vhost: remove index parameter")

Signed-off-by: Changchun Ouyang <changchun.ouyang@intel.com>
2015-09-24 12:17:01 +02:00
Ouyang Changchun
1cbf787ef8 vhost: add log on socket bind failure
It adds more readable log info if a socket fails to bind to
local socket file name.

Signed-off-by: Changchun Ouyang <changchun.ouyang@intel.com>
Acked-by: Huawei Xie <huawei.xie@intel.com>
2015-07-17 14:53:26 +02:00
Huawei Xie
cca619e459 vhost: comment unwanted callback
add comment for potential unwanted callback on listenfds

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
2015-06-30 17:49:08 +02:00
Huawei Xie
292959c719 vhost: cleanup unix socket
rte_vhost_driver_unregister API will remove the listenfd from event list,
and then close it.

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Signed-off-by: Peng Sun <peng.a.sun@intel.com>
2015-06-30 17:49:08 +02:00
Huawei Xie
3670686ab9 vhost: fix race for connection fd
In the event handler of connection fd, the connection fd could be possibly
closed. The event dispatch loop would then try to remove the fd from fdset.
Between these two actions, another thread might register a new listenfd
reusing the val of just closed fd, so we couldn't call fdset_del which would
wrongly clean up the new listenfd. A new function fdset_del_slot is provided
to cleanup the fd at the specified location.

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
2015-06-30 17:49:07 +02:00
Huawei Xie
af295ad469 vhost: realloc device and queues to same numa node as vring desc
When we get the address of vring descriptor table in VHOST_SET_VRING_ADDR
message, will try to reallocate vhost device and virt queue to the same
numa node.

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
2015-06-29 18:57:33 +02:00
Huawei Xie
4113e38100 vhost: use rte_malloc to allocate device and queues
use rte_malloc to allocate vhost device and queues

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
2015-06-29 18:57:33 +02:00
Cyril Chemparathy
82be8d5442 mbuf: use offset macro
This patch simply applies the transform previously committed in
scripts/cocci/mtod-offset.cocci.  No other modifications have been
made here.

Signed-off-by: Cyril Chemparathy <cchemparathy@ezchip.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
2015-06-24 12:01:14 +02:00
Ouyang Changchun
8b636a50c2 doc: fix doxygen warnings in vhost API
Signed-off-by: Changchun Ouyang <changchun.ouyang@intel.com>
2015-06-19 12:11:53 +02:00
Ouyang Changchun
5dc985ec1d vhost: remove unnecessary descriptor length updates
Remove these unnecessary vring descriptor length updating, vhost should
not change them.
virtio in front end should assign value to desc.len for both rx and tx.

Test report: http://dpdk.org/ml/archives/dev/2015-June/018610.html

Signed-off-by: Changchun Ouyang <changchun.ouyang@intel.com>
Acked-by: Huawei Xie <huawei.xie@intel.com>
2015-06-17 16:56:24 +02:00
Ouyang Changchun
2927c37ca4 vhost: rework mergeable Rx
Extract codes into a function:
update_secure_len which is used to accumulate the buffer len in the
vring descriptors and to fill struct buf_vec.

Signed-off-by: Changchun Ouyang <changchun.ouyang@intel.com>
Acked-by: Huawei Xie <huawei.xie@intel.com>
2015-06-17 16:47:51 +02:00
Ouyang Changchun
46a8fafaa7 vhost: refine code style
Remove unnecessary new line.

Signed-off-by: Changchun Ouyang <changchun.ouyang@intel.com>
Acked-by: Huawei Xie <huawei.xie@intel.com>
2015-06-17 16:25:10 +02:00
Ouyang Changchun
f1a519ad98 vhost: fix enqueue/dequeue to handle chained vring descriptors
Vring enqueue need consider the 2 cases:
 1. use separate descriptors to contain virtio header and actual data,
    e.g. the first descriptor is for virtio header, and then followed
    by descriptors for actual data.
 2. virtio header and some data are put together in one descriptor,
    e.g. the first descriptor contain both virtio header and part of
    actual data, and then followed by more descriptors for rest of packet
    data, current DPDK based virtio-net pmd implementation is this case;

So does vring dequeue, it should not assume vring descriptor is chained
or not chained, it should use desc->flags to check whether it is chained
or not. This patch also fixes TX corrupt issue when vhost co-work with
virtio-net driver which uses one single vring descriptor (header and data
are in one descriptor) for virtio tx process on default.

Test report: http://dpdk.org/ml/archives/dev/2015-June/018610.html

Signed-off-by: Changchun Ouyang <changchun.ouyang@intel.com>
Acked-by: Huawei Xie <huawei.xie@intel.com>
2015-06-17 16:18:40 +02:00
Krishna Murthy
f75f65abf3 vhost: enable live migration
When we migrate VM, without this feature, qemu will report error:
"migrate: Migration disabled: vhost lacks VHOST_F_LOG_ALL feature".

Signed-off-by: Krishna Murthy <krishna.j.murthy@intel.com>
2015-06-12 17:07:24 +02:00
Stephen Hemminger
a43a55472f lib: fix whitespace
More places with trailing whitespace, and empty blank lines

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2015-06-12 11:10:10 +02:00
Huawei Xie
159793ac86 vhost: fix virtio freeze due to missed interrupt
Update of used->idx and read of avail->flags could be reordered.
Memory fence should be used to ensure the order, otherwise guest could see
a stale used->idx value after it toggles the interrupt suppression flag.
After guest sets the interrupt suppression flag, it will check if there
is more buffer to process through used->idx. If it sees a stale value,
it will exit the processing while host won't send interrupt to guest.

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Reviewed-by: Luke Gorrie <luke@snabb.co>
2015-05-13 12:16:47 +02:00
Bruce Richardson
ec10e8d24e vhost: remove inclusion of mbuf header
The virtio_net header file includes the mbuf header file, but it does not
need to do so as it only uses pointers to the struct rte_mbuf type, and
does not use any of the mbuf internals, nor any of the mbuf functions or
macros. Therefore the inclusion is unnecessary, and can be replaced by a
forward declaration of the mbuf type.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
2015-05-11 15:36:37 +02:00
Pavel Boldin
38f4f01f8f vhost: fix file struct leakage
Due to increased `struct file's reference counter subsequent call
to `filp_close' does not free the `struct file'. Prepend `fput' call
to decrease the reference counter.

Signed-off-by: Pavel Boldin <pboldin@mirantis.com>
Acked-by: Huawei Xie <huawei.xie@intel.com>
2015-03-26 22:33:41 +01:00
Haifeng Lin
5c561b0a23 vhost: fix index when mbuf allocation fails
When failed to malloc buffer from mempool we just update last_used_idx but
not used->idx so after many times vhost thought have handle all packets
but virtio_net thought vhost have not handle all packets and will not
update avail->idx.

Signed-off-by: Haifeng Lin <haifeng.lin@huawei.com>
Acked-by: Changchun Ouyang <changchun.ouyang@intel.com>
Acked-by: Huawei Xie <huawei.xie@intel.com>
2015-03-26 22:33:41 +01:00
Huawei Xie
857af048ac vhost: fix build
Fix the error "missing initializer" and "cast to pointer from integer of different size".

For the pointer to integer cast issue, need to investigate changing the typeof mapped_address.

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Acked-by: Changchun Ouyang <changchun.ouyang@intel.com>
2015-03-20 23:01:42 +01:00
Benoît Canet
45db8927a8 vhost: add hint on how to add or remove device to a data core
Let's make sure people will not forget to set and unset VIRTIO_DEV_RUNNING.

Signed-off-by: Benoît Canet <benoit.canet@nodalink.com>
Acked-by: Huawei Xie <huawei.xie@intel.com>
2015-03-17 12:39:52 +01:00
Huawei Xie
28a1ccca41 vhost: add build option for vhost-user
Turn on CONFIG_RTE_LIBRTE_VHOST to enable vhost.
vhost-user is turned on by default. Turn off CONFIG_RTE_LIBRTE_VHOST_USER to
enable vhost-cuse implementation.

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Acked-by: Changchun Ouyang <changchun.ouyang@intel.com>
2015-03-17 00:46:01 +01:00
Huawei Xie
4d16fff496 vhost: check file descriptor before closing
This avoids closing -1 in our case.

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Acked-by: Changchun Ouyang <changchun.ouyang@intel.com>
Acked-by: Tetsuya Mukawa <mukawa@igel.co.jp>
2015-03-09 12:46:46 +01:00
Huawei Xie
64ab971791 vhost: fix file descriptors naming
Previous vhost implementation wrongly name kickfd as callfd and callfd as kickfd.
It is functional correct, but causes confusion.
Exchange kickfd and callfd to avoid confusion.

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Acked-by: Changchun Ouyang <changchun.ouyang@intel.com>
Acked-by: Tetsuya Mukawa <mukawa@igel.co.jp>
2015-03-09 12:46:46 +01:00
Huawei Xie
aee87dd706 vhost: use loop instead of goto
This patch reorder the code a bit to use loop instead of goto.
Besides, remove abudant check 'fd != -1'.

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2015-03-09 12:46:46 +01:00
Huawei Xie
31ff0c6a45 vhost: combine select with sleep
combine sleep into select when there is no file descriptors to be monitored.

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2015-03-09 12:46:46 +01:00
Huawei Xie
391b5f425c vhost: fix crash by removing device when requested
This patch fixes the segfault issue in the case vhost receives
new VHOST_SET_MEM_TABLE message without VHOST_VRING_GET_VRING_BASE
(which we uses as the stop message).

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Acked-by: Tommy Long <thomas.long@intel.com>
2015-03-05 22:08:27 +01:00
Stefan Puiu
4db87f9739 lib: fix C++11 compilation
In C++11 concatenated string literals need to have a space in between.
Found with clang++-3.4, IIRC g++-4.8 also complains about this.

Sample error message:
error: invalid suffix on literal; C++11 requires a space between literal
and identifier [-Wreserved-user-defined-literal]

Signed-off-by: Stefan Puiu <stefan.puiu@gmail.com>
Reviewed-by: John McNamara <john.mcnamara@intel.com>
2015-02-24 02:46:52 +01:00
Huawei Xie
dbfa62d63f vhost: support dynamically registering server
* support calling rte_vhost_driver_register after rte_vhost_driver_session_start
* add mutext to protect fdset from concurrent access
* add busy flag in fdentry. this flag is set before cb and cleared after cb is finished.

mutex lock scenario in vhost:

* event_dispatch(in rte_vhost_driver_session_start) runs in a separate thread, infinitely
processing vhost messages through cb(callback).
* event_dispatch acquires the lock, get the cb and its context, mark the busy flag,
and releases the mutex.
* vserver_new_vq_conn cb calls fdset_add, which acquires the mutex and add new fd into fdset.
* vserver_message_handler cb frees data context, marks remove flag to request to delete
connfd(connection fd) from fdset.
* after cb returns, event_dispatch
  1. clears busy flag.
  2. if there is remove request, call fdset_del, which acquires mutex, checks busy flag, and
removes connfd from fdset.
* rte_vhost_driver_unregister(not implemented) runs in another thread, acquires the mutex,
calls fdset_del to remove fd(listenerfd) from fdset. Then it could free data context.

The above steps ensures fd data context isn't freed when cb is using.

VM(s) should have been shutdown before rte_vhost_driver_unregister.

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Tetsuya Mukawa <mukawa@igel.co.jp>
2015-02-24 01:38:17 +01:00
Huawei Xie
54292e9520 vhost: support ifname for vhost-user
for vhost-cuse, ifname is the name of the tap device
for vhost-user, ifname is the name of the unix domain socket path

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Signed-off-by: Przemyslaw Czesnowicz <przemyslaw.czesnowicz@intel.com>
Acked-by: Tetsuya Mukawa <mukawa@igel.co.jp>
2015-02-24 01:38:16 +01:00
Huawei Xie
8f972312b8 vhost: support vhost-user
In rte_vhost_driver_register(), vhost unix domain socket listener fd is created
and added to polled(based on select) fdset.

In rte_vhost_driver_session_start(), fds in the fdset are checked for
processing. If there is new connection from qemu, connection fd accepted is
added to polled fdset. The listener and connection fds in the fdset are
then both checked. When there is message on the connection fd, its
callback vserver_message_handler is called to process vhost-user messages.

To support identifying which virtio is from which guest VM, we could call
rte_vhost_driver_register with different socket path. Virtio devices from
same VM will connect to VM specific socket. The socket path information is
stored in the virtio_net structure.

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Signed-off-by: Przemyslaw Czesnowicz <przemyslaw.czesnowicz@intel.com>
Acked-by: Tetsuya Mukawa <mukawa@igel.co.jp>
2015-02-24 01:38:15 +01:00
Huawei Xie
fbf7e07ca1 vhost: add select based event driven processing
for more generic event driven processing, refer to:
	http://libevent.org/

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Tetsuya Mukawa <mukawa@igel.co.jp>
2015-02-24 01:38:14 +01:00
Huawei Xie
9464a44160 vhost: implement cuse memory table
remove set_memory_table ops

vhost-cuse or vhost-user will both implement their own set_memory_region handler.

In current vhost-cuse implementation, guest numa memory isn't supported.
Assume that guest memory is backed by only one file.

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Signed-off-by: Przemyslaw Czesnowicz <przemyslaw.czesnowicz@intel.com>
2015-02-24 01:38:14 +01:00
Huawei Xie
c89d3e5afd vhost: make host memory mapping more generic
This functions accepts a virtual address and pid(qemu), and maps it into
current process(vhost)'s address space.

The memory behind the virtual address should be backed by a file,
and virtual address should be the starting address.

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Acked-by: Tetsuya Mukawa <mukawa@igel.co.jp>
2015-02-24 01:38:13 +01:00
Huawei Xie
6ca9df2812 vhost: copy host memory mapping to a new cuse file
Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Acked-by: Tetsuya Mukawa <mukawa@igel.co.jp>
2015-02-24 01:38:12 +01:00
Huawei Xie
c2f60667bf vhost: move fd copying into cuse subdirectory
File descriptor is copied from qemu process into vhost process.
vhost-user doesn't need eventfd kernel module to copy fds between processes.

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Signed-off-by: Przemyslaw Czesnowicz <przemyslaw.czesnowicz@intel.com>
2015-02-24 01:38:11 +01:00
Huawei Xie
34f4c46dc4 vhost: rename header file
Rename vhost-net-cdev.h to vhost-net.h.
This file defines common operations provided by virtio-net(.c).

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Acked-by: Tetsuya Mukawa <mukawa@igel.co.jp>
2015-02-24 01:38:10 +01:00
Huawei Xie
6ee36ead58 vhost: move cuse related handling in a subdirectory
Create vhost_cuse directory and move vhost-net-cdev.c into vhost_cuse.

vhost-cuse driver will be divided into two parts: cuse driver specific message
handling(in cuse directory) and common message handling(in virtio-net.c).

vhost ioctl message is pre-processed in cuse and then sent to virtio-net
if is not terminated.

virtio-net.c provides common message handling for both vhost-cuse and vhost-user.

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Acked-by: Tetsuya Mukawa <mukawa@igel.co.jp>
2015-02-24 01:38:09 +01:00
Huawei Xie
04d696037a vhost: enable virtio control channel Rx mode
VIRTIO_NET_F_CTRL_RX is dependant on VIRTIO_NET_F_CTRL_VQ.
Observed that virtio-net driver in guest would crash with only CTRL_RX enabled.

In virtnet_send_command:

	/* Caller should know better */
	BUG_ON(!virtio_has_feature(vi->vdev, VIRTIO_NET_F_CTRL_VQ) ||
		(out + in > VIRTNET_SEND_COMMAND_SG_MAX));

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Acked-by: Tetsuya Mukawa <mukawa@igel.co.jp>
2015-02-24 01:38:07 +01:00
Neil Horman
133b75923b mk: add library version extension
To differentiate libraries that break ABI, we add a library version number
suffix to the library, which must be incremented when a given libraries ABI is
broken.  This patch enforces that addition, sets the initial abi soname
extension to 1 for each library and creates a symlink to the base SONAME so that
the test applications will link properly.

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Sergio Gonzalez Monroy <sergio.gonzalez.monroy@intel.com>
2015-02-03 16:56:58 +01:00
Neil Horman
9d41beed24 lib: provide initial versioning
Add linker version script files to each DPDK library to put a stake in the
ground from which we can start cleaning up API's

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Sergio Gonzalez Monroy <sergio.gonzalez.monroy@intel.com>
2015-02-03 16:56:58 +01:00
Ciara Loftus
f5a31522f0 vhost: add interface name for virtio
This patch fixes the issue whereby when using userspace vhost ports
in the context of vSwitching, the name provided to the hypervisor/QEMU
of the vhost tap device needs to be exposed in the library, in order
for the vSwitch to be able to direct packets to the correct device.
This patch introduces an 'ifname' member to the virtio-net structure
which is populated with the tap device name when QEMU is brought up
with a vhost device.

Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
Signed-off-by: Anthony Fee <anthonyx.fee@intel.com>
Acked-by: Huawei Xie <huawei.xie@intel.com>
2014-12-18 22:52:38 +01:00
Ouyang Changchun
90924caf08 vhost: enable promiscuous and multicast
This is to enable user space vhost receiving and forwarding broadcast
and multicast packets:
Use new option in command line to enable promisc mode;
Enable 2 bits in VMDQ RX mode: ETH_VMDQ_ACCEPT_BROADCAST and ETH_VMDQ_ACCEPT_MULTICAST.

Signed-off-by: Changchun Ouyang <changchun.ouyang@intel.com>
2014-11-12 00:10:23 +01:00
Ouyang Changchun
4e3eff86cf vhost: fix mem path check
Commit aec8283d47 fixes the compilation issue, but it leads to
one runtime issue: early exit wrongly. In some case, 'path' is NULL, but
'resolved_path' has effective path, it should continue going ahead rather
than exit.
This is due to that qemu unlink the file after it maps the huge page file.
In this special case, it is ok to check the resolved path
when path is NULL if errno indicates "No such file or directory".

Signed-off-by: Changchun Ouyang <changchun.ouyang@intel.com>
Acked-by: Huawei Xie <huawei.xie@intel.com>
2014-11-06 23:12:02 +01:00
Huawei Xie
af4f2c5feb vhost: fix code style
Fix alignment issues, lengthy lines, misordered type and other coding style issues.

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
2014-11-06 23:12:02 +01:00
Ouyang Changchun
aec8283d47 vhost: fix build without unused result
It fixes this compilation complain: "error: ignoring return value of 'realpath',
declared with attribute warn_unused_result [-Werror=unused-result]"

Signed-off-by: Changchun Ouyang <changchun.ouyang@intel.com>
Tested-by: Jingguo Fu <jingguox.fu@intel.com>
2014-10-30 00:19:41 +01:00
Thomas Monjalon
8933dae15c vhost: add in doc
Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
2014-10-13 19:39:38 +02:00
Huawei Xie
7c845c1fcd vhost: add makefile
vhost lib is turned off by default.
vhost lib is based on cuse, which requires fuse development package
to be installed.

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Acked-by: Changchun Ouyang <changchun.ouyang@intel.com>
[Thomas: fix build dependencies]
2014-10-13 19:16:54 +02:00
Huawei Xie
22c668d494 vhost: comment identified issues
1) FIXME: concurrent calls to vhost set mem table from different guests
could cause mem_temp to be overrided.
2) TODO: cmpset cost quite some cpu cyles. Allow app to disable this
feature if there is no contention in real workload.
3) FIXME: fix scatter gather mbuf copy to vhost vring chained buffers.

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Acked-by: Changchun Ouyang <changchun.ouyang@intel.com>
2014-10-13 19:16:54 +02:00
Huawei Xie
60ddca7654 vhost: coding style fixes
Fix serious coding style issues reported by checkpatch.

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Acked-by: Changchun Ouyang <changchun.ouyang@intel.com>
2014-10-13 19:16:54 +02:00
Huawei Xie
38726fb1b6 vhost: static variable fixes
Add "static" for some variable definitions.

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Acked-by: Changchun Ouyang <changchun.ouyang@intel.com>
2014-10-13 19:16:54 +02:00
Huawei Xie
da74110053 vhost: clean includes
Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Acked-by: Changchun Ouyang <changchun.ouyang@intel.com>
2014-10-13 19:16:54 +02:00
Huawei Xie
1c01d52392 vhost: add debug print
Define PRINT_PACKET and LOG_DEBUG macros.

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Acked-by: Changchun Ouyang <changchun.ouyang@intel.com>
2014-10-13 19:16:54 +02:00
Huawei Xie
a58f905514 vhost: add private context field
priv field could be used to store application specific context.

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Acked-by: Changchun Ouyang <changchun.ouyang@intel.com>
2014-10-13 19:16:54 +02:00
Huawei Xie
3d671af711 vhost: supported features
VHOST_SUPPORTED_FEATURES is the feature mask that vhost lib supports.
VHOST_FEATURES is the feature mask vhost currently supports after some features are turned on/off.

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Acked-by: Changchun Ouyang <changchun.ouyang@intel.com>
[Thomas: split patch]
2014-10-13 19:16:54 +02:00
Huawei Xie
9eed6bfd2e vhost: allow to enable or disable features
Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Acked-by: Changchun Ouyang <changchun.ouyang@intel.com>
[Thomas: split patch]
2014-10-13 19:16:54 +02:00
Huawei Xie
7202b0a824 vhost: get available vring entries
Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Acked-by: Changchun Ouyang <changchun.ouyang@intel.com>
[Thomas: split patch]
2014-10-13 19:16:54 +02:00
Huawei Xie
28689ff04d vhost: rename ops registering function
Rename init_virtio_net as rte_vhost_callback_register API.
rte_vhost_callback_register register the callbacks called when a
vhost device is created and ready to be added to data processing core
or is de-actived by guest.

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Acked-by: Changchun Ouyang <changchun.ouyang@intel.com>
2014-10-13 19:16:54 +02:00
Huawei Xie
e5c2ded503 vhost: expose register and start functions
Rename register_cuse_device as rte_vhost_driver_register API.
Rename start_session_loop as rte_vhost_driver_session_start API.

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Acked-by: Changchun Ouyang <changchun.ouyang@intel.com>
2014-10-13 19:16:54 +02:00
Huawei Xie
f782576959 vhost: get internal ops when registering
vhost_net_device_ops is internal implementation in vhost lib.
register_cuse_device will be vhost driver register API.
There is no need for it to know the internal vhost ops.
Instead, that ops is retrieved in register_cuse_device
through get_virtio_net_callbacks.

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Acked-by: Changchun Ouyang <changchun.ouyang@intel.com>
2014-10-13 19:16:54 +02:00
Huawei Xie
17b8320a3e vhost: remove index parameter
Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Acked-by: Changchun Ouyang <changchun.ouyang@intel.com>
2014-10-13 19:16:54 +02:00
Huawei Xie
a4fff3bba8 vhost: enqueue/dequeue burst
rte_vhost_enqueue_burst copies host packets to guest.
rte_vhost_enqueue_burst will call virtio_dev_rx and virtio_dev_merge_rx
respectively depending on whether merge-able feature is negotiated or not
in the vhost device.

virtio_dev_merge_tx is renamed to rte_vhost_dequeue_burst.
rte_vhost_dequeue_burst gets to-be-sent packets from guest.

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Acked-by: Changchun Ouyang <changchun.ouyang@intel.com>
[Thomas: merged patches]
2014-10-13 19:16:54 +02:00
Huawei Xie
830bea8bc4 vhost: add queue id parameter
queue_id parameter is added to Rx/Tx functions for multiple queue support
in future.

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Acked-by: Changchun Ouyang <changchun.ouyang@intel.com>
2014-10-13 19:16:54 +02:00
Huawei Xie
fa325fa413 vhost: calculate mbuf size
As a lib, we have no idea the app defined mbuf size.
This patch will calculate mbuf size dynamically.

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Acked-by: Changchun Ouyang <changchun.ouyang@intel.com>
2014-10-13 19:16:53 +02:00
Huawei Xie
20f16ce646 vhost: return packets to upper layer
This patch makes virtio_dev_merge_tx return the received packets to app layer.
Previously virtio_tx_route was called to route these packets and then free them.

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Acked-by: Changchun Ouyang <changchun.ouyang@intel.com>
2014-10-13 19:16:53 +02:00
Huawei Xie
7f456f6d61 vhost: move address translation function
Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Acked-by: Changchun Ouyang <changchun.ouyang@intel.com>
[Thomas: split from a previous patch]
2014-10-13 19:16:04 +02:00
Huawei Xie
c19dc7db15 vhost: move internal structure
The structure virtio_net_config_ll is moved to virtio_net.c.
It is related to internal virtio device management,
so it should not be exposed to other files.

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Acked-by: Changchun Ouyang <changchun.ouyang@intel.com>
2014-10-13 19:13:10 +02:00
Huawei Xie
8a7576c3dc vhost: remove retry logic
It was used to wait some time and retry when there are not enough descriptors.
App could implement this policy easily if it needs.

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Acked-by: Changchun Ouyang <changchun.ouyang@intel.com>
2014-10-13 19:13:10 +02:00
Huawei Xie
0a739691fc vhost: remove zero copy memory region generation logic
Currently zero copy feature isn't generic as it couples closely with nic.
It isn't put in the vhost lib in this version.
gpa(guest physical address) to hpa(host physical address) mapping region
logic is removed.

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Acked-by: Changchun Ouyang <changchun.ouyang@intel.com>
2014-10-13 19:13:10 +02:00
Huawei Xie
68e6490476 vhost: remove switching related logics
The following logics will be moved to vhost example:
 1. mac learning, which is used to learn the mac address from the first
transmitted packet of guest and bind the vhost device to a queue in a
pool of VMDQ.
 2. VMDQ mac/vlan filter: Each pool the vhost device is bind to is
assigned a mac/vlan filter.
 3. num_devices is used to specify the maximum vhost devices the nic supports.

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Acked-by: Changchun Ouyang <changchun.ouyang@intel.com>
2014-10-13 19:13:10 +02:00
Huawei Xie
f14c0b4db8 vhost: remove useless code for Rx/Tx
Remove all other codes and only keep virtio_dev_rx, copy_from_mbuf_to_vring,
virtio_dev_merge_rx, virtio_dev_merge_tx.

Previous vhost merge-able feature introduces another version of tx function,
virtio_dev_merge_tx. Actually it is not related to merge-able feature but is
the fix for memcpy between mbuf and vring descriptors.
This lib will create the tx functions based on virtio_dev_merge_tx.

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Acked-by: Changchun Ouyang <changchun.ouyang@intel.com>
[Thomas: do not remove code used or moved later]
2014-10-13 19:11:38 +02:00
Huawei Xie
5c7a80aec3 vhost: move from examples to dedicated library
Those files will be refactored in subsequent patches to form user space
vhost library.
Makefile and main.h are removed.
main.c is renamed to vhost_rxtx.c and will provide vring enqueue/dequeue API.
virtio-net.h is renamed to rte_virtio_net.h which is the API header file.

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Acked-by: Changchun Ouyang <changchun.ouyang@intel.com>
[Thomas: remove from examples Makefile and merge file renaming]
2014-10-13 19:10:09 +02:00