numam-dpdk

Author	SHA1	Message	Date
Tetsuya Mukawa	fb871d0a4d	vhost: fix default value of kickfd and callfd Currently, default values of kickfd and callfd are -1. If the values are -1, current code guesses kickfd and callfd haven't been initialized yet. Then vhost library will guess the virtqueue isn't ready for processing. But callfd and kickfd will be set as -1 when "--enable-kvm" isn't specified in QEMU command line. It means we cannot treat -1 as uninitialized state. The patch defines -1 and -2 as VIRTIO_INVALID_EVENTFD and VIRTIO_UNINITIALIZED_EVENTFD, and uses VIRTIO_UNINITIALIZED_EVENTFD for the default values of kickfd and callfd. Signed-off-by: Tetsuya Mukawa <mukawa@igel.co.jp> Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>	2016-03-15 00:20:29 +01:00
Yuanhan Liu	bb66588304	vhost: broadcast RARP by injecting in receiving mbuf array Broadcast RARP packet by injecting it to receiving mbuf array at rte_vhost_dequeue_burst(). Commit `33226236a3` ("vhost: handle request to send RARP") iterates all host interfaces and then broadcast it by all of them. It did notify the switches about the new location of the migrated VM, however, the mac learning table in the target host is wrong (at least in my test with OVS): $ ovs-appctl fdb/show ovsbr0 port VLAN MAC Age 1 0 b6:3c:72:71:cd:4d 10 LOCAL 0 b6:3c:72:71:cd:4e 10 LOCAL 0 52:54:00:12:34:68 9 1 0 56:f6:64:2c:bc:c0 1 Where 52:54:00:12:34:68 is the mac of the VM. As you can see from the above, the port learned is "LOCAL", which is the "ovsbr0" port. That is reasonable, since we indeed send the pkt by the "ovsbr0" interface. The wrong mac table lead all the packets to the VM go to the "ovsbr0" in the end, which ends up with all packets being lost, until the guest send a ARP quest (or reply) to refresh the mac learning table. Jianfeng then came up with a solution I have thought of firstly but NAKed by myself, concerning it has potential issues [0]. The solution is as title stated: broadcast the RARP packet by injecting it to the receiving mbuf arrays at rte_vhost_dequeue_burst(). The re-bring of that idea made me think it twice; it looked like a false concern to me then. And I had done a rough verification: it worked as expected. [0]: http://dpdk.org/ml/archives/dev/2016-February/033527.html Another note is that while preparing this version, I found that DPDK has some ARP related structures and macros defined. So, use them instead of the one from standard header files here. Cc: Thibaut Collet <thibaut.collet@6wind.com> Suggested-by: Jianfeng Tan <jianfeng.tan@intel.com> Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>	2016-02-29 16:55:30 +01:00
Rich Lane	a90ca1a12e	vhost: remove device operations pointers The vhost_net_device_ops indirection is unnecessary because there is only one implementation of the vhost common code. Removing it makes the code more readable. Signed-off-by: Rich Lane <rich.lane@bigswitch.com> Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>	2016-02-19 19:33:31 +01:00
Rich Lane	ca67ed289a	vhost: fix leak of fds and mmaps The common vhost code only supported a single mmap per device. vhost-user worked around this by saving the address/length/fd of each mmap after the end of the rte_virtio_memory struct. This only works if the vhost-user code frees dev->mem, since the common code is unaware of the extra info. The VHOST_USER_RESET_OWNER message is one situation where the common code frees dev->mem and leaks the fds and mappings. This happens every time I shut down a VM. The new code calls back into the implementation (vhost-user or vhost-cuse) to clean up these resources. The vhost-cuse changes are only compile tested. Signed-off-by: Rich Lane <rich.lane@bigswitch.com> Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>	2016-02-19 16:13:32 +01:00
Yuanhan Liu	d22929db97	vhost: remove duplicate header include unistd.h has been included twice; remove one. Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>	2016-02-19 16:00:03 +01:00
Yuanhan Liu	d639996a74	vhost: enable log_shmfd protocol feature To claim that we support vhost-user live migration support: SET_LOG_BASE request will be send only when this feature flag is set. Besides this flag, we actually need another feature flag set to make vhost-user live migration work: VHOST_F_LOG_ALL. Which, however, has been enabled long time ago. Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Tested-by: Pavel Fedin <p.fedin@samsung.com>	2016-02-19 15:53:38 +01:00
Yuanhan Liu	33226236a3	vhost: handle request to send RARP While in former patch we enabled GUEST_ANNOUNCE feature, so that the guest OS will broadcast a GARP message after migration to notify the switch about the new location of migrated VM, the thing is that GUEST_ANNOUNCE is enabled since kernel v3.5 only. For older kernel, VHOST_USER_SEND_RARP request comes to rescue. The payload of this new request is the mac address of the migrated VM, with that, we could construct a RARP message, and then broadcast it to host interfaces. That's how this patch works: - list all interfaces, with the help of SIOCGIFCONF ioctl command - construct an RARP message and broadcast it Cc: Thibaut Collet <thibaut.collet@6wind.com> Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>	2016-02-19 15:49:02 +01:00
Yuanhan Liu	54f9e32305	vhost: handle dirty pages logging request VHOST_USER_SET_LOG_BASE request is used to tell the backend (dpdk vhost-user) where we should log dirty pages, and how big the log buffer is. This request introduces a new payload: typedef struct VhostUserLog { uint64_t mmap_size; uint64_t mmap_offset; } VhostUserLog; Also, a fd is delivered from QEMU by ancillary data. With those info given, an area of memory is mmaped, assigned to dev->log_base, for logging dirty pages. Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Signed-off-by: Victor Kaplansky <victork@redhat.com> Tested-by: Pavel Fedin <p.fedin@samsung.com>	2016-02-19 15:42:54 +01:00
Huawei Xie	16321f1caa	vhost: fix missed unlock CID 107113 (#1 of 1): Missing unlock (LOCK)5. missing_unlock: Returning without unlocking pfdset->fd_mutex. Fixes: `fbf7e07ca1` ("vhost: add select based event driven processing") Signed-off-by: Huawei Xie <huawei.xie@intel.com> Acked-by: John McNamara <john.mcnamara@intel.com>	2015-12-13 02:13:39 +01:00
Huawei Xie	3b77b90e34	vhost: fix missed break in switch CID 107114 (#1 of 1): Missing break in switch Fixes: `8f972312b8` ("vhost: support vhost-user") Signed-off-by: Huawei Xie <huawei.xie@intel.com> Acked-by: John McNamara <john.mcnamara@intel.com>	2015-12-13 02:12:22 +01:00
Huawei Xie	36b2449d86	vhost: fix out-of-bounds read CID 107126 (#1 OF 1): Out-of-bounds read Fixes: `8f972312b8` ("vhost: support vhost-user") Signed-off-by: Huawei Xie <huawei.xie@intel.com> Acked-by: John McNamara <john.mcnamara@intel.com>	2015-12-13 02:11:19 +01:00
Victor Kaplansky	cd81ee7cc2	vhost: fix enabling vring per queue The VHOST_USER_SET_VRING_ENABLE request was sent for each queue-pair. However, it's changed to be sent per queue in the queue-pair by QEMU commit dc3db6ad ("vhost-user: start/stop all rings"). The change is reasonable, as we send all other request per queue, instead of queue-pair. Hence we should do proper changes to adapt to the QEMU change here. Otherwise, a segfault will be triggered when last TX queue was enabled. Signed-off-by: Victor Kaplansky <victork@redhat.com> Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>	2015-11-24 21:34:11 +01:00
Jianfeng Tan	ec09c280b8	vhost: fix mmap not aligned with hugepage size This patch fixes a bug under lower version linux kernel, mmap() fails when length is not aligned with hugepage size. mmap() without flag of MAP_ANONYMOUS, should be called with length argument aligned with hugepagesz at older longterm version Linux, like 2.6.32 and 3.2.72, or mmap() will fail with EINVAL. This bug was fixed in Linux kernel by commit: dab2d3dc45ae7343216635d981d43637e1cb7d45 To avoid failure, make sure in caller to keep length aligned. Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com> Acked-by: Huawei Xie <huawei.xie@intel.com>	2015-11-24 21:34:11 +01:00
Yuanhan Liu	71dfdbe66a	vhost: fix build with kernel < 3.8 Fix build error: virtio-net.c:80:89: error: ‘VIRTIO_NET_F_MQ’ undeclared here rte_virtio_net.h:109: error: ‘VIRTIO_NET_CTRL_MQ_VQ_PAIRS_MAX’ undeclared here Above two virtio-net MQ macros are introduced since kernel v3.8. For older kernel, we should not reference them directly, hence, this patch introduced two wrapper macros, with proper values being set depending on we support MQ or not. Fixes: `b09b198bfb` ("vhost-user: announce queue number in message") Reported-by: Yongjie Gu <yongjiex.gu@intel.com> Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Tested-by: David Marchand <david.marchand@6wind.com>	2015-10-30 15:55:05 +01:00
Tetsuya Mukawa	07d37fbf5e	vhost: fix crash with multiqueue enabled The patch fixes wrong handling of virtqueue array index when GET_VRING_BASE message comes. Signed-off-by: Tetsuya Mukawa <mukawa@igel.co.jp> Acked-by: Huawei Xie <huawei.xie@intel.com> Acked-by: Flavio Leitner <fbl@sysclose.org>	2015-10-30 15:46:01 +01:00
Yuanhan Liu	19d4d7ef2a	vhost-user: enable multiple queue By setting VHOST_USER_PROTOCOL_F_MQ protocol feature bit, and VIRTIO_NET_F_MQ feature bit. Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Acked-by: Huawei Xie <huawei.xie@intel.com>	2015-10-26 21:23:54 +01:00
Changchun Ouyang	77d20126b4	vhost-user: handle message to enable vring This message is used to enable/disable a specific vring queue pair. The first queue pair is enabled by default. Signed-off-by: Changchun Ouyang <changchun.ouyang@intel.com> Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Acked-by: Huawei Xie <huawei.xie@intel.com>	2015-10-26 21:23:53 +01:00
Yuanhan Liu	e049ca6d10	vhost-user: prepare multiple queue setup All queue pairs, including the default (the first) queue pair, are allocated dynamically, when a vring_call message is received first time for a specific queue pair. This is a refactor work for enabling vhost-user multiple queue; it should not break anything as it does no functional changes: we don't support mq set, so there is only one mq at max. This patch is based on Changchun's patch. Signed-off-by: Ouyang Changchun <changchun.ouyang@intel.com> Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Acked-by: Huawei Xie <huawei.xie@intel.com>	2015-10-26 21:22:37 +01:00
Yuanhan Liu	b09b198bfb	vhost-user: announce queue number in message Add VHOST_USER_GET_QUEUE_NUM message to tell the frontend (qemu) how many queue pairs we support. And it is initiated to VIRTIO_NET_CTRL_MQ_VQ_PAIRS_MAX. Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Acked-by: Huawei Xie <huawei.xie@intel.com>	2015-10-26 21:22:32 +01:00
Yuanhan Liu	381316f6a2	vhost-user: support protocol features The two protocol features messages are introduced by qemu vhost maintainer(Michael) for extendting vhost-user interface. Here is an excerpta from the vhost-user spec: Any protocol extensions are gated by protocol feature bits, which allows full backwards compatibility on both master and slave. The vhost-user multiple queue features will be treated as a vhost-user extension, hence, we have to implement the two messages first. VHOST_USER_PROTOCOL_FEATURES is initialized to 0, as we don't support any yet. Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Acked-by: Huawei Xie <huawei.xie@intel.com>	2015-10-26 21:22:27 +01:00
Jerome Jutteau	6c6373c763	vhost: fix missing device checks virtio-net search for it's device in reset_owner. The function don't check the return result of get_config_ll_entry. Using get_config_ll_entry in reset_owner don't show any error when the device is not found. This patch fix this by using get_device instead instead of get_config_ll_entry. In user_get_vring_base, get_device return is not checked and may cause segfault when device is not found. Signed-off-by: Jerome Jutteau <jerome.jutteau@outscale.com> Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>	2015-10-21 12:21:18 +02:00
Yuanhan Liu	9702b2b53f	vhost: fix wrong usage of eventfd_t According to eventfd man page: typedef uint64_t eventfd_t; int eventfd_read(int fd, eventfd_t *value); int eventfd_write(int fd, eventfd_t value); eventfd_t is defined for the second arg(value), but not for fd. Here I redefine those fd fields to `int' type, which also removes the redundant (int) cast. And as the man page stated, we should cast 1 to eventfd_t type for eventfd_write(). Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Acked-by: Huawei Xie <huawei.xie@intel.com>	2015-09-25 14:58:30 +02:00
Yuanhan Liu	dbd897d0a1	vhost: get rid of duplicate code Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Acked-by: Changchun Ouyang <changchun.ouyang@intel.com> Acked-by: Huawei Xie <huawei.xie@intel.com>	2015-09-25 14:55:17 +02:00
Ouyang Changchun	1cbf787ef8	vhost: add log on socket bind failure It adds more readable log info if a socket fails to bind to local socket file name. Signed-off-by: Changchun Ouyang <changchun.ouyang@intel.com> Acked-by: Huawei Xie <huawei.xie@intel.com>	2015-07-17 14:53:26 +02:00
Huawei Xie	cca619e459	vhost: comment unwanted callback add comment for potential unwanted callback on listenfds Signed-off-by: Huawei Xie <huawei.xie@intel.com>	2015-06-30 17:49:08 +02:00
Huawei Xie	292959c719	vhost: cleanup unix socket rte_vhost_driver_unregister API will remove the listenfd from event list, and then close it. Signed-off-by: Huawei Xie <huawei.xie@intel.com> Signed-off-by: Peng Sun <peng.a.sun@intel.com>	2015-06-30 17:49:08 +02:00
Huawei Xie	3670686ab9	vhost: fix race for connection fd In the event handler of connection fd, the connection fd could be possibly closed. The event dispatch loop would then try to remove the fd from fdset. Between these two actions, another thread might register a new listenfd reusing the val of just closed fd, so we couldn't call fdset_del which would wrongly clean up the new listenfd. A new function fdset_del_slot is provided to cleanup the fd at the specified location. Signed-off-by: Huawei Xie <huawei.xie@intel.com>	2015-06-30 17:49:07 +02:00
Huawei Xie	857af048ac	vhost: fix build Fix the error "missing initializer" and "cast to pointer from integer of different size". For the pointer to integer cast issue, need to investigate changing the typeof mapped_address. Signed-off-by: Huawei Xie <huawei.xie@intel.com> Acked-by: Changchun Ouyang <changchun.ouyang@intel.com>	2015-03-20 23:01:42 +01:00
Huawei Xie	64ab971791	vhost: fix file descriptors naming Previous vhost implementation wrongly name kickfd as callfd and callfd as kickfd. It is functional correct, but causes confusion. Exchange kickfd and callfd to avoid confusion. Signed-off-by: Huawei Xie <huawei.xie@intel.com> Acked-by: Changchun Ouyang <changchun.ouyang@intel.com> Acked-by: Tetsuya Mukawa <mukawa@igel.co.jp>	2015-03-09 12:46:46 +01:00
Huawei Xie	aee87dd706	vhost: use loop instead of goto This patch reorder the code a bit to use loop instead of goto. Besides, remove abudant check 'fd != -1'. Signed-off-by: Huawei Xie <huawei.xie@intel.com> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>	2015-03-09 12:46:46 +01:00
Huawei Xie	31ff0c6a45	vhost: combine select with sleep combine sleep into select when there is no file descriptors to be monitored. Signed-off-by: Huawei Xie <huawei.xie@intel.com> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>	2015-03-09 12:46:46 +01:00
Huawei Xie	391b5f425c	vhost: fix crash by removing device when requested This patch fixes the segfault issue in the case vhost receives new VHOST_SET_MEM_TABLE message without VHOST_VRING_GET_VRING_BASE (which we uses as the stop message). Signed-off-by: Huawei Xie <huawei.xie@intel.com> Acked-by: Tommy Long <thomas.long@intel.com>	2015-03-05 22:08:27 +01:00
Huawei Xie	dbfa62d63f	vhost: support dynamically registering server * support calling rte_vhost_driver_register after rte_vhost_driver_session_start * add mutext to protect fdset from concurrent access * add busy flag in fdentry. this flag is set before cb and cleared after cb is finished. mutex lock scenario in vhost: * event_dispatch(in rte_vhost_driver_session_start) runs in a separate thread, infinitely processing vhost messages through cb(callback). * event_dispatch acquires the lock, get the cb and its context, mark the busy flag, and releases the mutex. * vserver_new_vq_conn cb calls fdset_add, which acquires the mutex and add new fd into fdset. * vserver_message_handler cb frees data context, marks remove flag to request to delete connfd(connection fd) from fdset. * after cb returns, event_dispatch 1. clears busy flag. 2. if there is remove request, call fdset_del, which acquires mutex, checks busy flag, and removes connfd from fdset. * rte_vhost_driver_unregister(not implemented) runs in another thread, acquires the mutex, calls fdset_del to remove fd(listenerfd) from fdset. Then it could free data context. The above steps ensures fd data context isn't freed when cb is using. VM(s) should have been shutdown before rte_vhost_driver_unregister. Signed-off-by: Huawei Xie <huawei.xie@intel.com> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com> Acked-by: Tetsuya Mukawa <mukawa@igel.co.jp>	2015-02-24 01:38:17 +01:00
Huawei Xie	54292e9520	vhost: support ifname for vhost-user for vhost-cuse, ifname is the name of the tap device for vhost-user, ifname is the name of the unix domain socket path Signed-off-by: Huawei Xie <huawei.xie@intel.com> Signed-off-by: Przemyslaw Czesnowicz <przemyslaw.czesnowicz@intel.com> Acked-by: Tetsuya Mukawa <mukawa@igel.co.jp>	2015-02-24 01:38:16 +01:00
Huawei Xie	8f972312b8	vhost: support vhost-user In rte_vhost_driver_register(), vhost unix domain socket listener fd is created and added to polled(based on select) fdset. In rte_vhost_driver_session_start(), fds in the fdset are checked for processing. If there is new connection from qemu, connection fd accepted is added to polled fdset. The listener and connection fds in the fdset are then both checked. When there is message on the connection fd, its callback vserver_message_handler is called to process vhost-user messages. To support identifying which virtio is from which guest VM, we could call rte_vhost_driver_register with different socket path. Virtio devices from same VM will connect to VM specific socket. The socket path information is stored in the virtio_net structure. Signed-off-by: Huawei Xie <huawei.xie@intel.com> Signed-off-by: Przemyslaw Czesnowicz <przemyslaw.czesnowicz@intel.com> Acked-by: Tetsuya Mukawa <mukawa@igel.co.jp>	2015-02-24 01:38:15 +01:00
Huawei Xie	fbf7e07ca1	vhost: add select based event driven processing for more generic event driven processing, refer to: http://libevent.org/ Signed-off-by: Huawei Xie <huawei.xie@intel.com> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com> Acked-by: Tetsuya Mukawa <mukawa@igel.co.jp>	2015-02-24 01:38:14 +01:00

36 Commits