numam-dpdk

Author	SHA1	Message	Date
Maxime Coquelin	c00bb88d35	vhost: add number of fds to vhost-user messages As soon as some ancillary data (fds) are received, it is copied without checking its length. This patch adds the number of fds received to the message, which is set in read_vhost_message(). This is preliminary work to support sending fds to Qemu. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2018-10-18 10:24:39 +02:00
Marvin Liu	22d2e78840	vhost: advertise support in-order feature If devices always use descriptors in the same order in which they have been made available. These devices can offer the VIRTIO_F_IN_ORDER feature. If negotiated, this knowledge allows devices to notify the use of a batch of buffers to virtio driver by only writing used ring index. Vhost user device has supported this feature by default. If vhost dequeue zero is enabled, should disable VIRTIO_F_IN_ORDER as vhost can’t assure that descriptors returned from NIC are in order. Signed-off-by: Marvin Liu <yong.liu@intel.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2018-07-03 01:35:58 +02:00
Tonghao Zhang	52d874dc67	vhost: fix crash on closing in client mode when rte_vhost_driver_unregister detstroy the vsocket, we should set it to NULL after freeing it, because in client mode, the conn may be added to reconnect thread while vsocket is destroyed. In one case, if qemu create vhostuser port as a server with the same unix path, the reconnect thread will reconnect to it while vsocket is destroyed. To fix this: 1. set vsocket to NULL after free it. 2. remove the reconnection from reconnection thread in suitable position. Cc: stable@dpdk.org Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2018-05-14 22:30:48 +01:00
Tonghao Zhang	8b4b949144	vhost: fix dead lock on closing in server mode When qemu close the unix socket fd of the vhostuser as a server, and then immediately delete the vhostuser port on openvswitch. There will be a deadlock. A thread (fdset event thread): B thread: 1. fdset_event_dispatch rte_vhost_driver_unregister 2. set the fd busy to 1. lock vsocket->conn_mutex 3. vhost_user_read_cb fdset_del waits busy changed to 0. 4. vhost peer closed, remove the conn from vsocket->conn_list: lock vsocket->conn_mutex 5. set the fd busy to 0 Fixes: `65388b43f5` ("vhost: fix fd leaks for vhost-user server mode") Cc: stable@dpdk.org Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Acked-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2018-05-14 22:29:59 +01:00
Olivier Matz	6383d2642b	eal: set name when creating a control thread To avoid code duplication, add a parameter to rte_ctrl_thread_create() to specify the name of the thread. This requires to add a wrapper for the thread start routine in rte_thread_init(), which will first wait that the thread is configured. Signed-off-by: Olivier Matz <olivier.matz@6wind.com> Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>	2018-04-25 00:51:31 +02:00
Olivier Matz	9e5afc72c9	eal: add function to create control threads Many parts of dpdk use their own management threads. Introduce a new wrapper for thread creation that will be extended in next commits to set the name and affinity. To be consistent with other DPDK APIs, the return value is negative in case of error, which was not the case for pthread_create(). Signed-off-by: Olivier Matz <olivier.matz@6wind.com> Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>	2018-04-25 00:51:31 +02:00
Olivier Matz	dec7b1884a	use sizeof to avoid double use of a length define Only a cosmetic change: the *_LEN defines are already used when defining the buffer. Using sizeof() ensures that the length stays consistent, even if the definition is modified. Signed-off-by: Olivier Matz <olivier.matz@6wind.com> Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>	2018-04-25 00:51:31 +02:00
Zhihong Wang	07718b4f87	vhost: adapt library for selective datapath This patch adapts vhost lib for selective datapath by calling device ops at the corresponding stage. Signed-off-by: Zhihong Wang <zhihong.wang@intel.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2018-04-14 00:40:21 +02:00
Zhihong Wang	b4953225ce	vhost: add APIs for datapath configuration This patch adds APIs for datapath configuration. The did of the vhost-user socket can be set to identify the backend device, in this case each vhost-user socket can have only 1 connection. The did is set to -1 by default when the software datapath is used. Signed-off-by: Zhihong Wang <zhihong.wang@intel.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2018-04-14 00:40:21 +02:00
Tonghao Zhang	d64c43773a	vhost: add pipe event for optimizing negotiation When vhost-user connects qemu successfully, dpdk will call the vhost_user_add_connection to add unix socket fd to poll. And fdset_add only set the socket fd to a fdentry while poll may sleep now. In a general case, this is no problem. But if we use hot update for vhost-user, most downtime of VMs network is 750+ms. This patch adds pipe event, so after connections are ok, dpdk rebuild the poll immediately. With this patch, the most downtime is 20~30ms. Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2018-03-30 17:25:45 +02:00
Tonghao Zhang	9426ee2678	vhost: move stdbool include The vhost.h file uses bool type, but not include stdbool header file. If other c files include vhost.h directly, there will be a compile error. This patch will be used in the next patch. Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2018-03-30 14:08:44 +02:00
Tonghao Zhang	ce5bd5fcae	vhost: add fdset-event thread name This patch adds the name for vhost fdset thread. It can help us to know whether the thread is running. Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Reviewed-by: Jianfeng Tan <jianfeng.tan@intel.com>	2018-03-30 14:08:44 +02:00
Tonghao Zhang	2db2d3220b	vhost: raise error on fdset-thread creation When first call the 'rte_vhost_driver_start', the fdset_event_dispatch thread should be created successfully. Because the vhost uses it to poll socket events for vhost server or clients. Without it, for example, vhost will not get the connection event. This patch returns err code directly when created not successful. Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Reviewed-by: Jianfeng Tan <jianfeng.tan@intel.com>	2018-03-30 14:08:44 +02:00
Tiwei Bie	7a36967029	vhost: do not generate signal when sendmsg fails More precisely, do not generate a SIGPIPE signal if the peer has closed the connection. Otherwise, it will terminate the process by default. As a library, we should avoid terminating the application process when error happens and just need to return with an error. Signed-off-by: Tiwei Bie <tiwei.bie@intel.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2018-03-30 14:08:44 +02:00
Ilya Maximets	1cf62d9685	vhost: add note about sockets in server mode From time to time, someone sends patches about unlinking existing sockets when registering a vhost user in server mode. A recent example: http://dpdk.org/ml/archives/dev/2018-February/090025.html This problem has been discussed many times, and it was made clear that the library should not unlink files given by the application in order to avoid possible security problems, such as removing random files used by other programs. One of the first discussions: http://dpdk.org/ml/archives/dev/2015-December/030326.html To avoid such patches in the future, it was decided to add a comment that explains what is happening and tries to describe the reasoning. Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2018-03-30 14:08:43 +02:00
Tomasz Kulasek	aa001111b0	vhost: check cmsg not null Fixes: `8f972312b8` ("vhost: support vhost-user") Cc: stable@dpdk.org Signed-off-by: Pawel Wodkowski <pawelx.wodkowski@intel.com> Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com> Reviewed-by: Jianfeng Tan <jianfeng.tan@intel.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2018-03-30 14:08:42 +02:00
Stefan Hajnoczi	2042cf7194	vhost: clear out unused SCM_RIGHTS file descriptors The number of file descriptors received is not stored by vhost_user.c. vhost_user_set_mem_table() assumes that memory.nregions matches the number of file descriptors received, but nothing guarantees this: for (i = 0; i < memory.nregions; i++) close(pmsg->fds[i]); Another questionable code snippet is: case VHOST_USER_SET_LOG_FD: close(msg.fds[0]); If not enough file descriptors were received then fds[] contains uninitialized data from the stack (see read_fd_message()). This might cause non-vhost file descriptors to be closed if the uninitialized data happens to match. Refactoring vhost_user.c to pass around and check the number of file descriptors everywhere would make the code more complex. It is simpler for read_fd_message() to set unused elements in fds[] to -1. This way close(-1) is called and no harm is done. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2018-03-30 14:08:42 +02:00
Stefan Hajnoczi	1c717af4c6	vhost: add flag for built-in virtio driver The librte_vhost API is used in two ways: 1. As a vhost net device backend via rte_vhost_enqueue/dequeue_burst(). 2. As a library for implementing vhost device backends. There is no distinction between the two at the API level or in the librte_vhost implementation. For example, device state is kept in "struct virtio_net" regardless of whether this is actually a net device backend or whether the built-in virtio_net.c driver is in use. The virtio_net.c driver should be a librte_vhost API client just like the vhost-scsi code and have no special access to vhost.h internals. Unfortunately, fixing this requires significant librte_vhost API changes. This patch takes a different approach: keep the librte_vhost API unchanged but track whether the built-in virtio_net.c driver is in use. See the next patch for a bug fix that requires knowledge of whether virtio_net.c is in use. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com> Acked-by: Yuanhan Liu <yliu@fridaylinux.org>	2018-02-05 15:11:07 +01:00
Olivier Matz	da51d2f6b8	vhost: fix error code check when creating thread On error, pthread_create() returns a positive number (errno). Fix the test on the return value. Fixes: `af14759181` ("vhost: introduce API to start a specific driver") Fixes: `e623e0c6d8` ("vhost: add reconnect ability") Cc: stable@dpdk.org Signed-off-by: Olivier Matz <olivier.matz@6wind.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com> Reviewed-by: Jens Freimann <jfreimann@redhat.com>	2018-01-16 18:47:49 +01:00
Tonghao Zhang	ae0b1de941	vhost: add reconnect thread name for client mode This patch adds the name for vhost-user reconnect thread. It can help us to know whether the thread is running. Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Acked-by: Yuanhan Liu <yliu@fridaylinux.org>	2018-01-16 18:47:49 +01:00
Bruce Richardson	369991d997	lib: use SPDX tag for Intel copyright files Replace the BSD license header with the SPDX tag for files with only an Intel copyright on them. Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>	2018-01-04 22:41:39 +01:00
Maxime Coquelin	002d6a7e55	vhost: add flag to enable IOMMU support Qemu versions from v2.7.0 to v2.9.0 have their reply-ack protocol feature implementation broken with multiqueue. The reply-ack protocol feature is optional except for IOMMU feature. This patch introduce a new RTE_VHOST_USER_IOMMU_SUPPORT flag to enable VIRTIO_F_IOMMU_PLATFORM virtio feature. By default, the IOMMU support is now disabled. Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> Acked-by: Yuanhan Liu <yliu@fridaylinux.org> Tested-by: Mark Kavanagh <mark.b.kavanagh@intel.com> Acked-by: Mark Kavanagh <mark.b.kavanagh@intel.com>	2017-11-07 14:19:11 +01:00
Dariusz Stojaczyk	efba12a78d	vhost: add user callbacks for socket open/close Added new callbacks to notify about socket connection status. As destroy_device is used for virtqueue processing pause as well as connection close, the user has no distinction between those. Consider the following scenario: rte_vhost: received SET_VRING_BASE message, calling destroy_device() as usual user: end-user asks to remove the device (together with socket file), OK, device is not in use - that's NOT the behavior we want calling rte_vhost_driver_unregister() etc. Instead of changing new_device/destroy_device callbacks and breaking the ABI, a set of new functions new_connection/destroy_connection has been added. Signed-off-by: Dariusz Stojaczyk <dariuszx.stojaczyk@intel.com> Reviewed-by: Jens Freimann <jfreimann@redhat.com>	2017-10-10 15:54:31 +02:00
Zhiyong Yang	78b2e3bae1	vhost: fix initialization Exception handling is executed in the normal path and it will cause vhost-user init failure. Fixes: `d6983a70e2` ("vhost: check return of pthread calls") Reported-by: Lei Yao <lei.a.yao@intel.com> Signed-off-by: Zhiyong Yang <zhiyong.yang@intel.com> Tested-by: Lei Yao <lei.a.yao@intel.com> Reviewed-by: Jens Freimann <jfreimann@redhat.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2017-07-19 22:49:47 +03:00
Jens Freimann	2dfeebe265	vhost: check return of mutex initialization Check return value of pthread_mutex_init(). Also destroy mutex in case of other erros before returning. Signed-off-by: Jens Freimann <jfreimann@redhat.com> Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>	2017-07-04 11:30:54 +02:00
Jens Freimann	d6983a70e2	vhost: check return of pthread calls Make sure we catch and log failed calls to pthread functions. Signed-off-by: Jens Freimann <jfreimann@redhat.com> Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>	2017-07-04 11:30:47 +02:00
Jens Freimann	6846128798	vhost: add missing check in driver registration Add a check for strdup() return value and fail gracefully if we get a bad return code. Signed-off-by: Jens Freimann <jfreimann@redhat.com> Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>	2017-07-04 11:11:01 +02:00
Daniel Verkamp	3cb502b310	vhost: clean up per-socket mutex vsocket->conn_mutex was allocated with pthread_mutex_init() but never freed with pthread_mutex_destroy(). This is a potential memory leak, depending on how pthread_mutex_t is implemented. Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com> Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>	2017-07-02 01:16:31 +02:00
Yuanhan Liu	7bd841b269	vhost: fix use after free A "return" is missing on error, which could lead to a "use after free" issue (about var "conn"). Coverity issue: 143476 Fixes: `65388b43f5` ("vhost: fix fd leaks for vhost-user server mode") Reported-by: John McNamara <john.mcnamara@intel.com> Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2017-04-19 10:49:06 +02:00
Yuanhan Liu	af14759181	vhost: introduce API to start a specific driver We used to use rte_vhost_driver_session_start() to trigger the vhost-user session. It takes no argument, thus it's a global trigger. And it could be problematic. The issue is, currently, rte_vhost_driver_register(path, flags) actually tries to put it into the session loop (by fdset_add). However, it needs a set of APIs to set a vhost-user driver properly: * rte_vhost_driver_register(path, flags); * rte_vhost_driver_set_features(path, features); * rte_vhost_driver_callback_register(path, vhost_device_ops); If a new vhost-user driver is registered after the trigger (think OVS-DPDK that could add a port dynamically from cmdline), the current code will effectively starts the session for the new driver just after the first API rte_vhost_driver_register() is invoked, leaving later calls taking no effect at all. To handle the case properly, this patch introduce a new API, rte_vhost_driver_start(path), to trigger a specific vhost-user driver. To do that, the rte_vhost_driver_register(path, flags) is simplified to create the socket only and let rte_vhost_driver_start(path) to actually put it into the session loop. Meanwhile, the rte_vhost_driver_session_start is removed: we could hide the session thread internally (create the thread if it has not been created). This would also simplify the application. NOTE: the API order in prog guide is slightly adjusted for showing the correct invoke order. Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2017-04-01 10:42:44 +02:00
Yuanhan Liu	7c12903746	vhost: rename device ops struct rename "virtio_net_device_ops" to "vhost_device_ops", to not let it be virtio-net specific. Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2017-04-01 10:42:44 +02:00
Yuanhan Liu	93433b639d	vhost: make notify ops per vhost driver Assume there is an application both support vhost-user net and vhost-user scsi, the callback should be different. Making notify ops per vhost driver allow application define different set of callbacks for different driver. Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2017-04-01 10:40:13 +02:00
Yuanhan Liu	0917f9d1f0	vhost: use new APIs to handle features Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2017-04-01 10:40:13 +02:00
Yuanhan Liu	5fbb3941da	vhost: introduce driver features related APIs Introduce few APIs to set/get/enable/disable driver features. Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2017-04-01 10:40:13 +02:00
Yuanhan Liu	65388b43f5	vhost: fix fd leaks for vhost-user server mode A vhost-user server socket could have many connections, thus many connfd. However, we currently just use one single int var to store it. Meaning, it will get overwritten every time a new connection is created. While this will not create fatal issue as it sounds (since the correct connfd is closured to the event loop thread by fdset_add), it may cause fd leaks if a user invokes rte_vhost_driver_unregister before shutting down all connections: it just closes the recent connfd. A simple example that should be able to reproduce this leaks issues is, del the ovs vhost-user port while the connected VMs are still alive. (Note that it's suggested to use one socket for one VM, which makes the issue not that fatal as it sounds again). Since we already use a struct "vhost_user_connection" to track all info about one connection, it's obvious that we should put the connfd there. Then we could build a connection list inside the vhost_user_socket struct, to represent all connections belong that socket file. Fixes: `164fd39678` ("vhost: fix unregistering in client mode") Cc: stable@dpdk.org Cc: Ilya Maximets <i.maximets@samsung.com> Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>	2017-04-01 10:36:17 +02:00
Ilya Maximets	29b851e8de	vhost: change log levels in client mode Inability to connect to socket is a normal situation in client mode because, in common case, server isn't started yet. RTE_LOG_WARNING should be suitable for the case of some unusual errors. Message about reconnection is not an error at all. Fixes: `e623e0c6d8` ("vhost: add reconnect ability") Cc: stable@dpdk.org Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com> Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>	2017-04-01 08:58:54 +02:00
Yuanhan Liu	54524e0391	vhost: fix use after free Fix the coverity USE_AFTER_FREE issue. Coverity issue: 137884 Fixes: `a277c71598` ("vhost: refactor code structure") Reported-by: John McNamara <john.mcnamara@intel.com> Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>	2016-10-26 13:39:09 +02:00
Yuanhan Liu	9ba1e744ab	vhost: add a flag to enable dequeue zero copy Dequeue zero copy is disabled by default. Here add a new flag ``RTE_VHOST_USER_DEQUEUE_ZERO_COPY`` to explictily enable it. Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com> Tested-by: Qian Xu <qian.q.xu@intel.com>	2016-10-13 10:29:20 +02:00
Yuanhan Liu	a277c71598	vhost: refactor code structure The code structure is a bit messy now. For example, vhost-user message handling is spread to three different files: vhost-net-user.c virtio-net.c virtio-net-user.c Where, vhost-net-user.c is the entrance to handle all those messages and then invoke the right method for a specific message. Some of them are stored at virtio-net.c, while others are stored at virtio-net-user.c. The truth is all of them should be in one file, vhost_user.c. So this patch refactors the source code structure: mainly on renaming files and moving code from one file to another file that is more suitable for storing it. Thus, no functional changes are made. After the refactor, the code structure becomes to: - socket.c handles all vhost-user socket file related stuff, such as, socket file creation for server mode, reconnection for client mode. - vhost.c mainly on stuff like vhost device creation/destroy/reset. Most of the vhost API implementation are there, too. - vhost_user.c all stuff about vhost-user messages handling goes there. - virtio_net.c all stuff about virtio-net should go there. It has virtio net Rx/Tx implementation only so far: it's just a rename from vhost_rxtx.c Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2016-09-13 05:25:08 +02:00

39 Commits