numam-dpdk

Author	SHA1	Message	Date
Yuanhan Liu	b8b992e93f	vhost: fix long stall of negotiation Setting up the mapping from GPA (guest physical address) to HPA (guest physical address) could be very time consuming when the guest memory is backened with small pages (4K). The bigger the guest memory, the longer it takes. This could lead a very long vhost-user negotiation. Since the mapping is only needed in zero copy mode so far, we could avoid such time consuming settup when zero copy is turned off (which is the default case). It's actually a workaround, a right fix might be to start a new thread, and hide the big latency there. Fixes: `e246896178` ("vhost: get guest/host physical address mappings") Cc: stable@dpdk.org Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2017-01-28 14:25:40 +01:00
Yuanhan Liu	cc7301908c	vhost: fix dead loop in enqueue path If a malicious guest forges a dead loop desc chain (let desc->next point to itself) and desc->len is zero, this could lead to a dead loop in copy_mbuf_to_desc(following is a simplified code to show this issue clearly): while (mbuf_is_not_totally_consumed) { if (desc_avail == 0) { desc = &descs[desc->next]; desc_avail = desc->len; } COPY(desc, mbuf, desc_avail); } I have actually fixed a same issue before: commit `a436f53ebf` ("vhost: avoid dead loop chain"); it fixes the dequeue path though, leaving the enqueue path still vulnerable. The fix is the same. Add a var nr_desc to avoid the dead loop. Fixes: `f1a519ad98` ("vhost: fix enqueue/dequeue to handle chained vring descriptors") Cc: stable@dpdk.org Reported-by: Xieming Katty <katty.xieming@huawei.com> Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2017-01-28 14:25:23 +01:00
Jan Wickbom	59317cef24	vhost: allow many vhost-user ports Currently select() is used to monitor file descriptors for vhostuser ports. This limits the number of ports possible to create since the fd number is used as index in the fd_set and we have seen fds > 1023. This patch changes select() to poll(). This way we can keep an packed (pollfd) array for the fds, e.g. as many fds as the size of the array. Also see: http://dpdk.org/ml/archives/dev/2016-April/037024.html Reported-by: Patrik Andersson <patrik.r.andersson@ericsson.com> Signed-off-by: Jan Wickbom <jan.wickbom@ericsson.com> Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>	2017-01-17 09:20:18 +01:00
Maxime Coquelin	73c8f9f69c	vhost: introduce reply ack feature REPLY_ACK features provide a generic way for QEMU to ensure both completion and success of a request. As described in vhost-user spec in QEMU repository, QEMU sets VHOST_USER_NEED_REPLY flag (bit 3) when expecting a reply_ack from the backend. Backend must reply with 0 for success or non-zero otherwise when flag is set. Currently, only VHOST_USER_SET_MEM_TABLE request implements reply_ack, in order to synchronize mapping updates. This patch enables REPLY_ACK feature generally, but only checks error code for VHOST_USER_SET_MEM_TABLE. Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>	2017-01-17 09:20:18 +01:00
Yong Wang	5731d6acc7	vhost: fix memory leak In function vhost_new_device(), current code dose not free 'dev' in "i == MAX_VHOST_DEVICE" condition statements. It will lead to a memory leak. Fixes: `45ca9c6f7b` ("vhost: get rid of linked list for devices") Cc: stable@dpdk.org Signed-off-by: Yong Wang <wang.yong19@zte.com.cn> Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>	2017-01-17 09:20:17 +01:00
Haifeng Lin	8c33fc10f6	vhost: fix guest/host physical address mapping When reg_size < page_size the function read in rte_mem_virt2phy would not return, because host_user_addr is invalid. Fixes: `e246896178` ("vhost: get guest/host physical address mappings") Cc: stable@dpdk.org Signed-off-by: Haifeng Lin <haifeng.lin@huawei.com> Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>	2017-01-17 09:20:17 +01:00
Yuanhan Liu	164a601b52	vhost: remove references to vhost-cuse vhost-cuse is removed, update corresponding comments that are still referencing it. Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Acked-by: John McNamara <john.mcnamara@intel.com>	2016-11-07 15:59:21 +01:00
Maxime Coquelin	2a51b1091c	vhost: support indirect descriptor in non-mergeable Rx Linux virtio-net kernel driver uses indirect descriptors when mergeable buffers are not used. This patch adds its support, fixing the use of indirect descriptors with these guests. Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>	2016-10-26 13:39:09 +02:00
Maxime Coquelin	1be4ebb1c4	vhost: support indirect descriptor in mergeable Rx Windows virtio-net driver uses indirect descriptors with mergeable buffers. This patch adds its support, fixing the use of indirect descriptors with these guests. Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>	2016-10-26 13:39:09 +02:00
Yuanhan Liu	54524e0391	vhost: fix use after free Fix the coverity USE_AFTER_FREE issue. Coverity issue: 137884 Fixes: `a277c71598` ("vhost: refactor code structure") Reported-by: John McNamara <john.mcnamara@intel.com> Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>	2016-10-26 13:39:09 +02:00
Yuanhan Liu	f2f5bc8f30	vhost: retrieve available head once There is no need to retrieve the latest avail head every time we enqueue a packet in the mereable Rx path by avail_idx = ((volatile uint16_t )&vq->avail->idx); Instead, we could just retrieve it once at the beginning of the enqueue path. This could diminish the cache penalty slightly, because the virtio driver could be updating it while vhost is reading it (for each packet). Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Signed-off-by: Zhihong Wang <zhihong.wang@intel.com> Reviewed-by: Jianbo Liu <jianbo.liu@linaro.org> Tested-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2016-10-26 13:39:09 +02:00
Yuanhan Liu	45847f015d	vhost: prefetch available ring Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Signed-off-by: Zhihong Wang <zhihong.wang@intel.com> Reviewed-by: Jianbo Liu <jianbo.liu@linaro.org> Tested-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2016-10-26 13:39:09 +02:00
Zhihong Wang	f689586bc0	vhost: shadow used ring update The basic idea is to shadow the used ring update: update them into a local buffer first, and then flush them all to the virtio used vring at once in the end. And since we do avail ring reservation before enqueuing data, we would know which and how many descs will be used. Which means we could update the shadow used ring at the reservation time. It also introduce another slight advantage: we don't need access the desc->flag any more inside copy_mbuf_to_desc_mergeable(). Signed-off-by: Zhihong Wang <zhihong.wang@intel.com> Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Reviewed-by: Jianbo Liu <jianbo.liu@linaro.org> Tested-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2016-10-26 13:39:09 +02:00
Yuanhan Liu	fcdbe1fe1a	vhost: use last available index for ring reservation shadow_used_ring will be introduced later. Since then last avail idx will not be updated together with last used idx. So, here we use last_avail_idx for avail ring reservation. Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Signed-off-by: Zhihong Wang <zhihong.wang@intel.com> Reviewed-by: Jianbo Liu <jianbo.liu@linaro.org> Tested-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2016-10-26 13:39:09 +02:00
Yuanhan Liu	3f9e48f7da	vhost: simplify mergeable Rx vring reservation Let it return "num_buffers" we reserved, so that we could re-use it with copy_mbuf_to_desc_mergeable() directly, instead of calculating it again there. Meanwhile, the return type of copy_mbuf_to_desc_mergeable is changed to "int". -1 will be return on error. Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Signed-off-by: Zhihong Wang <zhihong.wang@intel.com> Reviewed-by: Jianbo Liu <jianbo.liu@linaro.org> Tested-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2016-10-26 13:39:09 +02:00
Zhihong Wang	e7ef688562	vhost: optimize cache access This patch reorders the code to delay virtio header write to improve cache access efficiency for cases where the mrg_rxbuf feature is turned on. CPU pipeline stall cycles can be significantly reduced. Virtio header write and mbuf data copy are all remote store operations which takes a long time to finish. It's a good idea to put them together to remove bubbles in between, to let as many remote store instructions as possible go into store buffer at the same time to hide latency, and to let the H/W prefetcher goes to work as early as possible. On a Haswell machine, about 100 cycles can be saved per packet by this patch alone. Taking 64B packets traffic for example, this means about 60% efficiency improvement for the enqueue operation. Signed-off-by: Zhihong Wang <zhihong.wang@intel.com> Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Reviewed-by: Jianbo Liu <jianbo.liu@linaro.org> Tested-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2016-10-26 13:39:09 +02:00
Zhihong Wang	b24fb48886	vhost: remove useless volatile last_used_idx is a local var, there is no need to decorate it by "volatile". Signed-off-by: Zhihong Wang <zhihong.wang@intel.com> Reviewed-by: Jianbo Liu <jianbo.liu@linaro.org> Tested-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2016-10-26 13:39:09 +02:00
Maxime Coquelin	c843af3aa1	vhost: access header only if offloading is supported If offloading features are not negotiated, parsing the virtio header is not needed. Micro-benchmark with testpmd shows that the gain is +4% with indirect descriptors, +1% when using direct descriptors. Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>	2016-10-26 13:39:09 +02:00
Zhihong Wang	f46f655143	vhost: fix Windows VM hang This patch fixes a Windows VM compatibility issue in DPDK 16.07 vhost code which causes the guest to hang once any packets are enqueued when mrg_rxbuf is turned on by setting the right id and len in the used ring. As defined in virtio spec 0.95 and 1.0, in each used ring element, id means index of start of used descriptor chain, and len means total length of the descriptor chain which was written to. While in 16.07 code, index of the last descriptor is assigned to id, and the length of the last descriptor is assigned to len. How to test? 1. Start testpmd in the host with a vhost port. 2. Start a Windows VM image with qemu and connect to the vhost port. 3. Start io forwarding with tx_first in host testpmd. For 16.07 code, the Windows VM will hang once any packets are enqueued. Signed-off-by: Zhihong Wang <zhihong.wang@intel.com> Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>	2016-10-13 10:29:31 +02:00
Yuanhan Liu	9ba1e744ab	vhost: add a flag to enable dequeue zero copy Dequeue zero copy is disabled by default. Here add a new flag ``RTE_VHOST_USER_DEQUEUE_ZERO_COPY`` to explictily enable it. Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com> Tested-by: Qian Xu <qian.q.xu@intel.com>	2016-10-13 10:29:20 +02:00
Yuanhan Liu	b0a985d1f3	vhost: add dequeue zero copy The basic idea of dequeue zero copy is, instead of copying data from the desc buf, here we let the mbuf reference the desc buf addr directly. Doing so, however, has one major issue: we can't update the used ring at the end of rte_vhost_dequeue_burst. Because we don't do the copy here, an update of the used ring would let the driver to reclaim the desc buf. As a result, DPDK might reference a stale memory region. To update the used ring properly, this patch does several tricks: - when mbuf references a desc buf, refcnt is added by 1. This is to pin lock the mbuf, so that a mbuf free from the DPDK won't actually free it, instead, refcnt is subtracted by 1. - We chain all those mbuf together (by tailq) And we check it every time on the rte_vhost_dequeue_burst entrance, to see if the mbuf is freed (when refcnt equals to 1). If that happens, it means we are the last user of this mbuf and we are safe to update the used ring. - "struct zcopy_mbuf" is introduced, to associate an mbuf with the right desc idx. Dequeue zero copy is introduced for performance reason, and some rough tests show about 50% perfomance boost for packet size 1500B. For small packets, (e.g. 64B), it actually slows a bit down (well, it could up to 15%). That is expected because this patch introduces some extra works, and it outweighs the benefit from saving few bytes copy. Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com> Tested-by: Qian Xu <qian.q.xu@intel.com>	2016-10-12 09:45:14 +02:00
Yuanhan Liu	f6be82d725	vhost: introduce last available index for dequeue So far, we retrieve both the used ring and avail ring idx by the var last_used_idx; it won't be a problem because the used ring is updated immediately after those avail entries are consumed. But that's not true when dequeue zero copy is enabled, that used ring is updated only when the mbuf is consumed. Thus, we need use another var to note the last avail ring idx we have consumed. Therefore, last_avail_idx is introduced. Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com> Tested-by: Qian Xu <qian.q.xu@intel.com>	2016-10-12 09:45:12 +02:00
Yuanhan Liu	e246896178	vhost: get guest/host physical address mappings So that we can convert a guest physical address to host physical address, which will be used in later Tx zero copy implementation. MAP_POPULATE is set while mmaping guest memory regions, to make sure the page tables are setup and then rte_mem_virt2phy() could yield proper physical address. Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com> Tested-by: Qian Xu <qian.q.xu@intel.com>	2016-10-12 09:45:09 +02:00
Yuanhan Liu	552e8fd3d2	vhost: simplify memory regions handling Due to history reason (that vhost-cuse comes before vhost-user), some fields for maintaining the vhost-user memory mappings (such as mmapped address and size, with those we then can unmap on destroy) are kept in "orig_region_map" struct, a structure that is defined only in vhost-user source file. The right way to go is to remove the structure and move all those fields into virtio_memory_region struct. But we simply can't do that before, because it breaks the ABI. Now, thanks to the ABI refactoring, it's never been a blocking issue any more. And here it goes: this patch removes orig_region_map and redefines virtio_memory_region, to include all necessary info. With that, we can simplify the guest/host address convert a bit. Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com> Tested-by: Qian Xu <qian.q.xu@intel.com>	2016-10-12 09:44:56 +02:00
Maxime Coquelin	2304dd73d2	vhost: support indirect Tx descriptors Indirect descriptors are usually supported by virtio-net devices, allowing to dispatch a larger number of requests. When the virtio device sends a packet using indirect descriptors, only one slot is used in the ring, even for large packets. The main effect is to improve the 0% packet loss benchmark. A PVP benchmark using Moongen (64 bytes) on the TE, and testpmd (fwd io for host, macswap for VM) on DUT shows a +50% gain for zero loss. On the downside, micro-benchmark using testpmd txonly in VM and rxonly on host shows a loss between 1 and 4%. But depending on the needs, feature can be disabled at VM boot time by passing indirect_desc=off argument to vhost-user device in Qemu. Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>	2016-09-28 02:18:33 +02:00
Matthias Gatto	255c4829ad	vhost: remove obsolete comment As new_device and destroy_device use an int instead of a "struct virtio_net *", The comment about setting VIRTIO_DEV_RUNNING doesn't make sense anymore, plus If I've correctly understand the code, the drivers take care of setting the flag before calling the callbacks, so I guess that this comment is obsolet and I've remove it. Signed-off-by: Matthias Gatto <matthias.gatto@outscale.com> Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>	2016-09-13 05:25:09 +02:00
Yuanhan Liu	484e42d46f	vhost: simplify features set/get No need to use a pointer to store/retrieve features. Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2016-09-13 05:25:09 +02:00
Yuanhan Liu	bbd7e83520	vhost: get device once Invoke get_device() at the beginning of vhost_user_msg_handler, so that we could check the return value once. Which could save tons of duplicate get-and-check device. Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2016-09-13 05:25:08 +02:00
Yuanhan Liu	fc2a9b5b64	vhost: unify function names Some functions are with prefix "user_", while others with "vhost_". Making them all starting with "vhost_user_" to unify the function names. Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2016-09-13 05:25:08 +02:00
Yuanhan Liu	de854fd95d	vhost: fold common message handlers Due to history reason (that we have 2 vhost implementations), some messages are handled in two calls: vhost specific implementation handles it first and then invoke the common one to do another handling. We have one implementation only now, we could write one method for each message. Here fold those common handles to corresponding vhost user handler. Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2016-09-13 05:25:08 +02:00
Yuanhan Liu	a277c71598	vhost: refactor code structure The code structure is a bit messy now. For example, vhost-user message handling is spread to three different files: vhost-net-user.c virtio-net.c virtio-net-user.c Where, vhost-net-user.c is the entrance to handle all those messages and then invoke the right method for a specific message. Some of them are stored at virtio-net.c, while others are stored at virtio-net-user.c. The truth is all of them should be in one file, vhost_user.c. So this patch refactors the source code structure: mainly on renaming files and moving code from one file to another file that is more suitable for storing it. Thus, no functional changes are made. After the refactor, the code structure becomes to: - socket.c handles all vhost-user socket file related stuff, such as, socket file creation for server mode, reconnection for client mode. - vhost.c mainly on stuff like vhost device creation/destroy/reset. Most of the vhost API implementation are there, too. - vhost_user.c all stuff about vhost-user messages handling goes there. - virtio_net.c all stuff about virtio-net should go there. It has virtio net Rx/Tx implementation only so far: it's just a rename from vhost_rxtx.c Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2016-09-13 05:25:08 +02:00
Yuanhan Liu	8394025f54	vhost: remove sub-directory We now have one vhost implementation; no sub source dir is needed. Remove it by move them to upper dir. Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2016-09-13 05:25:08 +02:00
Yuanhan Liu	466d914b01	vhost: remove vhost-cuse remove vhost-cuse code, including the eventfd_link kernel module that is for vhost-cuse only. The lib/virt/qemu-wrap.py is also removed, as it's mainly for vhost-cuse usage. As we have one vhost implementation now, one vhost config option is needed only. Thus, CONFIG_RTE_LIBRTE_VHOST_USER is removed. Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2016-09-13 05:25:08 +02:00
Maxime Coquelin	b7aabd7a87	vhost: fix off-by-one error on descriptor number check nr_desc is not an index but the number of descriptors, so can be equal to the virtqueue size. Fixes: `a436f53ebf` ("vhost: avoid dead loop chain") Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>	2016-07-25 17:55:12 +02:00
Ilya Maximets	164fd39678	vhost: fix unregistering in client mode Currently while calling of 'rte_vhost_driver_unregister()' connection to QEMU will not be closed. This leads to inability to register driver again and reconnect to same virtual machine. This scenario is reproducible with OVS. While executing of the following command vhost port will be re-created (will be executed 'rte_vhost_driver_register()' followed by 'rte_vhost_driver_unregister()') network will be broken and QEMU possibly will crash: ovs-vsctl set Interface vhost1 ofport_request=15 Fix this by closing all established connections on driver unregister and removing of pending connections from reconnection list. Fixes: `64ab701c3d` ("vhost: add vhost-user client mode") Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>	2016-07-22 00:23:58 +02:00
Ilya Maximets	81b5d22f1d	vhost: fix connect hang in client mode If something abnormal happened to QEMU, 'connect()' can block calling thread (e.g. main thread of OVS) forever or for a really long time. This can break whole application or block the reconnection thread. Example with OVS: ovs_rcu(urcu2)\|WARN\|blocked 512000 ms waiting for main to quiesce (gdb) bt #0 connect () from /lib64/libpthread.so.0 #1 vhost_user_create_client (vsocket=0xa816e0) #2 rte_vhost_driver_register #3 netdev_dpdk_vhost_user_construct #4 netdev_open (name=0xa664b0 "vhost1") [...] #11 main Fix that by setting non-blocking mode for client sockets for connection. Fixes: `64ab701c3d` ("vhost: add vhost-user client mode") Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>	2016-07-22 00:21:51 +02:00
Patrik Andersson	acbff5c67e	vhost: fix crash when exceeding file descriptors Protect against DPDK crash when allocation of listen fd >= 1023. For events on fd:s >1023, the current implementation will trigger an abort due to access outside of allocated bit mask. Corrections would include: * Match fdset_add() signature in fd_man.c to fd_man.h * Handling of return codes from fdset_add() * Addition of check of fd number in fdset_add_fd() The rationale behind the suggested code change is that, fdset_event_dispatch() could attempt access outside of the FD_SET bitmask if there is an event on a file descriptor that in turn looks up a virtio file descriptor with a value > 1023. Such an attempt will lead to an abort() and a restart of any vswitch using DPDK. A discussion topic exist in the ovs-discuss mailing list that can provide a little more background: http://openvswitch.org/pipermail/discuss/2016-February/020243.html Fixes: `8f972312` ("vhost: support vhost-user") Signed-off-by: Patrik Andersson <patrik.r.andersson@ericsson.com> Signed-off-by: Christian Ehrhardt <christian.ehrhardt@canonical.com> Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>	2016-07-15 22:12:38 +02:00
Ilya Maximets	ebd02c060a	vhost: check ring descriptor address In current implementation vhost will crash with segmentation fault if malicious or buggy virtio application breaks addresses of descriptors. Before commit `0823c1cb0a` ("vhost: workaround stale vring base") this crash was reproducible even with normal DPDK application that tries to change number of virtqueues dynamically inside VM. Fix that by checking addresses of descriptors before using. Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>	2016-07-15 22:12:37 +02:00
Ilya Maximets	8729de6092	vhost: fix used descriptors number of mergeable enqueue Return value on error changed from '-1' to '0' because it returns unsigned value and it means number of used descriptors. Also fixed updating of 'last_used_idx' by using actual number of used descriptors. Fixes: `623bc47054` ("vhost: do sanity check for ring descriptor length") Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>	2016-07-15 22:12:37 +02:00
Yuanhan Liu	9d8365874e	vhost: fix potential null pointer dereference Fix the potential NULL pointer dereference issue raised by Coverity. 578 reconn = malloc(sizeof(*reconn)); >>> CID 127481: Null pointer dereferences (NULL_RETURNS) >>> Dereferencing a null pointer "reconn". 579 reconn->un = un; Coverity issue: 127481 Fixes: `e623e0c6d8` ("vhost: add reconnect ability") Reported-by: John McNamara <john.mcnamara@intel.com> Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>	2016-07-04 04:08:41 +02:00
Yuanhan Liu	f80c3fd3b4	vhost: fix not null terminated string Fix an issue raised by Coverity. >>> CID 127475: Memory - illegal accesses (BUFFER_SIZE_WARNING) >>> Calling strncpy with a maximum size argument of 108 bytes on >>> destination array "un->sun_path" of size 108 bytes might leave >>> the destination string unterminated. 441 strncpy(un->sun_path, path, sizeof(un->sun_path)); 442 443 return fd; 444 } Coverity issue: 127475 Fixes: `64ab701c3d` ("vhost: add vhost-user client mode") Reported-by: John McNamara <john.mcnamara@intel.com> Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>	2016-07-04 04:08:41 +02:00
Yuanhan Liu	532f2ea5f8	vhost: fix memory leak Fix potential memory leak raised by Coverity. >>> Variable "vsocket" going out of scope leaks the storage it >>> points to. Coverity issue: 127483 Fixes: `e623e0c6d8` ("vhost: add reconnect ability") Reported-by: John McNamara <john.mcnamara@intel.com> Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>	2016-07-04 04:08:41 +02:00
Yuanhan Liu	4feff06e50	vhost: fix missing flag reset on stop Commit `550c9d27d1` ("vhost: set/reset device flags internally") moves the VIRTIO_DEV_RUNNING set/reset to vhost lib. But I missed one reset on stop; here fixes it. Fixes: `550c9d27d1` ("vhost: set/reset device flags internally") Reported-by: Ciara Loftus <ciara.loftus@intel.com> Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Tested-by: Ciara Loftus <ciara.loftus@intel.com>	2016-06-30 07:46:29 +02:00
Thomas Monjalon	f8e9cbe2aa	mk: fix internal dependencies Some libraries were missing their dependency on eal, mbuf, mempool, ring and kvargs. It is revealed by the linker option "-z defs". Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>	2016-06-29 13:33:01 +02:00
Huawei Xie	2cdc118eef	vhost: check hugepage fstat error Value returned from fstat is not checked for errors before being used. This patch fixes following coverity issue. static uint64_t get_blk_size(int fd) { struct stat stat; fstat(fd, &stat); return (uint64_t)stat.st_blksize; >>> CID 107103 (#1 of 1): Unchecked return value from library (CHECKED_RETURN) >>> check_return: Calling fstat(fd, &stat) without checking return value. >>> This library function may fail and return an error code. Fixes: `8f972312b8` ("vhost: support vhost-user") Signed-off-by: Huawei Xie <huawei.xie@intel.com> Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>	2016-06-22 09:47:12 +02:00
Ilya Maximets	428261b461	vhost: unmap log memory on cleanup Fixes memory leak on QEMU migration. Fixes: `54f9e32305` ("vhost: handle dirty pages logging request") Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>	2016-06-22 09:47:12 +02:00
Ilya Maximets	53af5b1e0a	vhost: fix leak of file descriptors While migration of vhost-user device QEMU allocates memfd to store information about dirty pages and sends fd to vhost-user process. File descriptor for this memory should be closed to prevent "Too many open files" error for vhost-user process after some amount of migrations. Ex.: # ls /proc/<ovs-vswitchd pid>/fd/ -alh total 0 root qemu . root qemu .. root qemu 0 -> /dev/pts/0 root qemu 1 -> pipe:[1804353] root qemu 10 -> socket:[1782240] root qemu 100 -> /memfd:vhost-log (deleted) root qemu 1000 -> /memfd:vhost-log (deleted) root qemu 1001 -> /memfd:vhost-log (deleted) root qemu 1004 -> /memfd:vhost-log (deleted) [...] root qemu 996 -> /memfd:vhost-log (deleted) root qemu 997 -> /memfd:vhost-log (deleted) ovs-vswitchd.log: \|WARN\|punix:ovs-vswitchd.ctl: accept failed: Too many open files Fixes: `54f9e32305` ("vhost: handle dirty pages logging request") Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>	2016-06-22 09:47:12 +02:00
Marcin Kerlin	b497724652	vhost: fix null pointer dereference Return value of function get_device() is not checking before dereference. Fix this problem by adding checking condition. Coverity issue: 119262 Fixes: `77d20126b4` ("vhost-user: handle message to enable vring") Signed-off-by: Marcin Kerlin <marcinx.kerlin@intel.com> Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>	2016-06-22 09:47:12 +02:00
Huawei Xie	39449e7429	vhost: remove concurrent enqueue All other DPDK PMDs doesn't support concurrent receiving or sending packets to the same queue. The upper application should deal with this, normally through queue and core bindings. Due to historical reason, vhost internally supports concurrent lockless enqueuing packets to the same virtio queue through costly cmpset operation. This patch removes this internal lockless implementation and should improve performance a bit. Luckily DPDK OVS doesn't rely on this behavior. Signed-off-by: Huawei Xie <huawei.xie@intel.com> Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>	2016-06-22 09:47:12 +02:00
Yuanhan Liu	a66bcad322	vhost: arrange struct fields for better cache sharing The ifname[] field takes so much space, that it seperates some frequently used fields into different caches, say, features and broadcast_rarp. This patch moves all those fields that will be accessed frequently in Rx/Tx together (before the ifname[] field) to let them share one cache line. Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Acked-by: Huawei Xie <huawei.xie@intel.com> Tested-by: Rich Lane <rich.lane@bigswitch.com>	2016-06-22 09:47:12 +02:00

1 2 3 4 5

213 Commits