numam-dpdk

Author	SHA1	Message	Date
Victor Kaplansky	a368804699	vhost: protect active rings from async ring changes When performing live migration or memory hot-plugging, the changes to the device and vrings made by message handler done independently from vring usage by PMD threads. This causes for example segfaults during live-migration with MQ enable, but in general virtually any request sent by qemu changing the state of device can cause problems. These patches fixes all above issues by adding a spinlock to every vring and requiring message handler to start operation only after ensuring that all PMD threads related to the device are out of critical section accessing the vring data. Each vring has its own lock in order to not create contention between PMD threads of different vrings and to prevent performance degradation by scaling queue pair number. See https://bugzilla.redhat.com/show_bug.cgi?id=1450680 Cc: stable@dpdk.org Signed-off-by: Victor Kaplansky <victork@redhat.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com> Acked-by: Yuanhan Liu <yliu@fridaylinux.org>	2018-01-21 15:51:52 +01:00
Junjie Chen	2651726def	vhost: do deep copy while reallocating queue When vhost reallocate dev and vq for NUMA enabled case, it doesn't perform deep copy, which lead to 1) zmbuf list not valid 2) remote memory access. This patch is to re-initlize the zmbuf list and also do the deep copy. Signed-off-by: Junjie Chen <junjie.j.chen@intel.com> Reviewed-by: Zhiyong Yang <zhiyong.yang@intel.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com> Acked-by: Yuanhan Liu <yliu@fridaylinux.org>	2018-01-21 15:51:52 +01:00
Maxime Coquelin	e291093235	vhost: destroy unused virtqueues when multiqueue not negotiated QEMU sends VHOST_USER_SET_VRING_CALL requests for all queues declared in QEMU command line before the guest is started. It has the effect in DPDK vhost-user backend to allocate vrings for all queues declared by QEMU. If the first driver being used does not support multiqueue, the device never changes to VIRTIO_DEV_RUNNING state as only the first queue pair is initialized. One driver impacted by this bug is virtio-net's iPXE driver which does not support VIRTIO_NET_F_MQ feature. It is safe to destroy unused virtqueues in SET_FEATURES request handler, as it is ensured the device is not in running state at this stage, so virtqueues aren't being processed. Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> Acked-by: Laszlo Ersek <lersek@redhat.com> Acked-by: Yuanhan Liu <yliu@fridaylinux.org>	2018-01-16 18:47:49 +01:00
Maxime Coquelin	59fe5e17d9	vhost: propagate set features handling error Not propagating VHOST_USER_SET_FEATURES request handling error may result in unpredictable behavior, as host and guests features may no more be synchronized. This patch fixes this by reporting the error to the upper layer, which would result in the device being destroyed and the connection with the master to be closed. Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> Acked-by: Laszlo Ersek <lersek@redhat.com> Acked-by: Yuanhan Liu <yliu@fridaylinux.org>	2018-01-16 18:47:49 +01:00
Maxime Coquelin	07f8db29b8	vhost: prevent features to be changed while device is running As section 2.2 of the Virtio spec states about features negotiation: "During device initialization, the driver reads this and tells the device the subset that it accepts. The only way to renegotiate is to reset the device." This patch implements a check to prevent illegal features change while the device is running. One exception is the VHOST_F_LOG_ALL feature bit, which is enabled when live-migration is initiated. But this feature is not negotiated with the Virtio driver, but directly with the Vhost master. Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> Acked-by: Laszlo Ersek <lersek@redhat.com> Acked-by: Yuanhan Liu <yliu@fridaylinux.org>	2018-01-16 18:47:49 +01:00
Jianfeng Tan	cab278dee9	vhost: fix crash In a running VM, operations (like device attach/detach) will trigger the QEMU to resend set_mem_table to vhost-user backend. DPDK vhost-user handles this message rudely by unmap all existing regions and map new ones. This might lead to segfault if there is pmd thread just trying to touch those unmapped memory regions. But for most cases, except VM memory hotplug, QEMU still sends the set_mem_table message even the memory regions are not changed as QEMU vhost-user filters out those not backed by file (fd > 0). To fix this case, we add a check in the handler to see if the memory regions are really changed; if not, we just keep old memory regions. Fixes: 8f972312b8f4 ("vhost: support vhost-user") CC: stable@dpdk.org Reported-by: Yang Zhang <zy107165@alibaba-inc.com> Reported-by: Xin Long <longxin.xl@alibaba-inc.com> Signed-off-by: Yi Yang <yi.y.yang@intel.com> Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com> Acked-by: Yuanhan Liu <yliu@fridaylinux.org>	2018-01-16 18:47:49 +01:00
Bruce Richardson	369991d997	lib: use SPDX tag for Intel copyright files Replace the BSD license header with the SPDX tag for files with only an Intel copyright on them. Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>	2018-01-04 22:41:39 +01:00
Maxime Coquelin	6ea069651e	vhost: disable reply-ack feature if IOMMU disabled If the application has disabled VIRTIO_F_IOMMU_PLATFORM, disable VHOST_USER_PROTOCOL_F_REPLY_ACK protocol feature that is only mandatory with IOMMU for now. This is done to provide a way for the application to support multiqueue with old Qemu versions (v2.7.0 to v2.9.0) that have reply-ack feature broken. Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> Acked-by: Yuanhan Liu <yliu@fridaylinux.org> Tested-by: Mark Kavanagh <mark.b.kavanagh@intel.com> Acked-by: Mark Kavanagh <mark.b.kavanagh@intel.com>	2017-11-07 14:13:47 +01:00
Maxime Coquelin	5a4933e56b	vhost: postpone ring address translations at kick time only If multiple queue pairs are created but all are not used, the device is never started, as unused queues aren't enabled and their ring addresses aren't translated. The device is changed to running state when all rings addresses are translated. This patch fixes this by postponning rings addresses translation at kick time unconditionnaly, VHOST_USER_F_PROTOCOL_FEATURES being negotiated or not. Reported-by: Lei Yao <lei.a.yao@intel.com> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> Tested-by: Lei Yao <lei.a.yao@intel.com> Acked-by: Yuanhan Liu <yliu@fridaylinux.org>	2017-11-07 02:33:05 +01:00
Thomas Monjalon	62196f4e09	mem: rename address mapping function to IOVA The function rte_mem_virt2phy() is kept and used in functions which works only with physical addresses. For all other calls this function is replaced by rte_mem_virt2iova() which does a direct mapping (no conversion) in the VA case. Note: the new function rte_mem_virt2iova() function matches the behaviour implemented in rte_mem_virt2phy() by the commit 680f6c12600f ("mem: honor IOVA mode in virt2phy") Signed-off-by: Thomas Monjalon <thomas@monjalon.net> Acked-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>	2017-11-06 22:24:19 +01:00
Maxime Coquelin	5cd690e4fd	vhost: fix vring addresses not translated Commit 3ea7052f4b1b ("vhost: postpone rings addresses translation") moves rings addresses translation at either vring kick or enable time, depending on whether protocol features are enabled or not. This is done not interpret ring information as long as the vring is not fully initialized. The problem is that with old QEMU versions, like v2.5, the ring is enabled before addresses are sent, so addresses are never translated. This patch fixes the issue by doing the translation in VHOST_USER_SET_VRING_ADDR handling if ring is already enabled. Fixes: 3ea7052f4b1b ("vhost: postpone rings addresses translation") Reported-by: Lei Yao <lei.a.yao@intel.com> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> Acked-by: Yuanhan Liu <yliu@fridaylinux.org>	2017-10-24 21:26:10 +02:00
Maxime Coquelin	86fe881c03	vhost: fetch ring address after NUMA reallocation In case of NUMA reallocation, the virtqueue struct is reallocated on another socket, meaning that its address changes. In translate_ring_addresses(), addr pointer was not fetched again after the reallocation, so it pointed to freed memory. This patch just fetch again addr pointer after the reallocation. Reported-by: Lei Yao <lei.a.yao@intel.com> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> Tested-by: Lei Yao <lei.a.yao@intel.com> Reviewed-by: Jens Freimann <jfreimann@redhat.com>	2017-10-13 22:08:21 +02:00
Maxime Coquelin	b9c07b3141	vhost: fix IOTLB on NUMA realloc In case of NUMA reallocation, virtqueue's iotlb list is broken, has its head changes but first iotlb entry in the list still points to the previous head pointer. Also, in case of reallocation, we want the IOTLB cache mempool to be on the new socket. This patch perform a full re-init of the IOTLB cache when mempool already exists, and calls the IOTLB cache init function in case the virtqueue is being reallocated on a new socket. Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> Reviewed-by: Jens Freimann <jfreimann@redhat.com>	2017-10-13 22:08:21 +02:00
Maxime Coquelin	3494ed045e	vhost: distinguish master and slave requests This patch adds an union in VhostUserMsg to distinguish between master and slave initiated requests, instead of casting slave requests as master request. Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> Acked-by: Yuanhan Liu <yliu@fridaylinux.org>	2017-10-10 15:54:31 +02:00
Maxime Coquelin	36031f80cc	vhost: invalidate vring in case of matching IOTLB invalidate As soon as a page used by a ring is invalidated, the access_ok flag is cleared, so that processing threads try to map them again. Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> Acked-by: Yuanhan Liu <yliu@fridaylinux.org>	2017-10-10 15:52:27 +02:00
Maxime Coquelin	eefac9536a	vhost: postpone device creation until rings are mapped Translating the start addresses of the rings is not enough, we need to be sure all the ring is made available by the guest. It depends on the size of the rings, which is not known on SET_VRING_ADDR reception. Furthermore, we need to be be safe against vring pages invalidates. This patch introduces a new access_ok flag per virtqueue, which is set when all the rings are mapped, and cleared as soon as a page used by a ring is invalidated. The invalidation part is implemented in a following patch. Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> Acked-by: Yuanhan Liu <yliu@fridaylinux.org>	2017-10-10 15:52:27 +02:00
Maxime Coquelin	09927b5249	vhost: translate ring addresses when IOMMU enabled When IOMMU is enabled, the ring addresses set by the VHOST_USER_SET_VRING_ADDR requests are guest's IO virtual addresses, whereas Qemu virtual addresses when IOMMU is disabled. When enabled and the required translation is not in the IOTLB cache, an IOTLB miss request is sent, but being called by the vhost-user socket handling thread, the function does not wait for the requested IOTLB update. The function will be called again on the next IOTLB update message reception if matching the vring addresses. Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> Acked-by: Yuanhan Liu <yliu@fridaylinux.org>	2017-10-10 15:52:27 +02:00
Maxime Coquelin	3ea7052f4b	vhost: postpone rings addresses translation This patch postpones rings addresses translations and checks, as addresses sent by the master shuld not be interpreted as long as ring is not started and enabled[0]. When protocol features aren't negotiated, the ring is started in enabled state, so the addresses translations are postponed to vhost_user_set_vring_kick(). Otherwise, it is postponed to when ring is enabled, in vhost_user_set_vring_enable(). [0]: http://lists.nongnu.org/archive/html/qemu-devel/2017-05/msg04355.html Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> Acked-by: Yuanhan Liu <yliu@fridaylinux.org>	2017-10-10 15:52:27 +02:00
Maxime Coquelin	b0098b5e21	vhost: fix dereferencing invalid pointer after realloc numa_realloc() reallocates the virtio_net device structure and updates the vhost_devices[] table with the new pointer if the rings are allocated different NUMA node. Problem is that vhost_user_msg_handler() still dereferences old pointer afterward. This patch prevents this by fetching again the dev pointer in vhost_devices[] after messages have been handled. Fixes: af295ad4698c ("vhost: realloc device and queues to same numa node as vring desc") Cc: stable@dpdk.org Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> Acked-by: Yuanhan Liu <yliu@fridaylinux.org>	2017-10-10 15:52:27 +02:00
Maxime Coquelin	321203a54b	vhost: enable rings at the right time When VHOST_USER_F_PROTOCOL_FEATURES is negotiated, the ring is not enabled when started, but enabled through dedicated VHOST_USER_SET_VRING_ENABLE request. When not negotiated, the ring is started in enabled state, at VHOST_USER_SET_VRING_KICK request time. Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> Acked-by: Yuanhan Liu <yliu@fridaylinux.org>	2017-10-10 15:52:27 +02:00
Maxime Coquelin	e95f34d380	vhost: handle IOTLB update and invalidate requests Vhost-user device IOTLB protocol extension introduces VHOST_USER_IOTLB message type. The associated payload is the vhost_iotlb_msg struct defined in Kernel, which in this was can be either an IOTLB update or invalidate message. On IOTLB update, the virtqueues get notified of a new entry. Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> Acked-by: Yuanhan Liu <yliu@fridaylinux.org>	2017-10-10 15:52:27 +02:00
Maxime Coquelin	01a4bb55f9	vhost: support IOTLB miss slave requests Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> Acked-by: Yuanhan Liu <yliu@fridaylinux.org>	2017-10-10 15:52:27 +02:00
Maxime Coquelin	275c3f9447	vhost: support slave requests channel Currently, only QEMU sends requests, the backend sends replies. In some cases, the backend may need to send requests to QEMU, like IOTLB miss events when IOMMU is supported. This patch introduces a new channel for such requests. QEMU sends a file descriptor of a new socket using VHOST_USER_SET_SLAVE_REQ_FD. Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> Acked-by: Yuanhan Liu <yliu@fridaylinux.org>	2017-10-10 15:52:27 +02:00
Maxime Coquelin	a0563bd2e3	vhost: prepare for slave requests send_vhost_message() is currently only used to send replies, so it modifies message flags to perpare the reply. With upcoming channel for backend initiated request, this function can be used to send requests. This patch introduces a new send_vhost_reply() that does the message flags modifications, and makes send_vhost_message() generic. Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> Acked-by: Yuanhan Liu <yliu@fridaylinux.org>	2017-10-10 15:52:27 +02:00
Tiwei Bie	e5c494a7a2	vhost: batch small guest memory copies This patch adaptively batches the small guest memory copies. By batching the small copies, the efficiency of executing the memory LOAD instructions can be improved greatly, because the memory LOAD latency can be effectively hidden by the pipeline. We saw great performance boosts for small packets PVP test. This patch improves the performance for small packets, and has distinguished the packets by size. So although the performance for big packets doesn't change, it makes it relatively easy to do some special optimizations for the big packets too. Signed-off-by: Tiwei Bie <tiwei.bie@intel.com> Signed-off-by: Zhihong Wang <zhihong.wang@intel.com> Signed-off-by: Zhiyong Yang <zhiyong.yang@intel.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com> Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com> Acked-by: Yuanhan Liu <yliu@fridaylinux.org>	2017-10-10 15:48:53 +02:00
Dariusz Stojaczyk	058e2d294b	vhost: log error for badly negotiated features Since vhost_user_set_features failure is not handled in any way, a single error log has been added to at least to let the user know that something has gone wrong. Signed-off-by: Dariusz Stojaczyk <dariuszx.stojaczyk@intel.com> Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>	2017-06-16 14:04:25 +02:00
Yuanhan Liu	ebd792b386	vhost: fix crash on NUMA The queue allocation was changed, from allocating one queue-pair at a time to one queue at a time. Most of the changes have been done, but just with one being missed: the size of copying the old queue is still based on queue-pair at numa_realloc(), which leads to overwritten issue. As a result, crash may happen. Fix it by specifying the right copy size. Also, the net queue macros are not used any more. Remove them. Fixes: ab4d7b9f1afc ("vhost: turn queue pair to vring") Cc: stable@dpdk.org Reported-by: Ciara Loftus <ciara.loftus@intel.com> Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Reviewed-by: Jens Freimann <jfreiman@redhat.com> Tested-by: Ciara Loftus <ciara.loftus@intel.com>	2017-06-16 14:04:25 +02:00
Daniel Verkamp	368c6625b6	vhost: access VhostUsrMsg via packed struct Accessing fields of a packed struct through unaligned pointers is undefined behavior. Instead of passing pointers to particular fields, a pointer to the root struct should be used. This patch does exactly that. Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com> Signed-off-by: Dariusz Stojaczyk <dariuszx.stojaczyk@intel.com> Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>	2017-06-16 14:04:25 +02:00
Dariusz Stojaczyk	29c7c2fdaa	vhost: fix guest pages memory leak This patch fixes a memory leak. virtio_net::guest_pages is allocated in vhost_setup_mem_table(), reallocated in add_one_guest_page(), but never freed. Fixes: e246896178e6 ("vhost: get guest/host physical address mappings") Cc: stable@dpdk.org Signed-off-by: Dariusz Stojaczyk <dariuszx.stojaczyk@intel.com> Reviewed-by: Jens Freimann <jfreiman@redhat.com> Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>	2017-06-16 14:04:25 +02:00
Jens Freimann	4cee38a6fc	vhost: check allocation of guest pages When we try to allocate guest pages we need to check the return value of malloc(). Print an error message and return when it fails. Signed-off-by: Jens Freimann <jfreiman@redhat.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com> Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>	2017-06-16 14:04:25 +02:00
Yuanhan Liu	27052cd63f	vhost: do not destroy device on repeat mem table message It doesn't make any sense to invoke destroy_device() callback at while handling SET_MEM_TABLE message. From the vhost-user spec, it's the GET_VRING_BASE message indicates the end of a vhost device: the destroy_device() should be invoked from there (luckily, we already did that). Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2017-04-01 10:42:44 +02:00
Yuanhan Liu	abd53c16b6	vhost: add features changed callback Features could be changed after the feature negotiation. For example, VHOST_F_LOG_ALL will be set/cleared at the start/end of live migration, respecitively. Thus, we need a new callback to inform the application on such change. Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2017-04-01 10:42:44 +02:00
Yuanhan Liu	f53cf83980	vhost: drop the Rx and Tx queue macro They are virtio-net specific and should be defined inside the virtio-net driver. Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2017-04-01 10:42:44 +02:00
Yuanhan Liu	c0674b1bc8	vhost: move the device ready check at proper place Currently, we check vq->desc, vq->kickfd and vq->callfd to know whether a virtio device is ready or not. However, we only do it when handling SET_VRING_KICK message, which could be wrong if a vhost-user frontend send SET_VRING_KICK first and SET_VRING_CALL later. To work for all possible vhost-user frontend implementations, we could move the ready check at the end of vhost-user message handler. Meanwhile, since we do the check more often than before, the "virtio not ready" message is dropped, to not flood the screen. Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2017-04-01 10:42:44 +02:00
Yuanhan Liu	ab4d7b9f1a	vhost: turn queue pair to vring The queue pair is very virtio-net specific, other devices don't have such concept. To make it generic, we should log the number of vrings instead of the number of queue pairs. This patch just does a simple convert, a later patch would export the number of vrings to applications. Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2017-04-01 10:42:44 +02:00
Yuanhan Liu	eb32247457	vhost: export guest memory regions Some vhost-user driver may need this info to setup its own page tables for GPA (guest physical addr) to HPA (host physical addr) translation. SPDK (Storage Performance Development Kit) is one example. Besides, by exporting this memory info, we could also export the gpa_to_vva() as an inline function, which helps for performance. Otherwise, it has to be referenced indirectly by a "vid". Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2017-04-01 10:40:13 +02:00
Yuanhan Liu	93433b639d	vhost: make notify ops per vhost driver Assume there is an application both support vhost-user net and vhost-user scsi, the callback should be different. Making notify ops per vhost driver allow application define different set of callbacks for different driver. Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2017-04-01 10:40:13 +02:00
Yuanhan Liu	0917f9d1f0	vhost: use new APIs to handle features Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2017-04-01 10:40:13 +02:00
Maxime Coquelin	4c5d8459d2	vhost: add new ready status flag This patch adds a new status flag indicating the Virtio device is ready to operate. This is required to be able to call rte_vhost_mtu_get() in the .new_device() callback, as rte_vhost_mtu_get needs that the negotiation is done, but it is too early to rely on running status flag, which is set just after .new_device() returns. Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>	2017-04-01 10:36:17 +02:00
Maxime Coquelin	23f1e756ca	vhost: support MTU protocol feature This patch implements the vhost-user MTU protocol feature support. When VIRTIO_NET_F_MTU is negotiated, QEMU notifies the vhost-user backend with the configured MTU if dedicated protocol feature is supported. The value can be used by the application to ensure consistency with value set by the user. Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>	2017-04-01 10:36:17 +02:00
Yuanhan Liu	160cbc815b	vhost: remove a hack on queue allocation We used to allocate queues based on the index from SET_VRING_CALL request: if corresponding queue hasn't been allocated, allocate it. Though it's pratically right (it's the first per-vring request we will get from QEMU for vhost-user negotiation), but it's not technically right: it's not documented in the vhost-user spec that it will always be the first per-vring request. For example, SET_VRING_ADDR could also be the first per-vring request. Thus, we should not depend the SET_VRING_CALL on queue allocation. Instead, we could catch all the per-vring messages at the entrance of request handler, and allocate one if it hasn't been allocated before. By that, we could remove a hack. Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>	2017-04-01 10:36:06 +02:00
Emmanuel Roullit	68759bbe73	vhost: remove unneeded variable assignment Found with clang static analysis: lib/librte_vhost/vhost_user.c:996:3: warning: Value stored to 'ret' is never read ret = vhost_user_get_vring_base(dev, &msg.payload.state); ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Signed-off-by: Emmanuel Roullit <emmanuel.roullit@gmail.com> Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>	2017-01-30 13:47:20 +01:00
Yuanhan Liu	b8b992e93f	vhost: fix long stall of negotiation Setting up the mapping from GPA (guest physical address) to HPA (guest physical address) could be very time consuming when the guest memory is backened with small pages (4K). The bigger the guest memory, the longer it takes. This could lead a very long vhost-user negotiation. Since the mapping is only needed in zero copy mode so far, we could avoid such time consuming settup when zero copy is turned off (which is the default case). It's actually a workaround, a right fix might be to start a new thread, and hide the big latency there. Fixes: e246896178e6 ("vhost: get guest/host physical address mappings") Cc: stable@dpdk.org Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2017-01-28 14:25:40 +01:00
Maxime Coquelin	73c8f9f69c	vhost: introduce reply ack feature REPLY_ACK features provide a generic way for QEMU to ensure both completion and success of a request. As described in vhost-user spec in QEMU repository, QEMU sets VHOST_USER_NEED_REPLY flag (bit 3) when expecting a reply_ack from the backend. Backend must reply with 0 for success or non-zero otherwise when flag is set. Currently, only VHOST_USER_SET_MEM_TABLE request implements reply_ack, in order to synchronize mapping updates. This patch enables REPLY_ACK feature generally, but only checks error code for VHOST_USER_SET_MEM_TABLE. Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>	2017-01-17 09:20:18 +01:00
Haifeng Lin	8c33fc10f6	vhost: fix guest/host physical address mapping When reg_size < page_size the function read in rte_mem_virt2phy would not return, because host_user_addr is invalid. Fixes: e246896178e6 ("vhost: get guest/host physical address mappings") Cc: stable@dpdk.org Signed-off-by: Haifeng Lin <haifeng.lin@huawei.com> Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>	2017-01-17 09:20:17 +01:00
Zhihong Wang	f689586bc0	vhost: shadow used ring update The basic idea is to shadow the used ring update: update them into a local buffer first, and then flush them all to the virtio used vring at once in the end. And since we do avail ring reservation before enqueuing data, we would know which and how many descs will be used. Which means we could update the shadow used ring at the reservation time. It also introduce another slight advantage: we don't need access the desc->flag any more inside copy_mbuf_to_desc_mergeable(). Signed-off-by: Zhihong Wang <zhihong.wang@intel.com> Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Reviewed-by: Jianbo Liu <jianbo.liu@linaro.org> Tested-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2016-10-26 13:39:09 +02:00
Yuanhan Liu	b0a985d1f3	vhost: add dequeue zero copy The basic idea of dequeue zero copy is, instead of copying data from the desc buf, here we let the mbuf reference the desc buf addr directly. Doing so, however, has one major issue: we can't update the used ring at the end of rte_vhost_dequeue_burst. Because we don't do the copy here, an update of the used ring would let the driver to reclaim the desc buf. As a result, DPDK might reference a stale memory region. To update the used ring properly, this patch does several tricks: - when mbuf references a desc buf, refcnt is added by 1. This is to pin lock the mbuf, so that a mbuf free from the DPDK won't actually free it, instead, refcnt is subtracted by 1. - We chain all those mbuf together (by tailq) And we check it every time on the rte_vhost_dequeue_burst entrance, to see if the mbuf is freed (when refcnt equals to 1). If that happens, it means we are the last user of this mbuf and we are safe to update the used ring. - "struct zcopy_mbuf" is introduced, to associate an mbuf with the right desc idx. Dequeue zero copy is introduced for performance reason, and some rough tests show about 50% perfomance boost for packet size 1500B. For small packets, (e.g. 64B), it actually slows a bit down (well, it could up to 15%). That is expected because this patch introduces some extra works, and it outweighs the benefit from saving few bytes copy. Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com> Tested-by: Qian Xu <qian.q.xu@intel.com>	2016-10-12 09:45:14 +02:00
Yuanhan Liu	f6be82d725	vhost: introduce last available index for dequeue So far, we retrieve both the used ring and avail ring idx by the var last_used_idx; it won't be a problem because the used ring is updated immediately after those avail entries are consumed. But that's not true when dequeue zero copy is enabled, that used ring is updated only when the mbuf is consumed. Thus, we need use another var to note the last avail ring idx we have consumed. Therefore, last_avail_idx is introduced. Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com> Tested-by: Qian Xu <qian.q.xu@intel.com>	2016-10-12 09:45:12 +02:00
Yuanhan Liu	e246896178	vhost: get guest/host physical address mappings So that we can convert a guest physical address to host physical address, which will be used in later Tx zero copy implementation. MAP_POPULATE is set while mmaping guest memory regions, to make sure the page tables are setup and then rte_mem_virt2phy() could yield proper physical address. Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com> Tested-by: Qian Xu <qian.q.xu@intel.com>	2016-10-12 09:45:09 +02:00
Yuanhan Liu	552e8fd3d2	vhost: simplify memory regions handling Due to history reason (that vhost-cuse comes before vhost-user), some fields for maintaining the vhost-user memory mappings (such as mmapped address and size, with those we then can unmap on destroy) are kept in "orig_region_map" struct, a structure that is defined only in vhost-user source file. The right way to go is to remove the structure and move all those fields into virtio_memory_region struct. But we simply can't do that before, because it breaks the ABI. Now, thanks to the ABI refactoring, it's never been a blocking issue any more. And here it goes: this patch removes orig_region_map and redefines virtio_memory_region, to include all necessary info. With that, we can simplify the guest/host address convert a bit. Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com> Tested-by: Qian Xu <qian.q.xu@intel.com>	2016-10-12 09:44:56 +02:00

1 2

55 Commits