numam-dpdk

Author	SHA1	Message	Date
Yuanhan Liu	a67f286a65	vhost: export queue free entries The new API rte_vhost_avail_entries() is actually a rename of rte_vring_available_entries(), with the "vring" to "vhost" name change to keep the consistency of other vhost exported APIs. This change could let us avoid the dependency of "virtio_net" struct, to prepare for the ABI refactoring. Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Tested-by: Rich Lane <rich.lane@bigswitch.com> Acked-by: Rich Lane <rich.lane@bigswitch.com>	2016-06-22 09:02:58 +02:00
Yuanhan Liu	f6d1bd5365	vhost: export interface name Introduce a new API rte_vhost_get_ifname() to export the ifname to application. Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Tested-by: Rich Lane <rich.lane@bigswitch.com> Acked-by: Rich Lane <rich.lane@bigswitch.com>	2016-06-22 09:01:30 +02:00
Yuanhan Liu	4b4af666b9	vhost: export number of queues Introduce a new API rte_vhost_get_queue_num() to export the number of queues. Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Tested-by: Rich Lane <rich.lane@bigswitch.com> Acked-by: Rich Lane <rich.lane@bigswitch.com>	2016-06-22 09:01:30 +02:00
Yuanhan Liu	586e390013	vhost: export numa node Introduce a new API rte_vhost_get_numa_node() to get the numa node from which the virtio_net struct is allocated. Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Tested-by: Rich Lane <rich.lane@bigswitch.com> Acked-by: Rich Lane <rich.lane@bigswitch.com>	2016-06-22 09:01:30 +02:00
Yuanhan Liu	adf97191b3	vhost: move cuse only struct to cuse vhost cuse is now the last reference of the vhost_device_ctx struct; move it there, and do a rename to "vhost_cuse_device_ctx", to make it clear that it's "cuse only". Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Tested-by: Rich Lane <rich.lane@bigswitch.com> Acked-by: Rich Lane <rich.lane@bigswitch.com>	2016-06-22 09:01:30 +02:00
Yuanhan Liu	319d362e3b	vhost: get device by device id only get_device() just needs vid, so pass vid as the parameter only. Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Tested-by: Rich Lane <rich.lane@bigswitch.com> Acked-by: Rich Lane <rich.lane@bigswitch.com>	2016-06-22 09:01:30 +02:00
Yuanhan Liu	e2a1dd1275	vhost: rename device id variable I failed to figure out what does "fh" mean here for a long while. The only guess I could have had is "file handle". So, you get the point that it's not well named. I then figured it out that "fh" is derived from the fuse lib, and my above guess is right. However, device_fh represents a virtio net device ID. Therefore, here I rename it to vid (Virtio-net device ID, or Vhost device ID; choose one you prefer) to make it easier for understanding. This name (vid) then will be considered to the only interface to applications. That's another reason to do the rename: it's our interface, make it more understandable. Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Tested-by: Rich Lane <rich.lane@bigswitch.com> Acked-by: Rich Lane <rich.lane@bigswitch.com>	2016-06-22 09:01:25 +02:00
Yuanhan Liu	c08a349006	vhost: declare device id as int device_fh repsents the device id for a specific virtio net device. Firstly, "int" would be big enough: we don't need 64 bit. Secondly, this could let us avoid the ugly "%" PRIu64 ".." stuff. And since ctx.fh is derived from device_fh, declare it as int, too. Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Tested-by: Rich Lane <rich.lane@bigswitch.com> Acked-by: Rich Lane <rich.lane@bigswitch.com>	2016-06-22 08:59:54 +02:00
Yuanhan Liu	550c9d27d1	vhost: set/reset device flags internally It does not make sense to ask the application to set/unset the flag VIRTIO_DEV_RUNNING (that used internal only) at new_device()/ destroy_device() callback. Instead, it should be set after new_device() succeeds and reset before destroy_device() is invoked inside vhost lib. This patch fixes it. Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Tested-by: Rich Lane <rich.lane@bigswitch.com> Acked-by: Rich Lane <rich.lane@bigswitch.com>	2016-06-22 06:10:54 +02:00
Yuanhan Liu	092f1c2c77	vhost: declare backend with int type It's an fd; so define it as "int", which could also save the unncessary (int) case. Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Tested-by: Rich Lane <rich.lane@bigswitch.com> Acked-by: Rich Lane <rich.lane@bigswitch.com>	2016-06-22 06:10:54 +02:00
Daniel Mrzyglod	a5e20775a7	vhost: fix name not null terminated Fix issue reported by Coverity. Coverity ID 124556 If the buffer is treated as a null terminated string in later operations, a buffer overflow or over-read may occur. In vhost_set_ifname: The string buffer may not have a null terminator if the source string's length is equal to the buffer size Fixes: `54292e9520` ("vhost: support ifname for vhost-user") Signed-off-by: Daniel Mrzyglod <danielx.t.mrzyglod@intel.com> Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>	2016-05-10 20:25:30 +02:00
Yuanhan Liu	71dc571efd	vhost: fix error handling in destroy Fix following coverity defect: 291 void 292 vhost_destroy_device(struct vhost_device_ctx ctx) 293 { 294 struct virtio_net *dev = get_device(ctx); 295 >>> CID 124565: Null pointer dereferences (NULL_RETURNS) >>> Dereferencing a null pointer "dev". Fixes: `45ca9c6f7b` ("vhost: get rid of linked list for devices") Reported-by: John McNamara <john.mcnamara@intel.com> Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>	2016-04-06 12:27:57 +02:00
Ilya Maximets	e994bcda55	vhost: use SMP barriers instead of compiler ones Since commit `4c02e453cc` ("eal: introduce SMP memory barriers") virtio uses architecture dependent SMP barriers. vHost should use them too. Fixes: `4c02e453cc` ("eal: introduce SMP memory barriers") Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Acked-by: Huawei Xie <huawei.xie@intel.com>	2016-03-31 17:09:23 +02:00
Yuanhan Liu	cce3ce3567	vhost: remove unnecessary return Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>	2016-03-25 19:53:00 +01:00
Yuanhan Liu	b3869ebebf	vhost: remove unnecessary memset when enqueueing We have to reset the virtio net hdr at virtio_enqueue_offload() before, due to all mbufs share a single virtio_hdr structure: struct virtio_net_hdr_mrg_rxbuf virtio_hdr = {{0, }, 0}; foreach (mbuf) { virtio_enqueue_offload(mbuf, &virtio_hdr.hdr); copy net hdr and mbuf to desc buf } However, after the vhost rxtx refactor, the code looks like: copy_mbuf_to_desc(mbuf) { struct virtio_net_hdr_mrg_rxbuf virtio_hdr = {{0, }, 0} virtio_enqueue_offload(mbuf, &virtio_hdr.hdr); copy net hdr and mbuf to desc buf } foreach (mbuf) { copy_mbuf_to_desc(mbuf); } Therefore, the memset at virtio_enqueue_offload() is not necessary any more; remove it. Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Acked-by: Huawei Xie <huawei.xie@intel.com>	2016-03-17 21:53:06 +01:00
Tetsuya Mukawa	fb871d0a4d	vhost: fix default value of kickfd and callfd Currently, default values of kickfd and callfd are -1. If the values are -1, current code guesses kickfd and callfd haven't been initialized yet. Then vhost library will guess the virtqueue isn't ready for processing. But callfd and kickfd will be set as -1 when "--enable-kvm" isn't specified in QEMU command line. It means we cannot treat -1 as uninitialized state. The patch defines -1 and -2 as VIRTIO_INVALID_EVENTFD and VIRTIO_UNINITIALIZED_EVENTFD, and uses VIRTIO_UNINITIALIZED_EVENTFD for the default values of kickfd and callfd. Signed-off-by: Tetsuya Mukawa <mukawa@igel.co.jp> Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>	2016-03-15 00:20:29 +01:00
Yuanhan Liu	a436f53ebf	vhost: avoid dead loop chain If a malicious guest forges a dead loop chain, it could lead to a dead loop of copying the desc buf to mbuf, which results to all mbuf being exhausted. Add a var nr_desc to avoid such case. Suggested-by: Huawei Xie <huawei.xie@intel.com> Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>	2016-03-15 00:07:32 +01:00
Yuanhan Liu	c687b0b635	vhost: check for ring descriptors overflow A malicious guest may easily forge some illegal vring desc buf. To make our vhost robust, we need make sure desc->next will not go beyond the vq->desc[] array. Suggested-by: Rich Lane <rich.lane@bigswitch.com> Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>	2016-03-15 00:05:59 +01:00
Yuanhan Liu	623bc47054	vhost: do sanity check for ring descriptor length We need make sure that desc->len is bigger than the size of virtio net header, otherwise, unexpected behaviour might happen due to "desc_avail" would become a huge number with for following code: desc_avail = desc->len - vq->vhost_hlen; For dequeue code path, it will try to allocate enough mbuf to hold such size of desc buf, which ends up with consuming all mbufs, leading to no free mbuf is available. Therefore, you might see an error message: Failed to allocate memory for mbuf. Also, for both dequeue/enqueue code path, while it copies data from/to desc buf, the big "desc_avail" would result to access memory not belong the desc buf, which could lead to some potential memory access errors. A malicious guest could easily forge such malformed vring desc buf. Every time we restart an interrupted DPDK application inside guest would also trigger this issue, as all huge pages are reset to 0 during DPDK re-init, leading to desc->len being 0. Therefore, this patch does a sanity check for desc->len, to make vhost robust. Reported-by: Rich Lane <rich.lane@bigswitch.com> Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>	2016-03-15 00:03:46 +01:00
Yuanhan Liu	c252bcf9ec	vhost: remove wrong unlikely prediction in Rx VIRTIO_NET_F_MRG_RXBUF is a default feature supported by vhost. Adding unlikely for VIRTIO_NET_F_MRG_RXBUF detection doesn't make sense to me at all. Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>	2016-03-14 23:59:47 +01:00
Yuanhan Liu	a98240621d	vhost: remove rte_memcpy from header copy First of all, rte_memcpy() is mostly useful for copying big packets by leveraging hardware advanced instructions like AVX. But for virtio net hdr, which is 12 bytes at most, invoking rte_memcpy() will not introduce any performance boost. And, to my suprise, rte_memcpy() is VERY huge. Since rte_memcpy() is inlined, it increases the binary code size linearly every time we call it at a different place. Replacing the two rte_memcpy() with directly copy saves nearly 12K bytes of code size! Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>	2016-03-14 23:58:11 +01:00
Yuanhan Liu	932a00b85a	vhost: refactor mergeable Rx Current virtio_dev_merge_rx() implementation just looks like the old rte_vhost_dequeue_burst(), full of twisted logic, that you can see same code block in quite many different places. However, the logic of virtio_dev_merge_rx() is quite similar to virtio_dev_rx(). The big difference is that the mergeable one could allocate more than one available entries to hold the data. Fetching all available entries to vec_buf at once makes the difference a bit bigger then. The refactored code looks like below: while (mbuf_has_not_drained_totally \|\| mbuf_has_next) { if (this_desc_has_no_room) { this_desc = fetch_next_from_vec_buf(); if (it is the last of a desc chain) update_used_ring(); } if (this_mbuf_has_drained_totally) mbuf = fetch_next_mbuf(); COPY(this_desc, this_mbuf); } This patch reduces quite many lines of code, therefore, make it much more readable. Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>	2016-03-14 23:56:41 +01:00
Yuanhan Liu	282a94ba99	vhost: refactor Rx This is a simple refactor, as there isn't any twisted logic in old code. Here I just broke the code and introduced two helper functions, reserve_avail_buf() and copy_mbuf_to_desc() to make the code more readable. Also, it saves nearly 1K bytes of binary code size. Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>	2016-03-14 23:55:06 +01:00
Yuanhan Liu	bc7f87a2c1	vhost: refactor dequeueing The current rte_vhost_dequeue_burst() implementation is a bit messy and logic twisted. And you could see repeat code here and there. However, rte_vhost_dequeue_burst() acutally does a simple job: copy the packet data from vring desc to mbuf. What's tricky here is: - desc buff could be chained (by desc->next field), so that you need fetch next one if current is wholly drained. - One mbuf could not be big enough to hold all desc buff, hence you need to chain the mbuf as well, by the mbuf->next field. The simplified code looks like following: while (this_desc_is_not_drained_totally \|\| has_next_desc) { if (this_desc_has_drained_totally) { this_desc = next_desc(); } if (mbuf_has_no_room) { mbuf = allocate_a_new_mbuf(); } COPY(mbuf, desc); } Note that the old patch does a special handling for skipping virtio header. However, that could be simply done by adjusting desc_avail and desc_offset var: desc_avail = desc->len - vq->vhost_hlen; desc_offset = vq->vhost_hlen; This refactor makes the code much more readable (IMO), yet it reduces binary code size. Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>	2016-03-14 23:49:36 +01:00
Panu Matilainen	0d822b8047	mk: fix vhost shared library dependencies Add DT_NEEDED entries for external library dependencies which are the most critical ones for sane operation. Clean up vhost_cuse CFLAGS/LDFLAGS confusion while at it. Signed-off-by: Panu Matilainen <pmatilai@redhat.com>	2016-03-13 20:27:26 +01:00
Yuanhan Liu	fd2fca6f6b	vhost: fix queue pair reallocation vq is allocated on pairs, hence we should do pair reallocation at numa_realloc() as well, otherwise an error like following occurs while do numa reallocation: VHOST_CONFIG: reallocate vq from 0 to 1 node PANIC in rte_free(): Fatal error: Invalid memory The reason we don't catch it is because numa_realloc() will not take effect when RTE_LIBRTE_VHOST_NUMA is not enabled, which is the default case. Fixes: `e049ca6d10` ("vhost-user: prepare multiple queue setup") Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Acked-by: Huawei Xie <huawei.xie@intel.com> Tested-by: Ciara Loftus <ciara.loftus@intel.com>	2016-03-11 16:49:20 +01:00
Yuanhan Liu	78ffdaff06	vhost: simplify numa reallocation We could first check if we need realloc vq or not, if so, reallocate it. We then do similar to vhost dev realloc. This could get rid of the tons of repeated "if (realloc_dev)" and "if (realloc_vq)" statements, therefore, makes code a bit more readable. Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Acked-by: Huawei Xie <huawei.xie@intel.com>	2016-03-11 16:49:17 +01:00
Yuanhan Liu	45ca9c6f7b	vhost: get rid of linked list for devices While we use a single linked list to maintain all devices, we could use a static array to achieve the same goal, just like what we did to maintain the eth devices with rte_eth_devices array. This could simplifies the code a bit. Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Acked-by: Huawei Xie <huawei.xie@intel.com>	2016-03-11 16:46:18 +01:00
Yuanhan Liu	6fe390eed0	vhost: fix build with kernel < 3.5 VIRTIO_NET_F_GUEST_ANNOUNCE is a new feature introduced since kernel v3.5. For older kernels (or more precisely, old distributions), we could simply define it manually, to fix the "macro not defined" error. Fixes: `d293dac8f3` ("vhost: claim support of guest announce") Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>	2016-03-11 16:46:18 +01:00
Yuanhan Liu	bb66588304	vhost: broadcast RARP by injecting in receiving mbuf array Broadcast RARP packet by injecting it to receiving mbuf array at rte_vhost_dequeue_burst(). Commit `33226236a3` ("vhost: handle request to send RARP") iterates all host interfaces and then broadcast it by all of them. It did notify the switches about the new location of the migrated VM, however, the mac learning table in the target host is wrong (at least in my test with OVS): $ ovs-appctl fdb/show ovsbr0 port VLAN MAC Age 1 0 b6:3c:72:71:cd:4d 10 LOCAL 0 b6:3c:72:71:cd:4e 10 LOCAL 0 52:54:00:12:34:68 9 1 0 56:f6:64:2c:bc:c0 1 Where 52:54:00:12:34:68 is the mac of the VM. As you can see from the above, the port learned is "LOCAL", which is the "ovsbr0" port. That is reasonable, since we indeed send the pkt by the "ovsbr0" interface. The wrong mac table lead all the packets to the VM go to the "ovsbr0" in the end, which ends up with all packets being lost, until the guest send a ARP quest (or reply) to refresh the mac learning table. Jianfeng then came up with a solution I have thought of firstly but NAKed by myself, concerning it has potential issues [0]. The solution is as title stated: broadcast the RARP packet by injecting it to the receiving mbuf arrays at rte_vhost_dequeue_burst(). The re-bring of that idea made me think it twice; it looked like a false concern to me then. And I had done a rough verification: it worked as expected. [0]: http://dpdk.org/ml/archives/dev/2016-February/033527.html Another note is that while preparing this version, I found that DPDK has some ARP related structures and macros defined. So, use them instead of the one from standard header files here. Cc: Thibaut Collet <thibaut.collet@6wind.com> Suggested-by: Jianfeng Tan <jianfeng.tan@intel.com> Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>	2016-02-29 16:55:30 +01:00
Pavel Fedin	2f29ce885a	vhost: check memory map before address translation Malfunctioning virtio clients may not send VHOST_USER_SET_MEM_TABLE for some reason. This causes NULL dereference in qva_to_vva(). Signed-off-by: Pavel Fedin <p.fedin@samsung.com> Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>	2016-02-21 11:17:48 +01:00
Rich Lane	a90ca1a12e	vhost: remove device operations pointers The vhost_net_device_ops indirection is unnecessary because there is only one implementation of the vhost common code. Removing it makes the code more readable. Signed-off-by: Rich Lane <rich.lane@bigswitch.com> Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>	2016-02-19 19:33:31 +01:00
Rich Lane	ca67ed289a	vhost: fix leak of fds and mmaps The common vhost code only supported a single mmap per device. vhost-user worked around this by saving the address/length/fd of each mmap after the end of the rte_virtio_memory struct. This only works if the vhost-user code frees dev->mem, since the common code is unaware of the extra info. The VHOST_USER_RESET_OWNER message is one situation where the common code frees dev->mem and leaks the fds and mappings. This happens every time I shut down a VM. The new code calls back into the implementation (vhost-user or vhost-cuse) to clean up these resources. The vhost-cuse changes are only compile tested. Signed-off-by: Rich Lane <rich.lane@bigswitch.com> Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>	2016-02-19 16:13:32 +01:00
Yuanhan Liu	d22929db97	vhost: remove duplicate header include unistd.h has been included twice; remove one. Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>	2016-02-19 16:00:03 +01:00
Yuanhan Liu	d639996a74	vhost: enable log_shmfd protocol feature To claim that we support vhost-user live migration support: SET_LOG_BASE request will be send only when this feature flag is set. Besides this flag, we actually need another feature flag set to make vhost-user live migration work: VHOST_F_LOG_ALL. Which, however, has been enabled long time ago. Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Tested-by: Pavel Fedin <p.fedin@samsung.com>	2016-02-19 15:53:38 +01:00
Yuanhan Liu	33226236a3	vhost: handle request to send RARP While in former patch we enabled GUEST_ANNOUNCE feature, so that the guest OS will broadcast a GARP message after migration to notify the switch about the new location of migrated VM, the thing is that GUEST_ANNOUNCE is enabled since kernel v3.5 only. For older kernel, VHOST_USER_SEND_RARP request comes to rescue. The payload of this new request is the mac address of the migrated VM, with that, we could construct a RARP message, and then broadcast it to host interfaces. That's how this patch works: - list all interfaces, with the help of SIOCGIFCONF ioctl command - construct an RARP message and broadcast it Cc: Thibaut Collet <thibaut.collet@6wind.com> Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>	2016-02-19 15:49:02 +01:00
Yuanhan Liu	d293dac8f3	vhost: claim support of guest announce It's actually a feature already enabled in Linux kernel (since v3.5). What we need to do is simply to claim that we support such feature, and nothing else. With that, the guest will send an ARP message after live migration to notify the switches about the new location of migrated VM. Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Tested-by: Pavel Fedin <p.fedin@samsung.com>	2016-02-19 15:47:20 +01:00
Yuanhan Liu	699e3577e6	vhost: log vring desc buffer changes Every time we copy a buf to vring desc, we need to log it. Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Signed-off-by: Victor Kaplansky <victork@redhat.com> Tested-by: Pavel Fedin <p.fedin@samsung.com>	2016-02-19 15:46:46 +01:00
Yuanhan Liu	b171fad1ff	vhost: log used vring changes Introduce vhost_log_write() helper function to log the dirty pages we touched. Page size is harded code to 4096 (VHOST_LOG_PAGE), and each log is presented by 1 bit. Therefore, vhost_log_write() simply finds the right bit for related page we are gonna change, and set it to 1. dev->log_base denotes the start of the dirty page bitmap. Every time we update virtio used ring, we need to log it. And it's been done by a new vhost_log_write() wrapper, vhost_log_used_vring(). Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Signed-off-by: Victor Kaplansky <victork@redhat.com> Tested-by: Pavel Fedin <p.fedin@samsung.com>	2016-02-19 15:44:13 +01:00
Yuanhan Liu	54f9e32305	vhost: handle dirty pages logging request VHOST_USER_SET_LOG_BASE request is used to tell the backend (dpdk vhost-user) where we should log dirty pages, and how big the log buffer is. This request introduces a new payload: typedef struct VhostUserLog { uint64_t mmap_size; uint64_t mmap_offset; } VhostUserLog; Also, a fd is delivered from QEMU by ancillary data. With those info given, an area of memory is mmaped, assigned to dev->log_base, for logging dirty pages. Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Signed-off-by: Victor Kaplansky <victork@redhat.com> Tested-by: Pavel Fedin <p.fedin@samsung.com>	2016-02-19 15:42:54 +01:00
Panu Matilainen	f1fe8388d5	vhost: fix build dependency Commit `d0cf91303d` added dependency on librte_net headers to vhost but did not add this to the Makefile, which makes builds non-deterministic. Curiously it is non-parallel build that is consistently broken by this missing dependency, usually it's the other way around, but trying to build without -j(n) fails with: lib/librte_vhost/vhost_rxtx.c:41:20: fatal error: rte_ip.h: No such file or directory Fixes: `d0cf91303d` ("vhost: add Tx offload capabilities") Signed-off-by: Panu Matilainen <pmatilai@redhat.com> Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>	2016-02-18 20:25:15 +01:00
Jijiang Liu	859b480d5a	vhost: add guest offload setting Add guest offload setting in vhost lib. Virtio 1.0 spec (5.1.6.4 Processing of Incoming Packets) says: 1. If the VIRTIO_NET_F_GUEST_CSUM feature was negotiated, the VIRTIO_NET_HDR_F_NEEDS_CSUM bit in flags can be set: if so, the packet checksum at offset csum_offset from csum_start and any preceding checksums have been validated. The checksum on the packet is incomplete and csum_start and csum_offset indicate how to calculate it (see Packet Transmission point 1). 2. If the VIRTIO_NET_F_GUEST_TSO4, TSO6 or UFO options were negotiated, then gso_type MAY be something other than VIRTIO_NET_HDR_GSO_NONE, and gso_size field indicates the desired MSS (see Packet Transmission point 2). In order to support these features, the following changes are added, 1. Extend 'VHOST_SUPPORTED_FEATURES' macro to add the offload features negotiation. 2. Enqueue these offloads: convert some fields in mbuf to the fields in virtio_net_hdr. There are more explanations for the implementation. For VM2VM case, there is no need to do checksum, for we think the data should be reliable enough, and setting VIRTIO_NET_HDR_F_NEEDS_CSUM at RX side will let the TCP layer to bypass the checksum validation, so that the RX side could receive the packet in the end. In terms of us-vhost, at vhost RX side, the offload information is inherited from mbuf, which is in turn inherited from TX side. If we can still get those info at RX side, it means the packet is from another VM at same host. So, it's safe to set the VIRTIO_NET_HDR_F_NEEDS_CSUM, to skip checksum validation. Signed-off-by: Jijiang Liu <jijiang.liu@intel.com> Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>	2016-02-17 22:56:44 +01:00
Jijiang Liu	d0cf91303d	vhost: add Tx offload capabilities Add vhost TX offload (CSUM and TSO) support capabilities in vhost lib. In order to support these features, and the following changes are added, 1. Extend 'VHOST_SUPPORTED_FEATURES' macro to add the offload features negotiation. 2. Dequeue TX offload: convert the fileds in virtio_net_hdr to the related fileds in mbuf. Signed-off-by: Jijiang Liu <jijiang.liu@intel.com> Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>	2016-02-17 22:56:44 +01:00
Yuanhan Liu	0c83f820db	vhost: note the ABI changes Note the ABI changes and update the ABI version to 2. Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>	2015-12-14 01:33:14 +01:00
Huawei Xie	766ad08900	vhost: fix logically dead code CID 107107 (#1 of 1): Logically dead code Fixes: `af4f2c5feb` ("vhost: fix code style") Signed-off-by: Huawei Xie <huawei.xie@intel.com> Acked-by: John McNamara <john.mcnamara@intel.com>	2015-12-13 02:14:30 +01:00
Huawei Xie	16321f1caa	vhost: fix missed unlock CID 107113 (#1 of 1): Missing unlock (LOCK)5. missing_unlock: Returning without unlocking pfdset->fd_mutex. Fixes: `fbf7e07ca1` ("vhost: add select based event driven processing") Signed-off-by: Huawei Xie <huawei.xie@intel.com> Acked-by: John McNamara <john.mcnamara@intel.com>	2015-12-13 02:13:39 +01:00
Huawei Xie	3b77b90e34	vhost: fix missed break in switch CID 107114 (#1 of 1): Missing break in switch Fixes: `8f972312b8` ("vhost: support vhost-user") Signed-off-by: Huawei Xie <huawei.xie@intel.com> Acked-by: John McNamara <john.mcnamara@intel.com>	2015-12-13 02:12:22 +01:00
Huawei Xie	36b2449d86	vhost: fix out-of-bounds read CID 107126 (#1 OF 1): Out-of-bounds read Fixes: `8f972312b8` ("vhost: support vhost-user") Signed-off-by: Huawei Xie <huawei.xie@intel.com> Acked-by: John McNamara <john.mcnamara@intel.com>	2015-12-13 02:11:19 +01:00
Stephen Hemminger	98b5ecbf76	vhost: do not stall if guest is slow When guest is booting (or any othertime guest is busy) it is possible for the small receive ring (256) to get full. If this happens the vhost library should just return normally. It's current behavior of logging just creates massive log spew/overflow which could even act as a DoS attack against host. Reported-by: Nathan Law <nlaw@brocade.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>	2015-12-09 22:02:33 +01:00
Yuanhan Liu	abba423c1b	vhost: reserve some space in structures So that we will not break ABI in future extension by adding few more fields. Struct vhost_virtqueue is reserved with 16 qwords (the later vhost-live migration support would at least consume 3 of them), and struct virtio_net is reserved with a bit more, 64 qwords, as there is only one instance for a virtio nic instance. Note that both reservation are not placed at the end of the struct, but instead before the last field, since both the last field at the two struct take a lot spaces. Putting the reservation after it would divide those reserved fields to another cacheline. (we might need fix them in future, btw) Suggested-by: Panu Matilainen <pmatilai@redhat.com> Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>	2015-12-08 03:00:42 +01:00
Xiaobo Chi	7836894a64	vhost: fix kernel module insertion Problem:if I firstly insert my kmod_test.ko, then insert eventfd_link.ko, error will happen with hint "Device or resource busy". This is because the default minor device number, 0, has been occupied by my kmod_test.ko . root@distro:~/test$ lsmod Module Size Used by kmod_test 927 0 vboxsf 35930 4 vboxguest 222130 1 vboxsf microcode 10315 0 autofs4 25051 0 root@distro:~/test$ insmod ./eventfd_link.ko insmod: ERROR: could not insert module ./eventfd_link.ko: Device or resource busy Explanation: For miscdevices, the major device_no is same, so the minor device_no should be set to ditinguish different misc devices; if not set the minor, it may fail while insmod due to the default minor value, 0, has been used by other miscdevice. MISC_DYNAMIC_MINOR means to let Linux kernel dynamically assign one minor devide number while loading. Signed-off-by: Xiaobo Chi <xiaobo.chi@nokia.com> Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>	2015-11-24 21:34:11 +01:00
Victor Kaplansky	cd81ee7cc2	vhost: fix enabling vring per queue The VHOST_USER_SET_VRING_ENABLE request was sent for each queue-pair. However, it's changed to be sent per queue in the queue-pair by QEMU commit dc3db6ad ("vhost-user: start/stop all rings"). The change is reasonable, as we send all other request per queue, instead of queue-pair. Hence we should do proper changes to adapt to the QEMU change here. Otherwise, a segfault will be triggered when last TX queue was enabled. Signed-off-by: Victor Kaplansky <victork@redhat.com> Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>	2015-11-24 21:34:11 +01:00
Jianfeng Tan	ec09c280b8	vhost: fix mmap not aligned with hugepage size This patch fixes a bug under lower version linux kernel, mmap() fails when length is not aligned with hugepage size. mmap() without flag of MAP_ANONYMOUS, should be called with length argument aligned with hugepagesz at older longterm version Linux, like 2.6.32 and 3.2.72, or mmap() will fail with EINVAL. This bug was fixed in Linux kernel by commit: dab2d3dc45ae7343216635d981d43637e1cb7d45 To avoid failure, make sure in caller to keep length aligned. Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com> Acked-by: Huawei Xie <huawei.xie@intel.com>	2015-11-24 21:34:11 +01:00
Tetsuya Mukawa	e1f8f55571	vhost: fix guest descriptor closed on reset owner message The patch fixes reset_owner message handling not to clear callfd, because callfd will be valid while connection is established. Signed-off-by: Tetsuya Mukawa <mukawa@igel.co.jp> Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>	2015-11-24 21:34:11 +01:00
Yuanhan Liu	87308b5370	vhost: reset device properly Currently, we reset all fields of a device to zero when reset happens, which is wrong, since for some fields like device_fh, ifname, and virt_qp_nb, they should be same and be kept after reset until the device is removed. And this is what's the new helper function reset_device() for. And use rte_zmalloc() instead of rte_malloc, so that we could avoid init_device(), which basically dose zero reset only so far. Hence, init_device() is dropped in this patch. This patch also removes a hack of using the offset a specific field (which is virtqueue now) inside of `virtio_net' structure to do reset, which could be broken easily if someone changed the field order without caution. Cc: Tetsuya Mukawa <mukawa@igel.co.jp> Cc: Huawei Xie <huawei.xie@intel.com> Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Acked-by: Rich Lane <rich.lane@bigswitch.com>	2015-11-12 12:39:08 +01:00
Rich Lane	d243ecf0c2	vhost: make destroy callback on reset owner message QEMU sends VHOST_RESET_OWNER first when shutting down. There was previously no way for the dataplane to know that the virtio_net instance had become unusable and it would segfault when trying to do RX/TX. Signed-off-by: Rich Lane <rich.lane@bigswitch.com> Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>	2015-11-12 12:39:08 +01:00
Marcel Apfelbaum	45c55d39c5	vhost: fix build with old kernels Commit `15e9ee6982` uses the VIRTIO_F_VERSION_1 macro existing only in newer kernels. Fixed it by manually defining it for older kernels. Fixes: `15e9ee6982` ("vhost: enable virtio 1.0") Reported-by: Qian Xu <qian.q.xu@intel.com> Signed-off-by: Marcel Apfelbaum <marcel@redhat.com>	2015-11-03 12:33:04 +01:00
Marcel Apfelbaum	15e9ee6982	vhost: enable virtio 1.0 Make vhost-user virtio 1.0 compatible by adding it to the supported features and keeping the header length the same as for mergeable RX buffers. Signed-off-by: Marcel Apfelbaum <marcel@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>	2015-11-02 23:12:27 +01:00
Pavel Boldin	40e8552e32	vhost: use new eventfd copy Signed-off-by: Pavel Boldin <pboldin@mirantis.com> Acked-by: Huawei Xie <huawei.xie@intel.com>	2015-10-30 20:06:51 +01:00
Pavel Boldin	c706f3b9c8	vhost: add new eventfd copy ioctl Signed-off-by: Pavel Boldin <pboldin@mirantis.com> Acked-by: Huawei Xie <huawei.xie@intel.com>	2015-10-30 20:06:30 +01:00
Pavel Boldin	e658d99749	vhost: refactor eventfd copy handler * Move ioctl `EVENTFD_COPY' code to a separate function * Remove extra #includes * Introduce function fget_from_files * Fix ioctl return values Signed-off-by: Pavel Boldin <pboldin@mirantis.com> Acked-by: Huawei Xie <huawei.xie@intel.com>	2015-10-30 20:03:51 +01:00
Yuanhan Liu	71dfdbe66a	vhost: fix build with kernel < 3.8 Fix build error: virtio-net.c:80:89: error: ‘VIRTIO_NET_F_MQ’ undeclared here rte_virtio_net.h:109: error: ‘VIRTIO_NET_CTRL_MQ_VQ_PAIRS_MAX’ undeclared here Above two virtio-net MQ macros are introduced since kernel v3.8. For older kernel, we should not reference them directly, hence, this patch introduced two wrapper macros, with proper values being set depending on we support MQ or not. Fixes: `b09b198bfb` ("vhost-user: announce queue number in message") Reported-by: Yongjie Gu <yongjiex.gu@intel.com> Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Tested-by: David Marchand <david.marchand@6wind.com>	2015-10-30 15:55:05 +01:00
Tetsuya Mukawa	07d37fbf5e	vhost: fix crash with multiqueue enabled The patch fixes wrong handling of virtqueue array index when GET_VRING_BASE message comes. Signed-off-by: Tetsuya Mukawa <mukawa@igel.co.jp> Acked-by: Huawei Xie <huawei.xie@intel.com> Acked-by: Flavio Leitner <fbl@sysclose.org>	2015-10-30 15:46:01 +01:00
Yuanhan Liu	19d4d7ef2a	vhost-user: enable multiple queue By setting VHOST_USER_PROTOCOL_F_MQ protocol feature bit, and VIRTIO_NET_F_MQ feature bit. Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Acked-by: Huawei Xie <huawei.xie@intel.com>	2015-10-26 21:23:54 +01:00
Changchun Ouyang	77d20126b4	vhost-user: handle message to enable vring This message is used to enable/disable a specific vring queue pair. The first queue pair is enabled by default. Signed-off-by: Changchun Ouyang <changchun.ouyang@intel.com> Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Acked-by: Huawei Xie <huawei.xie@intel.com>	2015-10-26 21:23:53 +01:00
Changchun Ouyang	7c46842c9e	vhost: use queue id instead of constant ring index Do not use VIRTIO_RXQ or VIRTIO_TXQ anymore; use the queue_id instead, which will be set to a proper value for a specific queue when we have multiple queue support enabled. For now, queue_id is still set with VIRTIO_RXQ or VIRTIO_TXQ, so it should not break anything. Signed-off-by: Changchun Ouyang <changchun.ouyang@intel.com> Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Acked-by: Huawei Xie <huawei.xie@intel.com>	2015-10-26 21:23:49 +01:00
Yuanhan Liu	e049ca6d10	vhost-user: prepare multiple queue setup All queue pairs, including the default (the first) queue pair, are allocated dynamically, when a vring_call message is received first time for a specific queue pair. This is a refactor work for enabling vhost-user multiple queue; it should not break anything as it does no functional changes: we don't support mq set, so there is only one mq at max. This patch is based on Changchun's patch. Signed-off-by: Ouyang Changchun <changchun.ouyang@intel.com> Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Acked-by: Huawei Xie <huawei.xie@intel.com>	2015-10-26 21:22:37 +01:00
Yuanhan Liu	b09b198bfb	vhost-user: announce queue number in message Add VHOST_USER_GET_QUEUE_NUM message to tell the frontend (qemu) how many queue pairs we support. And it is initiated to VIRTIO_NET_CTRL_MQ_VQ_PAIRS_MAX. Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Acked-by: Huawei Xie <huawei.xie@intel.com>	2015-10-26 21:22:32 +01:00
Yuanhan Liu	381316f6a2	vhost-user: support protocol features The two protocol features messages are introduced by qemu vhost maintainer(Michael) for extendting vhost-user interface. Here is an excerpta from the vhost-user spec: Any protocol extensions are gated by protocol feature bits, which allows full backwards compatibility on both master and slave. The vhost-user multiple queue features will be treated as a vhost-user extension, hence, we have to implement the two messages first. VHOST_USER_PROTOCOL_FEATURES is initialized to 0, as we don't support any yet. Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Acked-by: Huawei Xie <huawei.xie@intel.com>	2015-10-26 21:22:27 +01:00
Jerome Jutteau	6c6373c763	vhost: fix missing device checks virtio-net search for it's device in reset_owner. The function don't check the return result of get_config_ll_entry. Using get_config_ll_entry in reset_owner don't show any error when the device is not found. This patch fix this by using get_device instead instead of get_config_ll_entry. In user_get_vring_base, get_device return is not checked and may cause segfault when device is not found. Signed-off-by: Jerome Jutteau <jerome.jutteau@outscale.com> Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>	2015-10-21 12:21:18 +02:00
Jerome Jutteau	2c95f4de6a	vhost: keep device identifier after reset owner virtio-net clean and init device after a VHOST_USER_RESET_OWNER. This reset device identifier to 0 and break ll_root listing logic. This patch keep the old device identifier and re-write it on the cleaned device. Signed-off-by: Jerome Jutteau <jerome.jutteau@outscale.com> Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>	2015-10-21 12:03:57 +02:00
Huawei Xie	5dab2f1137	vhost: inject only one interrupt for a batch of packets In merge-able RX path, vhost injects interrupts to guest for each packet. This should degrade performance a lot. This patch fixes this issue by injecting one interrupt for a batch of packets. Signed-off-by: Huawei Xie <huawei.xie@intel.com> Acked-by: Jianfeng Tan <jianfeng.tan@intel.com>	2015-09-30 01:19:19 +02:00
Yuanhan Liu	9702b2b53f	vhost: fix wrong usage of eventfd_t According to eventfd man page: typedef uint64_t eventfd_t; int eventfd_read(int fd, eventfd_t *value); int eventfd_write(int fd, eventfd_t value); eventfd_t is defined for the second arg(value), but not for fd. Here I redefine those fd fields to `int' type, which also removes the redundant (int) cast. And as the man page stated, we should cast 1 to eventfd_t type for eventfd_write(). Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Acked-by: Huawei Xie <huawei.xie@intel.com>	2015-09-25 14:58:30 +02:00
Yuanhan Liu	dbd897d0a1	vhost: get rid of duplicate code Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Acked-by: Changchun Ouyang <changchun.ouyang@intel.com> Acked-by: Huawei Xie <huawei.xie@intel.com>	2015-09-25 14:55:17 +02:00
Yuanhan Liu	2bb29a9fc1	vhost: fix typo _det => _dev Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Acked-by: Changchun Ouyang <changchun.ouyang@intel.com> Acked-by: Huawei Xie <huawei.xie@intel.com>	2015-09-25 14:53:28 +02:00
Yuanhan Liu	8426af8e57	vhost: remove extra semicolon Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Acked-by: Changchun Ouyang <changchun.ouyang@intel.com> Acked-by: Huawei Xie <huawei.xie@intel.com>	2015-09-25 14:50:46 +02:00
Aaron Conole	a17cc4f172	vhost: build eventfd_link module against specified kernel The vHost eventlink driver is a kernel module that requires a kernel source/build directory to build the ko. Convert the fixed kernel build directory specifier to one which may be user specified on the command-line. Signed-off-by: Aaron Conole <aconole@redhat.com>	2015-09-24 22:01:53 +02:00
Ouyang Changchun	d533647a87	vhost: fix qemu shutdown This patch originates from the patch: "Patch for Qemu wrapper for US-VHost to ensure Qemu process ends when VM is shutdown", http://dpdk.org/ml/archives/dev/2014-June/003606.html Also update the vhost sample guide doc. Signed-off-by: Claire Murphy <claire.k.murphy@intel.com> Signed-off-by: Changchun Ouyang <changchun.ouyang@intel.com>	2015-09-24 14:57:36 +02:00
Ouyang Changchun	43866bf71d	doc: fix vhost sample parameter This commit removes the dev-index, so update the doc for this change: Fixes: `17b8320a3e` ("vhost: remove index parameter") Signed-off-by: Changchun Ouyang <changchun.ouyang@intel.com>	2015-09-24 12:17:01 +02:00
Ouyang Changchun	1cbf787ef8	vhost: add log on socket bind failure It adds more readable log info if a socket fails to bind to local socket file name. Signed-off-by: Changchun Ouyang <changchun.ouyang@intel.com> Acked-by: Huawei Xie <huawei.xie@intel.com>	2015-07-17 14:53:26 +02:00
Huawei Xie	cca619e459	vhost: comment unwanted callback add comment for potential unwanted callback on listenfds Signed-off-by: Huawei Xie <huawei.xie@intel.com>	2015-06-30 17:49:08 +02:00
Huawei Xie	292959c719	vhost: cleanup unix socket rte_vhost_driver_unregister API will remove the listenfd from event list, and then close it. Signed-off-by: Huawei Xie <huawei.xie@intel.com> Signed-off-by: Peng Sun <peng.a.sun@intel.com>	2015-06-30 17:49:08 +02:00
Huawei Xie	3670686ab9	vhost: fix race for connection fd In the event handler of connection fd, the connection fd could be possibly closed. The event dispatch loop would then try to remove the fd from fdset. Between these two actions, another thread might register a new listenfd reusing the val of just closed fd, so we couldn't call fdset_del which would wrongly clean up the new listenfd. A new function fdset_del_slot is provided to cleanup the fd at the specified location. Signed-off-by: Huawei Xie <huawei.xie@intel.com>	2015-06-30 17:49:07 +02:00
Huawei Xie	af295ad469	vhost: realloc device and queues to same numa node as vring desc When we get the address of vring descriptor table in VHOST_SET_VRING_ADDR message, will try to reallocate vhost device and virt queue to the same numa node. Signed-off-by: Huawei Xie <huawei.xie@intel.com>	2015-06-29 18:57:33 +02:00
Huawei Xie	4113e38100	vhost: use rte_malloc to allocate device and queues use rte_malloc to allocate vhost device and queues Signed-off-by: Huawei Xie <huawei.xie@intel.com>	2015-06-29 18:57:33 +02:00
Cyril Chemparathy	82be8d5442	mbuf: use offset macro This patch simply applies the transform previously committed in scripts/cocci/mtod-offset.cocci. No other modifications have been made here. Signed-off-by: Cyril Chemparathy <cchemparathy@ezchip.com> Acked-by: Olivier Matz <olivier.matz@6wind.com>	2015-06-24 12:01:14 +02:00
Ouyang Changchun	8b636a50c2	doc: fix doxygen warnings in vhost API Signed-off-by: Changchun Ouyang <changchun.ouyang@intel.com>	2015-06-19 12:11:53 +02:00
Ouyang Changchun	5dc985ec1d	vhost: remove unnecessary descriptor length updates Remove these unnecessary vring descriptor length updating, vhost should not change them. virtio in front end should assign value to desc.len for both rx and tx. Test report: http://dpdk.org/ml/archives/dev/2015-June/018610.html Signed-off-by: Changchun Ouyang <changchun.ouyang@intel.com> Acked-by: Huawei Xie <huawei.xie@intel.com>	2015-06-17 16:56:24 +02:00
Ouyang Changchun	2927c37ca4	vhost: rework mergeable Rx Extract codes into a function: update_secure_len which is used to accumulate the buffer len in the vring descriptors and to fill struct buf_vec. Signed-off-by: Changchun Ouyang <changchun.ouyang@intel.com> Acked-by: Huawei Xie <huawei.xie@intel.com>	2015-06-17 16:47:51 +02:00
Ouyang Changchun	46a8fafaa7	vhost: refine code style Remove unnecessary new line. Signed-off-by: Changchun Ouyang <changchun.ouyang@intel.com> Acked-by: Huawei Xie <huawei.xie@intel.com>	2015-06-17 16:25:10 +02:00
Ouyang Changchun	f1a519ad98	vhost: fix enqueue/dequeue to handle chained vring descriptors Vring enqueue need consider the 2 cases: 1. use separate descriptors to contain virtio header and actual data, e.g. the first descriptor is for virtio header, and then followed by descriptors for actual data. 2. virtio header and some data are put together in one descriptor, e.g. the first descriptor contain both virtio header and part of actual data, and then followed by more descriptors for rest of packet data, current DPDK based virtio-net pmd implementation is this case; So does vring dequeue, it should not assume vring descriptor is chained or not chained, it should use desc->flags to check whether it is chained or not. This patch also fixes TX corrupt issue when vhost co-work with virtio-net driver which uses one single vring descriptor (header and data are in one descriptor) for virtio tx process on default. Test report: http://dpdk.org/ml/archives/dev/2015-June/018610.html Signed-off-by: Changchun Ouyang <changchun.ouyang@intel.com> Acked-by: Huawei Xie <huawei.xie@intel.com>	2015-06-17 16:18:40 +02:00
Krishna Murthy	f75f65abf3	vhost: enable live migration When we migrate VM, without this feature, qemu will report error: "migrate: Migration disabled: vhost lacks VHOST_F_LOG_ALL feature". Signed-off-by: Krishna Murthy <krishna.j.murthy@intel.com>	2015-06-12 17:07:24 +02:00
Stephen Hemminger	a43a55472f	lib: fix whitespace More places with trailing whitespace, and empty blank lines Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>	2015-06-12 11:10:10 +02:00
Huawei Xie	159793ac86	vhost: fix virtio freeze due to missed interrupt Update of used->idx and read of avail->flags could be reordered. Memory fence should be used to ensure the order, otherwise guest could see a stale used->idx value after it toggles the interrupt suppression flag. After guest sets the interrupt suppression flag, it will check if there is more buffer to process through used->idx. If it sees a stale value, it will exit the processing while host won't send interrupt to guest. Signed-off-by: Huawei Xie <huawei.xie@intel.com> Reviewed-by: Luke Gorrie <luke@snabb.co>	2015-05-13 12:16:47 +02:00
Bruce Richardson	ec10e8d24e	vhost: remove inclusion of mbuf header The virtio_net header file includes the mbuf header file, but it does not need to do so as it only uses pointers to the struct rte_mbuf type, and does not use any of the mbuf internals, nor any of the mbuf functions or macros. Therefore the inclusion is unnecessary, and can be replaced by a forward declaration of the mbuf type. Signed-off-by: Bruce Richardson <bruce.richardson@intel.com> Acked-by: Olivier Matz <olivier.matz@6wind.com>	2015-05-11 15:36:37 +02:00
Pavel Boldin	38f4f01f8f	vhost: fix file struct leakage Due to increased `struct file's reference counter subsequent call to `filp_close' does not free the `struct file'. Prepend `fput' call to decrease the reference counter. Signed-off-by: Pavel Boldin <pboldin@mirantis.com> Acked-by: Huawei Xie <huawei.xie@intel.com>	2015-03-26 22:33:41 +01:00
Haifeng Lin	5c561b0a23	vhost: fix index when mbuf allocation fails When failed to malloc buffer from mempool we just update last_used_idx but not used->idx so after many times vhost thought have handle all packets but virtio_net thought vhost have not handle all packets and will not update avail->idx. Signed-off-by: Haifeng Lin <haifeng.lin@huawei.com> Acked-by: Changchun Ouyang <changchun.ouyang@intel.com> Acked-by: Huawei Xie <huawei.xie@intel.com>	2015-03-26 22:33:41 +01:00
Huawei Xie	857af048ac	vhost: fix build Fix the error "missing initializer" and "cast to pointer from integer of different size". For the pointer to integer cast issue, need to investigate changing the typeof mapped_address. Signed-off-by: Huawei Xie <huawei.xie@intel.com> Acked-by: Changchun Ouyang <changchun.ouyang@intel.com>	2015-03-20 23:01:42 +01:00
Benoît Canet	45db8927a8	vhost: add hint on how to add or remove device to a data core Let's make sure people will not forget to set and unset VIRTIO_DEV_RUNNING. Signed-off-by: Benoît Canet <benoit.canet@nodalink.com> Acked-by: Huawei Xie <huawei.xie@intel.com>	2015-03-17 12:39:52 +01:00
Huawei Xie	28a1ccca41	vhost: add build option for vhost-user Turn on CONFIG_RTE_LIBRTE_VHOST to enable vhost. vhost-user is turned on by default. Turn off CONFIG_RTE_LIBRTE_VHOST_USER to enable vhost-cuse implementation. Signed-off-by: Huawei Xie <huawei.xie@intel.com> Acked-by: Changchun Ouyang <changchun.ouyang@intel.com>	2015-03-17 00:46:01 +01:00

1 2 3 4

200 Commits