numam-dpdk

Author	SHA1	Message	Date
Rich Lane	a90ca1a12e	vhost: remove device operations pointers The vhost_net_device_ops indirection is unnecessary because there is only one implementation of the vhost common code. Removing it makes the code more readable. Signed-off-by: Rich Lane <rich.lane@bigswitch.com> Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>	2016-02-19 19:33:31 +01:00
Rich Lane	ca67ed289a	vhost: fix leak of fds and mmaps The common vhost code only supported a single mmap per device. vhost-user worked around this by saving the address/length/fd of each mmap after the end of the rte_virtio_memory struct. This only works if the vhost-user code frees dev->mem, since the common code is unaware of the extra info. The VHOST_USER_RESET_OWNER message is one situation where the common code frees dev->mem and leaks the fds and mappings. This happens every time I shut down a VM. The new code calls back into the implementation (vhost-user or vhost-cuse) to clean up these resources. The vhost-cuse changes are only compile tested. Signed-off-by: Rich Lane <rich.lane@bigswitch.com> Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>	2016-02-19 16:13:32 +01:00
Yuanhan Liu	d293dac8f3	vhost: claim support of guest announce It's actually a feature already enabled in Linux kernel (since v3.5). What we need to do is simply to claim that we support such feature, and nothing else. With that, the guest will send an ARP message after live migration to notify the switches about the new location of migrated VM. Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Tested-by: Pavel Fedin <p.fedin@samsung.com>	2016-02-19 15:47:20 +01:00
Yuanhan Liu	b171fad1ff	vhost: log used vring changes Introduce vhost_log_write() helper function to log the dirty pages we touched. Page size is harded code to 4096 (VHOST_LOG_PAGE), and each log is presented by 1 bit. Therefore, vhost_log_write() simply finds the right bit for related page we are gonna change, and set it to 1. dev->log_base denotes the start of the dirty page bitmap. Every time we update virtio used ring, we need to log it. And it's been done by a new vhost_log_write() wrapper, vhost_log_used_vring(). Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Signed-off-by: Victor Kaplansky <victork@redhat.com> Tested-by: Pavel Fedin <p.fedin@samsung.com>	2016-02-19 15:44:13 +01:00
Jijiang Liu	859b480d5a	vhost: add guest offload setting Add guest offload setting in vhost lib. Virtio 1.0 spec (5.1.6.4 Processing of Incoming Packets) says: 1. If the VIRTIO_NET_F_GUEST_CSUM feature was negotiated, the VIRTIO_NET_HDR_F_NEEDS_CSUM bit in flags can be set: if so, the packet checksum at offset csum_offset from csum_start and any preceding checksums have been validated. The checksum on the packet is incomplete and csum_start and csum_offset indicate how to calculate it (see Packet Transmission point 1). 2. If the VIRTIO_NET_F_GUEST_TSO4, TSO6 or UFO options were negotiated, then gso_type MAY be something other than VIRTIO_NET_HDR_GSO_NONE, and gso_size field indicates the desired MSS (see Packet Transmission point 2). In order to support these features, the following changes are added, 1. Extend 'VHOST_SUPPORTED_FEATURES' macro to add the offload features negotiation. 2. Enqueue these offloads: convert some fields in mbuf to the fields in virtio_net_hdr. There are more explanations for the implementation. For VM2VM case, there is no need to do checksum, for we think the data should be reliable enough, and setting VIRTIO_NET_HDR_F_NEEDS_CSUM at RX side will let the TCP layer to bypass the checksum validation, so that the RX side could receive the packet in the end. In terms of us-vhost, at vhost RX side, the offload information is inherited from mbuf, which is in turn inherited from TX side. If we can still get those info at RX side, it means the packet is from another VM at same host. So, it's safe to set the VIRTIO_NET_HDR_F_NEEDS_CSUM, to skip checksum validation. Signed-off-by: Jijiang Liu <jijiang.liu@intel.com> Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>	2016-02-17 22:56:44 +01:00
Jijiang Liu	d0cf91303d	vhost: add Tx offload capabilities Add vhost TX offload (CSUM and TSO) support capabilities in vhost lib. In order to support these features, and the following changes are added, 1. Extend 'VHOST_SUPPORTED_FEATURES' macro to add the offload features negotiation. 2. Dequeue TX offload: convert the fileds in virtio_net_hdr to the related fileds in mbuf. Signed-off-by: Jijiang Liu <jijiang.liu@intel.com> Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>	2016-02-17 22:56:44 +01:00
Huawei Xie	766ad08900	vhost: fix logically dead code CID 107107 (#1 of 1): Logically dead code Fixes: af4f2c5feb2e ("vhost: fix code style") Signed-off-by: Huawei Xie <huawei.xie@intel.com> Acked-by: John McNamara <john.mcnamara@intel.com>	2015-12-13 02:14:30 +01:00
Tetsuya Mukawa	e1f8f55571	vhost: fix guest descriptor closed on reset owner message The patch fixes reset_owner message handling not to clear callfd, because callfd will be valid while connection is established. Signed-off-by: Tetsuya Mukawa <mukawa@igel.co.jp> Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>	2015-11-24 21:34:11 +01:00
Yuanhan Liu	87308b5370	vhost: reset device properly Currently, we reset all fields of a device to zero when reset happens, which is wrong, since for some fields like device_fh, ifname, and virt_qp_nb, they should be same and be kept after reset until the device is removed. And this is what's the new helper function reset_device() for. And use rte_zmalloc() instead of rte_malloc, so that we could avoid init_device(), which basically dose zero reset only so far. Hence, init_device() is dropped in this patch. This patch also removes a hack of using the offset a specific field (which is virtqueue now) inside of `virtio_net' structure to do reset, which could be broken easily if someone changed the field order without caution. Cc: Tetsuya Mukawa <mukawa@igel.co.jp> Cc: Huawei Xie <huawei.xie@intel.com> Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Acked-by: Rich Lane <rich.lane@bigswitch.com>	2015-11-12 12:39:08 +01:00
Rich Lane	d243ecf0c2	vhost: make destroy callback on reset owner message QEMU sends VHOST_RESET_OWNER first when shutting down. There was previously no way for the dataplane to know that the virtio_net instance had become unusable and it would segfault when trying to do RX/TX. Signed-off-by: Rich Lane <rich.lane@bigswitch.com> Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>	2015-11-12 12:39:08 +01:00
Marcel Apfelbaum	15e9ee6982	vhost: enable virtio 1.0 Make vhost-user virtio 1.0 compatible by adding it to the supported features and keeping the header length the same as for mergeable RX buffers. Signed-off-by: Marcel Apfelbaum <marcel@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>	2015-11-02 23:12:27 +01:00
Yuanhan Liu	71dfdbe66a	vhost: fix build with kernel < 3.8 Fix build error: virtio-net.c:80:89: error: ‘VIRTIO_NET_F_MQ’ undeclared here rte_virtio_net.h:109: error: ‘VIRTIO_NET_CTRL_MQ_VQ_PAIRS_MAX’ undeclared here Above two virtio-net MQ macros are introduced since kernel v3.8. For older kernel, we should not reference them directly, hence, this patch introduced two wrapper macros, with proper values being set depending on we support MQ or not. Fixes: b09b198bfb5c ("vhost-user: announce queue number in message") Reported-by: Yongjie Gu <yongjiex.gu@intel.com> Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Tested-by: David Marchand <david.marchand@6wind.com>	2015-10-30 15:55:05 +01:00
Yuanhan Liu	19d4d7ef2a	vhost-user: enable multiple queue By setting VHOST_USER_PROTOCOL_F_MQ protocol feature bit, and VIRTIO_NET_F_MQ feature bit. Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Acked-by: Huawei Xie <huawei.xie@intel.com>	2015-10-26 21:23:54 +01:00
Changchun Ouyang	77d20126b4	vhost-user: handle message to enable vring This message is used to enable/disable a specific vring queue pair. The first queue pair is enabled by default. Signed-off-by: Changchun Ouyang <changchun.ouyang@intel.com> Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Acked-by: Huawei Xie <huawei.xie@intel.com>	2015-10-26 21:23:53 +01:00
Yuanhan Liu	e049ca6d10	vhost-user: prepare multiple queue setup All queue pairs, including the default (the first) queue pair, are allocated dynamically, when a vring_call message is received first time for a specific queue pair. This is a refactor work for enabling vhost-user multiple queue; it should not break anything as it does no functional changes: we don't support mq set, so there is only one mq at max. This patch is based on Changchun's patch. Signed-off-by: Ouyang Changchun <changchun.ouyang@intel.com> Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Acked-by: Huawei Xie <huawei.xie@intel.com>	2015-10-26 21:22:37 +01:00
Yuanhan Liu	381316f6a2	vhost-user: support protocol features The two protocol features messages are introduced by qemu vhost maintainer(Michael) for extendting vhost-user interface. Here is an excerpta from the vhost-user spec: Any protocol extensions are gated by protocol feature bits, which allows full backwards compatibility on both master and slave. The vhost-user multiple queue features will be treated as a vhost-user extension, hence, we have to implement the two messages first. VHOST_USER_PROTOCOL_FEATURES is initialized to 0, as we don't support any yet. Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Acked-by: Huawei Xie <huawei.xie@intel.com>	2015-10-26 21:22:27 +01:00
Jerome Jutteau	6c6373c763	vhost: fix missing device checks virtio-net search for it's device in reset_owner. The function don't check the return result of get_config_ll_entry. Using get_config_ll_entry in reset_owner don't show any error when the device is not found. This patch fix this by using get_device instead instead of get_config_ll_entry. In user_get_vring_base, get_device return is not checked and may cause segfault when device is not found. Signed-off-by: Jerome Jutteau <jerome.jutteau@outscale.com> Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>	2015-10-21 12:21:18 +02:00
Jerome Jutteau	2c95f4de6a	vhost: keep device identifier after reset owner virtio-net clean and init device after a VHOST_USER_RESET_OWNER. This reset device identifier to 0 and break ll_root listing logic. This patch keep the old device identifier and re-write it on the cleaned device. Signed-off-by: Jerome Jutteau <jerome.jutteau@outscale.com> Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>	2015-10-21 12:03:57 +02:00
Yuanhan Liu	9702b2b53f	vhost: fix wrong usage of eventfd_t According to eventfd man page: typedef uint64_t eventfd_t; int eventfd_read(int fd, eventfd_t *value); int eventfd_write(int fd, eventfd_t value); eventfd_t is defined for the second arg(value), but not for fd. Here I redefine those fd fields to `int' type, which also removes the redundant (int) cast. And as the man page stated, we should cast 1 to eventfd_t type for eventfd_write(). Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Acked-by: Huawei Xie <huawei.xie@intel.com>	2015-09-25 14:58:30 +02:00
Yuanhan Liu	2bb29a9fc1	vhost: fix typo _det => _dev Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Acked-by: Changchun Ouyang <changchun.ouyang@intel.com> Acked-by: Huawei Xie <huawei.xie@intel.com>	2015-09-25 14:53:28 +02:00
Huawei Xie	af295ad469	vhost: realloc device and queues to same numa node as vring desc When we get the address of vring descriptor table in VHOST_SET_VRING_ADDR message, will try to reallocate vhost device and virt queue to the same numa node. Signed-off-by: Huawei Xie <huawei.xie@intel.com>	2015-06-29 18:57:33 +02:00
Huawei Xie	4113e38100	vhost: use rte_malloc to allocate device and queues use rte_malloc to allocate vhost device and queues Signed-off-by: Huawei Xie <huawei.xie@intel.com>	2015-06-29 18:57:33 +02:00
Krishna Murthy	f75f65abf3	vhost: enable live migration When we migrate VM, without this feature, qemu will report error: "migrate: Migration disabled: vhost lacks VHOST_F_LOG_ALL feature". Signed-off-by: Krishna Murthy <krishna.j.murthy@intel.com>	2015-06-12 17:07:24 +02:00
Huawei Xie	4d16fff496	vhost: check file descriptor before closing This avoids closing -1 in our case. Signed-off-by: Huawei Xie <huawei.xie@intel.com> Acked-by: Changchun Ouyang <changchun.ouyang@intel.com> Acked-by: Tetsuya Mukawa <mukawa@igel.co.jp>	2015-03-09 12:46:46 +01:00
Huawei Xie	64ab971791	vhost: fix file descriptors naming Previous vhost implementation wrongly name kickfd as callfd and callfd as kickfd. It is functional correct, but causes confusion. Exchange kickfd and callfd to avoid confusion. Signed-off-by: Huawei Xie <huawei.xie@intel.com> Acked-by: Changchun Ouyang <changchun.ouyang@intel.com> Acked-by: Tetsuya Mukawa <mukawa@igel.co.jp>	2015-03-09 12:46:46 +01:00
Huawei Xie	54292e9520	vhost: support ifname for vhost-user for vhost-cuse, ifname is the name of the tap device for vhost-user, ifname is the name of the unix domain socket path Signed-off-by: Huawei Xie <huawei.xie@intel.com> Signed-off-by: Przemyslaw Czesnowicz <przemyslaw.czesnowicz@intel.com> Acked-by: Tetsuya Mukawa <mukawa@igel.co.jp>	2015-02-24 01:38:16 +01:00
Huawei Xie	8f972312b8	vhost: support vhost-user In rte_vhost_driver_register(), vhost unix domain socket listener fd is created and added to polled(based on select) fdset. In rte_vhost_driver_session_start(), fds in the fdset are checked for processing. If there is new connection from qemu, connection fd accepted is added to polled fdset. The listener and connection fds in the fdset are then both checked. When there is message on the connection fd, its callback vserver_message_handler is called to process vhost-user messages. To support identifying which virtio is from which guest VM, we could call rte_vhost_driver_register with different socket path. Virtio devices from same VM will connect to VM specific socket. The socket path information is stored in the virtio_net structure. Signed-off-by: Huawei Xie <huawei.xie@intel.com> Signed-off-by: Przemyslaw Czesnowicz <przemyslaw.czesnowicz@intel.com> Acked-by: Tetsuya Mukawa <mukawa@igel.co.jp>	2015-02-24 01:38:15 +01:00
Huawei Xie	9464a44160	vhost: implement cuse memory table remove set_memory_table ops vhost-cuse or vhost-user will both implement their own set_memory_region handler. In current vhost-cuse implementation, guest numa memory isn't supported. Assume that guest memory is backed by only one file. Signed-off-by: Huawei Xie <huawei.xie@intel.com> Signed-off-by: Przemyslaw Czesnowicz <przemyslaw.czesnowicz@intel.com>	2015-02-24 01:38:14 +01:00
Huawei Xie	c2f60667bf	vhost: move fd copying into cuse subdirectory File descriptor is copied from qemu process into vhost process. vhost-user doesn't need eventfd kernel module to copy fds between processes. Signed-off-by: Huawei Xie <huawei.xie@intel.com> Signed-off-by: Przemyslaw Czesnowicz <przemyslaw.czesnowicz@intel.com>	2015-02-24 01:38:11 +01:00
Huawei Xie	34f4c46dc4	vhost: rename header file Rename vhost-net-cdev.h to vhost-net.h. This file defines common operations provided by virtio-net(.c). Signed-off-by: Huawei Xie <huawei.xie@intel.com> Acked-by: Tetsuya Mukawa <mukawa@igel.co.jp>	2015-02-24 01:38:10 +01:00
Huawei Xie	04d696037a	vhost: enable virtio control channel Rx mode VIRTIO_NET_F_CTRL_RX is dependant on VIRTIO_NET_F_CTRL_VQ. Observed that virtio-net driver in guest would crash with only CTRL_RX enabled. In virtnet_send_command: /* Caller should know better */ BUG_ON(!virtio_has_feature(vi->vdev, VIRTIO_NET_F_CTRL_VQ) \|\| (out + in > VIRTNET_SEND_COMMAND_SG_MAX)); Signed-off-by: Huawei Xie <huawei.xie@intel.com> Acked-by: Tetsuya Mukawa <mukawa@igel.co.jp>	2015-02-24 01:38:07 +01:00
Ciara Loftus	f5a31522f0	vhost: add interface name for virtio This patch fixes the issue whereby when using userspace vhost ports in the context of vSwitching, the name provided to the hypervisor/QEMU of the vhost tap device needs to be exposed in the library, in order for the vSwitch to be able to direct packets to the correct device. This patch introduces an 'ifname' member to the virtio-net structure which is populated with the tap device name when QEMU is brought up with a vhost device. Signed-off-by: Ciara Loftus <ciara.loftus@intel.com> Signed-off-by: Anthony Fee <anthonyx.fee@intel.com> Acked-by: Huawei Xie <huawei.xie@intel.com>	2014-12-18 22:52:38 +01:00
Ouyang Changchun	90924caf08	vhost: enable promiscuous and multicast This is to enable user space vhost receiving and forwarding broadcast and multicast packets: Use new option in command line to enable promisc mode; Enable 2 bits in VMDQ RX mode: ETH_VMDQ_ACCEPT_BROADCAST and ETH_VMDQ_ACCEPT_MULTICAST. Signed-off-by: Changchun Ouyang <changchun.ouyang@intel.com>	2014-11-12 00:10:23 +01:00
Ouyang Changchun	4e3eff86cf	vhost: fix mem path check Commit aec8283d47 fixes the compilation issue, but it leads to one runtime issue: early exit wrongly. In some case, 'path' is NULL, but 'resolved_path' has effective path, it should continue going ahead rather than exit. This is due to that qemu unlink the file after it maps the huge page file. In this special case, it is ok to check the resolved path when path is NULL if errno indicates "No such file or directory". Signed-off-by: Changchun Ouyang <changchun.ouyang@intel.com> Acked-by: Huawei Xie <huawei.xie@intel.com>	2014-11-06 23:12:02 +01:00
Huawei Xie	af4f2c5feb	vhost: fix code style Fix alignment issues, lengthy lines, misordered type and other coding style issues. Signed-off-by: Huawei Xie <huawei.xie@intel.com> Acked-by: Neil Horman <nhorman@tuxdriver.com>	2014-11-06 23:12:02 +01:00
Ouyang Changchun	aec8283d47	vhost: fix build without unused result It fixes this compilation complain: "error: ignoring return value of 'realpath', declared with attribute warn_unused_result [-Werror=unused-result]" Signed-off-by: Changchun Ouyang <changchun.ouyang@intel.com> Tested-by: Jingguo Fu <jingguox.fu@intel.com>	2014-10-30 00:19:41 +01:00
Huawei Xie	60ddca7654	vhost: coding style fixes Fix serious coding style issues reported by checkpatch. Signed-off-by: Huawei Xie <huawei.xie@intel.com> Acked-by: Changchun Ouyang <changchun.ouyang@intel.com>	2014-10-13 19:16:54 +02:00
Huawei Xie	da74110053	vhost: clean includes Signed-off-by: Huawei Xie <huawei.xie@intel.com> Acked-by: Changchun Ouyang <changchun.ouyang@intel.com>	2014-10-13 19:16:54 +02:00
Huawei Xie	3d671af711	vhost: supported features VHOST_SUPPORTED_FEATURES is the feature mask that vhost lib supports. VHOST_FEATURES is the feature mask vhost currently supports after some features are turned on/off. Signed-off-by: Huawei Xie <huawei.xie@intel.com> Acked-by: Changchun Ouyang <changchun.ouyang@intel.com> [Thomas: split patch]	2014-10-13 19:16:54 +02:00
Huawei Xie	9eed6bfd2e	vhost: allow to enable or disable features Signed-off-by: Huawei Xie <huawei.xie@intel.com> Acked-by: Changchun Ouyang <changchun.ouyang@intel.com> [Thomas: split patch]	2014-10-13 19:16:54 +02:00
Huawei Xie	28689ff04d	vhost: rename ops registering function Rename init_virtio_net as rte_vhost_callback_register API. rte_vhost_callback_register register the callbacks called when a vhost device is created and ready to be added to data processing core or is de-actived by guest. Signed-off-by: Huawei Xie <huawei.xie@intel.com> Acked-by: Changchun Ouyang <changchun.ouyang@intel.com>	2014-10-13 19:16:54 +02:00
Huawei Xie	c19dc7db15	vhost: move internal structure The structure virtio_net_config_ll is moved to virtio_net.c. It is related to internal virtio device management, so it should not be exposed to other files. Signed-off-by: Huawei Xie <huawei.xie@intel.com> Acked-by: Changchun Ouyang <changchun.ouyang@intel.com>	2014-10-13 19:13:10 +02:00
Huawei Xie	0a739691fc	vhost: remove zero copy memory region generation logic Currently zero copy feature isn't generic as it couples closely with nic. It isn't put in the vhost lib in this version. gpa(guest physical address) to hpa(host physical address) mapping region logic is removed. Signed-off-by: Huawei Xie <huawei.xie@intel.com> Acked-by: Changchun Ouyang <changchun.ouyang@intel.com>	2014-10-13 19:13:10 +02:00
Huawei Xie	68e6490476	vhost: remove switching related logics The following logics will be moved to vhost example: 1. mac learning, which is used to learn the mac address from the first transmitted packet of guest and bind the vhost device to a queue in a pool of VMDQ. 2. VMDQ mac/vlan filter: Each pool the vhost device is bind to is assigned a mac/vlan filter. 3. num_devices is used to specify the maximum vhost devices the nic supports. Signed-off-by: Huawei Xie <huawei.xie@intel.com> Acked-by: Changchun Ouyang <changchun.ouyang@intel.com>	2014-10-13 19:13:10 +02:00
Huawei Xie	5c7a80aec3	vhost: move from examples to dedicated library Those files will be refactored in subsequent patches to form user space vhost library. Makefile and main.h are removed. main.c is renamed to vhost_rxtx.c and will provide vring enqueue/dequeue API. virtio-net.h is renamed to rte_virtio_net.h which is the API header file. Signed-off-by: Huawei Xie <huawei.xie@intel.com> Acked-by: Changchun Ouyang <changchun.ouyang@intel.com> [Thomas: remove from examples Makefile and merge file renaming]	2014-10-13 19:10:09 +02:00

45 Commits