numam-dpdk

Author	SHA1	Message	Date
Yuanhan Liu	004b8ca8b5	vhost: reserve few more space for future extension "virtio_net_device_ops" is the only left open struct that an application can access, therefore, it's the only place that might introduce potential ABI break in future for extension. So, do some reservation for it. 5 should be pretty enough, considering that we have barely touched it for a long while. Another reason to choose 5 is for cache alignment: 5 makes the struct 64 bytes for 64 bit machine. With this, it's confidence to say that we might be able to be free from the ABI violation forever. Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Tested-by: Rich Lane <rich.lane@bigswitch.com> Acked-by: Rich Lane <rich.lane@bigswitch.com>	2016-06-22 09:43:01 +02:00
Yuanhan Liu	db69be54b6	vhost: hide internal code We are now safe to move all those internal structs/macros/functions to vhost-net.h, to hide them from external access. This patch also breaks long lines and removes some redundant comments. Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Tested-by: Rich Lane <rich.lane@bigswitch.com> Acked-by: Rich Lane <rich.lane@bigswitch.com>	2016-06-22 09:43:01 +02:00
Yuanhan Liu	4ecf22e356	vhost: export device id as the interface to applications With all the previous prepare works, we are just one step away from the final ABI refactoring. That is, to change current API to let them stick to vid instead of the old virtio_net dev. Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Tested-by: Rich Lane <rich.lane@bigswitch.com> Acked-by: Rich Lane <rich.lane@bigswitch.com>	2016-06-22 09:42:57 +02:00
Yuanhan Liu	a67f286a65	vhost: export queue free entries The new API rte_vhost_avail_entries() is actually a rename of rte_vring_available_entries(), with the "vring" to "vhost" name change to keep the consistency of other vhost exported APIs. This change could let us avoid the dependency of "virtio_net" struct, to prepare for the ABI refactoring. Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Tested-by: Rich Lane <rich.lane@bigswitch.com> Acked-by: Rich Lane <rich.lane@bigswitch.com>	2016-06-22 09:02:58 +02:00
Yuanhan Liu	f6d1bd5365	vhost: export interface name Introduce a new API rte_vhost_get_ifname() to export the ifname to application. Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Tested-by: Rich Lane <rich.lane@bigswitch.com> Acked-by: Rich Lane <rich.lane@bigswitch.com>	2016-06-22 09:01:30 +02:00
Yuanhan Liu	4b4af666b9	vhost: export number of queues Introduce a new API rte_vhost_get_queue_num() to export the number of queues. Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Tested-by: Rich Lane <rich.lane@bigswitch.com> Acked-by: Rich Lane <rich.lane@bigswitch.com>	2016-06-22 09:01:30 +02:00
Yuanhan Liu	586e390013	vhost: export numa node Introduce a new API rte_vhost_get_numa_node() to get the numa node from which the virtio_net struct is allocated. Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Tested-by: Rich Lane <rich.lane@bigswitch.com> Acked-by: Rich Lane <rich.lane@bigswitch.com>	2016-06-22 09:01:30 +02:00
Yuanhan Liu	e2a1dd1275	vhost: rename device id variable I failed to figure out what does "fh" mean here for a long while. The only guess I could have had is "file handle". So, you get the point that it's not well named. I then figured it out that "fh" is derived from the fuse lib, and my above guess is right. However, device_fh represents a virtio net device ID. Therefore, here I rename it to vid (Virtio-net device ID, or Vhost device ID; choose one you prefer) to make it easier for understanding. This name (vid) then will be considered to the only interface to applications. That's another reason to do the rename: it's our interface, make it more understandable. Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Tested-by: Rich Lane <rich.lane@bigswitch.com> Acked-by: Rich Lane <rich.lane@bigswitch.com>	2016-06-22 09:01:25 +02:00
Yuanhan Liu	c08a349006	vhost: declare device id as int device_fh repsents the device id for a specific virtio net device. Firstly, "int" would be big enough: we don't need 64 bit. Secondly, this could let us avoid the ugly "%" PRIu64 ".." stuff. And since ctx.fh is derived from device_fh, declare it as int, too. Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Tested-by: Rich Lane <rich.lane@bigswitch.com> Acked-by: Rich Lane <rich.lane@bigswitch.com>	2016-06-22 08:59:54 +02:00
Yuanhan Liu	092f1c2c77	vhost: declare backend with int type It's an fd; so define it as "int", which could also save the unncessary (int) case. Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Tested-by: Rich Lane <rich.lane@bigswitch.com> Acked-by: Rich Lane <rich.lane@bigswitch.com>	2016-06-22 06:10:54 +02:00
Tetsuya Mukawa	fb871d0a4d	vhost: fix default value of kickfd and callfd Currently, default values of kickfd and callfd are -1. If the values are -1, current code guesses kickfd and callfd haven't been initialized yet. Then vhost library will guess the virtqueue isn't ready for processing. But callfd and kickfd will be set as -1 when "--enable-kvm" isn't specified in QEMU command line. It means we cannot treat -1 as uninitialized state. The patch defines -1 and -2 as VIRTIO_INVALID_EVENTFD and VIRTIO_UNINITIALIZED_EVENTFD, and uses VIRTIO_UNINITIALIZED_EVENTFD for the default values of kickfd and callfd. Signed-off-by: Tetsuya Mukawa <mukawa@igel.co.jp> Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>	2016-03-15 00:20:29 +01:00
Yuanhan Liu	6fe390eed0	vhost: fix build with kernel < 3.5 VIRTIO_NET_F_GUEST_ANNOUNCE is a new feature introduced since kernel v3.5. For older kernels (or more precisely, old distributions), we could simply define it manually, to fix the "macro not defined" error. Fixes: d293dac8f30e ("vhost: claim support of guest announce") Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>	2016-03-11 16:46:18 +01:00
Yuanhan Liu	bb66588304	vhost: broadcast RARP by injecting in receiving mbuf array Broadcast RARP packet by injecting it to receiving mbuf array at rte_vhost_dequeue_burst(). Commit 33226236a35e ("vhost: handle request to send RARP") iterates all host interfaces and then broadcast it by all of them. It did notify the switches about the new location of the migrated VM, however, the mac learning table in the target host is wrong (at least in my test with OVS): $ ovs-appctl fdb/show ovsbr0 port VLAN MAC Age 1 0 b6:3c:72:71:cd:4d 10 LOCAL 0 b6:3c:72:71:cd:4e 10 LOCAL 0 52:54:00:12:34:68 9 1 0 56:f6:64:2c:bc:c0 1 Where 52:54:00:12:34:68 is the mac of the VM. As you can see from the above, the port learned is "LOCAL", which is the "ovsbr0" port. That is reasonable, since we indeed send the pkt by the "ovsbr0" interface. The wrong mac table lead all the packets to the VM go to the "ovsbr0" in the end, which ends up with all packets being lost, until the guest send a ARP quest (or reply) to refresh the mac learning table. Jianfeng then came up with a solution I have thought of firstly but NAKed by myself, concerning it has potential issues [0]. The solution is as title stated: broadcast the RARP packet by injecting it to the receiving mbuf arrays at rte_vhost_dequeue_burst(). The re-bring of that idea made me think it twice; it looked like a false concern to me then. And I had done a rough verification: it worked as expected. [0]: http://dpdk.org/ml/archives/dev/2016-February/033527.html Another note is that while preparing this version, I found that DPDK has some ARP related structures and macros defined. So, use them instead of the one from standard header files here. Cc: Thibaut Collet <thibaut.collet@6wind.com> Suggested-by: Jianfeng Tan <jianfeng.tan@intel.com> Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>	2016-02-29 16:55:30 +01:00
Yuanhan Liu	b171fad1ff	vhost: log used vring changes Introduce vhost_log_write() helper function to log the dirty pages we touched. Page size is harded code to 4096 (VHOST_LOG_PAGE), and each log is presented by 1 bit. Therefore, vhost_log_write() simply finds the right bit for related page we are gonna change, and set it to 1. dev->log_base denotes the start of the dirty page bitmap. Every time we update virtio used ring, we need to log it. And it's been done by a new vhost_log_write() wrapper, vhost_log_used_vring(). Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Signed-off-by: Victor Kaplansky <victork@redhat.com> Tested-by: Pavel Fedin <p.fedin@samsung.com>	2016-02-19 15:44:13 +01:00
Yuanhan Liu	54f9e32305	vhost: handle dirty pages logging request VHOST_USER_SET_LOG_BASE request is used to tell the backend (dpdk vhost-user) where we should log dirty pages, and how big the log buffer is. This request introduces a new payload: typedef struct VhostUserLog { uint64_t mmap_size; uint64_t mmap_offset; } VhostUserLog; Also, a fd is delivered from QEMU by ancillary data. With those info given, an area of memory is mmaped, assigned to dev->log_base, for logging dirty pages. Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Signed-off-by: Victor Kaplansky <victork@redhat.com> Tested-by: Pavel Fedin <p.fedin@samsung.com>	2016-02-19 15:42:54 +01:00
Yuanhan Liu	abba423c1b	vhost: reserve some space in structures So that we will not break ABI in future extension by adding few more fields. Struct vhost_virtqueue is reserved with 16 qwords (the later vhost-live migration support would at least consume 3 of them), and struct virtio_net is reserved with a bit more, 64 qwords, as there is only one instance for a virtio nic instance. Note that both reservation are not placed at the end of the struct, but instead before the last field, since both the last field at the two struct take a lot spaces. Putting the reservation after it would divide those reserved fields to another cacheline. (we might need fix them in future, btw) Suggested-by: Panu Matilainen <pmatilai@redhat.com> Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>	2015-12-08 03:00:42 +01:00
Marcel Apfelbaum	45c55d39c5	vhost: fix build with old kernels Commit 15e9ee6982a4822ce395fd597dd500a61ceafa7c uses the VIRTIO_F_VERSION_1 macro existing only in newer kernels. Fixed it by manually defining it for older kernels. Fixes: 15e9ee6982a4 ("vhost: enable virtio 1.0") Reported-by: Qian Xu <qian.q.xu@intel.com> Signed-off-by: Marcel Apfelbaum <marcel@redhat.com>	2015-11-03 12:33:04 +01:00
Yuanhan Liu	71dfdbe66a	vhost: fix build with kernel < 3.8 Fix build error: virtio-net.c:80:89: error: ‘VIRTIO_NET_F_MQ’ undeclared here rte_virtio_net.h:109: error: ‘VIRTIO_NET_CTRL_MQ_VQ_PAIRS_MAX’ undeclared here Above two virtio-net MQ macros are introduced since kernel v3.8. For older kernel, we should not reference them directly, hence, this patch introduced two wrapper macros, with proper values being set depending on we support MQ or not. Fixes: b09b198bfb5c ("vhost-user: announce queue number in message") Reported-by: Yongjie Gu <yongjiex.gu@intel.com> Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Tested-by: David Marchand <david.marchand@6wind.com>	2015-10-30 15:55:05 +01:00
Changchun Ouyang	77d20126b4	vhost-user: handle message to enable vring This message is used to enable/disable a specific vring queue pair. The first queue pair is enabled by default. Signed-off-by: Changchun Ouyang <changchun.ouyang@intel.com> Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Acked-by: Huawei Xie <huawei.xie@intel.com>	2015-10-26 21:23:53 +01:00
Yuanhan Liu	e049ca6d10	vhost-user: prepare multiple queue setup All queue pairs, including the default (the first) queue pair, are allocated dynamically, when a vring_call message is received first time for a specific queue pair. This is a refactor work for enabling vhost-user multiple queue; it should not break anything as it does no functional changes: we don't support mq set, so there is only one mq at max. This patch is based on Changchun's patch. Signed-off-by: Ouyang Changchun <changchun.ouyang@intel.com> Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Acked-by: Huawei Xie <huawei.xie@intel.com>	2015-10-26 21:22:37 +01:00
Yuanhan Liu	381316f6a2	vhost-user: support protocol features The two protocol features messages are introduced by qemu vhost maintainer(Michael) for extendting vhost-user interface. Here is an excerpta from the vhost-user spec: Any protocol extensions are gated by protocol feature bits, which allows full backwards compatibility on both master and slave. The vhost-user multiple queue features will be treated as a vhost-user extension, hence, we have to implement the two messages first. VHOST_USER_PROTOCOL_FEATURES is initialized to 0, as we don't support any yet. Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Acked-by: Huawei Xie <huawei.xie@intel.com>	2015-10-26 21:22:27 +01:00
Yuanhan Liu	9702b2b53f	vhost: fix wrong usage of eventfd_t According to eventfd man page: typedef uint64_t eventfd_t; int eventfd_read(int fd, eventfd_t *value); int eventfd_write(int fd, eventfd_t value); eventfd_t is defined for the second arg(value), but not for fd. Here I redefine those fd fields to `int' type, which also removes the redundant (int) cast. And as the man page stated, we should cast 1 to eventfd_t type for eventfd_write(). Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Acked-by: Huawei Xie <huawei.xie@intel.com>	2015-09-25 14:58:30 +02:00
Huawei Xie	292959c719	vhost: cleanup unix socket rte_vhost_driver_unregister API will remove the listenfd from event list, and then close it. Signed-off-by: Huawei Xie <huawei.xie@intel.com> Signed-off-by: Peng Sun <peng.a.sun@intel.com>	2015-06-30 17:49:08 +02:00
Ouyang Changchun	8b636a50c2	doc: fix doxygen warnings in vhost API Signed-off-by: Changchun Ouyang <changchun.ouyang@intel.com>	2015-06-19 12:11:53 +02:00
Bruce Richardson	ec10e8d24e	vhost: remove inclusion of mbuf header The virtio_net header file includes the mbuf header file, but it does not need to do so as it only uses pointers to the struct rte_mbuf type, and does not use any of the mbuf internals, nor any of the mbuf functions or macros. Therefore the inclusion is unnecessary, and can be replaced by a forward declaration of the mbuf type. Signed-off-by: Bruce Richardson <bruce.richardson@intel.com> Acked-by: Olivier Matz <olivier.matz@6wind.com>	2015-05-11 15:36:37 +02:00
Benoît Canet	45db8927a8	vhost: add hint on how to add or remove device to a data core Let's make sure people will not forget to set and unset VIRTIO_DEV_RUNNING. Signed-off-by: Benoît Canet <benoit.canet@nodalink.com> Acked-by: Huawei Xie <huawei.xie@intel.com>	2015-03-17 12:39:52 +01:00
Huawei Xie	64ab971791	vhost: fix file descriptors naming Previous vhost implementation wrongly name kickfd as callfd and callfd as kickfd. It is functional correct, but causes confusion. Exchange kickfd and callfd to avoid confusion. Signed-off-by: Huawei Xie <huawei.xie@intel.com> Acked-by: Changchun Ouyang <changchun.ouyang@intel.com> Acked-by: Tetsuya Mukawa <mukawa@igel.co.jp>	2015-03-09 12:46:46 +01:00
Huawei Xie	54292e9520	vhost: support ifname for vhost-user for vhost-cuse, ifname is the name of the tap device for vhost-user, ifname is the name of the unix domain socket path Signed-off-by: Huawei Xie <huawei.xie@intel.com> Signed-off-by: Przemyslaw Czesnowicz <przemyslaw.czesnowicz@intel.com> Acked-by: Tetsuya Mukawa <mukawa@igel.co.jp>	2015-02-24 01:38:16 +01:00
Huawei Xie	8f972312b8	vhost: support vhost-user In rte_vhost_driver_register(), vhost unix domain socket listener fd is created and added to polled(based on select) fdset. In rte_vhost_driver_session_start(), fds in the fdset are checked for processing. If there is new connection from qemu, connection fd accepted is added to polled fdset. The listener and connection fds in the fdset are then both checked. When there is message on the connection fd, its callback vserver_message_handler is called to process vhost-user messages. To support identifying which virtio is from which guest VM, we could call rte_vhost_driver_register with different socket path. Virtio devices from same VM will connect to VM specific socket. The socket path information is stored in the virtio_net structure. Signed-off-by: Huawei Xie <huawei.xie@intel.com> Signed-off-by: Przemyslaw Czesnowicz <przemyslaw.czesnowicz@intel.com> Acked-by: Tetsuya Mukawa <mukawa@igel.co.jp>	2015-02-24 01:38:15 +01:00
Ciara Loftus	f5a31522f0	vhost: add interface name for virtio This patch fixes the issue whereby when using userspace vhost ports in the context of vSwitching, the name provided to the hypervisor/QEMU of the vhost tap device needs to be exposed in the library, in order for the vSwitch to be able to direct packets to the correct device. This patch introduces an 'ifname' member to the virtio-net structure which is populated with the tap device name when QEMU is brought up with a vhost device. Signed-off-by: Ciara Loftus <ciara.loftus@intel.com> Signed-off-by: Anthony Fee <anthonyx.fee@intel.com> Acked-by: Huawei Xie <huawei.xie@intel.com>	2014-12-18 22:52:38 +01:00
Huawei Xie	af4f2c5feb	vhost: fix code style Fix alignment issues, lengthy lines, misordered type and other coding style issues. Signed-off-by: Huawei Xie <huawei.xie@intel.com> Acked-by: Neil Horman <nhorman@tuxdriver.com>	2014-11-06 23:12:02 +01:00
Thomas Monjalon	8933dae15c	vhost: add in doc Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>	2014-10-13 19:39:38 +02:00
Huawei Xie	60ddca7654	vhost: coding style fixes Fix serious coding style issues reported by checkpatch. Signed-off-by: Huawei Xie <huawei.xie@intel.com> Acked-by: Changchun Ouyang <changchun.ouyang@intel.com>	2014-10-13 19:16:54 +02:00
Huawei Xie	da74110053	vhost: clean includes Signed-off-by: Huawei Xie <huawei.xie@intel.com> Acked-by: Changchun Ouyang <changchun.ouyang@intel.com>	2014-10-13 19:16:54 +02:00
Huawei Xie	a58f905514	vhost: add private context field priv field could be used to store application specific context. Signed-off-by: Huawei Xie <huawei.xie@intel.com> Acked-by: Changchun Ouyang <changchun.ouyang@intel.com>	2014-10-13 19:16:54 +02:00
Huawei Xie	9eed6bfd2e	vhost: allow to enable or disable features Signed-off-by: Huawei Xie <huawei.xie@intel.com> Acked-by: Changchun Ouyang <changchun.ouyang@intel.com> [Thomas: split patch]	2014-10-13 19:16:54 +02:00
Huawei Xie	7202b0a824	vhost: get available vring entries Signed-off-by: Huawei Xie <huawei.xie@intel.com> Acked-by: Changchun Ouyang <changchun.ouyang@intel.com> [Thomas: split patch]	2014-10-13 19:16:54 +02:00
Huawei Xie	28689ff04d	vhost: rename ops registering function Rename init_virtio_net as rte_vhost_callback_register API. rte_vhost_callback_register register the callbacks called when a vhost device is created and ready to be added to data processing core or is de-actived by guest. Signed-off-by: Huawei Xie <huawei.xie@intel.com> Acked-by: Changchun Ouyang <changchun.ouyang@intel.com>	2014-10-13 19:16:54 +02:00
Huawei Xie	f782576959	vhost: get internal ops when registering vhost_net_device_ops is internal implementation in vhost lib. register_cuse_device will be vhost driver register API. There is no need for it to know the internal vhost ops. Instead, that ops is retrieved in register_cuse_device through get_virtio_net_callbacks. Signed-off-by: Huawei Xie <huawei.xie@intel.com> Acked-by: Changchun Ouyang <changchun.ouyang@intel.com>	2014-10-13 19:16:54 +02:00
Huawei Xie	a4fff3bba8	vhost: enqueue/dequeue burst rte_vhost_enqueue_burst copies host packets to guest. rte_vhost_enqueue_burst will call virtio_dev_rx and virtio_dev_merge_rx respectively depending on whether merge-able feature is negotiated or not in the vhost device. virtio_dev_merge_tx is renamed to rte_vhost_dequeue_burst. rte_vhost_dequeue_burst gets to-be-sent packets from guest. Signed-off-by: Huawei Xie <huawei.xie@intel.com> Acked-by: Changchun Ouyang <changchun.ouyang@intel.com> [Thomas: merged patches]	2014-10-13 19:16:54 +02:00
Huawei Xie	7f456f6d61	vhost: move address translation function Signed-off-by: Huawei Xie <huawei.xie@intel.com> Acked-by: Changchun Ouyang <changchun.ouyang@intel.com> [Thomas: split from a previous patch]	2014-10-13 19:16:04 +02:00
Huawei Xie	c19dc7db15	vhost: move internal structure The structure virtio_net_config_ll is moved to virtio_net.c. It is related to internal virtio device management, so it should not be exposed to other files. Signed-off-by: Huawei Xie <huawei.xie@intel.com> Acked-by: Changchun Ouyang <changchun.ouyang@intel.com>	2014-10-13 19:13:10 +02:00
Huawei Xie	0a739691fc	vhost: remove zero copy memory region generation logic Currently zero copy feature isn't generic as it couples closely with nic. It isn't put in the vhost lib in this version. gpa(guest physical address) to hpa(host physical address) mapping region logic is removed. Signed-off-by: Huawei Xie <huawei.xie@intel.com> Acked-by: Changchun Ouyang <changchun.ouyang@intel.com>	2014-10-13 19:13:10 +02:00
Huawei Xie	68e6490476	vhost: remove switching related logics The following logics will be moved to vhost example: 1. mac learning, which is used to learn the mac address from the first transmitted packet of guest and bind the vhost device to a queue in a pool of VMDQ. 2. VMDQ mac/vlan filter: Each pool the vhost device is bind to is assigned a mac/vlan filter. 3. num_devices is used to specify the maximum vhost devices the nic supports. Signed-off-by: Huawei Xie <huawei.xie@intel.com> Acked-by: Changchun Ouyang <changchun.ouyang@intel.com>	2014-10-13 19:13:10 +02:00
Huawei Xie	5c7a80aec3	vhost: move from examples to dedicated library Those files will be refactored in subsequent patches to form user space vhost library. Makefile and main.h are removed. main.c is renamed to vhost_rxtx.c and will provide vring enqueue/dequeue API. virtio-net.h is renamed to rte_virtio_net.h which is the API header file. Signed-off-by: Huawei Xie <huawei.xie@intel.com> Acked-by: Changchun Ouyang <changchun.ouyang@intel.com> [Thomas: remove from examples Makefile and merge file renaming]	2014-10-13 19:10:09 +02:00

45 Commits