Since commit 59fe5e17d9 ("vhost: propagate set features handling error"),
vhost does not allow to set different features without reset.
The virtio-user driver fails to reset the device in below commit.
To fix, we send the reset message as stopping the device.
Fixes: c12a26ee20 ("net/virtio-user: fix not properly reset device")
Cc: stable@dpdk.org
Reported-by: Lei Yao <lei.a.yao@intel.com>
Reported-by: Tiwei Bie <tiwei.bie@intel.com>
Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
Add checks during build to ensure that all symbols in the EXPERIMENTAL
version map section have __experimental tags on their definitions, and
enable the warnings needed to announce their use. Also add an
ALLOW_EXPERIMENTAL_APIS define to allow individual libraries and files
to declare the acceptability of experimental api usage
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
Create a rte_ethdev_driver.h file and move PMD specific APIs here.
Drivers updated to include this new header file.
There is no update in header content and since ethdev.h included by
ethdev_driver.h, nothing changed from driver point of view, only
logically grouping of APIs. From applications point of view they can't
access to driver specific APIs anymore and they shouldn't.
More PMD specific data structures still remain in ethdev.h because of
inline functions in header use them. Those will be handled separately.
Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Shreyansh Jain <shreyansh.jain@nxp.com>
Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
When live migration is done, for the backup VM, either the virtio
frontend or the vhost backend needs to send out gratuitous RARP packet
to announce its new network location.
This patch enables VIRTIO_NET_F_GUEST_ANNOUNCE feature to support live
migration scenario where the vhost backend doesn't have the ability to
generate RARP packet.
Brief introduction of the work flow:
1. QEMU finishes live migration, pokes the backup VM with an interrupt.
2. Virtio interrupt handler reads out the interrupt status value, and
realizes it needs to send out RARP packet to announce its location.
3. Pause device to stop worker thread touching the queues.
4. Inject a RARP packet into a Tx Queue.
5. Ack the interrupt via control queue.
6. Resume device to continue packet processing.
Signed-off-by: Xiao Wang <xiao.w.wang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
This patch adds dev_pause, dev_resume and inject_pkts APIs to allow
driver to pause the worker threads and inject special packets into
Tx queue. The next patch will be based on this.
Signed-off-by: Xiao Wang <xiao.w.wang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
The virtio_send_command function may be called from app's configuration
routine, but also from an interrupt handler called when live migration
is done on the backup side. So this patch makes control queue
thread-safe first.
Signed-off-by: Xiao Wang <xiao.w.wang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
The max_mtu is kept as zero in case no CRTL channel, which leads
to failure when calling virtio_mtu_set().
Signed-off-by: Zhike Wang <wangzhike@jd.com>
Acked-by: Zhiyong Yang <zhiyong.yang@intel.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
The pointer to the user parameter of the callback registration is
automatically pass to the callback function.
There is no point to allow changing this user parameter by a caller.
That's why this parameter is always set to NULL by PMDs and set only
in ethdev layer before calling the callback function.
The history is that the user parameter was initially used
by the callback implementation to pass some information
between the application and the driver:
c1ceaf3ad0 ("ethdev: add an argument to internal callback function")
Then a new parameter has been added to leave the user parameter
to its standard usage of context given at registration:
d6af1a13d7 ("ethdev: add return values to callback process API")
The NULL parameter in the internal callback processing function
is now removed. It makes clear that the callback parameter is user
managed and opaque from a DPDK point of view.
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Tested-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
RTE_VIRTIO_VPMD_RX_BURST and RTE_VIRTIO_VPMD_RX_REARM_THRESH
have been defined and used in virtio_rxtx_simple.h, but are
defined again in virtio_rxtx_simple_*.c. It just happens to
work. So remove the redundant definitions from the *.c files.
Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
VIRTIO_NET_CTRL_MAC_ADDR_SET is defined two times in
virtqueue.h, the second one is obviously not wanted.
Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
The vector Rx will be broken if backend has consumed all
the descs in the avail ring before the device is started.
Because in current implementation, vector Rx will return
immediately without refilling the avail ring if the used
ring is empty. So we have to refill the avail ring after
flushing the elements in the used ring for vector Rx.
Besides, vector Rx has a different ring layout assumption
and mbuf management. So we need to handle it differently.
Fixes: d8227497ec ("net/virtio: flush Rx queues on start")
Cc: stable@dpdk.org
Reported-by: Antonio Fischetti <antonio.fischetti@intel.com>
Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Tested-by: Antonio Fischetti <antonio.fischetti@intel.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
The rx_queues and tx_queues fields of the data structure points to a
struct virtnet_rx or virtnet_tx. Casting it to a virtqueue is an error.
It does not trigger any bug because pointer is not dereferenced inside
the function, but it can become a bug if this code is copy/pasted and
vq is dereferenced.
Fixes: 01ad44fd37 ("net/virtio: split Rx/Tx queue")
Cc: stable@dpdk.org
Signed-off-by: Didier Pallard <didier.pallard@6wind.com>
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
DPDK has already the definition of Ethernet numeric link speeds in Mbps
in the file Rte_ethdev.h, it is unnecessary to rededine virtio specific
link speeds macros again.
Signed-off-by: Zhiyong Yang <zhiyong.yang@intel.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
Replace the BSD license header with the SPDX tag for files
with only an Intel copyright on them.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
In function eth_virtio_dev_init(), dynamic memory stored
in "eth_dev->data->mac_addrs" variable and it is not freed
when function return,
this is a possible memory leak.
Fixes: 8ced1542f7 ("net/virtio: eth_dev->data->mac_addrs is not freed")
Cc: stable@dpdk.org
Signed-off-by: Pengzhen Liu <liupengzhen3@huawei.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
When running l3fwd-power to test virtio rxq interrupt using vfio
pci noiommu mode, startup fails. In the function virtio_read_caps,
the code if (flags & PCI_MSIX_ENABLE) intends to double check
if vfio msix is enabled or not. However, it is not enable at that
time. So use_msix is assigned to "0", not "1", which causes the
failure of configuring rxq intr in l3fwd-power.
This patch adds the function "vtpci_msix_detect" to detect the status
of msix when interrupt changes happen.
In the meanwhile, virtio_intr_enable/disable are introduced to wrap
rte_intr_enable/disable to enhance the ability to detect msix.
use_msix can indicate three different msix status by:
VIRTIO_MSIX_NONE (0)
VIRTIO_MSIX_DISABLED (1)
VIRTIO_MSIX_ENABLED (2)
Fixes: cb482cb3a3 ("net/virtio: fix MAC address read")
Cc: stable@dpdk.org
Signed-off-by: Zhiyong Yang <zhiyong.yang@intel.com>
Acked-by: Jianfeng Tan <jianfeng.tan@intel.com>
Acked-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Move the vdev bus from lib/librte_eal to drivers/bus.
As the crypto vdev helper function refers to data structure
in rte_vdev.h, so we move those helper function into drivers/bus
too.
Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
For virtual device, the rte_intr_handle struct is
initialized by the virtual device driver, including
the event fd assignment. If the event fd need to be
read for clean, an argument is required for the proper
event fd read.
This patch adds efd_counter_size in rte_intr_handle
struct to tell the rx interrupt process the read size.
Signed-off-by: Jingjing Wu <jingjing.wu@intel.com>
Reviewed-by: Jianfeng Tan <jianfeng.tan@intel.com>
Renamed data type from phys_addr_t to rte_iova_t.
Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Rename buf_physaddr to buf_iova.
Keep the deprecated name in an anonymous union to avoid breaking
the API.
Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
The struct rte_memzone field .phys_addr is renamed to .iova.
The deprecated name is kept in an anonymous union to avoid breaking
the API.
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
The memzone header is often included without good reason.
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
The PCI lib defines the types and methods allowing to use PCI elements.
The PCI bus implements a bus driver for PCI devices by constructing
rte_bus elements using the PCI lib.
Move the relevant code out of the EAL to its expected place.
Libraries, drivers, unit tests and applications are updated to use the
new rte_bus_pci.h header when necessary.
Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com>
Some devices may not support or fail setting VLAN offload
configuration based on dynamic circumstances so the
vlan_offload_set_t vector is modified to return an int so
the caller can determine success or not.
rte_eth_dev_set_vlan_offload is updated to return the
value provided by the vector when called along with restoring
the original offload configs on failure.
Existing vlan_offload_set_t vectors are modified to return
an int. Majority of cases return 0 but a few that actually
can fail now return their failure codes.
Finally, a vlan_offload_set_t vector is added to virtio
to facilitate dynamically turning VLAN strip on or off.
Signed-off-by: David Harton <dharton@cisco.com>
Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
This flag is not necessary at the ether layer anymore.
Buses are able to advertise their hotplug support. The ether layer can
rely upon this capability instead of a special flag.
Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com>
Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: John McNamara <john.mcnamara@intel.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
In the function virtqueue_enqueue_xmit(), when can_push is true,
vtnet_hdr_size is added to pkt_len by calling rte_pktmbuf_prepend.
which is wrong for pkt stats, virtio header length should be subtracted
before calling stats function.
Fixes: 58169a9c81 ("net/virtio: support Tx checksum offload")
Cc: stable@dpdk.org
Signed-off-by: Zhiyong Yang <zhiyong.yang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
Report an error message if the flag O_NONBLOCK setting fails,
then return from function.
Coverity issue: 143439
Fixes: ef53b60300 ("net/virtio-user: support LSC")
Cc: stable@dpdk.org
Signed-off-by: Sebastian Basierski <sebastianx.basierski@intel.com>
Acked-by: Jianfeng Tan <jianfeng.tan@intel.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
After starting a device, the driver shouldn't deliver the
packets that already existed before the device is started
to applications. Otherwise it will lead to incorrect packet
collection for port state. This patch fixes this issue by
flushing the Rx queues when starting the device.
Fixes: a85786dc81 ("virtio: fix states handling during initialization")
Cc: stable@dpdk.org
Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
Reviewed-by: Jens Freimann <jfreimann@redhat.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
The list of libraries in LDLIBS was generated from the DEPDIRS-xyz
variable. This is valid when the subdirectory name match the library
name, but it's not always the case, especially for PMDs.
The patches removes this feature and explicitly adds the proper
libraries in LDLIBS.
Some DEPDIRS-xyz variables become useless, remove them.
Reported-by: Gage Eads <gage.eads@intel.com>
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Reviewed-by: Gage Eads <gage.eads@intel.com>
The stats_get dev op API doesn't include return value, so PMD cannot
return an error in case of failure at stats getting process time.
Since PCI devices can be removed and there is a time between the
physical removal to the RMV interrupt, the user may get invalid stats
without any indication.
This patch changes the stats_get API return value to be int instead of
void.
All the net PMDs stats_get dev ops are adjusted by this patch.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
The compilation with gcc-6.3.0 and EXTRA_CFLAGS=-Og gives the following
error:
CC virtio_rxtx.o
virtio_rxtx.c: In function ‘virtio_rx_offload’:
virtio_rxtx.c:680:10: error: ‘csum’ may be used uninitialized in
this function [-Werror=maybe-uninitialized]
csum = ~csum;
~~~~~^~~~~~~
The function rte_raw_cksum_mbuf() may indeed return an error, and
in this case, csum won't be initialized. Fix it by initializing csum
to 0.
Fixes: 96cb671193 ("net/virtio: support Rx checksum offload")
Cc: stable@dpdk.org
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Fix calling strncpy with the a maximum size equal of destination
array size.
Coverity issue: 140732
Fixes: e3b434818b ("net/virtio-user: support kernel vhost")
Cc: stable@dpdk.org
Signed-off-by: Sebastian Basierski <sebastianx.basierski@intel.com>
Acked-by: Jianfeng Tan <jianfeng.tan@intel.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
To use pointer instead of memcpy can save many cycles in the funciton
virtio_send_command.
Signed-off-by: Zhiyong Yang <zhiyong.yang@intel.com>
Acked-by: Fan Zhang <roy.fan.zhang@intel.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
Fixed a comment in struct virtionet_ctl, referring to the ring type
Signed-off-by: Jay Zhou <jianjay.zhou@huawei.com>
Reviewed-by: Kirill Rybalchenko <kirill.rybalchenko@intel.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
The unscrutinized value may be incorrectly assumed to be within a certain
range by later operations.
In vhost_user_read: An unscrutinized value from an untrusted source used
in a trusted context - the value of sz_payload may be harmfull and we need
limit them to the max value of payload.
Coverity issue: 139601
Fixes: 6a84c37e39 ("net/virtio-user: add vhost-user adapter layer")
Cc: stable@dpdk.org
Signed-off-by: Daniel Mrzyglod <danielx.t.mrzyglod@intel.com>
Acked-by: Jianfeng Tan <jianfeng.tan@intel.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
The simple Rx handler is selected even if Rx checksum offload is
requested by the application, but this handler does not support
offloads. This results in broken received packets (no checksum flag but
invalid checksum in the mbuf data).
Disable the simple Rx handler in that case.
Fixes: 96cb671193 ("net/virtio: support Rx checksum offload")
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
Split use_simple_rxtx into use_simple_rx and use_simple_tx,
and ensure that only use_simple_tx is updated when txq flags
forces to use the standard Tx handler.
This change is also useful for next commit (disable simple Rx
path when Rx checksum is requested).
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
Since commit f27769f796 ("mk: require SSE4.2 support on all x86
platforms"), SSE4.2 is a requirement when compiling on x86 platforms.
We can remove this check in the virtio driver.
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
The selection of Rx/Tx handlers is done at several places,
group them in one function set_rxtx_funcs().
The update of hw->use_simple_rxtx is also rationalized:
- initialized to 1 (prefer simple path)
- in dev configure or rx/tx queue setup, if something prevents from
using the simple path, change it to 0.
- in dev start, set the handlers according to hw->use_simple_rxtx.
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
In rx/tx queue setup functions, some code is executed only if
use_simple_rxtx == 1. The value of this variable can change depending on
the offload flags or sse support. If Rx queue setup is called before Tx
queue setup, it can result in an invalid configuration:
- dev_configure is called: use_simple_rxtx is initialized to 0
- rx queue setup is called: queues are initialized without simple path
support
- tx queue setup is called: use_simple_rxtx switch to 1, and simple
Rx/Tx handlers are selected
Fix this by postponing a part of Rx/Tx queue initialization in
dev_start(), as it was the case in the initial implementation.
Fixes: 48cec290a3 ("net/virtio: move queue configure code to proper place")
Cc: stable@dpdk.org
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
The mbuf->port was was not properly set for the first received
mbufs. Fix this by setting it in virtqueue_enqueue_recv_refill_simple(),
which is used to enqueue the first mbuf in the ring.
The function virtio_rxq_rearm_vec(), which is used to rearm the ring
with new mbufs, is correct and does not need to be updated.
Fixes: cab0461234 ("virtio: fill Rx avail ring with blank mbufs")
Cc: stable@dpdk.org
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
On error, we should log with error level.
Fixes: 9f4f2846ef ("virtio: support vlan filtering")
Fixes: 86d59b2146 ("net/virtio: support LRO")
Fixes: 96cb671193 ("net/virtio: support Rx checksum offload")
Cc: stable@dpdk.org
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
This reverts
commit 4dab342b75 ("net/virtio: do not falsely claim to do IP checksum").
The description of rxmode->hw_ip_checksum is:
hw_ip_checksum : 1, /**< IP/UDP/TCP checksum offload enable. */
Despite its name, this field can be set by an application to enable L3
and L4 checksums. In case of virtio, only L4 checksum is supported and
L3 checksums flags will always be set to "unknown".
Fixes: 4dab342b75 ("net/virtio: do not falsely claim to do IP checksum")
Cc: stable@dpdk.org
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
This reverts
commit 701a64622c ("net/virtio: do not claim to support LRO")
Setting rxmode->enable_lro is a way to tell the host that the guest is
ok to receive tso packets. From the guest point of view, it is like
enabling LRO on a physical driver.
Fixes: 701a64622c ("net/virtio: do not claim to support LRO")
Cc: stable@dpdk.org
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
Acccording to the vhost-user spec [0], client must start ring
upon receiving a kick (that is, detecting that file descriptor
is reachable) on the descriptor specified by VHOST_USER_SET_VRING_KICK.
The code sends a kick to the rx queue. It is missing sending a
kick for the tx queue. This patch is to add the missing code to
comply with the spec.
[0]: https://fossies.org/linux/qemu/docs/specs/vhost-user.txt
Signed-off-by: Steven Luong <sluong@cisco.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
To use macro instead of magic number in order to enhance code
readability.
Signed-off-by: Zhiyong Yang <zhiyong.yang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
Signed-off-by: Zhiyong Yang <zhiyong.yang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
Extend port_id definition from uint8_t to uint16_t in lib and drivers
data structures, specifically rte_eth_dev_data. Modify the APIs,
drivers and app using port_id at the same time.
Fix some checkpatch issues from the original code and remove some
unnecessary cast operations.
release_17_11 and deprecation docs have been updated in this patch.
Signed-off-by: Zhiyong Yang <zhiyong.yang@intel.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
When use rte_eth_dev_configure() to enable rx queue interrupt for virtio
devices, virtio_init_device() isn't called to set up the interrupt
environment, which causes rx queue interrupt setup failed. This patch is
to fix this issue.
Fixes: 26b683b4f7 ("net/virtio: setup Rx queue interrupts")
Cc: stable@dpdk.org
Signed-off-by: Jiayu Hu <jiayu.hu@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
When virtio-net devices are bound to uio_pci_generic, we get
the wrong mac addr by virtio PMD. The wrong mac addr is a
addr that is 4-byte left shift of the correct addr.
It's a regression bug introduced by the cleanup patch below.
The condition of if we set use_msix should be if msix is
actually enabled. Only to check if there is a capability list
is not enough. For example, binding a transitional device
to uio_pci_device would trigger the wrong assignment of use_msix.
To correct that, we also check the flags of msix capability to
make sure it's enabled.
Fixes: ee1843bd89 ("net/virtio: remove redundant MSI-X detection")
Cc: stable@dpdk.org
Reported-by: Vipin Varghese <vipin.varghese@intel.com>
Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
The current virtio supports Transmit Segmentation Offload, but
does not really support Large Receive Offload. The driver was confusing
the two offloads.
Fixes: 86d59b2146 ("net/virtio: support LRO")
Cc: stable@dpdk.org
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
The virtio driver is confused about the meaning of the ip_checksum
flag. In DPDK, ip_checksum means the hardware is capable of checking
the Layer 3 IP checksum. But KVM/QEMU does not do that. The flag
VIRTIO_NET_F_GUEST_CSUM controls whether the receive side does
Layer 4 (TCP/UDP) checksum offload.
Fix by erroring out any requests to do IP checksum.
Fixes: 96cb671193 ("net/virtio: support Rx checksum offload")
Cc: stable@dpdk.org
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Replace the incorrect reference to "Cavium Networks", "Cavium Ltd"
company name with correct the "Cavium, Inc" company name in
copyright headers.
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Since "rte_eal_dev_init()" has been removed, the comment referred to
it should be modified simultaneously.
Fixes: 9721b4d543 ("eal: remove unused device init function")
Signed-off-by: Yong Wang <wang.yong19@zte.com.cn>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
The rte_eth_dev.data pointer is set to a reference to a static table.
Attempting to rte_free() it leads to a panic. For example, the
following commands result in a panic if run in testpmd
testpmd> port attach virtio_user0,path=/dev/vhost-net,iface=test0
testpmd> port stop 2
testpmd> port close 2
testpmd> port detach 2
Fixes: ce2eabdd43 ("net/virtio-user: add virtual device")
Cc: stable@dpdk.org
Signed-off-by: Allain Legacy <allain.legacy@windriver.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Change the rte_eth_dev_callback_process function to return int,
and add a void *ret_param parameter.
The new parameter is used by ixgbe and i40e instead of abusing
the user data of the callback.
Signed-off-by: Bernard Iremonger <bernard.iremonger@intel.com>
Zero the whole memory zone instead of the first few bytes.
Fixes: c1f86306a0 ("virtio: add new driver")
Cc: stable@dpdk.org
Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Instead of many PMD define their own macro, define a generic one in
ethdev and use that in PMDs.
Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Allain Legacy <allain.legacy@windriver.com>
Acked-by: Wenzhuo Lu <wenzhuo.lu@intel.com>
Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
remove __rte_unused instances that are not required.
Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Allain Legacy <allain.legacy@windriver.com>
Acked-by: Wenzhuo Lu <wenzhuo.lu@intel.com>
vfio is the kernel framework used by the vfio-pci kernel driver.
DPDK drivers do not rely solely on vfio, but rather on vfio-pci to gain
access to pci resources.
Fixes: 0880c40113 ("drivers: advertise kmod dependencies in pmdinfo")
Cc: stable@dpdk.org
Signed-off-by: David Marchand <david.marchand@6wind.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Some customers find adding MAC addr to VF sometimes can fail,
but it is still stored in dev->data->mac_addrs[ ]. So this
can lead to some errors that assumes the non-zero entry in
dev->data->mac_addrs[ ] is valid.
Following acknowledgements are from specific NIC PMD
maintainer for their managing part.
This patch changes the ethdev internal API, it should not be
backported to a stable/LTS release so far.
Fixes: af75078fec ("first public release")
Signed-off-by: Wei Dai <wei.dai@intel.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Acked-by: Wenzhuo Lu <wenzhuo.lu@intel.com>
The PCI code will move to the bus drivers directory.
Rename functions from rte_eal_pci_ to rte_pci_
to prepare the move of the driver out of EAL.
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
This commit fixs segment fault when rte_eth_dev_close() is called on
a virtio dev more than once. Assigning zero after free to avoids
freed memory to be accessed again.
Fixes: 69c80d4ef8 ("net/virtio: allocate queue at init stage")
Cc: stable@dpdk.org
Signed-off-by: Huanle Han <hanxueluo@gmail.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Segfault happens when using virtio-user after commit 7f0a669e7b
("ethdev: add allocation helper for virtual drivers").
It's due to we use ethdev->device to recognize physical devices,
but after above commit, this field is also filled for virtual
devices. Then we obtain the wrong pci_dev pointer and accessing
its field when copying pci info results in segfault.
To fix it, we use hw->virtio_user_dev to differentiate physical
devices from virtual devices.
Fixes: 6a7c0dfcdf ("net/virtio: do not depend on PCI device of ethdev")
Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
The virtio port link status will always be DOWN:
The commit aa9f060617 ("net/virtio: fix link status always being up")
introduces a flag to help checking the status. If this flag is not set,
status will be always down. However, in dev start, this flag is set
after link status update, then we miss the chance to change the status
to UP in dev start.
To fix this bug, we simply move the link status update after the flag
setting so that the status can be correctly updated.
Fixes: aa9f060617 ("net/virtio: fix link status always being up")
Cc: stable@dpdk.org
Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
As we already change to use capability list to detect MSI-X, remove
the redundant MSI-X detection in legacy devices.
Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
LSC flag is set in several places, but only the last one takes effect;
so we remove the redundant ones and just keep the last one.
This also fixes the bug that dev_flags being overwritten by
rte_eth_copy_pci_info(), which resets it to 0 unconditionally.
Cc: stable@dpdk.org
Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
The field, use_msix, in struct virtio_hw is not updated for modern
device, and is always zero. And now we depend on the status feature
and MSI-X to report LSC support (which is also not a correct
behavior). As a result, LSC is always disabled for modern devices.
To fix this, we just recognize MSI-X capability when going through
capability list, and update the info in virtio.
Fixes: 6ba1f63b5a ("virtio: support specification 1.0")
Cc: stable@dpdk.org
Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Current virtio_dev_stop only disables interrupt and marks link down,
When it is invoked, tx/rx traffic flows still work. This is a strange
behavior. The patch supports the switch of flow.
Signed-off-by: Zhiyong Yang <zhiyong.yang@intel.com>
virtio-user cannot work on 32-bit system as higher 32-bit of the
addr field (64-bit) in the desc is filled with non-zero value
which should not happen for a 32-bit system.
In case of virtio-user, we use buf_addr of mbuf to fill the
virtqueue desc addr. This is a regression bug. For 32-bit system,
the first 4 bytes of mbuf is buf_addr, with following 8 bytes for
buf_phyaddr. With below wrong definition, both buf_addr and lower
4 bytes buf_phyaddr are obtained to fill the virtqueue desc.
#define VIRTIO_MBUF_ADDR(mb, vq) \
(*(uint64_t *)((uintptr_t)(mb) + (vq)->offset))
Fixes: 25f80d1087 ("net/virtio: fix packet corruption")
Cc: stable@dpdk.org
Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Previously, we miss to set intr_handle->fd which will be used as
target file for epoll to check LSC.
As a result, stdin (0) is used and intr thread keeps busy whenever
data comes from stdin.
To fix this, we use vhostfd as the target file for epoll to check
the link status change events. And we move intr_handle initialization
after vhost backend settup to make sure vhostfd is initialized.
Fixes: 35c4f85548 ("net/virtio-user: support to report net status")
Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
The virtio port link status will always be UP, even the port is stopped:
testpmd> port stop 0
Stopping ports...
Checking link statuses...
Port 0 Link Up - speed 10000 Mbps - full-duplex
Done
The link status is queried by link_update callback when LSC is disabled.
Which in turn queries the "status" field. However, the "status" is
read-only. I couldn't think of some proper ways to change the status
without doing device reset.
Instead of doing (the heavy) reset at stop, this patch introduced a flag,
which is set to 1 and 0 on start and stop, respectively. When it's set to
0, the link status is set to DOWN unconditionally.
Fixes: a85786dc81 ("virtio: fix states handling during initialization")
Cc: stable@dpdk.org
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
We only enabled LSC when using vhost-user as the backend, but it is
reported even when using vhost-kernel as the backend.
Fix it by only reportting LSC support when using vhost-user as the
backend.
Fixes: 35c4f85548 ("net/virtio-user: support to report net status")
Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
The feature negotiation in virtio-user is proven to be broken,
which results in device initialization failure.
Originally, we get features from vhost backend, and remove those
that are not supported. But when new feature is added, for example,
VIRTIO_NET_F_MTU, we fail to remove this new feature. Then, this
new feature will be negotiated, as both frontend and backend claim
to support this feature.
To fix it, we add a macro to record supported features, as a filter
to remove newly added features.
Fixes: 37a7eb2ae8 ("net/virtio-user: add device emulation layer")
Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
According to spec, we should write virtqueue index into the notify
address, rather than 1. Besides, some HW backend may rely on the data
written to identify which queue need to serve.
Fixes: 6ba1f63b5a ("virtio: support specification 1.0")
Cc: stable@dpdk.org
Signed-off-by: Xiao Wang <xiao.w.wang@intel.com>
Reviewed-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
This is a preparation to embed the generic rte_device into the rte_eth_dev
also for virtual devices.
Signed-off-by: Jan Blunck <jblunck@infradead.org>
Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com>
Secondary process doesn't properly attach to the rte_eth_device
initialized by the primary process.
Accessing device from secondary process (e.g. via rte_eth_rx_burst),
causes process to crash. because rte_eth_dev_data is not properly set.
The issue was flood by
'commit 7f95f78a8a ("ethdev: clear data when allocating device")'
which now clears rte_eth_dev_data entry.
For pci devices the struct is initialized by rte_eth_dev_pci_probe
->eth_dev_attach_secondary().
However, for virtio-user virtio_user_pmd_probe() is called instead of
rte_eth_dev_pci_probe().
The fix is to call rte_eth_dev_attach_secondary(), for secondary
process, from virtio_user_pmd_probe.
Fixes: 7f95f78a8a ("ethdev: clear data when allocating device")
Cc: stable@dpdk.org
Signed-off-by: Ami Sabo <amis@radware.com>
The patch change the prototype of callback function
(rte_intr_callback_fn) by removing the unnecessary parameter.
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Now that the m->next pointer and m->nb_segs is expected to be set (to
NULL and 1 respectively) after a mempool_get(), we can avoid to write them
in the Rx functions of drivers.
Only some drivers are patched, it's not an exhaustive patch. It gives
the idea to do the same in other drivers.
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Document the function and make it public, since it is used at several
places in the drivers. The old one is marked as deprecated.
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
So far, virtio-user with vhost-user as the backend can only support
client mode. So when vhost user backend is down, i.e., unix socket
connection is broken, the connection cannot be re-connected. We will
forcely set the link state to be down.
Note: virtio-user with vhost-kernel as the backend still cannot
support lsc now as we fail to find a way to monitor the backend, tap
device, up/down events.
Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Originally, we did not report support of VIRTIO_NET_F_STATUS.
This feature is not reported by vhost backend, instead, it
is added/removed by QEMU in virtio PCI case.
We report the support of this feature so that following patch
will depend on this feature to enable LSC interrupt.
Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
For rxq interrupt, the device (backend driver) will notify driver
through callfd. Each virtqueue has a callfd. To keep compatible
with the existing framework, we will give these callfds to
interrupt thread for listening for interrupts.
Before that, we need to allocate intr_handle, and fill callfds
into it so that driver can use it to set up rxq interrupt mode.
Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Originally, eventfd is opened when initializing each vq; and gets closded
in virtio_user_stop_device().
To make it possible to initialize intr_handle struct in init() in following
patch, we put the open() of all eventfds into init(); and put the close()
into uninit().
Suggested-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
This patch adds a new option 'iface' to change the interface name of
tap device with vhost-kernel as backend.
Signed-off-by: Wenfeng Liu <liuwf@arraynetworks.com.cn>
Reviewed-by: Jianfeng Tan <jianfeng.tan@intel.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
This patch implements support for the Virtio MTU feature.
When negotiated, the host shares its maximum supported MTU,
which is used as initial MTU and as maximum MTU the application
can set.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
The link state change interrupt can only be configured if the virtio device
supports MSIX. Prior to this change the writing of the vector to the PCI
config space was causing it to overwrite the initial part of the MAC
address since the MSIX vector is not in the config space and is occupied by
the MAC address.
This has been reproduced in Virtual Box (v5.0.30.r112061) in Windows 7.
Fixes: 954ea11540 ("virtio: do not report link state feature unless available")
Cc: stable@dpdk.org
Signed-off-by: Matt Peters <matt.peters@windriver.com>
Signed-off-by: Allain Legacy <allain.legacy@windriver.com>
virtio-user limits the qeueue number to 8 but provides no limit
check against the queue number input from user. If a bigger queue
number (> 8) is given, there is an overflow issue. Doing a sanity
check could avoid it.
Fixes: 37a7eb2ae8 ("net/virtio-user: add device emulation layer")
Cc: stable@dpdk.org
Signed-off-by: Wenfeng Liu <liuwf@arraynetworks.com.cn>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
The valid tap file descriptor range should be equal or greater
than zero instead of non-zero
Fixes: e3b434818b ("net/virtio-user: support kernel vhost")
Cc: stable@dpdk.org
Signed-off-by: Wenfeng Liu <liuwf@arraynetworks.com.cn>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
The minor change aims to remove the redundant computing and make
it easier to understand the code.
Signed-off-by: Zhiyong Yang <zhiyong.yang@intel.com>
Before this patch, the management of dependencies between directories
had several issues:
- the generation of .depdirs, done at configuration is slow: it can take
more than one minute on some slow targets (usually ~10s on a standard
PC without -j).
- for instance, it is possible to express a dependency like:
- app/foo depends on lib/librte_foo
- and lib/librte_foo depends on app/bar
But this won't work because the directories are traversed with a
depth-first algorithm, so we have to choose between doing 'app' before
or after 'lib'.
- the script depdirs-rule.sh is too complex.
- we cannot use "make -d" for debug, because the output of make is used for
the generation of .depdirs.
This patch moves the DEPDIRS-* variables in the upper Makefile, making
the dependencies much easier to calculate. A DEPDIRS variable is still
used to process library dependencies in LDLIBS.
After this commit, "make config" is almost immediate.
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Tested-by: Robin Jarry <robin.jarry@6wind.com>
Tested-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
The chosen fake capability (10G) is consistent with the reported
link speed in virtio_dev_link_update():
link.link_speed = SPEED_10G;
The feature is not marked in doc/guides/nics/features/virtio.ini
because it is only a fake value.
Signed-off-by: Ido Barnea <ibarnea@cisco.com>
[Thomas: comments added]
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
When any layout is used, the header is stored in the head room of mbuf.
mbuf is allocated and filled by user, means there is no gurateen the
header is all zero for non TSO case. Therefore, we have to do the reset
by ourself:
memest(hdr, 0, head_size);
The memset has two impacts on performance:
- memset could not be inlined, which is a bit costly.
- more importantly, it touches the mbuf, which could introduce severe
cache issues as described by former patch.
Similiary, we could do the same trick: reset just when necessary, when
the corresponding field is already 0, which is likely true for a simple
l2 forward case. It could boost the performance up to 20+% in micro
benchmarking.
Cc: stable@dpdk.org
Cc: Maxime Coquelin <maxime.coquelin@redhat.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
TSO is now enabled, but it's not actually being used by default in a
simple L2 forward mode. In such case, we have to zero the virtio net
headers, to inform the vhost backend that no offload is being used:
hdr->csum_start = 0;
hdr->csum_offset = 0;
hdr->flags = 0;
hdr->gso_type = 0;
hdr->gso_size = 0;
hdr->hdr_len = 0;
Such writes could be very costly; it introduces severe cache issues:
The above operations introduce cache write for each packet, which
stalls the read operation from the vhost backend.
The fact that virtio net header is initiated to zero in PMD driver
init stage means that these costly writes are unnecessary and could
be avoided:
if (hdr->csum_start != 0)
hdr->csum_start = 0;
And that's what the macro ASSIGN_UNLESS_EQUAL does. With this, the
performance drop introduced by TSO enabling is recovered: it could
be up to 20% in micro benchmarking.
Fixes: 58169a9c81 ("net/virtio: support Tx checksum offload")
Fixes: 696573046e ("net/virtio: support TSO")
Cc: stable@dpdk.org
Cc: Olivier Matz <olivier.matz@6wind.com>
Cc: Maxime Coquelin <maxime.coquelin@redhat.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Olivier Matz <olivier.matz@6wind.com>
Value returned from malloc is not checked for errors before being used.
This patch fixes following coverity issue.
static struct vhost_memory_kernel *
prepare_vhost_memory_kernel(void)
{
...
vm = malloc(sizeof(struct vhost_memory_kernel) +
max_regions *
sizeof(struct vhost_memory_region));
...
>>> CID 140744: (NULL_RETURNS)
>>> Dereferencing a null pointer "vm".
mr = &vm->regions[k++];
Coverity issue: 140744
Fixes: e3b434818b ("net/virtio-user: support kernel vhost")
Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
The vtpci_ops assignment needs the 'hw->port_id' as an input parameter.
That said, we should set 'hw->port_id' firstly, then do the vtpci_ops
assignment, while the code does reversely. That would result to a crash
when more than one virtio devices are used, because we keep assigning
proper vtpci_ops to virtio_hw_internal[0]->vtpci_ops, leaving the pointer
for other ports being NULL.
Reverse the order fixes this issue.
Fixes: 9470427c88 ("net/virtio: do not store PCI device pointer at shared memory")
Cc: stable@dpdk.org
Reported-by: Lei Yao <lei.a.yao@intel.com>
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Replace the raw I/O device memory read/write access with eal
abstraction for I/O device memory read/write access to fix
portability issues across different architectures.
Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Some virtual pmds report a different name than the vdev driver name
registered in eal.
While it does not hurt, let's try to be consistent.
Signed-off-by: David Marchand <david.marchand@6wind.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
When CONFIG_RTE_VIRTIO_USER is disabled (default on FreeBSD),
the virtio driver cannot be compiled:
librte_pmd_virtio.a(virtio_ethdev.o): In function `eth_virtio_dev_init':
(.text+0x1eba): undefined reference to `virtio_user_ops'
Reported-by: Andrew Rybchenko <arybchenko@solarflare.com>
Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
When the virtio PMD is used on top of a vhost that does not support
offloads, Rx offload capabilities are still advertised by
virtio_dev_info_get(). But if an application tries to start the PMD with
Rx offloads enabled (rxmode.hw_ip_checksum = 1), the initialization of
the device will fail with -ENOTSUP and the following log:
rx ip checksum not available on this host
This patch fixes the Rx offload capabilities returned by
virtio_dev_info_get() to be consistent with features advertised by the
host.
Fixes: 96cb671193 ("net/virtio: support Rx checksum offload")
Fixes: 86d59b2146 ("net/virtio: support LRO")
Cc: stable@dpdk.org
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
When closing virtio devices, close eventfds, free the struct to
store queue/irq mapping.
Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Tested-by: Lei Yao <lei.a.yao@intel.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
When virtio devices get stopped, tell the kernel to unbind the
mapping between interrupts and eventfds.
Note: it behaves differently from other NICs which close eventfds,
free struct. In virtio, we do those things when close device in
following patch.
Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Tested-by: Lei Yao <lei.a.yao@intel.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
This patch mainly allocates structure to store queue/irq mapping,
and configure queue/irq mapping down through PCI ops. It also creates
eventfds for each Rx queue and tell the kernel about the eventfd/intr
binding.
Note: So far, we hard-code 1:1 queue/irq mapping (each rx queue has
one exclusive interrupt), like this:
vec 0 -> config irq
vec 1 -> rxq0
vec 2 -> rxq1
...
which means, the "vectors" option of QEMU should be configured with
a value >= N+1 (N is the number of the queue pairs).
Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Tested-by: Lei Yao <lei.a.yao@intel.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
This patch implements interrupt enable/disable functions for each
Rx queue. And we rely on flags of avail queue as the hint for virtio
device to interrupt virtio driver or not.
Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Tested-by: Lei Yao <lei.a.yao@intel.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Add handler in virtio_pci_ops to set queue/irq bind.
Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Tested-by: Lei Yao <lei.a.yao@intel.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Under interrupt mode, rx_descriptor_done is used as an indicator
for applications to check if some number of packets are ready to
be received.
This patch enables this by checking used ring's local consumed idx
with shared (with backend) idx.
Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Tested-by: Lei Yao <lei.a.yao@intel.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
We need to define a prototype for such wrapper, which makes thing
too complicated. Remove wrapper and call set_config_irq directly.
Suggested-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Tested-by: Lei Yao <lei.a.yao@intel.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
The LSC flag is decided according to if VIRTIO_NET_F_STATUS feature
is negotiated. Copy the PCI info after the judgement will rewrite
the correct result.
Fixes: 198ab33677 ("net/virtio: move device initialization in a function")
CC: stable@dpdk.org
Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Tested-by: Lei Yao <lei.a.yao@intel.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
With vhost kernel, to enable multiqueue, we need backend device
in kernel support multiqueue feature. Specifically, with tap
as the backend, as linux/Documentation/networking/tuntap.txt shows,
we check if tap supports IFF_MULTI_QUEUE feature.
And for vhost kernel, each queue pair has a vhost fd, and with a tap
fd binding this vhost fd. All tap fds are set with the same tap
interface name.
Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
When used with vhost kernel backend, we can offload at both directions.
- From vhost kernel to virtio_user, the offload is enabled so that
DPDK app can trust the flow is checksum-correct; and if DPDK app
sends it through another port, the checksum needs to be
recalculated or offloaded. It also applies to TSO.
- From virtio_user to vhost_kernel, the offload is enabled so that
kernel can trust the flow is L4-checksum-correct, no need to verify
it; if kernel will consume it, DPDK app should make sure the
l3-checksum is correctly set.
Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
This patch add support vhost kernel as the backend for virtio_user.
Three main hook functions are added:
- vhost_kernel_setup() to open char device, each vq pair needs one
vhostfd;
- vhost_kernel_ioctl() to communicate control messages with vhost
kernel module;
- vhost_kernel_enable_queue_pair() to open tap device and set it
as the backend of corresonding vhost fd (that is to say, vq pair).
Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Add a struct virtio_user_backend_ops to abstract three kinds of backend
operations:
- setup, create the unix socket connection;
- send_request, sync messages with backend;
- enable_qp, enable some queue pair.
Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
To support vhost kernel as the backend of net_virtio_user in coming
patches, we move vhost_user specific structs and macros into
vhost_user.c, and only keep common definitions in vhost.h.
Besides, remove VHOST_USER_MQ feature check.
Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
virtio_user is not properly reset when users call vtpci_reset(),
as it ignores VIRTIO_CONFIG_STATUS_RESET status in
virtio_user_set_status().
This might lead to initialization failure as it starts to re-init
the device before sending RESET messege to backend. Besides, previous
callfds and kickfds are not closed.
To fix it, we add support to disable virtqueues when it's set to
DRIVER OK status, and re-init fields in struct virtio_user_dev.
Fixes: e9efa4d938 ("net/virtio-user: add new virtual PCI driver")
Fixes: 37a7eb2ae8 ("net/virtio-user: add device emulation layer")
Cc: stable@dpdk.org
Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Before the commit 86d59b2146 ("net/virtio: support LRO"), features
in virtio PMD, is decided and properly set at device initialization
and will not be changed. But afterward, features could be changed in
virtio_dev_configure(), and will be re-negotiated if it's changed.
In virtio-user, device features is obtained at driver probe phase
only once, but we did not store it. So the added feature bits in
re-negotiation will fail.
To fix it, we store it down, and will be used to feature negotiation
either at device initialization phase or device configure phase.
Fixes: e9efa4d938 ("net/virtio-user: add new virtual PCI driver")
Cc: stable@dpdk.org
Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
hw->dev, a pointer to pci_dev, was actually not used, until the
refactor of decouping from PCI device. This would somehow break
the multiple process again, since "hw" is stored at shared memory,
while "pci_dev" is not: the primary and secondary process could
have different address for it, while just one value is allowed.
Thus we should not store it to "hw", instead, we could retrieve
it from the "eth_dev->device" field.
Fixes: ae34410a8a ("ethdev: move info filling of PCI into drivers")
Fixes: eac901ce29 ("ethdev: decouple from PCI device")
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Since commit 0e1b45a284 ("ethdev: decouple interrupt handling from
PCI device"), intr_handle is stored at eth_dev struct, that we could
use it directly. Thus there is no need to get it from hw.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
The introduce of virtio 1.0 support brings yet another set of ops, badly,
it's not handled correctly, that it breaks the multiple process support.
The issue is the data/function pointer may vary from different processes,
and the old used to do one time set (for primary process only). That
said, the function pointer the secondary process saw is actually from the
primary process space. Accessing it could likely result to a crash.
Kudos to the last patches, we now be able to maintain those info that may
vary among different process locally, meaning every process could have its
own copy for each of them, with the correct value set. And this is what
this patch does:
- remap the PCI (IO port for legacy device and memory map for modern
device)
- set vtpci_ops correctly
After that, multiple process would work like a charm. (At least, it
passed my fuzzy test)
Fixes: b8f04520ad ("virtio: use PCI ioport API")
Fixes: d5bbeefca8 ("virtio: introduce PCI implementation structure")
Fixes: 6ba1f63b5a ("virtio: support specification 1.0")
Cc: stable@dpdk.org
Reported-by: Juho Snellman <jsnell@iki.fi>
Reported-by: Yaron Illouz <yaroni@radcom.com>
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Like vtpci_ops, the rte_pci_ioport has to store in local memory. This
is basically for the rte_pci_device field is allocated from process
local memory, but not from shared memory.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
We used to store the vtpci_ops at virtio_hw structure. The struct,
however, is stored in shared memory. That means only one value is
allowed. For the multiple process model, however, the address of
vtpci_ops should be different among different processes.
Take virtio PMD as example, the vtpci_ops is set by the primary
process, based on its own process space. If we access that address
from the secondary process, that would be an illegal memory access,
A crash then might happen.
To make the multiple process model work, we need store the vtpci_ops
in local memory but not in a shared memory. This is what the patch
does: a local virtio_hw_internal array of size RTE_MAX_ETHPORTS is
allocated. This new structure is used to store all these kind of
info in a non-shared memory. Current, we have:
- vtpci_ops
- rte_pci_ioport
- virtio pci mapped memory, such as common_cfg.
The later two will be done in coming patches. Later patches would also
set them correctly for secondary process, so that the multiple process
model could work.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
If the primary enables the vector Rx/Tx path, the current code would
let the secondary always choose the non vector Rx/Tx path. This results
to a Rx/Tx method mismatch between primary and secondary process. Werid
errors then may happen, something like:
PMD: virtio_xmit_pkts() tx: virtqueue_enqueue error: -14
Fix it by choosing the correct Rx/Tx callbacks for the secondary process.
That is, use vector path if it's given.
Fixes: 8d8393fb18 ("virtio: pick simple Rx/Tx")
Cc: stable@dpdk.org
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Current virtio driver advertises VERSION_1 support,
but does not handle device's VERSION_1 support when
sending packets (it looks for ANY_LAYOUT feature,
which is absent).
This patch enables 'can_push' in tx path when VERSION_1
is advertised by the device.
This significantly improves small packets forwarding rate
towards devices advertising VERSION_1 feature.
Signed-off-by: Pierre Pfister <ppfister@cisco.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Attaching and detaching ethernet ports from an application
is not the same thing as physically removing a PCI device,
so clarify the flags indicating support. All PCI devices
are assumed to be physically removable, so no flag is
necessary in the PCI layer.
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
The function rte_eth_xstats_get() return an array of tuples (id,
value). The value is the statistic counter, while the id references a
name in the array returned by rte_eth_xstats_get_name().
Today, each 'id' returned by rte_eth_xstats_get() is equal to the index
in the returned array, making this value useless. It also prevents a
driver from having different indexes for names and value, like in the
example below:
rte_eth_xstats_get_name() returns:
0: "rx0_stat"
1: "rx1_stat"
2: ...
7: "rx7_stat"
8: "tx0_stat"
9: "tx1_stat"
...
15: "tx7_stat"
rte_eth_xstats_get() returns:
0: id=0, val=<stat> ("rx0_stat")
1: id=1, val=<stat> ("rx1_stat")
2: id=8, val=<stat> ("tx0_stat")
3: id=9, val=<stat> ("tx1_stat")
This patch fixes the drivers to set the 'id' in their ethdev->xstats_get()
(except e1000 which was already doing it), and fixes ethdev by not setting
the 'id' field to the index of the table for pmd-specific stats: instead,
they should just be shifted by the max number of generic statistics.
Fixes: bd6aa172cf ("ethdev: fetch extended statistics with integer ids")
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Remy Horton <remy.horton@intel.com>
This makes struct rte_eth_dev independent of struct rte_pci_device by
replacing it with a pointer to the generic struct rte_device.
Signed-off-by: Jan Blunck <jblunck@infradead.org>
Acked-by: Shreyansh Jain <shreyansh.jain@nxp.com>
Only the drivers itself can decide if it could fill PCI information fields
of dev_info.
Signed-off-by: Jan Blunck <jblunck@infradead.org>
Acked-by: Shreyansh Jain <shreyansh.jain@nxp.com>
We don't need to depend on rte_eth_dev->pci_dev to differentiate between
the virtio_user and the virtio_pci case. Instead we can use the private
virtio_hw struct to get that information.
Signed-off-by: Jan Blunck <jblunck@infradead.org>
Acked-by: Shreyansh Jain <shreyansh.jain@nxp.com>
This adds a helper to get the rte_intr_handle from the virtio_hw. This is
safe to do since the usage of the helper is guarded by RTE_ETH_DEV_INTR_LSC
which is only set if we found a PCI device during initialization.
Signed-off-by: Jan Blunck <jblunck@infradead.org>
Acked-by: Shreyansh Jain <shreyansh.jain@nxp.com>
This is overwritten in rte_eth_dev_info_get().
Signed-off-by: Jan Blunck <jblunck@infradead.org>
Reviewed-by: David Marchand <david.marchand@6wind.com>
Add a new macro RTE_PMD_REGISTER_KMOD_DEP() that allows a driver to
declare the list of kernel modules required to run properly.
Today, most PCI drivers require uio/vfio.
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Fiona Trahe <fiona.trahe@intel.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
When queue number shrinks to 1 from X, the following code stops us
sending the multiple queue ctrl message:
if (nb_queues > 1) {
if (virtio_set_multiple_queues(dev, nb_queues) != 0)
return -EINVAL;
}
This ends up with still X queues being enabled, which is obviously
wrong. Fix it by replacing the check with a multiple queue enabled
or not check.
Fixes: 823ad64795 ("virtio: support multiple queues")
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
From the virtio spec of view, multiple-queue is always enabled/disabled
in queue pairs. DPDK somehow allows the case when Tx and Rx queue number
are different.
Currently, virtio PMD get the queue pair number from the nb_rx_queues
field, which could be an issue when Tx queue number > Rx queue number.
Say, 2 Tx queues and 1 Rx queues. This would end up with 1 quues being
enabled. Which is wrong.
The fix is straightforward. Just pick a bigger number and enable that many
of queues.
Fixes: 823ad64795 ("virtio: support multiple queues")
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
The "hw->started" field was introduced to stop touching queues
on restart. We never touches queues on restart any more, thus
it's safe to remove this flag.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Invoking vtpci_reinit_complete() at port start stage doesn't make any
sense, instead, it should be done at the end of dev init stage.
So move it here.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
The only piece of code of virtio_dev_rxtx_start() is actually doing
queue configure/setup work. So, move it to corresponding queue_setup
callback.
Once that is done, virtio_dev_rxtx_start() becomes an empty function,
thus it's being removed.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
virtio_dev_vring_start() is actually doing the vring initiation job.
And the vring initiation job should be done at the dev init stage, as
stated with great details in former commit.
So move it there, and rename it to virtio_init_vring().
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Queue allocation should be done once, since the queue related info (such
as vring addreess) will only be informed to the vhost-user backend once
without virtio device reset.
That means, if you allocate queues again after the vhost-user negotiation,
the vhost-user backend will not be informed any more. Leading to a state
that the vring info mismatches between virtio PMD driver and vhost-backend:
the driver switches to the new address has just been allocated, while the
vhost-backend still sticks to the old address has been assigned in the init
stage.
Unfortunately, that is exactly how the virtio driver is coded so far: queue
allocation is done at queue_setup stage (when rte_eth_tx/rx_queue_setup is
invoked). This is wrong, because queue_setup can be invoked several times.
For example,
$ start_testpmd.sh ... --txq=1 --rxq=1 ...
> port stop 0
> port config all txq 1 # just trigger the queue_setup callback again
> port config all rxq 1
> port start 0
The right way to do is allocate the queues in the init stage, so that the
vring info could be persistent with the vhost-user backend.
Besides that, we should allocate max_queue pairs the device supports, but
not nr queue pairs firstly configured, to make following case work.
$ start_testpmd.sh ... --txq=1 --rxq=1 ...
> port stop 0
> port config all txq 2
> port config all rxq 2
> port start 0
Since the allocation is switched to init stage, the free should also
moved from the rx/tx_queue_release to dev close stage. That leading we
could do nothing an empty rx/tx_queue_release() implementation.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Let rxq/txq/cq be the union field of the virtqueue struct. This would
simplifies the vq allocation a bit: we don't need calculate the vq_size
any more based on the queue type.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Instead of setting up a queue memzone name like "port0_rxq0", "port0_txq0",
it could be simplified a bit to something like "port0_vq0", "port0_vq1" ...
Meanwhile, the code is also simplified a bit.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
This reverts commit 9a0615af77 ("virtio: fix restart"); conflict is
manually addressed.
Kyle reported an issue with above commit
qemu-kvm: Guest moved used index from 5 to 1
with following steps,
1) Start my virtio interfaces
2) Send some traffic into/out of the interfaces
3) Stop the interfaces
4) Start the interfaces
5) Send some more traffic
And here are some quotes from Kyle's analysis,
Prior to the patch, if an interface were stopped then started, without
restarting the application, the queues would be left as-is, because
hw->started would be set to 1. Now, calling stop sets hw->started to 0,
which means the next call to start will "touch the queues". This is the
unintended side-effect that causes the problem.
We should not touch the queues once the init is done, otherwise, the vring
state of virtio PMD driver and vhost-user would be inconsistent, leading
some issue like above.
Thus this patch is reverted.
Fixes: 9a0615af77 ("virtio: fix restart")
Reported-by: Kyle Larose <klarose@sandvine.com>
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
This registers the legacy names of the driver being renamed in
commit 2f45703c17 ("drivers: make driver names consistent").
Signed-off-by: Jan Blunck <jblunck@infradead.org>
Reviewed-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
add cb_arg parameter to the _rte_eth_dev_callback_process function.
Adding a parameter to this function allows passing information
to the application when an eth device event occurs such as
a VF to PF message.
This allows the application to decide if a particular function
is permitted.
Signed-off-by: Bernard Iremonger <bernard.iremonger@intel.com>
Signed-off-by: Alex Zelezniak <alexz@att.com>
All macros related to driver registeration renamed from DRIVER_*
to RTE_PMD_*
This includes:
DRIVER_REGISTER_PCI -> RTE_PMD_REGISTER_PCI
DRIVER_REGISTER_PCI_TABLE -> RTE_PMD_REGISTER_PCI_TABLE
DRIVER_REGISTER_VDEV -> RTE_PMD_REGISTER_VDEV
DRIVER_REGISTER_PARAM_STRING -> RTE_PMD_REGISTER_PARAM_STRING
DRIVER_EXPORT_* -> RTE_PMD_EXPORT_*
Fix PMDINFOGEN tool to look for matches of RTE_PMD_REGISTER_*.
Signed-off-by: Shreyansh Jain <shreyansh.jain@nxp.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Add the ability to reset the virtio device in the configure callback
if the features flag changed since previous reset. This will be possible
with the introduction of offload support in next commits.
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Move the configuration of control queue in the configure callback.
This is needed by next commit, which introduces the reinitialization
of the device in the configure callback to change the feature flags.
Therefore, the control queue will have to be restarted at the same
place.
As virtio_dev_cq_queue_setup() is called from a place where
config->max_virtqueue_pairs is not available, we need to store this in
the private structure. It replaces max_rx_queues and max_tx_queues which
have the same value. The log showing the value of max_rx_queues and
max_tx_queues is also removed since config->max_virtqueue_pairs is
already displayed above.
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Move all code related to device initialization in a new function
virtio_init_device().
This commit brings no functional change, it prepares the next commits
that will add the offload support. For that, it will be needed to
reinitialize the device from ethdev->configure(), using this new
function.
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Negotiate VIRTIO_F_IOMMU_PLATFORM to have IOMMU support.
Signed-off-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Add modern device id and rename VIRTIO_PCI_DEVICEID_MIN to
VIRTIO_PCI_LEGACY_DEVICEID_NET. While at it, remove unused macros too.
Signed-off-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
The driver name has been lost with the eal rework.
Restore it.
Fixes: c830cb2954 ("drivers: use PCI registration macro")
Signed-off-by: David Marchand <david.marchand@6wind.com>
Acked-by: Shreyansh Jain <shreyansh.jain@nxp.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Virtio interfaces do not currently allow the user to specify a particular
Maximum Transmission Unit (MTU). Consequently, the MTU of Virtio interfaces
is typically set to the Ethernet default value of 1500.
This is problematic in the case of cloud deployments, in which a specific
(and potentially non-standard) MTU needs to be set by a DHCP server, which
needs to be honored by all interfaces across the traffic path.To acheive
this Virtio interfaces should support setting of MTU.
In case when GRE/VXLAN tunneling is used for internal communication, there
will be an overhead added by the infrastructure in the packet over and
above the ETHER MTU of 1518. So to take care of this overhead in these
cases the DHCP server corrects the L3 MTU to 1454. But since virtio
interfaces was not having the MTU set functionality that MTU sent by the
DHCP server was ignored and the instance will still send packets with 1500
MTU which after encapsulation will become more than 1518 and eventually
gets dropped in the infrastructure.
By adding an additional 'set_mtu' function to the Virtio driver, we can
honor the MTU sent by the DHCP server. The dhcp server/controller can
then leverage this 'set_mtu' functionality to resolve the above
mentioned issue of packets getting dropped due to incorrect size.
Signed-off-by: Souvik Dey <sodey@sonusnet.com>
Reviewed-by: Mark Kavanagh <mark.b.kavanagh@intel.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Inline with PCI probe and remove, VDEV probe and remove hooks provide
a uniform naming.
PCI probe represents scan and driver initialization. For VDEV, it will
represent argument parsing and initialization.
Signed-off-by: Shreyansh Jain <shreyansh.jain@nxp.com>
Added neon based Rx vector implementation.
Selection of the new handler based neon availability at runtime.
Updated the release notes and MAINTAINERS file.
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Acked-by: Jianbo Liu <jianbo.liu@linaro.org>
Introduced cpuflag based run-time detection to select the
SSE based simple Rx handler
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Split out SSE instruction based virtio simple Rx
implementation to a separate file
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Removed unnecessary compile time dependency on "use_simple_rxtx".
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Interestingly, clang and gcc has different prototype for _mm_prefetch().
For gcc, we have
_mm_prefetch (const void *__P, enum _mm_hint __I)
While for clang, it's
#define _mm_prefetch(a, sel) (__builtin_prefetch((void *)(a), 0, (sel)))
That's how the following error comes with clang:
error: cast from 'const void *' to 'void *' drops const qualifier
[-Werror,-Wcast-qual]
_mm_prefetch((const void *)rused, _MM_HINT_T0);
/usr/lib/llvm-3.8/bin/../lib/clang/3.8.0/include/xmmintrin.h:684:58:
note: expanded from macro '_mm_prefetch'
#define _mm_prefetch(a, sel) (__builtin_prefetch((void *)(a),
0, (sel)))
What's weird is that the build was actaully Okay before. I met it while
apply Jerin's vector support for ARM patch set: he just move this piece
of code to another file, nothing else changed.
This patch fix the issue when Jerin's patchset is applied. Thus, I think
it's still needed.
Similarly, make the same change to other _mm_prefetch users, just in case
this weird issue shows up again somehow later.
Fixes: fc3d66212f ("virtio: add vector Rx")
Fixes: c95584dc2b ("ixgbe: new vectorized functions for Rx/Tx")
Fixes: 9ed94e5bb0 ("i40e: add vector Rx")
Fixes: 7092be8437 ("fm10k: add vector Rx")
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Acked-by: Jing Chen <jing.d.chen@intel.com>
Currently, when virtio_user device fails to be started (e.g., vhost
unix socket does not exit), the init function does not return struct
rte_eth_dev (and some other structs) back to ether layer. And what's
more, it does not report the error to upper layer.
The fix is to free those structs and report error when failing to
start virtio_user devices.
Fixes: ce2eabdd43 ("net/virtio-user: add virtual device")
Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
When virtio_user is used with VPP's native vhost user, it cannot
send/receive any packets.
The root cause is that vpp-vhost-user translates the message
VHOST_USER_SET_FEATURES as puting this device into init state,
aka, zero all related structures. However, previous code
puts this message at last in the whole initialization process,
which leads to all previous information are zeroed.
To fix this issue, we rearrange the sequence of those messages.
- step 0, send VHOST_USER_SET_VRING_CALL so that vhost allocates
virtqueue structures;
- step 1, send VHOST_USER_SET_FEATURES to confirm the features;
- step 2, send VHOST_USER_SET_MEM_TABLE to share mem regions;
- step 3, send VHOST_USER_SET_VRING_NUM, VHOST_USER_SET_VRING_BASE,
VHOST_USER_SET_VRING_ADDR, VHOST_USER_SET_VRING_KICK for each
queue;
- ...
Fixes: 37a7eb2ae8 ("net/virtio-user: add device emulation layer")
Reported-by: Zhihong Wang <zhihong.wang@intel.com>
Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
When virtio_user is used with OVS-DPDK (with mq disabled), it cannot
receive any packets. This is because no queue is enabled at all when
mq is disabled.
To fix it, we should consistently make sure the 1st queue is enabled,
which is also the behaviour QEMU takes.
Fixes: 37a7eb2ae8 ("net/virtio-user: add device emulation layer")
Reported-by: Ning Li <lining18@jd.com>
Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
We have a stats named "size_1024_1517_packets", while the code
actually counts the range "[1024, 1518]", which is obviously wrong.
The code is as follows in the function virtio_update_packet_stats.
else if (s < 1519)
stats->size_bins[6]++;
We could either fix it by correcting the "if" check in the code,
or fix it by just renaming the stats to conform to the code. The
latter solution is taken because that's what the RFC2819 suggests.
Fixes: 76d4c652e0 ("virtio: add extended stats")
Signed-off-by: Zhiyong Yang <zhiyong.yang@intel.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Virtio indirect descriptors are supported by the data-path
but the feature bit is never set during feature negociation.
This patch simply adds VIRTIO_RING_F_INDIRECT_DESC back to
the supported features bit mask, hence enabling the use of
indirect descriptors when the feature is negociated with the
device.
Signed-off-by: Pierre Pfister <ppfister@cisco.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Now that rte_device is available, drivers can start using its members
(numa, name) as well as link themselves into another rte_device list.
As of now no one is using this list, but can be used for moving over all
devices (pdev/vdev/Xdev) and perform bulk actions (like cleanup).
Signed-off-by: Jan Viktorin <viktorin@rehivetech.com>
[Shreyansh: Reword commit log for extra rte_device list]
Signed-off-by: Shreyansh Jain <shreyansh.jain@nxp.com>
Acked-by: David Marchand <david.marchand@6wind.com>
Remove the 'name' member from rte_pci_driver and move to generic
rte_driver.
Most of the PMD drivers were initially using DRIVER_REGISTER_PCI(<name>..)
as well as assigning a name to eth_driver.pci_drv.name member.
In this patch, only the original DRIVER_REGISTER_PCI(<name>..) name has
been populated into the rte_driver.name member - assignments through
eth_driver has been removed.
Signed-off-by: Jan Viktorin <viktorin@rehivetech.com>
[Shreyansh: Rebase and expand changes to newly added files]
Signed-off-by: Shreyansh Jain <shreyansh.jain@nxp.com>
Acked-by: David Marchand <david.marchand@6wind.com>
- All devices register themselfs by calling a kind of DRIVER_REGISTER_XXX.
The PMD_REGISTER_DRIVER is not used anymore.
- PMD_VDEV type is also not being used - can be removed from all VDEVs.
Signed-off-by: Jan Viktorin <viktorin@rehivetech.com>
Signed-off-by: Shreyansh Jain <shreyansh.jain@nxp.com>
Acked-by: David Marchand <david.marchand@6wind.com>
All PMD_VDEV drivers can now use rte_vdev_driver instead of the
rte_driver (which is embedded in the rte_vdev_driver).
Signed-off-by: Jan Viktorin <viktorin@rehivetech.com>
Signed-off-by: Shreyansh Jain <shreyansh.jain@nxp.com>
Acked-by: David Marchand <david.marchand@6wind.com>
Now that hotplug has been moved to eal, there is no reason to keep the
device type in this layer.
Signed-off-by: David Marchand <david.marchand@6wind.com>
Signed-off-by: Shreyansh Jain <shreyansh.jain@nxp.com>
Simplify crypto and ethdev pci drivers init by using newly introduced
init macros and helpers.
Those drivers then don't need to register as "rte_driver"s anymore.
Exceptions:
- virtio and mlx* use RTE_INIT directly as they have custom initialization
steps.
- VDEV devices are not modified - they continue to use PMD_REGISTER_DRIVER.
Update documentation for replacing an example referring to
PMD_REGISTER_DRIVER.
Signed-off-by: David Marchand <david.marchand@6wind.com>
Signed-off-by: Shreyansh Jain <shreyansh.jain@nxp.com>
As discussed in the past release, driver names are modified
to be more consistent, and the future driver should follow
this new convention.
Driver names consist of:
"driver category"_"driver folder name"_"optional extra name".
For example:
- Crypto null driver -> "crypto_null"
- Network IXGBE VF driver -> "net_ixgbe_vf"
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
The commit cb6696d220 ("drivers: update registration macro usage")
changes the name from virtio-user to virtio_user, because hyphen
cannot be used in a C symbol name. However, this commit does not
update the strings in docs and source code, which could lead to
failure to start this device as per the docs.
This patch updates related strings in the docs and source code.
Fixes: cb6696d220 ("drivers: update registration macro usage")
Reported-by: Tiwei Bie <tiwei.bie@intel.com>
Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
The rxq/txq for the queue_release callback could be NULL, say when
rte_eth_dev_configure() fails that the queue is not setup at all.
Do a simple NULL check would fix the crash issue.
Fixes: 01ad44fd37 ("net/virtio: split Rx/Tx queue")
Reported-by: Olivier Matz <olivier.matz@6wind.com>
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
The support of virtio-user changed the way the mbuf dma address is
retrieved, using a physical address in case of virtio-pci and a virtual
address in case of virtio-user.
This change introduced some possible memory corruption in packets,
replacing:
m->buf_physaddr + RTE_PKTMBUF_HEADROOM
by:
m->buf_physaddr + m->data_off (through a macro)
This patch fixes this issue, restoring the original behavior.
By the way, it also rework the macros, adding a "VIRTIO_" prefix and
API comments.
Fixes: f24f8f9fee ("net/virtio: allow virtual address to fill vring descriptors")
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Jianfeng Tan <jianfeng.tan@intel.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
The error is reported using test build script:
$ scripts/test-build.sh x86_64-native-linuxapp-gcc
...
drivers/net/virtio/virtio_user_ethdev.c:345:2: error:
this ‘if’ clause does not guard... [-Werror=misleading-indentation]
if (rte_kvargs_count(kvlist, VIRTIO_USER_ARG_PATH) == 1)
^~
Fixes: 404bd6bfe3 ("net/virtio-user: fix return value not checked")
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reused defines from the driver.
Used RTE_PCI_DEVICE in place of RTE_PCI_DEV_ID_DECL* stuff.
Signed-off-by: David Marchand <david.marchand@6wind.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
This is for target i686-native-linuxapp-gcc and gcc6,
Compilation error is:
In file included from
include/rte_mempool.h:77:0, from
drivers/net/virtio/virtio_rxtx_simple.c:
In function `virtio_xmit_pkts_simple':
include/rte_memcpy.h:551:2: error:
array subscript is above array bounds
rte_mov16((uint8_t *)dst + 1 * 16, (const uint8_t *)src + 1 * 16);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Call stack is as following:
virtio_xmit_pkts_simple
virtio_xmit_cleanup
rte_mempool_put_bulk
rte_mempool_generic_put
__mempool_generic_put
rte_memcpy
The array used as source buffer in virtio_xmit_cleanup (free) is a
pointer array with 32 elements, in 32bit this makes 128 bytes.
in rte_memcpy() implementation, there a code piece as following:
if (size > 256) {
rte_move128(...);
rte_move128(...); <--- [1]
....
}
The compiler traces the array all through the call stack and knows the
size of array is 128 and generates a warning on above [1] which tries to
access beyond byte 128.
But unfortunately it ignores the "(size > 256)" check.
Giving a hint to compiler that variable "size" is related to the size of
the source buffer fixes compiler warning.
Fixes: 863bfb4744 ("mempool: optimize copy in cache")
Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
There is a logic bug in this code, that could lead to null pointer
dereference when cvq is NULL. Fix this problem by changing logic
&& to logic ||.
>> CID 127480: Null pointer dereferences (FORWARD_NULL)
>> Dereferencing null pointer "cvq".
if (!cvq && !cvq->vq) {
...
}
Coverity issue: 127480
Fixes: 01ad44fd37 ("net/virtio: split Rx/Tx queue")
Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
When use strcpy() to copy string with length exceeding the last
parameter of strcpy(), it may lead to the destination string
unterminated.
We replaced strncpy with snprintf to make sure it's NULL terminated.
Coverity issue: 127476
Fixes: ce2eabdd43 ("net/virtio-user: add virtual device")
Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
The return value by rte_kvargs_parse is not free(d), which leads
to memory leak.
Coverity issue: 127482
Fixes: ce2eabdd43 ("net/virtio-user: add virtual device")
Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
When parsing /proc/self/maps to get hugepage information, the string
was being copied with strcpy(), which could, theoretically but in fact
not possiblly, overflow the destination buffer. Anyway, to avoid the
false alarm, we replaced strncpy with snprintf for safely copying the
strings.
Coverity issue: 127484
Fixes: 6a84c37e39 ("net/virtio-user: add vhost-user adapter layer")
Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
When return values of function calls are not checked, Coverity will
report errors like:
if (rte_kvargs_count(kvlist, VIRTIO_USER_ARG_PATH) == 1)
>>> CID 127477: (CHECKED_RETURN)
>>> Calling "rte_kvargs_process" without checking return value
(as is done elsewhere 25 out of 30 times).
rte_kvargs_process(kvlist, VIRTIO_USER_ARG_PATH,
&get_string_arg, &path);
Coverity issue: 127477, 127478
Fixes: ce2eabdd43 ("net/virtio-user: add virtual device")
Fixes: 6a84c37e39 ("net/virtio-user: add vhost-user adapter layer")
Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
On some older systems, such as SUSE 11, the compiling error shows
as:
.../dpdk/drivers/net/virtio/virtio_user/virtio_user_dev.c:67:22:
error: ‘O_CLOEXEC’ undeclared (first use in this function)
The fix is to use EFD_CLOEXEC, which is defined in sys/eventfd.h,
instead of O_CLOEXEC which needs _GNU_SOURCE defined on some old
systems.
Fixes: 37a7eb2ae8 ("net/virtio-user: add device emulation layer")
Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Virtio and Xenvirt are two virtual device drivers that admit
arguments, so DRIVER_REGISTER_PARAM_STRING should be used
in them.
Fixes: cb6696d220 ("drivers: update registration macro usage")
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Since now the PMD_REGISTER_DRIVER macro sets the driver names,
there is no need to have the rte_driver structure setting it
statically, as it will get overridden.
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Modify the PMD_REGISTER_DRIVER macro, adding a name argument to it. The
addition of a name argument creates a token that can be used for subsequent
macros in the creation of unique symbol names to export additional bits of
information for use by the pmdinfogen tool. For example:
PMD_REGISTER_DRIVER(ena_driver, ena);
registers the ena_driver struct as it always did, and creates a symbol
const char this_pmd_name0[] __attribute__((used)) = "ena";
which pmdinfogen can search for and extract. The subsequent macro
DRIVER_REGISTER_PCI_TABLE(ena, ena_pci_id_map);
creates a symbol const char ena_pci_tbl_export[] __attribute__((used)) =
"ena_pci_id_map";
Which allows pmdinfogen to find the pci table of this driver
Using this pattern, we can export arbitrary bits of information.
pmdinfo uses this information to extract hardware support from an object
file and create a json string to make hardware support info discoverable
later.
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Panu Matilainen <pmatilai@redhat.com>
Acked-by: Remy Horton <remy.horton@intel.com>
Implicit int to enum conversion is not allowed when icc is used as
the compiler. It raises the compiling error like,
drivers/net/virtio/virtio_user/vhost_user.c(257):
error #188: enumerated type mixed with another type
msg.request = req;
^
The fix is simple, change the type of parameter req to enum
vhost_user_request.
Fixes: 6a84c37e39 ("net/virtio-user: add vhost-user adapter layer")
Suggested-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
For all drivers that currently implement xstats, the id field in the
rte_eth_stats_name structure equals the entry's array index. This
patch eliminates the redundant id field as a direct index lookup is
faster than a search for the matching id field.
Suggested-by: Olivier Matz <olivier.matz@6wind.com>
Signed-off-by: Remy Horton <remy.horton@intel.com>
Some libraries were missing their dependency on eal, mbuf, mempool,
ring and kvargs.
It is revealed by the linker option "-z defs".
Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
The compilation for 32-bit fails when CONFIG_RTE_VIRTIO_USER is enabled:
drivers/net/virtio/virtio_user_ethdev.c:84:47:
error: format ‘%llu’ expects argument of type ‘long long unsigned int’,
but argument 5 has type ‘size_t {aka unsigned int}’
Fixes: e9efa4d938 ("net/virtio-user: add new virtual PCI driver")
Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
In the following loop:
while (vq->vq_used_cons_idx != vq->vq_ring.used->idx) {
...
}
There is no external function call or any explict memory barrier
in the loop, the re-read of used->idx might be optimized and only
be retrieved once.
Use of voaltile normally should be prohibited, and access_once
is Linux kernel's style to handle this issue; Once we have that
macro in DPDK, we could change to that style.
virtio_recv_mergable_pkts might also have the same issue, so fix
it as well.
Fixes: 823ad64795 ("virtio: support multiple queues")
Fixes: 13ce5e7eb9 ("virtio: mergeable buffers")
Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Trying to access xstats_names after "if (xstats_names == NULL)" is
obviously wrong, which would result to a crash while running "show
port xstats 0" in testpmd with virtio PMD.
The fix is straightforward; just reverse the check.
Fixes: baf91c395b ("net/virtio: fetch extended statistics with integer ids")
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
In virtio-user driver, when notify ctrl-queue, invoke API of
virtio-user device emulation to handle ctrl-q command.
Besides, multi-queue requires ctrl-queue and ctrl-queue will be
enabled automatically when multi-queue is specified.
Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
The main purpose of this patch is to enable multi-queue. But
multi-queue requires ctrl-queue so that driver can send how many
queues will be enabled through ctrl-queue messages.
So we partially implement ctrl-queue to handle control command
with class of VIRTIO_NET_CTRL_MQ and with cmd of
VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET to handle mq support. This patch
provides a function, virtio_user_handle_cq(), for driver to handle
ctrl-queue messages.
Besides, multi-queue requires VIRTIO_NET_F_MQ and VIRTIO_NET_F_CTRL_VQ
are enabled when we do feature negotiation.
Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
This patch mainly adds method in vhost user adapter to communicate
enable/disable queues messages with vhost user backend, aka,
VHOST_USER_SET_VRING_ENABLE.
Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Add a new virtual device named virtio-user, which can be used just like
eth_ring, eth_null, etc. To reuse the code of original virtio, we do
some adjustment in virtio_ethdev.c, such as remove key _static_ of
eth_virtio_dev_init() so that it can be reused in virtual device; and
we add some check to make sure it will not crash.
Configured parameters include:
- queues (optional, 1 by default), number of queue pairs, multi-queue
not supported for now.
- cq (optional, 0 by default), not supported for now.
- mac (optional), random value will be given if not specified.
- queue_size (optional, 256 by default), size of virtqueues.
- path (madatory), path of vhost user.
When enable CONFIG_RTE_VIRTIO_USER (enabled by default), the compiled
library can be used in both VM and container environment.
Examples:
path_vhost=<path_to_vhost_user> # use vhost-user as a backend
sudo ./examples/l2fwd/build/l2fwd -c 0x100000 -n 4 \
--socket-mem 0,1024 --no-pci --file-prefix=l2fwd \
--vdev=virtio-user0,mac=00:01:02:03:04:05,path=$path_vhost -- -p 0x1
Known issues:
- Control queue and multi-queue are not supported yet.
- Cannot work with --huge-unlink.
- Cannot work with no-huge.
- Cannot work when there are more than VHOST_MEMORY_MAX_NREGIONS(8)
hugepages.
- Root privilege is a must (mainly becase of sorting hugepages according
to physical address).
- Applications should not use file name like HUGEFILE_FMT ("%smap_%d").
- Cannot work with vhost-net backend.
Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
This patch is related to how to calculate relative address for vhost
backend.
The principle is that: based on one or multiple shared memory regions,
vhost maintains a reference system with the frontend start address,
backend start address, and length for each segment, so that each
frontend address (GPA, Guest Physical Address) can be translated into
vhost-recognizable backend address. To make the address translation
efficient, we need to maintain as few regions as possible. In the case
of VM, GPA is always locally continuous. But for some other case, like
virtio-user, GPA continuous is not guaranteed, therefore, we use virtual
address here.
It basically means:
a. when set_base_addr, VA address is used;
b. when preparing RX's descriptors, VA address is used;
c. when transmitting packets, VA is filled in TX's descriptors;
d. in TX and CQ's header, VA is used.
Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
This patch moves phys addr check from virtio_dev_queue_setup
to pci ops. To make that happen, make sure virtio_ops.setup_queue
return the result if we pass through the check.
Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
We skip kernel managed virtio devices, if it isn't whitelisted.
Before checking if the virtio device is whitelisted, check if devargs
is specified.
Fixes: ac5e1d838d ("virtio: skip error when probing kernel managed device")
Reported-by: Vincent Li <vincent.mc.li@gmail.com>
Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
We keep a common vq structure, containing only vq related fields,
and then split others into RX, TX and control queue respectively.
Signed-off-by: Huawei Xie <huawei.xie@intel.com>
[Jianfeng Tan: found and fixed 2 bugs]
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
The commit dd856dfcb9 introduced an optimization that prepends virtio
header to mbuf data. It can be used when the tx mbuf is writeable, so we
need to check that the mbuf is direct (i.e. it embeds its own data).
Fixes: dd856dfcb9 ("virtio: use any layout on Tx")
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
The current extended ethernet statistics fetching involve doing several
string operations, which causes performance issues if there are lots of
statistics and/or network interfaces. This patch changes the test-pmd
and proc_info applications to use the new xstats API, and removes
deprecated code associated with the old API.
Signed-off-by: Remy Horton <remy.horton@intel.com>
The current extended ethernet statistics fetching involve doing several
string operations, which causes performance issues if there are lots of
statistics and/or network interfaces. This patch changes the virtio driver
to use the new API that seperates name string and value queries.
Signed-off-by: Remy Horton <remy.horton@intel.com>
Although ppc supports both endianesses, qemu supposes that the cpu is
big endian and enforces this for the virtio-net stuff.
Fix PCI accesses in legacy mode. Only ppc64le is supported at the moment.
Signed-off-by: David Marchand <david.marchand@6wind.com>
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
The SYSFS_PCI_DEVICES is a constant that makes the PCI testing
difficult as it points to an absolute path. We remove using this
constant and introducing a function pci_get_sysfs_path that gives
the same value. However, the user can pass a SYSFS_PCI_DEVICES env
variable to override the path. It is now possible to create a fake
sysfs hierarchy for testing.
Signed-off-by: Jan Viktorin <viktorin@rehivetech.com>
Many drivers provide their own implementation of rte_mbuf_raw_alloc(),
duplicating the code. Introduce a new public function in rte_mbuf to
allocate a raw mbuf (uninitialized).
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
When virtio was proposed in DPDK, there is no API to free memzones.
But this has changed since rte_memzone_free() has been implemented by
commit ff909fe21f ("mem: introduce memzone freeing").
This patch is to make sure memzones in struct virtqueue, like mz and
virtio_net_hdr_mz, are freed when queue is released or setup fails.
Fixes: c1f86306a0 ("virtio: add new driver")
Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
The "drv_flags" is set with device as the input, which means different
device (say, modern vs legacy) could end up with a different value. And
the fact that "drv_flags" is shared by all devices means that every time
we add a new device, it simply overwrites the value configured from the
last device.
Therefore, when two virtio devices have different flags, it may lead to
wrong result, such as virtio would set irq config when it's not supported.
Making the flag per device (using "dev->data->dev_flags") could let us
have different value for each device, which would avoid the above issue.
Fixes: da978dfdc4 ("virtio: use port IO to get PCI resource")
Reported-by: David Marchand <david.marchand@6wind.com>
Suggested-by: David Marchand <david.marchand@6wind.com>
Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Acked-by: David Marchand <david.marchand@6wind.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Avail ring is updated by the frontend and consumed by the backend.
There are frequent core to core cache transfers for the avail ring.
This optmization avoids avail ring entry index update if the entry
already holds the same value.
As DPDK virtio PMD implements FIFO free descriptor list (also for
performance reason of CACHE), in which descriptors are allocated
from the head and freed to the tail, with this patch in most cases
avail ring will remain the same, then it would be valid in both caches
of frontend and backend.
Suggested-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
check merge-able header as it is supported.
previously we don't support merge-able feature, so non merge-able
header is checked.
Fixes: 13ce5e7eb9 ("virtio: mergeable buffers")
Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
After the do-while loop, idx could be VQ_RING_DESC_CHAIN_END (32768)
when it's the last vring desc buf we can get. Therefore, following
expresssion could lead to a segfault error, as it tries to access
beyond the desc memory boundary.
start_dp[idx].flags &= ~VRING_DESC_F_NEXT;
This bug could be reproduced easily with "set fwd txonly" in the
guest PMD, where the dequeue on host is slower than the guest Tx,
that running out of free desc buf is pretty easy.
The fix is straightforward and easy, just remove it, as we have
already set desc flags properly inside the do-while loop.
Fixes: dd856dfcb9 ("virtio: use any layout on Tx")
[Yuanhan Liu: commit log reword]
Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Acked-by: Huawei Xie <huawei.xie@intel.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Issue: output of appliations and debug info of DPDK may be mixed up
in same line when enabling below debug options of virtio:
CONFIG_RTE_LIBRTE_VIRTIO_DEBUG_INIT
CONFIG_RTE_LIBRTE_VIRTIO_DEBUG_TX
CONFIG_RTE_LIBRTE_VIRTIO_DEBUG_DRIVER
This patch adds "\n" in the tail of definitions like PMD_RX_LOG,
PMD_TX_LOG, and PMD_DRV_LOG, and removes some "\n" when using these
macros.
Fixes: c1f86306a0 ("virtio: add new driver")
Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
For simple TX the virtio-net header must be zeroed, but it was using memory
that had been initialized with indirect descriptor tables. This resulted in
"unsupported gso type" errors from librte_vhost.
We can use the same memory for every descriptor to save cachelines in the
vswitch.
Fixes: 6dc5de3a ("virtio: use indirect ring elements")
Signed-off-by: Rich Lane <rich.lane@bigswitch.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Acked-by: Jianfeng Tan <jianfeng.tan@intel.com>
Some duplex values are replaced from 0 to half-duplex when link is down.
Some drivers are still using their own constants for duplex modes.
Signed-off-by: Marc Sune <marcdevel@gmail.com>
Define and use ETH_LINK_UP and ETH_LINK_DOWN where appropriate.
Signed-off-by: Marc Sune <marcdevel@gmail.com>
Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Virtio has an mbuf descriptor ring containing mbufs to be used for
receiving traffic. When the host queues traffic to be sent to the guest, it
consumes these descriptors. If none exist, it discards the packet.
The virtio pmd allocates mbufs to the descriptor ring every time it
successfully receives a packet. However, it never does it if it does not
receive a valid packet. If the descriptor ring is exhausted, and the mbuf
mempool does not have any mbufs free (which can happen for various reasons,
such as queueing along the processing pipeline), then the receive call will
not allocate any mbufs to the descriptor ring, and when it finishes, the
descriptor ring will be empty. The ring being empty means that we will
never receive a packet again, which means we will never allocate mbufs to
the ring: we are stuck.
Ultimately, the problem arises because there is a dependency between
receiving packets and making the descriptor ring not be empty, and a
dependency between the descriptor ring not being empty, and receiving
packets.
To fix the problem, this pakes makes virtio always try to allocate mbufs
to the descriptor ring, if necessary, when polling for packets. Do this by
removing the early exit if no packets were received. Since the packet loop
later will do nothing if there are no packets, this is fine.
I reproduced the problem by pushing packets through a pipelined systems
(such as the client_server sample application) after artificially
decreasing the size of the mbuf pool and introducing a delay in a secondary
stage.
Without the fix, the process stops receiving packets fairly quicky. With
the fix, it continues to receive packets.
Fixes: c1f86306a0 ("virtio: add new driver")
Signed-off-by: Kyle Larose <klarose@sandvine.com>
Acked-by: Huawei Xie <huawei.xie@intel.com>
All the error checks in virtqueue_enqueue_xmit are already done
by the caller. Therefore they can be removed to improve performance.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Acked-by: Huawei Xie <huawei.xie@intel.com>
Virtio supports a feature that allows sender to put transmit
header prepended to data. It requires that the mbuf be writeable, correct
alignment, and the feature has been negotiatied. If all this works out,
then it will be the optimum way to transmit a single segment packet.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Acked-by: Huawei Xie <huawei.xie@intel.com>
The virtio ring in QEMU/KVM is usually limited to 256 entries
and the normal way that virtio driver was queuing mbufs required
nsegs + 1 ring elements. By using the indirect ring element feature
if available, each packet will take only one ring slot even for
multi-segment packets.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Acked-by: Huawei Xie <huawei.xie@intel.com>
Signed-off-by: Igor Ryzhov <iryzhov@nfware.com>
Acked-by: Harry van Haaren <harry.van.haaren@intel.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Applied with coding standards fixes:
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
The virtio_net_hdr desc all pointed to the same buffer. It doesn't cause
issue because in the simple TX mode we don't use the header. This patch
makes the header desc point to different buffer.
Fixes: b4ae9c505f ("virtio: optimize ring layout")
Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Acked-by: Jianfeng Tan <jianfeng.tan@intel.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
This initialisation of nb_rx_queues and nb_tx_queues has been removed
from eth_virtio_dev_init.
The nb_rx_queues and nb_tx_queues were being initialised in
eth_virtio_dev_init before the tx_queues and rx_queues arrays were
allocated.
The arrays are allocated when the ethdev port is configured and the
nb_tx_queues and nb_rx_queues are initialised.
If any of the following functions were called before the ethdev
port was configured there was a segmentation fault because
rx_queues and tx_queues were NULL:
rte_eth_stats_get
rte_eth_stats_reset
rte_eth_xstats_get
rte_eth_xstats_reset
Fixes: 823ad64795 ("virtio: support multiple queues")
Signed-off-by: Bernard Iremonger <bernard.iremonger@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Fix the issue that virtio device cannot be started after stopped.
The field, hw->started, should be changed by virtio_dev_start/stop instead
of virtio_dev_close.
Fixes: a85786dc81 ("virtio: fix states handling during initialization")
Reported-by: Pavel Fedin <p.fedin@samsung.com>
Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Tested-by: Pavel Fedin <p.fedin@samsung.com>
Declare dst as type uint32_t instead of uint64_t, otherwise, we will get
a random upper 32 bit feature bits, as the following io port read reads
lower 32 bit only. It could lead a feature bits that include VIRTIO_F_VERSION_1
(the 32th bit) for legacy virtio, which is obviously wrong.
Fixes: b8f04520ad ("virtio: use PCI ioport API")
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Acked-by: Jianfeng Tan <jianfeng.tan@intel.com>
Reviewed-by: David Marchand <david.marchand@6wind.com>
virtio PMD could use IO port to configure the virtio device without
using UIO/VFIO driver in legacy mode.
There are two issues with previous implementation:
1) virtio PMD will take over the virtio device(s) blindly even if not
intended for DPDK.
2) driver conflict between virtio PMD and virtio-net kernel driver.
This patch checks if there is kernel driver other than UIO/VFIO managing
the virtio device before using port IO.
If legacy_virtio_resource_init fails and kernel driver other than
VFIO/UIO is managing the device, return 1 to tell the upper layer we
don't take over this device.
For all other IO port mapping errors, return -1.
Note than if VFIO/UIO fails, now we don't fall back to port IO.
Fixes: da978dfdc4 ("virtio: use port IO to get PCI resource")
Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Acked-by: David Marchand <david.marchand@6wind.com>
Macros RTE_MBUF_DATA_DMA_ADDR and RTE_MBUF_DATA_DMA_ADDR_DEFAULT
are defined in each PMD driver file. Convert macros to inline
functions and move them to common lib/librte_mbuf/rte_mbuf.h file.
PMD drivers include rte_mbuf.h file directly/indirectly hence no
additioanl header file inclusion is necessary.
Signed-off-by: Ravi Kerur <rkerur@gmail.com>
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Move all os / arch specifics to eal.
Signed-off-by: David Marchand <david.marchand@6wind.com>
Reviewed-by: Santosh Shukla <sshukla@mvista.com>
Tested-by: Santosh Shukla <sshukla@mvista.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
According to the api, rte_eal_pci_map_device is only successful when
returning 0.
Fixes: 6ba1f63b5a ("virtio: support specification 1.0")
Signed-off-by: David Marchand <david.marchand@6wind.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Fixes: c52afa68d7 ("virtio: move left PCI stuff in the right file")
Signed-off-by: David Marchand <david.marchand@6wind.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
fix the error reported by checkpatch:
"ERROR: return is not a function, parentheses are not required"
remove parentheses in return like:
"return (logical expressions)"
remove parentheses in return a function like:
"return (rte_mempool_lookup(...))"
Fixes: 6307b909b8 ("lib: remove extra parenthesis after return")
Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Modern (v1.0) virtio pci device defines several pci capabilities.
Each cap has a configure structure corresponding to it, and the
cap.bar and cap.offset fields tell us where to find it.
Firstly, we map the pci resources by rte_eal_pci_map_device().
We then could easily locate a cfg structure by:
cfg_addr = dev->mem_resources[cap.bar].addr + cap.offset;
Therefore, the entrance of enabling modern (v1.0) pci device support
is to iterate the pci capability lists, and to locate some configs
we care; and they are:
- common cfg
For generic virtio and virtqueue configuration, such as setting/getting
features, enabling a specific queue, and so on.
- nofity cfg
Combining with `queue_notify_off' from common cfg, we could use it to
notify a specific virt queue.
- device cfg
Where virtio_net_config structure is located.
- isr cfg
Where to read isr (interrupt status).
If any of above cap is not found, we fallback to the legacy virtio
handling.
If succeed, hw->vtpci_ops is assigned to modern_ops, where all
operations are implemented by reading/writing a (or few) specific
configuration space from above 4 cfg structures. And that's basically
how this patch works.
Besides those changes, virtio 1.0 introduces a new status field:
FEATURES_OK, which is set after features negotiation is done.
Last, set the VIRTIO_F_VERSION_1 feature flag.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Tested-by: Qian Xu <qian.q.xu@intel.com>
Reviewed-by: Tetsuya Mukawa <mukawa@igel.co.jp>
Tested-by: Tetsuya Mukawa <mukawa@igel.co.jp>
Acked-by: Huawei Xie <huawei.xie@intel.com>
The mergeable virtio net hdr format has been the standard and the
only virtio net hdr format since virtio 1.0. Therefore, we can
not hardcode hdr_size to "sizeof(struct virtio_net_hdr)" any more
at virtio_recv_pkts(), otherwise, there would be a mismatch of
hdr size from rte_vhost_enqueue_burst() and virtio_recv_pkts(),
leading a packet corruption.
Instead, we should retrieve it from hw->vtnet_hdr_size; we will
do proper settings at eth_virtio_dev_init() in later patches.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Tested-by: Qian Xu <qian.q.xu@intel.com>
Reviewed-by: Tetsuya Mukawa <mukawa@igel.co.jp>
Tested-by: Tetsuya Mukawa <mukawa@igel.co.jp>
Acked-by: Huawei Xie <huawei.xie@intel.com>
Switch to 64 bit features, which virtio 1.0 supports.
While legacy virtio only supports 32 bit features, it complains aloud
and quit when trying to setting > 32 bit features.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Tested-by: Qian Xu <qian.q.xu@intel.com>
Reviewed-by: Tetsuya Mukawa <mukawa@igel.co.jp>
Tested-by: Tetsuya Mukawa <mukawa@igel.co.jp>
Acked-by: Huawei Xie <huawei.xie@intel.com>