This commit fixs segment fault when rte_eth_dev_close() is called on
a virtio dev more than once. Assigning zero after free to avoids
freed memory to be accessed again.
Fixes: 69c80d4ef8 ("net/virtio: allocate queue at init stage")
Cc: stable@dpdk.org
Signed-off-by: Huanle Han <hanxueluo@gmail.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Segfault happens when using virtio-user after commit 7f0a669e7b
("ethdev: add allocation helper for virtual drivers").
It's due to we use ethdev->device to recognize physical devices,
but after above commit, this field is also filled for virtual
devices. Then we obtain the wrong pci_dev pointer and accessing
its field when copying pci info results in segfault.
To fix it, we use hw->virtio_user_dev to differentiate physical
devices from virtual devices.
Fixes: 6a7c0dfcdf ("net/virtio: do not depend on PCI device of ethdev")
Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
The virtio port link status will always be DOWN:
The commit aa9f060617 ("net/virtio: fix link status always being up")
introduces a flag to help checking the status. If this flag is not set,
status will be always down. However, in dev start, this flag is set
after link status update, then we miss the chance to change the status
to UP in dev start.
To fix this bug, we simply move the link status update after the flag
setting so that the status can be correctly updated.
Fixes: aa9f060617 ("net/virtio: fix link status always being up")
Cc: stable@dpdk.org
Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
As we already change to use capability list to detect MSI-X, remove
the redundant MSI-X detection in legacy devices.
Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
LSC flag is set in several places, but only the last one takes effect;
so we remove the redundant ones and just keep the last one.
This also fixes the bug that dev_flags being overwritten by
rte_eth_copy_pci_info(), which resets it to 0 unconditionally.
Cc: stable@dpdk.org
Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
The field, use_msix, in struct virtio_hw is not updated for modern
device, and is always zero. And now we depend on the status feature
and MSI-X to report LSC support (which is also not a correct
behavior). As a result, LSC is always disabled for modern devices.
To fix this, we just recognize MSI-X capability when going through
capability list, and update the info in virtio.
Fixes: 6ba1f63b5a ("virtio: support specification 1.0")
Cc: stable@dpdk.org
Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Current virtio_dev_stop only disables interrupt and marks link down,
When it is invoked, tx/rx traffic flows still work. This is a strange
behavior. The patch supports the switch of flow.
Signed-off-by: Zhiyong Yang <zhiyong.yang@intel.com>
virtio-user cannot work on 32-bit system as higher 32-bit of the
addr field (64-bit) in the desc is filled with non-zero value
which should not happen for a 32-bit system.
In case of virtio-user, we use buf_addr of mbuf to fill the
virtqueue desc addr. This is a regression bug. For 32-bit system,
the first 4 bytes of mbuf is buf_addr, with following 8 bytes for
buf_phyaddr. With below wrong definition, both buf_addr and lower
4 bytes buf_phyaddr are obtained to fill the virtqueue desc.
#define VIRTIO_MBUF_ADDR(mb, vq) \
(*(uint64_t *)((uintptr_t)(mb) + (vq)->offset))
Fixes: 25f80d1087 ("net/virtio: fix packet corruption")
Cc: stable@dpdk.org
Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Previously, we miss to set intr_handle->fd which will be used as
target file for epoll to check LSC.
As a result, stdin (0) is used and intr thread keeps busy whenever
data comes from stdin.
To fix this, we use vhostfd as the target file for epoll to check
the link status change events. And we move intr_handle initialization
after vhost backend settup to make sure vhostfd is initialized.
Fixes: 35c4f85548 ("net/virtio-user: support to report net status")
Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
The virtio port link status will always be UP, even the port is stopped:
testpmd> port stop 0
Stopping ports...
Checking link statuses...
Port 0 Link Up - speed 10000 Mbps - full-duplex
Done
The link status is queried by link_update callback when LSC is disabled.
Which in turn queries the "status" field. However, the "status" is
read-only. I couldn't think of some proper ways to change the status
without doing device reset.
Instead of doing (the heavy) reset at stop, this patch introduced a flag,
which is set to 1 and 0 on start and stop, respectively. When it's set to
0, the link status is set to DOWN unconditionally.
Fixes: a85786dc81 ("virtio: fix states handling during initialization")
Cc: stable@dpdk.org
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
We only enabled LSC when using vhost-user as the backend, but it is
reported even when using vhost-kernel as the backend.
Fix it by only reportting LSC support when using vhost-user as the
backend.
Fixes: 35c4f85548 ("net/virtio-user: support to report net status")
Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
The feature negotiation in virtio-user is proven to be broken,
which results in device initialization failure.
Originally, we get features from vhost backend, and remove those
that are not supported. But when new feature is added, for example,
VIRTIO_NET_F_MTU, we fail to remove this new feature. Then, this
new feature will be negotiated, as both frontend and backend claim
to support this feature.
To fix it, we add a macro to record supported features, as a filter
to remove newly added features.
Fixes: 37a7eb2ae8 ("net/virtio-user: add device emulation layer")
Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
According to spec, we should write virtqueue index into the notify
address, rather than 1. Besides, some HW backend may rely on the data
written to identify which queue need to serve.
Fixes: 6ba1f63b5a ("virtio: support specification 1.0")
Cc: stable@dpdk.org
Signed-off-by: Xiao Wang <xiao.w.wang@intel.com>
Reviewed-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
This is a preparation to embed the generic rte_device into the rte_eth_dev
also for virtual devices.
Signed-off-by: Jan Blunck <jblunck@infradead.org>
Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com>
Secondary process doesn't properly attach to the rte_eth_device
initialized by the primary process.
Accessing device from secondary process (e.g. via rte_eth_rx_burst),
causes process to crash. because rte_eth_dev_data is not properly set.
The issue was flood by
'commit 7f95f78a8a ("ethdev: clear data when allocating device")'
which now clears rte_eth_dev_data entry.
For pci devices the struct is initialized by rte_eth_dev_pci_probe
->eth_dev_attach_secondary().
However, for virtio-user virtio_user_pmd_probe() is called instead of
rte_eth_dev_pci_probe().
The fix is to call rte_eth_dev_attach_secondary(), for secondary
process, from virtio_user_pmd_probe.
Fixes: 7f95f78a8a ("ethdev: clear data when allocating device")
Cc: stable@dpdk.org
Signed-off-by: Ami Sabo <amis@radware.com>
The patch change the prototype of callback function
(rte_intr_callback_fn) by removing the unnecessary parameter.
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Now that the m->next pointer and m->nb_segs is expected to be set (to
NULL and 1 respectively) after a mempool_get(), we can avoid to write them
in the Rx functions of drivers.
Only some drivers are patched, it's not an exhaustive patch. It gives
the idea to do the same in other drivers.
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Document the function and make it public, since it is used at several
places in the drivers. The old one is marked as deprecated.
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
So far, virtio-user with vhost-user as the backend can only support
client mode. So when vhost user backend is down, i.e., unix socket
connection is broken, the connection cannot be re-connected. We will
forcely set the link state to be down.
Note: virtio-user with vhost-kernel as the backend still cannot
support lsc now as we fail to find a way to monitor the backend, tap
device, up/down events.
Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Originally, we did not report support of VIRTIO_NET_F_STATUS.
This feature is not reported by vhost backend, instead, it
is added/removed by QEMU in virtio PCI case.
We report the support of this feature so that following patch
will depend on this feature to enable LSC interrupt.
Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
For rxq interrupt, the device (backend driver) will notify driver
through callfd. Each virtqueue has a callfd. To keep compatible
with the existing framework, we will give these callfds to
interrupt thread for listening for interrupts.
Before that, we need to allocate intr_handle, and fill callfds
into it so that driver can use it to set up rxq interrupt mode.
Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Originally, eventfd is opened when initializing each vq; and gets closded
in virtio_user_stop_device().
To make it possible to initialize intr_handle struct in init() in following
patch, we put the open() of all eventfds into init(); and put the close()
into uninit().
Suggested-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
This patch adds a new option 'iface' to change the interface name of
tap device with vhost-kernel as backend.
Signed-off-by: Wenfeng Liu <liuwf@arraynetworks.com.cn>
Reviewed-by: Jianfeng Tan <jianfeng.tan@intel.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
This patch implements support for the Virtio MTU feature.
When negotiated, the host shares its maximum supported MTU,
which is used as initial MTU and as maximum MTU the application
can set.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
The link state change interrupt can only be configured if the virtio device
supports MSIX. Prior to this change the writing of the vector to the PCI
config space was causing it to overwrite the initial part of the MAC
address since the MSIX vector is not in the config space and is occupied by
the MAC address.
This has been reproduced in Virtual Box (v5.0.30.r112061) in Windows 7.
Fixes: 954ea11540 ("virtio: do not report link state feature unless available")
Cc: stable@dpdk.org
Signed-off-by: Matt Peters <matt.peters@windriver.com>
Signed-off-by: Allain Legacy <allain.legacy@windriver.com>
virtio-user limits the qeueue number to 8 but provides no limit
check against the queue number input from user. If a bigger queue
number (> 8) is given, there is an overflow issue. Doing a sanity
check could avoid it.
Fixes: 37a7eb2ae8 ("net/virtio-user: add device emulation layer")
Cc: stable@dpdk.org
Signed-off-by: Wenfeng Liu <liuwf@arraynetworks.com.cn>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
The valid tap file descriptor range should be equal or greater
than zero instead of non-zero
Fixes: e3b434818b ("net/virtio-user: support kernel vhost")
Cc: stable@dpdk.org
Signed-off-by: Wenfeng Liu <liuwf@arraynetworks.com.cn>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
The minor change aims to remove the redundant computing and make
it easier to understand the code.
Signed-off-by: Zhiyong Yang <zhiyong.yang@intel.com>
Before this patch, the management of dependencies between directories
had several issues:
- the generation of .depdirs, done at configuration is slow: it can take
more than one minute on some slow targets (usually ~10s on a standard
PC without -j).
- for instance, it is possible to express a dependency like:
- app/foo depends on lib/librte_foo
- and lib/librte_foo depends on app/bar
But this won't work because the directories are traversed with a
depth-first algorithm, so we have to choose between doing 'app' before
or after 'lib'.
- the script depdirs-rule.sh is too complex.
- we cannot use "make -d" for debug, because the output of make is used for
the generation of .depdirs.
This patch moves the DEPDIRS-* variables in the upper Makefile, making
the dependencies much easier to calculate. A DEPDIRS variable is still
used to process library dependencies in LDLIBS.
After this commit, "make config" is almost immediate.
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Tested-by: Robin Jarry <robin.jarry@6wind.com>
Tested-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
The chosen fake capability (10G) is consistent with the reported
link speed in virtio_dev_link_update():
link.link_speed = SPEED_10G;
The feature is not marked in doc/guides/nics/features/virtio.ini
because it is only a fake value.
Signed-off-by: Ido Barnea <ibarnea@cisco.com>
[Thomas: comments added]
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
When any layout is used, the header is stored in the head room of mbuf.
mbuf is allocated and filled by user, means there is no gurateen the
header is all zero for non TSO case. Therefore, we have to do the reset
by ourself:
memest(hdr, 0, head_size);
The memset has two impacts on performance:
- memset could not be inlined, which is a bit costly.
- more importantly, it touches the mbuf, which could introduce severe
cache issues as described by former patch.
Similiary, we could do the same trick: reset just when necessary, when
the corresponding field is already 0, which is likely true for a simple
l2 forward case. It could boost the performance up to 20+% in micro
benchmarking.
Cc: stable@dpdk.org
Cc: Maxime Coquelin <maxime.coquelin@redhat.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
TSO is now enabled, but it's not actually being used by default in a
simple L2 forward mode. In such case, we have to zero the virtio net
headers, to inform the vhost backend that no offload is being used:
hdr->csum_start = 0;
hdr->csum_offset = 0;
hdr->flags = 0;
hdr->gso_type = 0;
hdr->gso_size = 0;
hdr->hdr_len = 0;
Such writes could be very costly; it introduces severe cache issues:
The above operations introduce cache write for each packet, which
stalls the read operation from the vhost backend.
The fact that virtio net header is initiated to zero in PMD driver
init stage means that these costly writes are unnecessary and could
be avoided:
if (hdr->csum_start != 0)
hdr->csum_start = 0;
And that's what the macro ASSIGN_UNLESS_EQUAL does. With this, the
performance drop introduced by TSO enabling is recovered: it could
be up to 20% in micro benchmarking.
Fixes: 58169a9c81 ("net/virtio: support Tx checksum offload")
Fixes: 696573046e ("net/virtio: support TSO")
Cc: stable@dpdk.org
Cc: Olivier Matz <olivier.matz@6wind.com>
Cc: Maxime Coquelin <maxime.coquelin@redhat.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Olivier Matz <olivier.matz@6wind.com>
Value returned from malloc is not checked for errors before being used.
This patch fixes following coverity issue.
static struct vhost_memory_kernel *
prepare_vhost_memory_kernel(void)
{
...
vm = malloc(sizeof(struct vhost_memory_kernel) +
max_regions *
sizeof(struct vhost_memory_region));
...
>>> CID 140744: (NULL_RETURNS)
>>> Dereferencing a null pointer "vm".
mr = &vm->regions[k++];
Coverity issue: 140744
Fixes: e3b434818b ("net/virtio-user: support kernel vhost")
Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
The vtpci_ops assignment needs the 'hw->port_id' as an input parameter.
That said, we should set 'hw->port_id' firstly, then do the vtpci_ops
assignment, while the code does reversely. That would result to a crash
when more than one virtio devices are used, because we keep assigning
proper vtpci_ops to virtio_hw_internal[0]->vtpci_ops, leaving the pointer
for other ports being NULL.
Reverse the order fixes this issue.
Fixes: 9470427c88 ("net/virtio: do not store PCI device pointer at shared memory")
Cc: stable@dpdk.org
Reported-by: Lei Yao <lei.a.yao@intel.com>
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Replace the raw I/O device memory read/write access with eal
abstraction for I/O device memory read/write access to fix
portability issues across different architectures.
Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Some virtual pmds report a different name than the vdev driver name
registered in eal.
While it does not hurt, let's try to be consistent.
Signed-off-by: David Marchand <david.marchand@6wind.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
When CONFIG_RTE_VIRTIO_USER is disabled (default on FreeBSD),
the virtio driver cannot be compiled:
librte_pmd_virtio.a(virtio_ethdev.o): In function `eth_virtio_dev_init':
(.text+0x1eba): undefined reference to `virtio_user_ops'
Reported-by: Andrew Rybchenko <arybchenko@solarflare.com>
Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
When the virtio PMD is used on top of a vhost that does not support
offloads, Rx offload capabilities are still advertised by
virtio_dev_info_get(). But if an application tries to start the PMD with
Rx offloads enabled (rxmode.hw_ip_checksum = 1), the initialization of
the device will fail with -ENOTSUP and the following log:
rx ip checksum not available on this host
This patch fixes the Rx offload capabilities returned by
virtio_dev_info_get() to be consistent with features advertised by the
host.
Fixes: 96cb671193 ("net/virtio: support Rx checksum offload")
Fixes: 86d59b2146 ("net/virtio: support LRO")
Cc: stable@dpdk.org
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
When closing virtio devices, close eventfds, free the struct to
store queue/irq mapping.
Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Tested-by: Lei Yao <lei.a.yao@intel.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
When virtio devices get stopped, tell the kernel to unbind the
mapping between interrupts and eventfds.
Note: it behaves differently from other NICs which close eventfds,
free struct. In virtio, we do those things when close device in
following patch.
Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Tested-by: Lei Yao <lei.a.yao@intel.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
This patch mainly allocates structure to store queue/irq mapping,
and configure queue/irq mapping down through PCI ops. It also creates
eventfds for each Rx queue and tell the kernel about the eventfd/intr
binding.
Note: So far, we hard-code 1:1 queue/irq mapping (each rx queue has
one exclusive interrupt), like this:
vec 0 -> config irq
vec 1 -> rxq0
vec 2 -> rxq1
...
which means, the "vectors" option of QEMU should be configured with
a value >= N+1 (N is the number of the queue pairs).
Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Tested-by: Lei Yao <lei.a.yao@intel.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
This patch implements interrupt enable/disable functions for each
Rx queue. And we rely on flags of avail queue as the hint for virtio
device to interrupt virtio driver or not.
Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Tested-by: Lei Yao <lei.a.yao@intel.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Add handler in virtio_pci_ops to set queue/irq bind.
Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Tested-by: Lei Yao <lei.a.yao@intel.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Under interrupt mode, rx_descriptor_done is used as an indicator
for applications to check if some number of packets are ready to
be received.
This patch enables this by checking used ring's local consumed idx
with shared (with backend) idx.
Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Tested-by: Lei Yao <lei.a.yao@intel.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
We need to define a prototype for such wrapper, which makes thing
too complicated. Remove wrapper and call set_config_irq directly.
Suggested-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Tested-by: Lei Yao <lei.a.yao@intel.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
The LSC flag is decided according to if VIRTIO_NET_F_STATUS feature
is negotiated. Copy the PCI info after the judgement will rewrite
the correct result.
Fixes: 198ab33677 ("net/virtio: move device initialization in a function")
CC: stable@dpdk.org
Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Tested-by: Lei Yao <lei.a.yao@intel.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
With vhost kernel, to enable multiqueue, we need backend device
in kernel support multiqueue feature. Specifically, with tap
as the backend, as linux/Documentation/networking/tuntap.txt shows,
we check if tap supports IFF_MULTI_QUEUE feature.
And for vhost kernel, each queue pair has a vhost fd, and with a tap
fd binding this vhost fd. All tap fds are set with the same tap
interface name.
Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>