In multi-process, the secondary process will remap PCI during
initialization, but the mapping is not removed in the uninit path,
the device is not closed, and the device busy error will be reported
when the device is hotplugged.
This patch unmaps PCI device at secondary process uninitialization
based on virtio_rempa_pci.
Fixes: 36a7a2e7a5 ("net/virtio: move PCI device init in dedicated file")
Cc: stable@dpdk.org
Signed-off-by: Yuan Wang <yuanx.wang@intel.com>
Tested-by: Wei Ling <weix.ling@intel.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
GCC 12 raises the following warning:
In file included from ../lib/mempool/rte_mempool.h:46,
from ../lib/mbuf/rte_mbuf.h:38,
from ../lib/vhost/vhost_crypto.c:7:
../lib/vhost/vhost_crypto.c: In function ‘rte_vhost_crypto_fetch_requests’:
../lib/eal/x86/include/rte_memcpy.h:371:9: warning: array subscript 1 is
outside array bounds of ‘struct virtio_crypto_op_data_req[1]’
[-Warray-bounds]
371 | rte_mov32((uint8_t *)dst + 3 * 32, (const uint8_t *)src + 3 * 32);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../lib/vhost/vhost_crypto.c:1178:42: note: while referencing ‘req’
1178 | struct virtio_crypto_op_data_req req;
| ^~~
Split this function and separate the per descriptor copy.
This makes the code clearer, and the compiler happier.
Note: logs for errors have been moved to callers to avoid duplicates.
Fixes: 3c79609fda ("vhost/crypto: handle virtually non-contiguous buffers")
Cc: stable@dpdk.org
Signed-off-by: David Marchand <david.marchand@redhat.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Split the virtqs virt-queue resource between
the configuration threads.
Also need pre-created virt-queue resource
after virtq destruction.
This accelerates the LM process and reduces its time by 30%.
Signed-off-by: Li Zhang <lizh@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
pre-created virt-queue sub-resource in device probe stage
and then modify virtqueue in device config stage.
Steer table also need to support dummy virt-queue.
This accelerates the LM process and reduces its time by 40%.
Signed-off-by: Li Zhang <lizh@nvidia.com>
Signed-off-by: Yajun Wu <yajunw@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Split the virtqs device close tasks after
stopping virt-queue between the configuration threads.
This accelerates the LM process and
reduces its time by 50%.
Signed-off-by: Li Zhang <lizh@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Split the virtqs LM log between the configuration threads.
This accelerates the LM process and reduces its time by 20%.
Signed-off-by: Li Zhang <lizh@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
The virtq object and all its sub-resources use a lot of
FW commands and can be accelerated by the MT management.
Split the virtqs creation between the configuration threads.
This accelerates the LM process and reduces its time by 20%.
Signed-off-by: Li Zhang <lizh@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
The driver creates a direct MR object of
the HW for each VM memory region,
which maps the VM physical address to
the actual physical address.
Later, after all the MRs are ready,
the driver creates an indirect MR to group all the direct MRs
into one virtual space from the HW perspective.
Create direct MRs in parallel using the MT mechanism.
After completion, the primary thread creates the indirect MR
needed for the following virtqs configurations.
This optimization accelerrate the LM process and
reduce its time by 5%.
Signed-off-by: Li Zhang <lizh@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
The configuration threads tasks need a container to
support multiple tasks assigned to a thread in parallel.
Use rte_ring container per thread to manage
the thread tasks without locks.
The caller thread from the user context opens a task to
a thread and enqueue it to the thread ring.
The thread polls its ring and dequeue tasks.
That’s why the ring should be in multi-producer
and single consumer mode.
Anatomic counter manages the tasks completion notification.
The threads report errors to the caller by
a dedicated error counter per task.
Signed-off-by: Li Zhang <lizh@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
The LM process includes a lot of objects creations and
destructions in the source and the destination servers.
As much as LM time increases, the packet drop of the VM increases.
To improve LM time need to parallel the configurations for mlx5 FW.
Add internal multi-thread management in the driver for it.
A new devarg defines the number of threads and their CPU.
The management is shared between all the devices of the driver.
Since the event_core also affects the datapath events thread,
reduce the priority of the datapath event thread to
allow fast configuration of the devices doing the LM.
Signed-off-by: Li Zhang <lizh@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
The driver used a single global lock for any synchronization
needed for the datapath and control path.
It is better to group the critical sections with
the other ones that should be synchronized.
Replace the global lock with the following locks:
1.virtq locks(per virtq) synchronize datapath polling and
parallel configurations on the same virtq.
2.A doorbell lock synchronizes doorbell update,
which is shared for all the virtqs in the device.
3.A steering lock for the shared steering objects updates.
Signed-off-by: Li Zhang <lizh@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
dev_config operation is called in LM progress.
LM time is very critical because all
the VM packets are dropped directly at that time.
Move the virtq creation to probe time and
only modify the configuration later in
the dev_config stage using the new ability
to modify virtq.
This optimization accelerates the LM process and
reduces its time by 70%.
Signed-off-by: Li Zhang <lizh@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
A virtq configuration can be modified after the virtq creation.
Added the following modifiable fields:
1.address fields: desc_addr/used_addr/available_addr
2.hw_available_index
3.hw_used_index
4.virtio_q_type
5.version type
6.queue mkey
7.feature bit mask: tso_ipv4/tso_ipv6/tx_csum/rx_csum
8.event mode: event_mode/event_qpn_or_msix
Signed-off-by: Li Zhang <lizh@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
To speed up queue creation time, event QP and CQ will create only once.
Each virtq creation will reuse same event QP and CQ.
Because FW will set event QP to error state during virtq destroy,
need modify event QP to RESET state, then modify QP to RTS state as
usual. This can save about 1.5ms for each virtq creation.
After SW QP reset, QP pi/ci all become 0 while CQ pi/ci keep as
previous. Add new variable qp_ci to save SW QP ci. Move QP pi
independently with CQ ci.
Add new function mlx5_vdpa_drain_cq to drain CQ CQE after virtq
release.
Signed-off-by: Yajun Wu <yajunw@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Support set QP to RESET state.
Signed-off-by: Yajun Wu <yajunw@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
The motivation of this change is to reduce vDPA device queue creation
time by creating some queue resource in vDPA device probe stage.
In VM live migration scenario, this can reduce 0.8ms for each queue
creation, thus reduce LM network downtime.
To create queue resource(umem/counter) in advance, we need to know
virtio queue depth and max number of queue VM will use.
Introduce two new devargs: queues(max queue pair number) and queue_size
(queue depth). Two args must be both provided, if only one argument
provided, the argument will be ignored and no pre-creation.
The queues and queue_size must also be identical to vhost configuration
driver later receive. Otherwise either the pre-create resource is wasted
or missing or the resource need destroy and recreate(in case queue_size
mismatch).
Pre-create umem/counter will keep alive until vDPA device removal.
Signed-off-by: Yajun Wu <yajunw@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
The driver wrongly takes the capability value for
the number of virtq pairs instead of just the number of virtqs.
Adjust all the usages of it to be the number of virtqs.
Fixes: c2eb33aaf9 ("vdpa/mlx5: manage virtqs by array")
Cc: stable@dpdk.org
Signed-off-by: Li Zhang <lizh@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Since the commit 02798b0735 ("vhost: improve virtio-net layer logs"),
vhost logs contain the socket path as a prefix.
Async dequeue path was copied from the sync dequeue path but a log
was incorrect.
Fixes: 84d5204310 ("vhost: support async dequeue for split ring")
Signed-off-by: David Marchand <david.marchand@redhat.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
This patchs renames the local variables free_entries to
avail_entries in the dequeue path.
Indeed, this variable represents the number of new packets
available in the Virtio transmit queue, so these entries
are actually used, not free.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
vDPA driver first uses kernel driver to allocate doorbell (VAR) area for
each device. Then uses var->mmap_off and var->length to mmap uverbs device
file as doorbell userspace virtual address.
Current kernel driver provides var->mmap_off equal to page start of VAR.
It's fine with x86 4K page server, because VAR physical address is only 4K
aligned thus locate in 4K page start.
But with aarch64 64K page server, the actual VAR physical address has
offset within page (not located in 64K page start).
So the vDPA driver needs to add this within page offset
(caps.doorbell_bar_offset) to get the right VAR virtual address.
Fixes: 62c813706e ("vdpa/mlx5: map doorbell")
Cc: stable@dpdk.org
Signed-off-by: Yajun Wu <yajunw@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
This patch implements packed ring dequeue data path
for asynchronous vhost.
Signed-off-by: Cheng Jiang <cheng1.jiang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
This patch allows vring_state_changed() to clear in-flight
dequeue packets. It also clears the in-flight packets in
a thread-safe way in destroy_device().
Signed-off-by: Yuan Wang <yuanx.wang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Jiayu Hu <jiayu.hu@intel.com>
rte_vhost_clear_queue_thread_unsafe() supports to clear
in-flight packets for async enqueue only. But after
supporting async dequeue, this API should support async dequeue too.
This patch also adds the thread-safe version of this API,
the difference between the two API is that thread safety uses lock.
These APIs maybe used to clean up packets in the async channel
to prevent packet loss when the device state changes or
when the device is destroyed.
Signed-off-by: Yuan Wang <yuanx.wang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Jiayu Hu <jiayu.hu@intel.com>
Virtio specification supports guest checksum offloading
for L4, which is enabled with VIRTIO_NET_F_GUEST_CSUM
feature negotiation. However, the Vhost PMD does not
advertise Tx checksum offload capabilities.
Advertising these offload capabilities at the ethdev level
is not enough, because we could still end-up with the
application enabling these offloads while the guest not
negotiating it.
This patch advertises the Tx checksum offload capabilities,
and introduces a compatibility layer to cover the case
VIRTIO_NET_F_GUEST_CSUM has not been negotiated but the
application does configure the Tx checksum offloads. This
function performs the L4 Tx checksum in SW for UDP and TCP.
Compared to Rx SW checksum, the Tx SW checksum function
needs to compute the pseudo-header checksum, as we cannot
know whether it was done before.
This patch does not advertise SCTP checksum offloading
capability for now, but it could be handled later if the
need arises.
Reported-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
Reviewed-by: Cheng Jiang <cheng1.jiang@intel.com>
Virtio specification supports host checksum offloading
for L4, which is enabled with VIRTIO_NET_F_CSUM feature
negotiation. However, the Vhost PMD does not advertise
Rx checksum offload capabilities, so we can end-up with
the VIRTIO_NET_F_CSUM feature being negotiated, implying
the Vhost library returns packets with checksum being
offloaded while the application did not request for it.
Advertising these offload capabilities at the ethdev level
is not enough, because we could still end-up with the
application not enabling these offloads while the guest
still negotiate them.
This patch advertises the Rx checksum offload capabilities,
and introduces a compatibility layer to cover the case
VIRTIO_NET_F_CSUM has been negotiated but the application
does not configure the Rx checksum offloads. This function
performis the L4 Rx checksum in SW for UDP and TCP. Note
that it is not needed to calculate the pseudo-header
checksum, because the Virtio specification requires that
the driver do it.
This patch does not advertise SCTP checksum offloading
capability for now, but it could be handled later if the
need arises.
Reported-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
Reviewed-by: Cheng Jiang <cheng1.jiang@intel.com>
This trivial patch makes the vlan_strip field of the
pmd_internal struct a boolean, since it is handled as
such.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
This patch enables the compliant offloading flags mode by
default, which prevents the Rx path to set Tx offload flags,
which is illegal. A new legacy-ol-flags devarg is introduced
to enable the legacy behaviour.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
The Virtio specification requires that in case of checksum
offloading, the pseudo-header checksum must be set in the
L4 header.
When received from another Vhost-user port, the packet
checksum might already contain the pseudo-header checksum
but we have no way to know it. So we have no other choice
than doing the pseudo-header checksum systematically.
This patch handles this using the rte_net_intel_cksum_prepare()
helper.
Fixes: 859b480d5a ("vhost: add guest offload setting")
Cc: stable@dpdk.org
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
This patch reverts
commit 10f4620f02 ("app/testpmd: modify mac in csum forwarding"),
as the checksum forwarding is expected to only perform
checksum and not also overwrites the source and destination MAC addresses.
Doing so, we can test checksum offloading with real traffic
without breaking broadcast packets.
Fixes: 10f4620f02 ("app/testpmd: modify mac in csum forwarding")
Cc: stable@dpdk.org
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Chenbo Xia <chenbo.xia@intel.com>
Acked-by: Aman Singh <aman.deep.singh@intel.com>
Fix to read and write the correct register fields for yt8521s and
yt8531s PHY, since mode check was added.
Fixes: 1c44384fce ("net/ngbe: support custom PHY interfaces")
Cc: stable@dpdk.org
Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
Fix to poll some specific registers, which expect bit value 0.
'w32w' is used in registers where the write command bit is set and
waits for the bit clear to complete the write.
Fixes: 24a4c76aff ("net/txgbe: add error types and registers")
Cc: stable@dpdk.org
Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
Move related specific testpmd commands into this driver directory.
While at it, fix checkpatch warnings.
Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Ferruh Yigit <ferruh.yigit@xilinx.com>
Move related specific testpmd commands into this driver directory.
While at it, fix checkpatch warnings.
Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Ferruh Yigit <ferruh.yigit@xilinx.com>
When the testpmd start-up, it will check MTU range,
if MTU > flubfsz, it will lead testpmd start fail.
Because the hw->flbufsz doesn't have the initialized
value, so it will lead the bug.
Fixes: 417be15e5f ("net/nfp: make sure MTU is never larger than mbuf size")
Cc: stable@dpdk.org
Signed-off-by: Peng Zhang <peng.zhang@corigine.com>
Signed-off-by: Chaoyong He <chaoyong.he@corigine.com>
Now NFP NIC support two type of RSS logic, NFP_NET_CFG_CTRL_RSS and
NFP_NET_CFG_CTRL_RSS2, use NFP_NET_CFG_CTRL_RSS2 if NIC capability
support, otherwise use NFP_NET_CFG_CTRL_RSS.
Signed-off-by: Jin Liu <jin.liu@corigine.com>
Signed-off-by: Diana Wang <na.wang@corigine.com>
Signed-off-by: Peng Zhang <peng.zhang@corigine.com>
Signed-off-by: Chaoyong He <chaoyong.he@corigine.com>
Signed-off-by: Niklas Söderlund <niklas.soderlund@corigine.com>
Move macro __round_mask, round_up and round_down from C file to
corresponding head file, will be used by TX function of nfp net
firmware with NFDk.
Signed-off-by: Jin Liu <jin.liu@corigine.com>
Signed-off-by: Diana Wang <na.wang@corigine.com>
Signed-off-by: Peng Zhang <peng.zhang@corigine.com>
Signed-off-by: Chaoyong He <chaoyong.he@corigine.com>
Signed-off-by: Niklas Söderlund <niklas.soderlund@corigine.com>
This commit does not introduce new features, just integrate some common
logic into helper functions to reduce the same logic and increase code
reuse, include queue stop and queue close logic, will be used when NFP
net stop and close.
queue stop: reset queue
queue close: reset and release queue
Modify NFP net stop and close function, use helper function to stop
and close queue instead of before logic.
Signed-off-by: Jin Liu <jin.liu@corigine.com>
Signed-off-by: Diana Wang <na.wang@corigine.com>
Signed-off-by: Peng Zhang <peng.zhang@corigine.com>
Signed-off-by: Chaoyong He <chaoyong.he@corigine.com>
Signed-off-by: Niklas Söderlund <niklas.soderlund@corigine.com>
Add ethdev option for firmware with NFDk, implement tx_queue setup
function for firmware with NFDk.
Signed-off-by: Jin Liu <jin.liu@corigine.com>
Signed-off-by: Diana Wang <na.wang@corigine.com>
Signed-off-by: Peng Zhang <peng.zhang@corigine.com>
Signed-off-by: Chaoyong He <chaoyong.he@corigine.com>
Signed-off-by: Niklas Söderlund <niklas.soderlund@corigine.com>
Add and modify the nfp PMD struct and macro that will be used by firmware
with NFDk.
Signed-off-by: Jin Liu <jin.liu@corigine.com>
Signed-off-by: Diana Wang <na.wang@corigine.com>
Signed-off-by: Peng Zhang <peng.zhang@corigine.com>
Signed-off-by: Chaoyong He <chaoyong.he@corigine.com>
Signed-off-by: Niklas Söderlund <niklas.soderlund@corigine.com>
Modify nfp driver logic, add firmware version (NFD3 or NFDK) judgment, will
according to the firmware version, mount different driver functions.
Signed-off-by: Jin Liu <jin.liu@corigine.com>
Signed-off-by: Diana Wang <na.wang@corigine.com>
Signed-off-by: Peng Zhang <peng.zhang@corigine.com>
Signed-off-by: Chaoyong He <chaoyong.he@corigine.com>
Signed-off-by: Niklas Söderlund <niklas.soderlund@corigine.com>
Add support for a new type of NIC NFP3800 card, and update some
network card data acquisition interface functions.
Signed-off-by: Jin Liu <jin.liu@corigine.com>
Signed-off-by: Diana Wang <na.wang@corigine.com>
Signed-off-by: Peng Zhang <peng.zhang@corigine.com>
Signed-off-by: Chaoyong He <chaoyong.he@corigine.com>
Signed-off-by: Niklas Söderlund <niklas.soderlund@corigine.com>
Add 'nfd3' into the firmware with NFD3 eth driver function name,
preparation for the next work, as we will support another version
firmware with NFDk.
Signed-off-by: Jin Liu <jin.liu@corigine.com>
Signed-off-by: Diana Wang <na.wang@corigine.com>
Signed-off-by: Peng Zhang <peng.zhang@corigine.com>
Signed-off-by: Chaoyong He <chaoyong.he@corigine.com>
Signed-off-by: Niklas Söderlund <niklas.soderlund@corigine.com>
The NFP eth driver function name start with 'nfp_net', but set_mac
function start with 'nfp' only, rename it, be consistent with others.
Signed-off-by: Jin Liu <jin.liu@corigine.com>
Signed-off-by: Diana Wang <na.wang@corigine.com>
Signed-off-by: Peng Zhang <peng.zhang@corigine.com>
Signed-off-by: Chaoyong He <chaoyong.he@corigine.com>
Signed-off-by: Niklas Söderlund <niklas.soderlund@corigine.com>