In live migration, before logging the virtq, the driver queries the
virtq indexes after moving it to suspend mode.
Separate this method to new function mlx5_vdpa_virtq_stop as a
preparation for reusing.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
As a preparation to listen the virtqs status before the device is
configured, manage the virtqs structures in array instead of list.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Change references to ABI 20.0.1 to use ABI v21, see
https://doc.dpdk.org/guides/contributing/abi_policy.html#general-guidelines
"Major ABI versions are declared no more frequently than yearly.
Compatibility with the major ABI version is mandatory in subsequent
releases until a new major ABI version is declared."
Combined ABI policy and versioning in maintainers, add map files to the
filter to more closely monitor future ABI changes.
Signed-off-by: Ray Kinsella <mdr@ashroe.eu>
Add log prints to improve driver status following.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
When both, direct and indirect notifier management cannot be
configured, return an error.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Add support for the next 2 callbacks:
get_vfio_device_fd and get_notify_area.
This will allow direct HW doorbell ringing from guest and will save CPU
usage in host.
By this patch, the QEMU will map the physical address of the virtio
device in guest directly to the physical address of the HW device
doorbell.
The guest doorbell write is 2 bytes transaction while some Mellanox nics
support only 4 bytes transactions.
Remove ConnectX-5 and BF1 devices support which don't support 2B
doorbell writes for HW triggering.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
The configure and close operations may be called a lot of time by vhost
library according to the virtio connections in the guest.
VAR is the device memory space for the virtio queues doorbells.
Each VAR page can be shared for more than one queue while its owner must
synchronize the writes to it.
The mlx5 driver allocates single VAR page for all its queues.
Therefore, it is better to allocate it in probe device level instead of
creating and destroying it per new connection.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
The rte_vhost_get_vring_base function is being called to get the values
of last_avail_idx and last_used_idx.
These fields will not have the correct values in case the function
returns an error.
Adding a check for the function return value, and in the case of an
error, set the fields to be zero and print a warning message.
Fixes: bff7350110 ("vdpa/mlx5: prepare virtio queues")
Cc: stable@dpdk.org
Signed-off-by: Asaf Penso <asafp@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
In order to avoid potential conflicts, rename the PCI_ADDR
enum value to VDPA_ADDR_PCI in vdpa_addr_type_enum.
All symbols referencing this enum are experimental, so it
does not break API policy.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Matan Azrad <matan@mellanox.com>
In function mlx5_devx_cmd_create_tir(), the 40 bytes of RSS key are
copied in 10 iterations, 4 bytes each time using the MLX5_SET macro.
As result the RSS key is copied into TIR context in swapped byte order.
This patch fixes the issue, using memcpy() to copy the RSS key as is.
The struct member mlx5_devx_tir_attr.rx_hash_toeplitz_key is updated
to byte array type.
Fixes: c3aea272ee ("net/mlx5: create advanced Rx object via DevX")
Cc: stable@dpdk.org
Signed-off-by: Dekel Peled <dekelp@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
In the current state, when preforming read/write
transactions we must wait for a completion in order
to run the next transaction, and all transactions are
performed by order.
Relaxed Ordering is a PCI optimization which by enabling it
we allow the system to perform read/writes in a different
order without having to wait for completion and improve
the performance in that matter.
This commit introduces the creation of relaxed ordering
memory regions in mlx5.
As relaxed ordering is an optimization, drivers that
do not support it can simply ignore it and therefore
it is enabled by default.
Signed-off-by: Shiri Kuzin <shirik@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
Remove setting ALLOW_EXPERIMENTAL_API individually for each Makefile and
meson.build. Instead, enable ALLOW_EXPERIMENTAL_API flag across app, lib
and drivers.
This changes reduces the clutter across the project while still
maintaining the functionality of ALLOW_EXPERIMENTAL_API i.e. warning
external applications about experimental API usage.
Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Signed-off-by: David Marchand <david.marchand@redhat.com>
When the HW finishes to consume the guest Rx descriptors, it creates a
CQE in the CQ.
The mlx5 driver arms the CQ to get notifications when a specific CQE
index is created - the index to be armed is the next CQE index which
should be polled by the driver.
The mlx5 driver configured the kernel driver to send notification to the
guest callfd in the same time it arrives to the mlx5 driver.
It means that the guest was notified only for each first CQE in a poll
cycle, so if the driver polled CQEs of all the virtio queue available
descriptors, the guest was not notified again for the rest because
there was no any new cycle for polling.
Hence, the Rx queues might be stuck when the guest didn't work with
poll mode.
Move the guest notification to be after the driver consumes all the
SW own CQEs.
By this way, guest will be notified only after all the SW CQEs are
polled.
Also init the CQ to be with HW owner in the start.
Fixes: 8395927cdf ("vdpa/mlx5: prepare HW queues")
Signed-off-by: Matan Azrad <matan@mellanox.com>
The completion event mechanism should work only if at least one of the
virtqs has valid callfd to be notified on.
When all the virtqs works with poll mode, the event mechanism should not
be configured.
The driver didn't take it into account and crashed in the above case.
Do not configure event interrupt when all the virtqs are in poll mode.
Fixes: 8395927cdf ("vdpa/mlx5: prepare HW queues")
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
The mlx5 vDPA driver manages QP and CQ in order to forward the HW event
to the guest by the callfd file descriptor for each virtq.
The driver arms the CQ for the next CQE index that should be
completed by the HW in order to create completion event.
In the SW completion event handler, the driver arms the CQ again for the
next index,
The CQE index in the CQ doorbell and in the CQ doorbell record was
masked incorrectly with the CQ size mask while it should be masked only
with 0xFFFFFF mask.
Remove the CQ size mask, stay only with 0xFFFFFF mask.
Fixes: 8395927cdf ("vdpa/mlx5: prepare HW queues")
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Maxime Coquelin <maxime.coquelin@redhat.com>
This adds new device id to the list of Mellanox devices
that runs mlx5 PMD.
- BlueField-2 integrated ConnectX-6 Dx network controller
This device is not ready yet, it is in development stage.
Signed-off-by: Raslan Darawsheh <rasland@mellanox.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Acked-by: Matan Azrad <matan@mellanox.com>
In order to support virtio queue creation by the FW, RoCE mode
should be disabled in the device.
Do it by netlink which is like the devlink tool commands:
1. devlink dev param set pci/[pci] name enable_roce value false
cmode driverinit
2. devlink dev reload pci/[pci]
Or by sysfs which is like:
echo 0 > /sys/bus/pci/devices/[pci]/roce_enable
The IB device is matched again after ROCE disabling.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Add support for live migration feature by the HW:
Create a single Mkey that maps the memory address space of the
VHOST live migration log file.
Modify VIRTIO_NET_Q object and provide vhost_log_page,
dirty_bitmap_mkey, dirty_bitmap_size, dirty_bitmap_addr
and dirty_bitmap_dump_enable.
Modify VIRTIO_NET_Q object and move state to SUSPEND.
Query VIRTIO_NET_Q and get hw_available_idx and hw_used_idx.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Maxime Coquelin <maxime.coquelin@redhat.com>
The HW supports only 4 bytes doorbell writing detection.
The virtio device set only 2 bytes when it rings the doorbell.
Map the virtio doorbell detected by the virtio queue kickfd to the HW
VAR space when it expects to get the virtio emulation doorbell.
Use the EAL interrupt mechanism to get notification when a new event
appears in kickfd by the guest and write 4 bytes to the HW doorbell space
in the notification callback.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Add support for set_vring_state operation.
Using DevX API the virtq state can be changed as described in PRM:
enable - move to ready state.
disable - move to suspend state.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Add a steering object to be managed by a new file mlx5_vdpa_steer.c.
Allow promiscuous flow to scatter the device Rx packets to the virtio
queues using RSS action.
In order to allow correct RSS in L3 and L4, split the flow to 7 flows
as required by the device.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Add support for the next features in virtq configuration:
VIRTIO_F_RING_PACKED,
VIRTIO_NET_F_HOST_TSO4,
VIRTIO_NET_F_HOST_TSO6,
VIRTIO_NET_F_CSUM,
VIRTIO_NET_F_GUEST_CSUM,
VIRTIO_F_VERSION_1,
These features support depends in the DevX capabilities reported by the
device.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
The HW virtq object represents an emulated context for a VIRTIO_NET
virtqueue which was created and managed by a VIRTIO_NET driver as
defined in VIRTIO Specification.
Add support to prepare and release all the basic HW resources needed
the user virtqs emulation according to the rte_vhost configurations.
This patch prepares the basic configurations needed by DevX commands to
create a virtq.
Add new file mlx5_vdpa_virtq.c to manage virtq operations.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
As an arrangement to the vitrio queues creation, a 2 QPs and CQ may be
created for the virtio queue.
The design is to trigger an event for the guest and for the vdpa driver
when a new CQE is posted by the HW after the packet transition.
This patch add the basic operations to create and destroy the above HW
objects and to trigger the CQE events when a new CQE is posted.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
In order to map the guest physical addresses used by the virtio device
guest side to the host physical addresses used by the HW as the host
side, memory regions are created.
By this way, for example, the HW can translate the addresses of the
packets posted by the guest and to take the packets from the correct
place.
The design is to work with single MR which will be configured to the
virtio queues in the HW, hence a lot of direct MRs are grouped to single
indirect MR.
Create functions to prepare and release MRs with all the related
resources that are required for it.
Create a new file mlx5_vdpa_mem.c to manage all the MR related code
in the driver.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Add support for get_features and get_protocol_features operations.
Part of the features are reported by the DevX capabilities.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Support get_queue_num operation to get the maximum number of queues
supported by the device.
This number comes from the DevX capabilities.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Add a new driver to support vDPA operations by Mellanox devices.
The first Mellanox devices which support vDPA operations are
ConnectX-6 Dx and Bluefield1 HCA for their PF ports and VF ports.
This driver is depending on rdma-core like the mlx5 PMD, also it is
going to use mlx5 DevX to create HW objects directly by the FW.
Hence, the common/mlx5 library is linked to the mlx5_vdpa driver.
This driver will not be compiled by default due to the above
dependencies.
Register a new log type for this driver.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>