Vdpa device failed to initialize 2nd VQ during setup. From FW syndrome,
unsupported CQE size was specified in CQ initialization attributes.
The unsupported CQE size comes from uninitialized stack struct data, and
the struct has new fields defined recently which are not initialized in
vdpa code.
This patch initializes cq creation attributes with zero to avoid such
random data.
Fixes: 79a7e409a2 ("common/mlx5: prepare support of packet pacing")
Signed-off-by: Xueming Li <xuemingl@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
The issue is relevant only for the timer event modes: 0 and 1.
When the HW finishes to consume a burst of the guest Rx descriptors,
it creates a CQE in the CQ.
When traffic stops, the mlx5 driver arms the CQ to get a notification
when a specific CQE index is created - the index to be armed is the
next CQE index which should be polled by the driver.
The mlx5 driver configured the kernel driver to send notification to
the guest callfd in the same time of the armed CQE event.
It means that the guest was notified only for each first CQE in a
poll cycle, so if the driver polled CQEs of all the virtio queue
available descriptors, the guest was not notified again for the rest
because there was no any new CQE to trigger the guest notification.
Hence, the Rx queues might be stuck when the guest didn't work with
poll mode.
Remove prior kernel notification, and do manual notification after CQ
polling.
Fixes: a9dd7275a1 ("vdpa/mlx5: optimize notification events")
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Xueming Li <xuemingl@mellanox.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
When a virtq is destroyed by the driver, it must be removed from the
steering RQT which holds its reference.
The driver didn't remove the virtq from RQT before destroying it what
caused HW syndrome in virtq unset.
Remove the virtq from RQT before destroying it.
Fixes: 9f09b1ca15 ("vdpa/mlx5: recreate a virtq becoming enabled")
Cc: stable@dpdk.org
Signed-off-by: Xueming Li <xuemingl@mellanox.com>
Signed-off-by: Matan Azrad <matan@mellanox.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
There are a lot of per virtq operations in the live migration
handling.
Before the driver support for queue update, when a virtq was not valid,
all the LM handling was terminated.
But now, when the driver supports queue update, the virtq can be invalid
as legal stage.
Skip invalid virtq in LM handling.
Fixes: c47d6e8333 ("vdpa/mlx5: support queue update")
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Xueming Li <xuemingl@mellanox.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
When dynamic flex parser feature is introduced, the support for misc
parameters 4 of flow table entry (FTE) match set is needed. The
structure of "mlx5_ifc_fte_match_param_bits" is extended with
"mlx5_ifc_fte_match_set_misc4_bits" at the end of it. The total size
of the FTE match set will be changed into 384 bytes from 320 bytes.
Low level user space driver (rdma-core) will have the validation of
the length of FTE match set. In the old release that no MISC4
supported in the rdma-core, and this will break the backward
compatibility, even if the MISC4 is not used in most cases, like
in vDPA driver.
In order not to break the compatibility old rdma-core, the length
adjustment needs to be done. In mlx5 vDPA driver, the lengths of
the matcher and value are both set to 320 without MISC4. There is
no need to change the structure definition, all bytes of the MISC4
will be discarded if it is not needed. Since the MISC4 parameter
is aligned with a 64B boundary and so does the whole FTE match set
parameter, there is no need to take any padding and alignment into
consideration when calculating the size.
Fixes: daa38a8924 ("net/mlx5: add flow translation of eCPRI header")
Signed-off-by: Bing Zhao <bingz@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Now that mlx5_pci PMD checks for enabled classes and performs
probe(), remove() of associated classes, individual class driver
does not need to check if other driver is enabled.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
Migrate mlx5 net, vdpa and regex PMD to start using mlx5 common class
driver.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
mlx5_common is shared library between mlx5 net, VDPA and regex PMD.
It is better to use common initialization helper instead of using
RTE_PRIORITY_CLASS priority.
Suggested-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
This patch advertises VHOST_USER_PROTOCOL_F_STATUS
support in the MLX5 driver so that that the protocol
feature is negotiated.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
This patch advertises VHOST_USER_PROTOCOL_F_STATUS
support in the IFC driver so that that the protocol
feature is negotiated.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
Introduce the RTE_LOG_REGISTER macro to avoid the code duplication
in the logtype registration process.
It is a wrapper macro for declaring the logtype, registering it and
setting its level in the constructor context.
Signed-off-by: Jerin Jacob <jerinj@marvell.com>
Acked-by: Adam Dybkowski <adamx.dybkowski@intel.com>
Acked-by: Sachin Saxena <sachin.saxena@nxp.com>
Acked-by: Akhil Goyal <akhil.goyal@nxp.com>
Last changes in vDPA device management by vhost library may cause queue
ready state update after the device configuration.
So, there is chance that some queue configuration information will be
known only after the device was configured.
Add support to reconfigure a queue after the device configuration
according to the queue state update and the configuration changes.
Adjust the host notifier and the guest notification configuration to be
per queue and to be applied in the enablement process.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
As an arrangement to per queue operations in the vDPA device it is
needed to change the next experimental API:
The API ``rte_vhost_host_notifier_ctrl`` was changed to be per queue
instead of per device.
A `qid` parameter was added to the API arguments list.
Setting the parameter to the value RTE_VHOST_QUEUE_ALL configures the
host notifier to all the device queues as done before this patch.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
The CQ polling is necessary in order to manage guest notifications when
the guest doesn't work with poll mode (callfd != -1).
The CQ polling scheduling method can affect the host CPU utilization and
the traffic bandwidth.
Define 3 modes to control the CQ polling scheduling:
1. A timer thread which automatically adjusts its delays to the coming
traffic rate.
2. A timer thread with fixed delay time.
3. Interrupts: Each CQE burst arms the CQ in order to get an interrupt
event in the next traffic burst.
When traffic becomes off, mode 3 is taken automatically.
The interrupt management takes a lot of CPU cycles but forward traffic
event to the guest very fast.
Timer thread save the interrupt overhead but may add delay for the guest
notification.
Add device arguments to control on the mode.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
The vDPA driver uses a CQ in order to know when traffic works were
completed by the HW.
Each traffic burst completion adds a CQE to the CQ.
When the vDPA driver detects CQEs in the CQ, it triggers the guest
notification for the corresponding queue and consumes all of them.
There is collapse feature in the HW that configures the HW to write all
the CQEs in the first entry of the CQ.
Using this feature, the vDPA driver can read only the first CQE,
validate that the completion counter inside the CQE was changed and if
so, to notify the guest.
Use CQ collapse feature in order to improve the poll utilization.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
When the virtio guest driver doesn't work with poll mode, the driver
creates event mechanism in order to schedule completion notifications
for each virtq burst traffic.
When traffic comes to a virtq, a CQE will be added to the virtq CQ by
the FW.
The driver requests interrupt for the next CQE index, and when interrupt
is triggered, the driver polls the CQ and notifies the guest by virtq
callfd writing.
According to the described method, the interrupts will be triggered for
each burst of traffic. The burst size depends on interrupt latency.
Interrupts management takes a lot of CPU cycles and using it for each
traffic burst takes big portion of CPU capacity.
When traffic is on, using timer for CQ poll scheduling instead of
interrupts saves a lot of CPU cycles.
Move CQ poll scheduling to be done by timer in case of running traffic.
Request interrupts only when traffic is off.
The timer scheduling management is done by a new dedicated thread uses
a usleep command.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
This patch split the vDPA header file in two, making
rte_vdpa_device structure opaque to the application.
Applications should only include rte_vdpa.h, while drivers
should include both rte_vdpa.h and rte_vdpa_dev.h.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Adrián Moreno <amorenoz@redhat.com>
This removes the notion of device ID in Vhost library
as a preliminary step to get rid of the vDPA device ID.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Adrián Moreno <amorenoz@redhat.com>
This patch is a preliminary step to get rid of the
vDPA device ID. It makes vDPA callbacks to use the
vDPA device struct as a reference instead of the ID.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Adrián Moreno <amorenoz@redhat.com>
This patch makes the vDPA framework to no more
support only PCI devices, but any devices by relying
on the generic device name as identifier.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Adrián Moreno <amorenoz@redhat.com>
The guest virtio device may request MTU updating when the vhost backend
device exposes a capability to support it.
Expose the MTU feature capability.
At configuration time, check the requested MTU and update it in the HW
device.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
In other to fill the new requirement for virtq
configuration, set the single PD managed by the driver for
all the virtqs.
Cc: stable@dpdk.org
Signed-off-by: Matan Azrad <matan@mellanox.com>
Signed-off-by: Xueming Li <xuemingl@mellanox.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Add support for statistics operations.
A DevX counter object is allocated per virtq in order to
manage the virtq statistics.
The counter object is allocated before the virtq creation
and destroyed after it, so the statistics are valid only in
the life time of the virtq.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
The glue file mlx5_glue.c is based on Linux specifics APIs.
Move it (including file mlx5_glue.h) to common/mlx5/linux directory.
Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
A regular memcmp function was used to compare between two objects of
type `struct rte_pci_addr`.
Due to the alignment rules of compiler structure builders, some memory
is not initiated in the structure even though all the fields were
initiated.
Therefore, the comparison may fail even though the PCI addresses are
identical and to cause false failure in probe.
Use the dedicated API to compare 2 PCI addresses.
Fixes: 75dd0ae917 ("vdpa/mlx5: disable RoCE")
Cc: stable@dpdk.org
Signed-off-by: Matan Azrad <matan@mellanox.com>
Tested-by: Noa Ezra <noae@mellanox.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
The virtq configurations may be changed when it moves from disabled
state to enabled state.
Listen to the state callback even if the device is not configured.
Recreate the virtq when it moves from disabled state to enabled state
and when the device is configured.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
In live migration, before logging the virtq, the driver queries the
virtq indexes after moving it to suspend mode.
Separate this method to new function mlx5_vdpa_virtq_stop as a
preparation for reusing.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
As a preparation to listen the virtqs status before the device is
configured, manage the virtqs structures in array instead of list.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Change references to ABI 20.0.1 to use ABI v21, see
https://doc.dpdk.org/guides/contributing/abi_policy.html#general-guidelines
"Major ABI versions are declared no more frequently than yearly.
Compatibility with the major ABI version is mandatory in subsequent
releases until a new major ABI version is declared."
Combined ABI policy and versioning in maintainers, add map files to the
filter to more closely monitor future ABI changes.
Signed-off-by: Ray Kinsella <mdr@ashroe.eu>
Add log prints to improve driver status following.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
When both, direct and indirect notifier management cannot be
configured, return an error.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Add support for the next 2 callbacks:
get_vfio_device_fd and get_notify_area.
This will allow direct HW doorbell ringing from guest and will save CPU
usage in host.
By this patch, the QEMU will map the physical address of the virtio
device in guest directly to the physical address of the HW device
doorbell.
The guest doorbell write is 2 bytes transaction while some Mellanox nics
support only 4 bytes transactions.
Remove ConnectX-5 and BF1 devices support which don't support 2B
doorbell writes for HW triggering.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
The configure and close operations may be called a lot of time by vhost
library according to the virtio connections in the guest.
VAR is the device memory space for the virtio queues doorbells.
Each VAR page can be shared for more than one queue while its owner must
synchronize the writes to it.
The mlx5 driver allocates single VAR page for all its queues.
Therefore, it is better to allocate it in probe device level instead of
creating and destroying it per new connection.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
The rte_vhost_get_vring_base function is being called to get the values
of last_avail_idx and last_used_idx.
These fields will not have the correct values in case the function
returns an error.
Adding a check for the function return value, and in the case of an
error, set the fields to be zero and print a warning message.
Fixes: bff7350110 ("vdpa/mlx5: prepare virtio queues")
Cc: stable@dpdk.org
Signed-off-by: Asaf Penso <asafp@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
In order to avoid potential conflicts, rename the PCI_ADDR
enum value to VDPA_ADDR_PCI in vdpa_addr_type_enum.
All symbols referencing this enum are experimental, so it
does not break API policy.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Matan Azrad <matan@mellanox.com>
In function mlx5_devx_cmd_create_tir(), the 40 bytes of RSS key are
copied in 10 iterations, 4 bytes each time using the MLX5_SET macro.
As result the RSS key is copied into TIR context in swapped byte order.
This patch fixes the issue, using memcpy() to copy the RSS key as is.
The struct member mlx5_devx_tir_attr.rx_hash_toeplitz_key is updated
to byte array type.
Fixes: c3aea272ee ("net/mlx5: create advanced Rx object via DevX")
Cc: stable@dpdk.org
Signed-off-by: Dekel Peled <dekelp@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
In the current state, when preforming read/write
transactions we must wait for a completion in order
to run the next transaction, and all transactions are
performed by order.
Relaxed Ordering is a PCI optimization which by enabling it
we allow the system to perform read/writes in a different
order without having to wait for completion and improve
the performance in that matter.
This commit introduces the creation of relaxed ordering
memory regions in mlx5.
As relaxed ordering is an optimization, drivers that
do not support it can simply ignore it and therefore
it is enabled by default.
Signed-off-by: Shiri Kuzin <shirik@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
There is a common macro __rte_packed for packing structs,
which is now used where appropriate for consistency.
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Remove setting ALLOW_EXPERIMENTAL_API individually for each Makefile and
meson.build. Instead, enable ALLOW_EXPERIMENTAL_API flag across app, lib
and drivers.
This changes reduces the clutter across the project while still
maintaining the functionality of ALLOW_EXPERIMENTAL_API i.e. warning
external applications about experimental API usage.
Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Signed-off-by: David Marchand <david.marchand@redhat.com>
When the HW finishes to consume the guest Rx descriptors, it creates a
CQE in the CQ.
The mlx5 driver arms the CQ to get notifications when a specific CQE
index is created - the index to be armed is the next CQE index which
should be polled by the driver.
The mlx5 driver configured the kernel driver to send notification to the
guest callfd in the same time it arrives to the mlx5 driver.
It means that the guest was notified only for each first CQE in a poll
cycle, so if the driver polled CQEs of all the virtio queue available
descriptors, the guest was not notified again for the rest because
there was no any new cycle for polling.
Hence, the Rx queues might be stuck when the guest didn't work with
poll mode.
Move the guest notification to be after the driver consumes all the
SW own CQEs.
By this way, guest will be notified only after all the SW CQEs are
polled.
Also init the CQ to be with HW owner in the start.
Fixes: 8395927cdf ("vdpa/mlx5: prepare HW queues")
Signed-off-by: Matan Azrad <matan@mellanox.com>
The completion event mechanism should work only if at least one of the
virtqs has valid callfd to be notified on.
When all the virtqs works with poll mode, the event mechanism should not
be configured.
The driver didn't take it into account and crashed in the above case.
Do not configure event interrupt when all the virtqs are in poll mode.
Fixes: 8395927cdf ("vdpa/mlx5: prepare HW queues")
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
The mlx5 vDPA driver manages QP and CQ in order to forward the HW event
to the guest by the callfd file descriptor for each virtq.
The driver arms the CQ for the next CQE index that should be
completed by the HW in order to create completion event.
In the SW completion event handler, the driver arms the CQ again for the
next index,
The CQE index in the CQ doorbell and in the CQ doorbell record was
masked incorrectly with the CQ size mask while it should be masked only
with 0xFFFFFF mask.
Remove the CQ size mask, stay only with 0xFFFFFF mask.
Fixes: 8395927cdf ("vdpa/mlx5: prepare HW queues")
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Maxime Coquelin <maxime.coquelin@redhat.com>
This adds new device id to the list of Mellanox devices
that runs mlx5 PMD.
- BlueField-2 integrated ConnectX-6 Dx network controller
This device is not ready yet, it is in development stage.
Signed-off-by: Raslan Darawsheh <rasland@mellanox.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Acked-by: Matan Azrad <matan@mellanox.com>
In order to support virtio queue creation by the FW, RoCE mode
should be disabled in the device.
Do it by netlink which is like the devlink tool commands:
1. devlink dev param set pci/[pci] name enable_roce value false
cmode driverinit
2. devlink dev reload pci/[pci]
Or by sysfs which is like:
echo 0 > /sys/bus/pci/devices/[pci]/roce_enable
The IB device is matched again after ROCE disabling.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Add support for live migration feature by the HW:
Create a single Mkey that maps the memory address space of the
VHOST live migration log file.
Modify VIRTIO_NET_Q object and provide vhost_log_page,
dirty_bitmap_mkey, dirty_bitmap_size, dirty_bitmap_addr
and dirty_bitmap_dump_enable.
Modify VIRTIO_NET_Q object and move state to SUSPEND.
Query VIRTIO_NET_Q and get hw_available_idx and hw_used_idx.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Maxime Coquelin <maxime.coquelin@redhat.com>
The HW supports only 4 bytes doorbell writing detection.
The virtio device set only 2 bytes when it rings the doorbell.
Map the virtio doorbell detected by the virtio queue kickfd to the HW
VAR space when it expects to get the virtio emulation doorbell.
Use the EAL interrupt mechanism to get notification when a new event
appears in kickfd by the guest and write 4 bytes to the HW doorbell space
in the notification callback.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Maxime Coquelin <maxime.coquelin@redhat.com>