This patch adds new HCA capability related to hairpin RQs. This new
capability, hairpin_data_buffer_locked, indicates whether HCA supports
locking data buffer of hairpin RQ in ICMC (Interconnect Context Memory
Cache).
Struct used to define RQ configuration (RQ context) is extended with
hairpin_data_buffer_type field, which configures data buffer for hairpin
RQ. It can take the following values:
- MLX5_RQC_HAIRPIN_DATA_BUFFER_TYPE_UNLOCKED_INTERNAL_BUFFER - hairpin
RQ's data buffer is stored in unlocked memory in ICMC.
- MLX5_RQC_HAIRPIN_DATA_BUFFER_TYPE_LOCKED_INTERNAL_BUFFER - hairpin
RQ's data buffer is stored in locked memory in ICMC.
Signed-off-by: Dariusz Sosnowski <dsosnowski@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
This patch extends HCA_CAP and SQ Context structs available in PRM. This
fields allow checking if NIC supports storing hairpin SQ's WQ buffer in
host memory and configuring such memory placement.
HCA capabilities are extended with the following fields:
- hairpin_sq_wq_in_host_mem - If set, then NIC supports using host
memory as a backing storage for hairpin SQ's WQ buffer.
- hairpin_sq_wqe_bb_size - Indicates the required size of SQ WQE basic
block.
SQ Context is extended with hairpin_wq_buffer_type which informs
NIC where SQ's WQ buffer will be stored. This field can take the
following values:
- MLX5_SQC_HAIRPIN_WQ_BUFFER_TYPE_INTERNAL_BUFFER - WQ buffer will be
stored in unlocked device memory.
- MLX5_SQC_HAIRPIN_WQ_BUFFER_TYPE_HOST_MEMORY - WQ buffer will be stored
in host memory. Buffer is provided by PMD.
Signed-off-by: Dariusz Sosnowski <dsosnowski@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Before this patch, implementation details and configuration of hairpin
queues were decided internally by the PMD. Applications had no control
over the configuration of Rx and Tx hairpin queues, despite number of
descriptors, explicit Tx flow mode and disabling automatic binding.
This patch addresses that by adding:
- Hairpin queue capabilities reported by PMDs.
- New configuration options for Rx and Tx hairpin queues.
Main goal of this patch is to allow applications to provide
configuration hints regarding placement of hairpin queues.
These hints specify whether buffers of hairpin queues should be placed
in host memory or in dedicated device memory. Different memory options
may have different performance characteristics and hairpin configuration
should be fine-tuned to the specific application and use case.
This patch introduces new hairpin queue configuration options through
rte_eth_hairpin_conf struct, allowing to tune Rx and Tx hairpin queues
memory configuration. Hairpin configuration is extended with the
following fields:
- use_locked_device_memory - If set, PMD will use specialized on-device
memory to store RX or TX hairpin queue data.
- use_rte_memory - If set, PMD will use DPDK-managed memory to store RX
or TX hairpin queue data.
- force_memory - If set, PMD will be forced to use provided memory
settings. If no appropriate resources are available, then device start
will fail. If unset and no resources are available, PMD will fallback
to using default type of resource for given queue.
If application chooses to use PMD default memory configuration, all of
these flags should remain unset.
Hairpin capabilities are also extended, to allow verification of support
of given hairpin memory configurations. Struct rte_eth_hairpin_cap is
extended with two additional fields of type rte_eth_hairpin_queue_cap:
- rx_cap - memory capabilities of hairpin RX queues.
- tx_cap - memory capabilities of hairpin TX queues.
Struct rte_eth_hairpin_queue_cap exposes whether given queue type
supports use_locked_device_memory and use_rte_memory flags.
Signed-off-by: Dariusz Sosnowski <dsosnowski@nvidia.com>
This patch is to fix the tunnel TSO not enabling issue, simplify
the logic of calculating 'Tx Buffer Size' of data descriptor with IPSec,
and fix handling that the mbuf size exceeds the TX descriptor
hardware limit(1B-16KB) which causes malicious behavior to the NIC.
Fixes: 1e728b0112 ("net/iavf: rework Tx path")
Signed-off-by: Zhichao Zeng <zhichaox.zeng@intel.com>
Tested-by: Ke Xu <ke1.xu@intel.com>
Acked-by: Radu Nicolau <radu.nicolau@intel.com>
ICE_DDP_PKG_SAME_VERSION_ALREADY_LOADED and
ICE_DDP_PKG_COMPATIBLE_ALREADY_LOADED should not be treated as
a DDP package init failure. Use ice_is_init_pkg_successful
to check return value of ice_copy_and_init_pkg.
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Tested-by: Zhimin Huang <zhiminx.huang@intel.com>
libbpf v0.8.0 deprecates the bpf_get_link_xdp_id() and
bpf_set_link_xdp_fd() functions. Use meson to detect if
bpf_xdp_attach() is available and if so, use the recommended
replacement functions bpf_xdp_query_id(), bpf_xdp_attach()
and bpf_xdp_detach().
Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
Signed-off-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Reviewed-by: Ferruh Yigit <ferruh.yigit@xilinx.com>
Make it visible in logs if something goes wrong on XDP program
removal failure.
Signed-off-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Reviewed-by: Ferruh Yigit <ferruh.yigit@xilinx.com>
Version-based checks are bad. It is better to check for required
functions. Check for bpf_object__next_program() in this case since
it appears last in libbpf among functions used to load program
without bpf_prog_load() which is deprecated in libbpf v0.7.0.
Signed-off-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Reviewed-by: Ferruh Yigit <ferruh.yigit@xilinx.com>
Check for xsk_socket__create_shared() function instead.
Signed-off-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Reviewed-by: Ferruh Yigit <ferruh.yigit@xilinx.com>
Include checked libxdp version in driver build skip reason.
Signed-off-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Reviewed-by: Ferruh Yigit <ferruh.yigit@xilinx.com>
RTE_NET_AF_XDP_LIBXDP is a conditional to include xdp/xsk.h and should
be set as soon as we know that the header is present.
RTE_NET_AF_XDP_SHARED_UMEM is one of conditions to use
xsk_socket__create_shared().
Both do not depend on libbpf and bpf/bpf.h presence.
Since else branch below returns error, there is no functional changes,
just style which will help on further rework.
Signed-off-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Reviewed-by: Ferruh Yigit <ferruh.yigit@xilinx.com>
If there is any error in packet or taildrop feature is enabled,
HW can reject those packets and put them in error queue. Driver
poll this error queue to free the buffers.
DPAA driver has an issue while freeing these rejected buffers. In
case of scatter gather packets, it is preparing the mbuf SG list
by scanning the HW descriptors and once the mbuf SG list prepared,
it free only first segment of the mbuf SG list by calling the
API rte_pktmbuf_free_seg(), This will leak the memory of other
segments and mempool can be empty.
Also there is one more issue, external buffer's memory may not
belong to mempool so driver itself free the external buffer
after successfully send the packet to HW to transmit instead of
let the HW to free it. So transmit function free all the external
buffers. But driver has no check for external buffers
while freeing the rejected buffers and this can do double free the
memory which can corrupt the user pool and crashes and undefined
behaviour of system can be seen.
This patch fixes the above mentioned issue by checking each
and every segment and freeing all the segments except external.
Fixes: 9124e65dd3 ("net/dpaa: enable Tx queue taildrop")
Cc: stable@dpdk.org
Signed-off-by: Gagandeep Singh <g.singh@nxp.com>
Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>
When using SG list to TX with external and direct buffers,
HW free direct buffers and driver free external buffers.
Software scans the complete SG mbuf list to find the external
buffers to free, but this is wrong as hardware can free the
direct buffers if any present in the list and same can be
re-allocated for other purpose in multi thread or high speed
running traffic environment with new data in it. So the software
which is scanning the SG mbuf list, if that list has any direct
buffer present then that direct buffer's next pointer can give
wrong pointer value, if already freed by hardware which
can do the mempool corruption or memory leak.
In this patch instead of relying on user given SG mbuf list
we are storing the buffers in an internal list which will
be scanned by driver after transmit to free non-direct
buffers.
This patch also fixes below issues.
Driver is freeing complete SG list by checking external buffer
flag in first segment only, but external buffer can be attached
to any of the segment. Because of this, driver either can double
free buffers or there can be memory leak.
In case of indirect buffers, driver is modifying the original
buffer list to free the indirect buffers but this original buffer
list is being used by driver even after transmit packets for
non-direct buffer cleanup. This can cause the buffer leak issue.
Fixes: f191d5abda ("net/dpaa: support external buffers in Tx")
Cc: stable@dpdk.org
Signed-off-by: Gagandeep Singh <g.singh@nxp.com>
Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>
moving the mempool ops registration before DPAA
devices probe so that device probe functions can
also be able to use mempool operations.
Signed-off-by: Gagandeep Singh <g.singh@nxp.com>
Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Creating and using driver's mempool for
allocating the SG table memory required for
FD creation.
Signed-off-by: Gagandeep Singh <g.singh@nxp.com>
Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Due to change in latest kernel, passing the interface name to
kernel through IOCTL as string instead of character pointer.
Signed-off-by: Rohit Raj <rohit.raj@nxp.com>
Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>
DPAA driver has dependency on kernel to perform various functionalities.
So kernel and DPDK version should be compatible for proper working.
This patch updates the DPAA guide with the information that user can
refer to find the compatible kernel version.
Signed-off-by: Gagandeep Singh <g.singh@nxp.com>
For packet length of size more than 2K bytes, segmented packets were
being received in DPDK even if mbuf size was greater than packet
length. This is due to the configuration in VSP.
This patch fixes the issue by configuring the VSP according to the
mbuf size configured during mempool configuration.
Fixes: e4abd4ff18 ("net/dpaa: support virtual storage profile")
Cc: stable@dpdk.org
Signed-off-by: Rohit Raj <rohit.raj@nxp.com>
Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Adding one second timeout in MC send command API to ensure it doesn't
gets stuck in case of failure.
Signed-off-by: Rohit Raj <rohit.raj@nxp.com>
Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>
When using SG list to TX with external and direct buffers,
HW free the direct buffers and driver free the external buffers.
Software scans the complete SG mbuf list to find the external
buffers to free, but this is wrong as hardware can free the
direct buffers if any present in the list and same can be
re-allocated for other purpose in multi thread or high speed
running traffic environment with new data in it. So the software
which is scanning the SG mbuf list, if that list has any direct
buffer present then that direct buffer's next pointer can give
wrong pointer value, if already freed by hardware which
can do the mempool corruption or memory leak.
In this patch instead of relying on user given SG mbuf list
we are storing the buffers in an internal list which will
be scanned by driver after transmit to free non-direct
buffers.
This patch also fixes 2 more memory leak issues.
Driver is freeing complete SG list by checking external buffer
flag in first segment only, but external buffer can be attached
to any of the segment. Because of which driver either can double
free buffers or there can be memory leak.
In case of indirect buffers, driver is modifying the original
buffer list to free the indirect buffers but this original buffer
list is being used even after transmit packets for software
buffer cleanup. This can cause the buffer leak issue.
Fixes: 6bfbafe18d ("net/dpaa2: support external buffers in Tx")
Cc: stable@dpdk.org
Signed-off-by: Gagandeep Singh <g.singh@nxp.com>
Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Creating and using driver's mempool for
allocating the SG table memory required for
FD creation instead of relying on user mempool.
Signed-off-by: Gagandeep Singh <g.singh@nxp.com>
Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Add support of ESP packet type in packet receive path.
Signed-off-by: Gagandeep Singh <g.singh@nxp.com>
Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Check if there exists free enqueue descriptors before enqueuing Tx
packet. Also try to free enqueue descriptors in case they are not
free.
Fixes: ed1cdbed6a ("net/dpaa2: support multiple Tx queues enqueue for ordered")
Cc: stable@dpdk.org
Signed-off-by: Brick Yang <brick.yang@nxp.com>
Signed-off-by: Rohit Raj <rohit.raj@nxp.com>
Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Driver is giving the wrong interface ID while setting the
error behaviour.
This patch fixes the issue by passing the correct MAC interface
index value to the API.
Fixes: 3d43972b1b ("net/dpaa2: do not drop parse error packets by dpdmux")
Cc: stable@dpdk.org
Signed-off-by: Vanshika Shukla <vanshika.shukla@nxp.com>
Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Driver has no proper handling to free unused
allocated mbufs in case of error or when the rx
processing complete because of which mempool
can be empty after some time.
This patch fixes this issue by moving the buffer
allocation code to the right place in driver.
Fixes: ecae71571b ("net/enetfec: support Rx/Tx")
Cc: stable@dpdk.org
Signed-off-by: Apeksha Gupta <apeksha.gupta@nxp.com>
Signed-off-by: Sachin Saxena <sachin.saxena@nxp.com>
Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Queue reset is missing in restart because of which
IO cannot work on device restart.
This patch fixes the issue by resetting the queues on
device restart.
Fixes: b84fdd3963 ("net/enetfec: support UIO")
Cc: stable@dpdk.org
Signed-off-by: Apeksha Gupta <apeksha.gupta@nxp.com>
Signed-off-by: Sachin Saxena <sachin.saxena@nxp.com>
Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>
This patch sets qman portal file descriptors used for
interrupts IO processing in non-blocking mode to avoid
any unwanted blocks while IO operations over the FD.
Signed-off-by: Vanshika Shukla <vanshika.shukla@nxp.com>
Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>
NIC HW controllers often come with congestion management support on
various HW objects such as Rx queue depth or mempool queue depth.
Also, it can support various modes of operation such as RED
(Random early discard), WRED etc on those HW objects.
Add a framework to express such modes(enum rte_cman_mode) and
introduce (enum rte_eth_cman_obj) to enumerate the different
objects where the modes can operate on.
Add RTE_CMAN_RED mode of operation and RTE_ETH_CMAN_OBJ_RX_QUEUE,
RTE_ETH_CMAN_OBJ_RX_QUEUE_MEMPOOL objects.
Introduce reserved fields in configuration structure
backed by rte_eth_cman_config_init() to add new configuration
parameters without ABI breakage.
Add rte_eth_cman_info_get() API to get the information such as
supported modes and objects.
Add rte_eth_cman_config_init(), rte_eth_cman_config_set() APIs
to configure congestion management on those object with associated mode.
Finally, add rte_eth_cman_config_get() API to retrieve the
applied configuration.
Signed-off-by: Jerin Jacob <jerinj@marvell.com>
Signed-off-by: Sunil Kumar Kori <skori@marvell.com>
Signed-off-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Acked-by: Sunil Kumar Kori <skori@marvell.com>
This patch support query HW descriptor from hns3 device. HW descriptor
is also called BD (buffer description) which is shared memory between
software and hardware.
Signed-off-by: Min Hu (Connor) <humin29@huawei.com>
Signed-off-by: Dongdong Liu <liudongdong3@huawei.com>
Acked-by: Ferruh Yigit <ferruh.yigit@xilinx.com>
Added the ethdev Rx/Tx desc dump API which provides functions for query
descriptor from device. HW descriptor info differs in different NICs.
The information demonstrates I/O process which is important for debug.
As the information is different between NICs, the new API is introduced.
Signed-off-by: Min Hu (Connor) <humin29@huawei.com>
Signed-off-by: Dongdong Liu <liudongdong3@huawei.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@xilinx.com>
mana can receive Rx interrupts from kernel through RDMA verbs interface.
Implement Rx interrupts in the driver.
Signed-off-by: Long Li <longli@microsoft.com>
MANA allocates device queues through the IB layer when starting Rx queues.
When device is stopped all the queues are unmapped and freed.
Signed-off-by: Long Li <longli@microsoft.com>
MANA allocate device queues through the IB layer when starting Tx queues.
When device is stopped all the queues are unmapped and freed.
Signed-off-by: Long Li <longli@microsoft.com>
The hardware layer of MANA understands the device queue and doorbell
formats. Those functions are implemented for use by packet RX/TX code.
Signed-off-by: Long Li <longli@microsoft.com>
MANA hardware has iommu built-in, that provides hardware safe access to
user memory through memory registration. Since memory registration is an
expensive operation, this patch implements a two level memory registration
cache mechanisum for each queue and for each port.
Signed-off-by: Long Li <longli@microsoft.com>
Rx hardware queue is allocated when starting the queue. This function is
for queue configuration pre starting.
Signed-off-by: Long Li <longli@microsoft.com>
Currently this PMD supports RSS configuration when the device is stopped.
Configuring RSS in running state will be supported in the future.
Signed-off-by: Long Li <longli@microsoft.com>
MANA supports PCI hot plug events. Add this interrupt to DPDK core so its
parent PMD can detect device removal during Azure servicing or live
migration.
Signed-off-by: Long Li <longli@microsoft.com>
MANA defines its memory allocation functions to override IB layer default
functions to allocate device queues. This patch adds the code for device
configuration and stop.
Signed-off-by: Long Li <longli@microsoft.com>
MANA is a PCI device. It uses IB verbs to access hardware through the
kernel RDMA layer. This patch introduces build environment and basic
device probe functions.
Signed-off-by: Long Li <longli@microsoft.com>
nfp_net_recv_pkts() should not return a value that less than 0 and the
inappropriate return value in receive loop also causes the memory leak.
Modify code to avoid return a value less than 0. Furthermore, When
nfp_net_recv_pkts() break out from the receive loop because of packet
problems, a rte_mbuf will not be freed and it will cause memory leak.
Free the rte_mbuf before break out.
Fixes: b812daadad ("nfp: add Rx and Tx")
Cc: stable@dpdk.org
Signed-off-by: Long Wu <long.wu@corigine.com>
Signed-off-by: Niklas Söderlund <niklas.soderlund@corigine.com>
Reviewed-by: Chaoyong He <chaoyong.he@corigine.com>
For the Rx logic, the representor port decap packet from the
corresponding ring.
For the Tx logic, the representor port prepend the metadata
into packet, and send to firmware through the queue 0 of pf
vNIC.
Signed-off-by: Chaoyong He <chaoyong.he@corigine.com>
Reviewed-by: Niklas Söderlund <niklas.soderlund@corigine.com>