numam-dpdk

Author	SHA1	Message	Date
Andrew Rybchenko	68e8ca7b59	ethdev: avoid usage of ULL for 64-bit unsigned constants Use UINT64_C() macro instead. Signed-off-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>	2021-10-22 19:11:35 +02:00
Andrew Rybchenko	4852c647d1	ethdev: replace single bit masks with macros The macros RTE_BIT32 and RTE_BIT64 are used to replace single bit masks. Do not switch VLAN offload flags since type is not fixed size. Signed-off-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>	2021-10-22 18:36:34 +02:00
Ferruh Yigit	295968d174	ethdev: add namespace Add 'RTE_ETH' namespace to all enums & macros in a backward compatible way. The macros for backward compatibility can be removed in next LTS. Also updated some struct names to have 'rte_eth' prefix. All internal components switched to using new names. Syntax fixed on lines that this patch touches. Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com> Acked-by: Tyler Retzlaff <roretzla@linux.microsoft.com> Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru> Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com> Acked-by: Jerin Jacob <jerinj@marvell.com> Acked-by: Wisam Jaddo <wisamm@nvidia.com> Acked-by: Rosen Xu <rosen.xu@intel.com> Acked-by: Chenbo Xia <chenbo.xia@intel.com> Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com> Acked-by: Somnath Kotur <somnath.kotur@broadcom.com>	2021-10-22 18:15:38 +02:00
Konstantin Ananyev	a136b08c10	test/bonding: fix after hiding ethdev internal structures link bounding auto-test internally creates emulated ethdev. Some tests change Rx/Tx functions of this emulated device on the fly: by directly modifying rte_eth_dev fields and without doing stop/start for these devices. As now ethdev uses rte_eth_fp_ops[] for fast-path functions, these direct changes doesn't make expected effect. Fix the problem by guarding fast-path functions changes with rte_eth_dev_stop()/rte_eth_dev_start(). Fixes: `7a0935239b` ("ethdev: make fast-path functions to use new flat array") Reported-by: Lewei Yang <leweix.yang@intel.com> Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com> Tested-by: Ferruh Yigit <ferruh.yigit@intel.com>	2021-10-22 17:55:39 +02:00
Ferruh Yigit	ede6356582	drivers/net: fix removing jumbo offload flag After DEV_RX_OFFLOAD_JUMBO_FRAME flag removed, drivers give jumbo frame decisions based on MTU value checks, but some of the checks were wrong by mistake, causing device initialization to fail, fixing them. Fixes: `b563c14212` ("ethdev: remove jumbo offload flag") Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com> Tested-by: Yu Jiang <yux.jiang@intel.com>	2021-10-22 17:44:18 +02:00
Ferruh Yigit	1aca4fdb00	doc: remove jumbo offload feature Jumbo offload is no more announced as capability, and 'DEV_RX_OFFLOAD_JUMBO_FRAME' offload flag is removed. This patch is also removing 'Jumbo frame' feature from documentation. Fixes: `b563c14212` ("ethdev: remove jumbo offload flag") Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com> Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>	2021-10-22 17:26:07 +02:00
Ciara Loftus	985e7673c0	net/af_xdp: fix max Rx packet length Commit `1bb4a528c4` ("ethdev: fix max Rx packet length") clarified the expected usage of the max_rx_pktlen and max_mtu values and implemented some extra checks on these values to ensure they are sane. After this, the AF_XDP PMD fails to initialise. The value for max_rx_pktlen which represents the max size of the Ethernet frame was set to ETH_FRAME_LEN (1514) and the max_mtu which represents the size of the payload was set to the max size of the Ethernet frame. This did not make sense, as naturally the maximum frame size should be greater than the payload size. Fix this by setting the max_rx_pktlen equal to the max size of the Ethernet frame as expected, and the max MTU equal to the max_rx_pktlen less the overhead which is set to the size of an Ethernet header plus CRC. Fixes: `1bb4a528c4` ("ethdev: fix max Rx packet length") Signed-off-by: Ciara Loftus <ciara.loftus@intel.com> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>	2021-10-22 17:12:50 +02:00
Ivan Ilchenko	b26bee10ee	ethdev: forbid MTU set before device configure rte_eth_dev_configure() always sets MTU to either dev_conf.rxmode.mtu or RTE_ETHER_MTU if application doesn't provide the value. So, there is no point to allow rte_eth_dev_set_mtu() before since set value will be overwritten on configure anyway. Fixes: `1bb4a528c4` ("ethdev: fix max Rx packet length") Signed-off-by: Ivan Ilchenko <ivan.ilchenko@oktetlabs.ru> Signed-off-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>	2021-10-22 15:26:54 +02:00
Andrew Rybchenko	9ce1717d3e	ethdev: remove unused L2 tunnel mask defines Fixes: `cf47acc0f9` ("ethdev: remove L2 tunnel offload control API") Cc: stable@dpdk.org Signed-off-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>	2021-10-22 12:03:52 +02:00
Eli Britstein	6a8b64fd5e	app/testpmd: fix packet burst spreading stats RX/TX functions (rte_eth_rx_burst/rte_eth_tx_burst) get 'nb_pkts' argument, which specifies the maximum number to receive/transmit. It can be 0..nb_pkts, meaning nb_pkts+1 options. Testpmd can provide statistics of the burst sizes ('set record-burst-stats on') by incrementing an array cell of index <burst-size>. This array is mistakenly [MAX_PKT_BURST] size. Receiving the maximum burst will cause out of bound write. Enlarge the spread stats array by one cell to fix it. Fixes: `af75078fec` ("first public release") Cc: stable@dpdk.org Signed-off-by: Eli Britstein <elibr@nvidia.com> Reviewed-by: Matan Azrad <matan@nvidia.com> Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>	2021-10-22 04:23:15 +02:00
Chengchang Tang	2fc3e696a7	net/hns3: add runtime config for mailbox limit time Current, the max waiting time for MBX response is 500ms, but in some scenarios, it is not enough. Since it depends on the response of the kernel mode driver, and its response time is related to the scheduling of the system. In this special scenario, most of the cores are isolated, and only a few cores are used for system scheduling. When a large number of services are started, the scheduling of the system will be very busy, and the reply of the mbx message will time out, which will cause our PMD initialization to fail. This patch add a runtime config to set the max wait time. For the above scenes, users can adjust the waiting time to a suitable value by themselves. Fixes: `463e748964` ("net/hns3: support mailbox") Cc: stable@dpdk.org Signed-off-by: Chengchang Tang <tangchengchang@huawei.com> Signed-off-by: Min Hu (Connor) <humin29@huawei.com>	2021-10-22 04:11:43 +02:00
Xueming Li	5984037501	app/testpmd: add forwarding engine for shared Rx queue To support shared Rx queue, this patch introduces dedicate forwarding engine. The engine groups received packets by mbuf->port into sub-group, updates stream statistics and simply frees packets. Signed-off-by: Xueming Li <xuemingl@nvidia.com>	2021-10-22 00:09:19 +02:00
Xueming Li	6574483365	app/testpmd: force shared Rx queue polled on same core Shared Rx queue must be polled on same core. This patch checks and stops forwarding if shared RxQ being scheduled on multiple cores. It's suggested to use same number of Rx queues and polling cores. Signed-off-by: Xueming Li <xuemingl@nvidia.com> Acked-by: Xiaoyun Li <xiaoyun.li@intel.com>	2021-10-22 00:09:15 +02:00
Xueming Li	7fbf4a02eb	app/testpmd: dump port info for shared Rx queue In case of shared Rx queue, source port mbuf from polling result isn't the Rx port of forwarding stream. To provide original port ID, this patch dumps mbuf->port for each packet in verbose mode if shared Rx queue enabled. Signed-off-by: Xueming Li <xuemingl@nvidia.com> Acked-by: Xiaoyun Li <xiaoyun.li@intel.com> Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com>	2021-10-22 00:09:11 +02:00
Xueming Li	f4d178c13b	app/testpmd: add parameter for shared Rx queue Adds "--rxq-share=X" parameter to enable shared RxQ. Rx queue is shared if device supports, otherwise fallback to standard RxQ. Shared Rx queues are grouped per X ports. X defaults to UINT32_MAX, implies all ports join share group 1. Queue ID is mapped equally with shared Rx queue ID. Signed-off-by: Xueming Li <xuemingl@nvidia.com> Acked-by: Thomas Monjalon <thomas@monjalon.net> Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com>	2021-10-22 00:09:07 +02:00
Xueming Li	0bacfaa0c7	app/testpmd: dump device capability and Rx domain info Dump device capability and Rx domain ID if shared Rx queue is supported by device. Signed-off-by: Xueming Li <xuemingl@nvidia.com> Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru> Acked-by: Xiaoyun Li <xiaoyun.li@intel.com> Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com>	2021-10-22 00:09:00 +02:00
Xueming Li	93e441c9a0	ethdev: get device capability name as string This patch adds API to return name of device capability. Signed-off-by: Xueming Li <xuemingl@nvidia.com> Reviewed-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru> Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com> Acked-by: Thomas Monjalon <thomas@monjalon.net>	2021-10-22 00:08:57 +02:00
Xueming Li	dd22740cc2	ethdev: introduce shared Rx queue In current DPDK framework, each Rx queue is pre-loaded with mbufs to save incoming packets. For some PMDs, when number of representors scale out in a switch domain, the memory consumption became significant. Polling all ports also leads to high cache miss, high latency and low throughput. This patch introduces shared Rx queue. Ports in same Rx domain and switch domain could share Rx queue set by specifying non-zero sharing group in Rx queue configuration. Shared Rx queue is identified by share_rxq field of Rx queue configuration. Port A RxQ X can share RxQ with Port B RxQ Y by using same shared Rx queue ID. No special API is defined to receive packets from shared Rx queue. Polling any member port of a shared Rx queue receives packets of that queue for all member ports, port_id is identified by mbuf->port. PMD is responsible to resolve shared Rx queue from device and queue data. Shared Rx queue must be polled in same thread or core, polling a queue ID of any member port is essentially same. Multiple share groups are supported. PMD should support mixed configuration by allowing multiple share groups and non-shared Rx queue on one port. Example grouping and polling model to reflect service priority: Group1, 2 shared Rx queues per port: PF, rep0, rep1 Group2, 1 shared Rx queue per port: rep2, rep3, ... rep127 Core0: poll PF queue0 Core1: poll PF queue1 Core2: poll rep2 queue0 PMD advertise shared Rx queue capability via RTE_ETH_DEV_CAPA_RXQ_SHARE. PMD is responsible for shared Rx queue consistency checks to avoid member port's configuration contradict each other. Signed-off-by: Xueming Li <xuemingl@nvidia.com> Reviewed-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru> Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com>	2021-10-22 00:08:50 +02:00
Huisong Li	17faaed854	ethdev: fix PCI device release in secondary process In secondary process, rte_eth_dev_close() doesn't clear eth_dev->data. If calling rte_dev_remove() after rte_eth_dev_close(), in rte_eth_dev_pci_generic_remove() function, the released eth device still can be found by its name in shared memory. As a result, the eth device will be released repeatedly. The state of the eth device is modified to RTE_ETH_DEV_UNUSED after rte_eth_dev_close(). So this state can be used to avoid this problem. Fixes: `dcd5c8112b` ("ethdev: add PCI driver helpers") Cc: stable@dpdk.org Signed-off-by: Huisong Li <lihuisong@huawei.com> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>	2021-10-21 23:15:34 +02:00
Satheesh Paul	00ea15e7a3	net/cnxk: support port ID flow action This patch adds support for rte flow action type port_id to enable directing packets from an input port PF to an output port which is a VF of the input port PF. Signed-off-by: Satheesh Paul <psatheesh@marvell.com> Acked-by: Jerin Jacob <jerinj@marvell.com>	2021-10-21 18:59:40 +02:00
Satheesh Paul	15f0b8a5b9	common/cnxk: support port ID action This patch adds ROC API to support flow port ID action type. Signed-off-by: Satheesh Paul <psatheesh@marvell.com> Acked-by: Jerin Jacob <jerinj@marvell.com>	2021-10-21 18:58:50 +02:00
Xuan Ding	ad6f01945a	net/virtio: fix avail descriptor ID Vhost will update desc’s Buffer ID advance to next used descriptor when VIRTIO_F_IN_ORDER feature negotiated. When virtio reuses the descriptor, the Buffer ID should be restored even VIRTQ_DESC_F_INDIRECT feature negotiated. Fixes: `b473061b0e` ("net/virtio: fix indirect descriptors in packed datapaths") Cc: stable@dpdk.org Signed-off-by: Xuan Ding <xuan.ding@intel.com> Signed-off-by: Yong Liu <yong.liu@intel.com> Signed-off-by: Miao Li <miao.li@intel.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2021-10-21 14:24:21 +02:00
Gaoxiang Liu	028f06e8be	net/vhost: merge stats loop in datapath To improve performance in vhost Tx/Rx, merge vhost stats loop. eth_vhost_tx has 2 loop of send num iteraion. It can be merge into one. eth_vhost_rx has the same issue as Tx. Signed-off-by: Gaoxiang Liu <liugaoxiang@huawei.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2021-10-21 14:24:21 +02:00
Xuan Ding	7c61fa08b7	vhost: enable IOMMU for async vhost The use of IOMMU has many advantages, such as isolation and address translation. This patch extends the capability of DMA engine to use IOMMU if the DMA engine is bound to vfio. When set memory table, the guest memory will be mapped into the default container of DPDK. Signed-off-by: Xuan Ding <xuan.ding@intel.com> Tested-by: Yvonne Yang <yvonnex.yang@intel.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2021-10-21 14:24:21 +02:00
Xuan Ding	56259f7fc0	vfio: allow partially unmapping adjacent memory Currently, if we map a memory area A, then map a separate memory area B that by coincidence happens to be adjacent to A, current implementation will merge these two segments into one, and if partial unmapping is not supported, these segments will then be only allowed to be unmapped in one go. In other words, given segments A and B that are adjacent, it is currently not possible to map A, then map B, then unmap A. Fix this by adding a notion of "chunk size", which will allow subdividing segments into equally sized segments whenever we are dealing with an IOMMU that does not support partial unmapping. With this change, we will still be able to merge adjacent segments, but only if they are of the same size. If we keep with our above example, adjacent segments A and B will be stored as separate segments if they are of different sizes. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Signed-off-by: Xuan Ding <xuan.ding@intel.com> Tested-by: Yvonne Yang <yvonnex.yang@intel.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2021-10-21 14:24:21 +02:00
Xuan Ding	04bcc80204	net/virtio: fix indirect descriptor reconnection Add initialization for packed ring indirect descriptors in reconnection path. Fixes: `381f39ebb7` ("net/virtio: fix packed ring indirect descricptors setup") Cc: stable@dpdk.org Signed-off-by: Xuan Ding <xuan.ding@intel.com> Tested-by: Yinan Wang <yinan.wang@intel.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2021-10-21 14:24:21 +02:00
Li Feng	5a4fbe79e6	vhost: add sanity check on inflight last index The index in rte_vhost_set_last_inflight_io_split is from the frontend driver, check if it's in the virtqueue range. Fixes: `bb0c2de960` ("vhost: add APIs to operate inflight ring") Cc: stable@dpdk.org Signed-off-by: Li Feng <fengli@smartx.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2021-10-21 14:24:21 +02:00
Ivan Malov	6474b59448	net/virtio: fix Tx checksum for tunnel packets Tx prepare method calls rte_net_intel_cksum_prepare(), which handles tunnel packets correctly, but Tx burst path does not take tunnel presence into account when computing the offsets. Fixes: `58169a9c81` ("net/virtio: support Tx checksum offload") Cc: stable@dpdk.org Signed-off-by: Ivan Malov <ivan.malov@oktetlabs.ru> Reviewed-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru> Reviewed-by: Chenbo Xia <chenbo.xia@intel.com> Reviewed-by: Olivier Matz <olivier.matz@6wind.com>	2021-10-21 14:24:21 +02:00
Wenwu Ma	ad5050e42e	examples/vhost: fix use after free on drain When a vdev is removed in destroy_device function, the corresponding vhost TX buffer will also be freed, but the vhost TX buffer may still be used in the drain_vhost function, which will cause an error of heap-use-after-free. Therefore, before accessing vhost TX buffer, we need to check whether the vdev has been removed, if so, let's skip this vdev. Fixes: `a68ba8e0a6` ("examples/vhost: refactor vhost data path") Cc: stable@dpdk.org Signed-off-by: Wenwu Ma <wenwux.ma@intel.com> Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>	2021-10-21 14:24:21 +02:00
Marvin Liu	99ebada2d6	net/virtio: fix oversized packets in vectorized Rx If packed ring size is not power of two, it is possible that remained number less than one batch and meanwhile batch operation can pass. This will cause incorrect remained number calculation and then lead to receiving oversized packets. The patch fixed the issue by added remained number check before batch operation. Fixes: `77d66da838` ("net/virtio: add vectorized packed ring Rx") Cc: stable@dpdk.org Signed-off-by: Marvin Liu <yong.liu@intel.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2021-10-21 14:24:21 +02:00
Xueming Li	8011a09add	vdpa/mlx5: retry VAR allocation during vDPA restart VAR is the device memory space for the virtio queues doorbells, Qemu could mmap it to directly to speed up doorbell push. On a busy system, Qemu takes time to release VAR resources during driver shutdown. If vdpa restarted quickly, the VAR allocation failed with error 28 since the VAR is singleton resource per device. This patch adds retry mechanism for VAR allocation. Fixes: `4cae722c1b` ("vdpa/mlx5: move virtual doorbell alloc to probe") Cc: stable@dpdk.org Signed-off-by: Xueming Li <xuemingl@nvidia.com> Reviewed-by: Matan Azrad <matan@nvidia.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2021-10-21 14:24:21 +02:00
Xueming Li	d38a53b175	vdpa/mlx5: workaround FW first completion in start After a vDPA application restart, Qemu restores VQ with used and available index, new incoming packet triggers virtio driver to handle buffers. Under heavy traffic, no available buffer for firmware to receive new packets, no Rx interrupts generated, driver is stuck on endless interrupt waiting. As a firmware workaround, this patch sends a notification after VQ setup to ask driver handling buffers and filling new buffers. Fixes: `bff7350110` ("vdpa/mlx5: prepare virtio queues") Cc: stable@dpdk.org Signed-off-by: Xueming Li <xuemingl@nvidia.com> Reviewed-by: Matan Azrad <matan@nvidia.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2021-10-21 14:24:21 +02:00
Zhihong Peng	84cc857b5d	net/virtio: fix check scatter on all Rx queues This patch fixes the wrong way to obtain virtqueue. The end of virtqueue cannot be judged based on whether the array is NULL. Fixes: `4e8169eb0d` ("net/virtio: fix Rx scatter offload") Cc: stable@dpdk.org Signed-off-by: Zhihong Peng <zhihongx.peng@intel.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2021-10-21 14:24:13 +02:00
Ting Xu	f5ec6a3a19	net/ice: fix TM hierarchy commit flag reset After DCF commits TM hierarchy configuration, the commit flag is set to avoid duplicated commit. But the flag is not reset after device stop, which prevents the update of hierarchy configuration unless close the device. It is not reasonable. This patch fix to reset the commit flag after device stop. Then users can delete and add nodes to commit a new TM hierarchy configuration. Fixes: `3a6bfc37ea` ("net/ice: support QoS config VF bandwidth in DCF") Cc: stable@dpdk.org Signed-off-by: Ting Xu <ting.xu@intel.com> Acked-by: Qi Zhang <qi.z.zhang@intel.com>	2021-10-21 13:32:26 +02:00
William Tu	d1c7029a52	net/e1000: build on Windows This patch enables building the e1000 driver for Windows. I tested using two Windows VM on top of VMware Fusion, creating two e1000 devices with device ID 0x10D3 (8274L), verifying rx/tx works correctly using dpdk-testpmd.exe rxonly and txonly mode. Signed-off-by: William Tu <u9012063@gmail.com> Acked-by: Haiyue Wang <haiyue.wang@intel.com> Acked-by: Pallavi Kadam <pallavi.kadam@intel.com> Tested-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com> Tested-by: Pallavi Kadam <pallavi.kadam@intel.com>	2021-10-21 04:58:40 +02:00
Tudor Cornea	2108930be1	net/ixgbe: fix port initialization if MTU config fails On a VMware ESXi 6.0 setup with an Intel 82599 NIC the ports don't seem to initialize anymore, while running testpmd. Configuring Port 0 (socket 0) ixgbevf_dev_rx_init(): Set max packet length to 1518 failed. ixgbevf_dev_start(): Unable to initialize RX hardware (-22) Fail to start port 0: Invalid argument Configuring Port 1 (socket 0) ixgbevf_dev_rx_init(): Set max packet length to 1518 failed. ixgbevf_dev_start(): Unable to initialize RX hardware (-22) Fail to start port 1: Invalid argument Please stop the ports first If the call to ixgbevf_rlpml_set_vf fails and we return prematurely, we will not be able to initialize the ports correctly. The behavior seems to have changed since the following commit: Fixes: `c77866a169` ("net/ixgbe: detect failed VF MTU set") Cc: stable@dpdk.org We can make this particular use case work correctly if we don't return an error, which seems to be consistent with the overall kernel ixgbevf implementation. [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/ drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c?h=v5.14#n2015 Signed-off-by: Tudor Cornea <tudor.cornea@gmail.com> Acked-by: Haiyue Wang <haiyue.wang@intel.com>	2021-10-21 04:56:06 +02:00
Rongwei Liu	a89f6433aa	net/mlx5: set Tx queue affinity in round-robin Previously, we set txq affinity to 0 and let firmware to perform round-robin when bonding. Firmware uses a global counter to assign txq affinity to different physical ports accord to remainder after division. There are three dis-advantages: 1. The global counter is shared between kernel and dpdk. 2. After restarting pmd or port, the previous counter value is reused, so the new affinity is unpredictable. 3. There is no way to get what affinity is set by firmware. In this update, we will create several TISs up to the number of bonding ports and bind each TIS to one PF port. For each port, it will start to pick up TIS using its port index. Upper layer application can quickly calculate each txq's affinity without querying. At DPDK layer, when creating txq with 2 bonding ports, the affinity is set like: port 0: 1-->2-->1-->2 port 1: 2-->1-->2-->1 port 2: 1-->2-->1-->2 Note: Only applicable to DevX api. This affinity subjects to HW hash. Signed-off-by: Rongwei Liu <rongweil@nvidia.com> Acked-by: Matan Azrad <matan@nvidia.com>	2021-10-21 12:37:00 +02:00
Rongwei Liu	cf5ac38d51	common/mlx5: add LAG context query Added a new function mlx5_devx_cmd_query_lag() to query LAG property from firmware including state/affinity/mode etc. Signed-off-by: Jiawei Wang <jiaweiw@nvidia.com> Signed-off-by: Rongwei Liu <rongweil@nvidia.com> Acked-by: Matan Azrad <matan@nvidia.com>	2021-10-21 12:36:57 +02:00
Dmitry Kozlyuk	ea823b2c51	net/mlx5: close tools socket with last device MLX5 PMD exposes a socket for external tools to dump port state. Socket events are listened using an interrupt source of EXT type. The socket was closed and the interrupt callback was unregistered at program exit, which is incorrect because DPDK could be already shut down at this point. Move actions performed at program exit to the moment the last MLX5 port is closed. The socket will be opened again if later a new MLX5 device is plugged in and probed. Also fix comments that were decisively talking about secondary processes instead of external tools. Fixes: `e6cdc54cc0` ("net/mlx5: add socket server for external tools") Cc: stable@dpdk.org Reported-by: Harman Kalra <hkalra@marvell.com> Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com> Acked-by: Thomas Monjalon <thomas@monjalon.net>	2021-10-21 10:31:53 +02:00
Dmitry Kozlyuk	9ec1ceab76	net/mlx5: fix Rx queue resource cleanup mlx5_rxq_start() allocates rxq_ctrl->obj and frees it on failure, but did not set it to NULL. Later mlx5_rxq_release() could not recognize this object is already freed and attempted to release its resources, resulting in a crash: Configuring Port 0 (socket 0) mlx5_common: Failed to create RQ using DevX mlx5_common: Can't create DevX RQ object. mlx5_net: Port 0 Rx queue 0 RQ creation failure. Segmentation fault Set rxq_ctrl->obj to NULL after it is freed to skip resource release. Fixes: `1260a87b28` ("net/mlx5: share Rx control code") Cc: stable@dpdk.org Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com> Acked-by: Matan Azrad <matan@nvidia.com>	2021-10-21 09:31:17 +02:00
Bing Zhao	273b09376c	net/mlx5: fix meter yellow policy with RSS action The RSS configuration in a policy action container was a pointer inside a union, and the pointer area could be used as other fate action. In the current implementation, the RSS of the green color was prior to that of the yellow color. There was a high possibility the pointer was considered as the RSS and result in a error flow expansion when only the yellow color had the RSS action. The check of the fate action type should also be done to get rid of the misjudgment. Fixes: `b38a12272b` ("net/mlx5: split meter color policy handling") Cc: stable@dpdk.org Signed-off-by: Bing Zhao <bingz@nvidia.com> Acked-by: Matan Azrad <matan@nvidia.com>	2021-10-21 09:31:15 +02:00
Xueming Li	614966c2fa	net/mlx5: check DevX to support more Verbs ports Verbs API doesn't support device port number larger than 255 by design. To support more VF or SubFunction port representors, forces DevX API check when max Verbs device link ports larger than 255. Signed-off-by: Xueming Li <xuemingl@nvidia.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>	2021-10-21 09:31:14 +02:00
Xueming Li	686d05b60d	net/mlx5: enable DevX Tx queue creation Verbs API does not support Infiniband device port number larger 255 by design. To support more representors on a single Infiniband device DevX API should be engaged. While creating Send Queue (SQ) object with Verbs API, the PMD assigned IB device port attribute and kernel created the default miss flows in FDB domain, to redirect egress traffic from the queue being created to representor appropriate peer (wire, HPF, VF or SF). With DevX API there is no IB-device port attribute (it is merely kernel one, DevX operates in PRM terms) and PMD must create default miss flows in FDB explicitly. PMD did not provide this and using DevX API for E-Switch configurations was disabled. The default miss FDB flow matches E-Switch manager vport (to make sure the source is some representor) and SQn (Send Queue number - device internal queue index). The root flow table managed by kernel/firmware and it does not support vport redirect action, we have to split the default miss flow into two ones: - flow with lowest priority in the root table that matches E-Switch manager vport ID and jump to group 1. - flow in group 1 that matches E-Switch manager vport ID and SQn and forwards packet to peer vport Signed-off-by: Xueming Li <xuemingl@nvidia.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>	2021-10-21 09:31:13 +02:00
Xueming Li	ebe9afedc7	net/mlx5: fix internal root table flow priority When creating internal transfer flow on root table with lowest priority, the flow was created with max UINT32_MAX priority. It is wrong since the flow is created in kernel and max priority supported is 16. This patch fixes this by adding internal flow check. Fixes: `5f8ae44dd4` ("net/mlx5: enlarge maximal flow priority") Cc: stable@dpdk.org Signed-off-by: Xueming Li <xuemingl@nvidia.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>	2021-10-21 09:31:12 +02:00
Xueming Li	d9020f2577	net/mlx5: support flow item of normal Tx queue Extends txq flow pattern to support both hairpin and regular txq. Signed-off-by: Xueming Li <xuemingl@nvidia.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>	2021-10-21 09:31:11 +02:00
Xueming Li	a564038699	net/mlx5: support E-Switch manager egress traffic match For egress packet on representor, the vport ID in transport domain is E-Switch manager vport ID since representor shares resources of E-Switch manager. E-Switch manager vport ID and Tx queue internal device index are used to match representor egress packet. This patch adds flow item port ID match on E-Switch manager. E-Switch manager vport ID is 0xfffe on BlueField, 0 otherwise. Signed-off-by: Xueming Li <xuemingl@nvidia.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>	2021-10-21 09:31:10 +02:00
Xueming Li	1d47e9335e	net/mlx5: improve Verbs flow priority discovery To detect number flow Verbs flow priorities, PMD try to create Verbs flows in different priority. While Verbs is not designed to support ports larger than 255. When DevX supported by kernel driver, 16 Verbs priorities must be supported, no need to create Verbs flows. Signed-off-by: Xueming Li <xuemingl@nvidia.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>	2021-10-21 09:31:09 +02:00
Xueming Li	3fd2961efa	net/mlx5: use Netlink when IB port greater than 255 IB spec doesn't allow 255 ports on a single HCA, port number of 256 was cast to u8 value 0 which invalid to ibv_query_port() This patch invokes Netlink API to query port state when port number greater than 255. Signed-off-by: Xueming Li <xuemingl@nvidia.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>	2021-10-21 09:31:08 +02:00
Xueming Li	227813f28a	common/mlx5: get RDMA port state via Netlink Introduce netlink API to get RDMA port state. Port state is retrieved based on RDMA device name and port index. Signed-off-by: Xueming Li <xuemingl@nvidia.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>	2021-10-21 09:31:03 +02:00
Chandubabu Namburu	11f61054bf	maintainers: update for AMD axgbe Updating AMD axgbe maintainer Signed-off-by: Chandubabu Namburu <chandu@amd.com> Acked-by: Somalapuram Amaranath <asomalap@amd.com>	2021-10-21 14:53:30 +02:00

1 2 3 4 5 ...

30435 Commits