numam-dpdk

Author	SHA1	Message	Date
Yongseok Koh	dceb502942	net/mlx5: add control of excessive memory pinning by kernel A new PMD parameter (mr_ext_memseg_en) is added to control extension of memseg when creating a MR. It is enabled by default. If enabled, mlx5_mr_create() tries to maximize the range of MR registration so that the LKey lookup tables on datapath become smaller and get the best performance. However, it may worsen memory utilization because registered memory is pinned by kernel driver. Even if a page in the extended chunk is freed, that doesn't become reusable until the entire memory is freed and the MR is destroyed. To make freed pages available immediately, this parameter has to be turned off but it could drop performance. Signed-off-by: Yongseok Koh <yskoh@mellanox.com> Acked-by: Shahaf Shuler <shahafs@mellanox.com>	2019-04-05 17:45:22 +02:00
Yongseok Koh	2aac5b5d11	net/mlx5: sync stop/start with secondary process Rx/Tx burst function pointers are stored in the rte_eth_dev structure, which is local to a process. Even though primary process replaces the function pointers, secondary will not run the new ones. With rte_mp APIs, primary can easily broadcast a request to stop/start the datapath of secondary processes. Signed-off-by: Yongseok Koh <yskoh@mellanox.com> Acked-by: Shahaf Shuler <shahafs@mellanox.com>	2019-04-05 17:45:22 +02:00
Yongseok Koh	7be600c8d8	net/mlx5: rework PMD global data init There's more need to have PMD global data structure. This should be initialized once per a process regardless of how many PMD instances are probed. mlx5_init_once() is called during probing and make sure all the init functions are called once per a process. Currently, such global data and its initialization functions are even scattered. Rather than 'extern'-ing such variables and calling such functions one by one making sure it is called only once by checking the validity of such variables, it will be better to have a global storage to hold such data and a consolidated function having all the initializations. The existing shared memory gets more extensively used for this purpose. As there could be multiple secondary processes, a static storage (local to process) is also added. As the reserved virtual address for UAR remap is a PMD global resource, this doesn't need to be stored in the device priv structure, but in the PMD global data. Signed-off-by: Yongseok Koh <yskoh@mellanox.com> Acked-by: Shahaf Shuler <shahafs@mellanox.com>	2019-04-05 17:45:22 +02:00
Yongseok Koh	9a8ab29b84	net/mlx5: replace IPC socket with EAL API Socket API is used for IPC in order for secondary process to acquire Verb command file descriptor. The FD is used to remap UAR address. The multi-process APIs (rte_mp) in EAL are newly introduced. mlx5_socket.c is replaced with mlx5_mp.c, which uses the new APIs. As it is PMD global infrastructure, only one IPC channel is established. All the IPC message types may have port_id in the message if there is need to reference a specific device. Signed-off-by: Yongseok Koh <yskoh@mellanox.com> Acked-by: Shahaf Shuler <shahafs@mellanox.com>	2019-04-05 17:45:22 +02:00
Viacheslav Ovsiienko	53e5a82fd1	net/mlx5: update install/uninstall event handlers We are implementing the support for multiport Infiniband device with representors attached to these multiple ports. Asynchronous device event notifications (link status change, removal event, etc.) should be shared between ports. We are going to implement shared event handler and this patch introduces appropriate device structure changes and updated event handler install and uninstall routines. Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com> Acked-by: Shahaf Shuler <shahafs@mellanox.com>	2019-03-29 17:25:32 +01:00
Viacheslav Ovsiienko	f048f3d479	net/mlx5: switch to the shared IB device context The code is updated to use the shared IB device context and device handles. The IB device context is shared between reprentors created over the single multiport IB device. All Verbs and DevX objects will be created within this shared context. Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com> Acked-by: Shahaf Shuler <shahafs@mellanox.com>	2019-03-29 17:25:32 +01:00
Viacheslav Ovsiienko	d485cdca01	net/mlx5: switch to the shared context IB attributes The code is updated to use the shared IB device attributes, located in the shared IB context. It saves some memory if there are representors created over the single Infiniband device with multiple ports. Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com> Acked-by: Shahaf Shuler <shahafs@mellanox.com>	2019-03-29 17:25:32 +01:00
Viacheslav Ovsiienko	1b782252cb	net/mlx5: switch to the shared protection domain The PMD code is updated to use Protected Domain from the shared IB device context. The Domain is shared between all devices belonging to the same multiport Infiniband device. If IB device has only one port, the PD is not shared, because there is only ethernet device created over IB one. Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com> Acked-by: Shahaf Shuler <shahafs@mellanox.com>	2019-03-29 17:25:32 +01:00
Viacheslav Ovsiienko	9c0a9eed37	net/mlx5: switch to the names in the shared IB context The IB device names are moved from device private data to the shared context, code involving the names is updated. The IB port index treatment is added where it is relevant. Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com> Acked-by: Shahaf Shuler <shahafs@mellanox.com>	2019-03-29 17:25:32 +01:00
Viacheslav Ovsiienko	17e19bc4dd	net/mlx5: add IB shared context alloc/free functions The Mellanox NICs support SR-IOV and have E-Switch feature. When SR-IOV is set up in switchdev mode and E-Switch is enabled we have so called VF representors in the system. All representors belonging to the same E-Switch are created on the basis of the single PCI function and with current implementation each representor has its own dedicated Infiniband device and operates within its own Infiniband context. It is proposed to provide representors as ports of the single Infiniband device and operate on the shared Infiniband context saving various resources. This patch introduces appropriate structures. Also the functions to allocate and free shared IB context for multiport are added. The IB device context, Protection Domain, device attributes, Infiniband names are going to be relocated to the shared structure from the device private one. mlx5_dev_spawn() is updated to support shared context. Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com> Acked-by: Shahaf Shuler <shahafs@mellanox.com>	2019-03-29 17:25:32 +01:00
Viacheslav Ovsiienko	bbfad6427b	net/mlx5: add getting IB ports number for multiport IB There is the routine mlx5_nl_portnum() added to get the number of ports of multiport Infiniband device. It is assumed the Uplink/VF representors are attached on these ports. Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com> Acked-by: Shahaf Shuler <shahafs@mellanox.com>	2019-03-29 17:25:32 +01:00
Viacheslav Ovsiienko	e505508a38	net/mlx5: modify get ifindex routine for multiport IB There is the routine mlx5_nl_ifindex() returning the network interface index associated with Infiniband device. We are going to support multiport IB devices, now function takes the IB port as argument and returns ifindex associated with tuple <IB device, IB port> Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com> Acked-by: Shahaf Shuler <shahafs@mellanox.com>	2019-03-29 17:25:32 +01:00
Viacheslav Ovsiienko	299d7dc28c	net/mlx5: add representor recognition on Linux 5.x The master device and VF representors were distinguished by presence of port name, master device did not have one. The new Linux kernels starting from 5.0 provide the port name for master device and the implemented representor recognizing method does not work. The new recognizing method is based on querying the VF number, has been created on the base of the device. The IFLA_NUM_VF attribute is returned by kernel if IFLA_EXT_MASK attribute is specified in the Netlink request message. Also the presence check of device symlink in device sysfs folder is added to distinguish representors with sysfs based method. Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com> Acked-by: Shahaf Shuler <shahafs@mellanox.com>	2019-03-29 17:25:32 +01:00
Dekel Peled	b2f3a38101	net/mlx5: support new representor naming format Kernel update [1] introduce new format of representors names. This patch implements RFC [2], updating MLX5 PMD to support the new format, while maintaining support of the existing format. [1] https://github.com/torvalds/linux/commit/c12ecc2 [2] http://mails.dpdk.org/archives/dev/2019-March/125676.html Signed-off-by: Dekel Peled <dekelp@mellanox.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com> Acked-by: Shahaf Shuler <shahafs@mellanox.com>	2019-03-20 18:15:42 +01:00
Thomas Monjalon	dbeba4cf18	net/mlx: prefix private structure The private structure stored in rte_eth_dev->data->dev_private was named "struct priv". In order to ease code browsing, the structure is renamed "struct mlx[45]_priv". Cc: stable@dpdk.org Signed-off-by: Thomas Monjalon <thomas@monjalon.net> Acked-by: Yongseok Koh <yskoh@mellanox.com>	2019-03-01 18:17:35 +01:00
Thomas Monjalon	714bf46ebb	net/mlx: support firmware version query The API function rte_eth_dev_fw_version_get() is querying drivers via the operation callback fw_version_get(). The implementation of this operation is added for mlx4 and mlx5. Both functions are copying the same ibverbs field fw_ver which is retrieved when calling ibv_query_device[_ex]() during the port probing. It is tested with command "drvinfo" of examples/ethtool/. Signed-off-by: Thomas Monjalon <thomas@monjalon.net> Acked-by: Shahaf Shuler <shahafs@mellanox.com>	2019-02-13 12:55:38 +01:00
Moti Haimovsky	f5bf91de73	net/mlx5: support flow counters using devx This commit adds counters support when creating flows via direct verbs. The implementation uses devx interface in order to create query and delete the counters. This support requires MLNX_OFED_LINUX-4.5-0.1.0.1 installation. Signed-off-by: Moti Haimovsky <motih@mellanox.com> Acked-by: Shahaf Shuler <shahafs@mellanox.com>	2019-01-14 17:44:29 +01:00
Wisam Jaddo	f0354d8423	net/mlx5: add ConnectX-6 device IDs This commit includes the add of: - ConnectX-6 device ID - ConnectX-6 SRIOV device ID Signed-off-by: Wisam Jaddo <wisamm@mellanox.com> Acked-by: Shahaf Shuler <shahafs@mellanox.com>	2019-01-03 13:07:06 +01:00
Dekel Peled	4bb14c83df	net/mlx5: support modify header using Direct Verbs This patch implements the set of actions to support offload of packet header modifications to MLX5 NIC. Implementation is based on RFC [1]. [1] http://mails.dpdk.org/archives/dev/2018-November/119971.html Signed-off-by: Dekel Peled <dekelp@mellanox.com> Acked-by: Shahaf Shuler <shahafs@mellanox.com>	2019-01-03 12:56:43 +01:00
Tom Barbette	ce9494d76c	net/mlx5: report imissed statistics The imissed counters (number of packets dropped because the queues were full) were actually reported through xstats as "rx_out_of_buffer" but was not reported through stats. Following a recent discussion on the ML, as there is no way to tell the user if a counter is implemented or not, this should be considered a bug. For example, user looking at imissed will think the packets are lost before reaching the device. Signed-off-by: Tom Barbette <barbette@kth.se> Acked-by: Shahaf Shuler <shahafs@mellanox.com>	2018-12-13 16:31:06 +00:00
Yongseok Koh	09d8b41699	net/mlx5: make vectorized Tx threshold configurable Add txqs_max_vec parameter to configure the maximum number of Tx queues to enable vectorized Tx. And its default value is set according to the architecture and device type. Signed-off-by: Yongseok Koh <yskoh@mellanox.com> Acked-by: Shahaf Shuler <shahafs@mellanox.com>	2018-11-05 15:01:25 +01:00
Dekel Peled	c513f05cde	net/mlx5: add caching of encap/decap actions Make flow encap and decap Verbs actions cacheable resources. Store created actions in local database. This enables MLX5 PMD reuse of existing actions. Signed-off-by: Dekel Peled <dekelp@mellanox.com> Acked-by: Shahaf Shuler <shahafs@mellanox.com>	2018-11-05 15:01:25 +01:00
Yongseok Koh	bc91e8db12	net/mlx5: add 128B padding of Rx completion entry A PMD parameter (rxq_cqe_pad_en) is added to enable 128B padding of CQE on RX side. The size of CQE is aligned with the size of a cacheline of the core. If cacheline size is 128B, the CQE size is configured to be 128B even though the device writes only 64B data on the cacheline. This is to avoid unnecessary cache invalidation by device's two consecutive writes on to one cacheline. However in some architecture, it is more beneficial to update entire cacheline with padding the rest 64B rather than striding because read-modify-write could drop performance a lot. On the other hand, writing extra data will consume more PCIe bandwidth and could also drop the maximum throughput. It is recommended to empirically set this parameter. Disabled by default. Signed-off-by: Yongseok Koh <yskoh@mellanox.com> Acked-by: Shahaf Shuler <shahafs@mellanox.com>	2018-11-05 15:01:25 +01:00
Viacheslav Ovsiienko	2dd8b72167	net/mlx5: simplify flow counters support check The redundant check of Flow counters support in runtime is removed. The flag flow_counter_en is eliminated from the code. The Verbs create counter function just returns an error if no counter support presented in the system. If there is no any of Flow counters configuration macro defined the log message is emited, indicating the missing counter support. mlx5_flow_validate_action_count() fuctnion is also updated due to flow_counter_en flag removal. Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>	2018-10-26 22:14:06 +02:00
Moti Haimovsky	d53180afe3	net/mlx5: refactor TC-flow infrastructure This commit refactors tc_flow as a preparation to coming commits that sends different type of messages and expect differ type of replies while still using the same underlying routines. Signed-off-by: Moti Haimovsky <motih@mellanox.com>	2018-10-26 22:14:05 +02:00
Shahaf Shuler	7dd7be29b4	net/mlx5: always use representor ifindex for ioctl In the current code, on some cases the representor ethdev is using the PF interface to query some link status information or pause parameters. It was done because in previous kernel versions there was no support from the kernel for the representor info. Using the PF i/f for such ioctl is error prone and not always working because: * On some cases there is no PF at all, only representors (e.g Bluefield with host representors) * Query the up/down status from representor and link status from PF is in-consist * PF link is down doesn't necessarily means representor is down. * setting different pause configuration for the PF and the representors will result on undefined behaviour Making the code cleaner and more robust by using only the representor i/f for the ioctl. whatever the kernel will provide on this query will be used. No need to do W.A. for kernel missing functionality. Note: 1. Setting pause parameters will obviously won't work on representors 2. Old kernel will not report all the possible representor info Fixes: `2b73026388` ("net/mlx5: probe all port representors") Cc: stable@dpdk.org Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>	2018-10-11 18:56:02 +02:00
Shahaf Shuler	1a611fdaf6	net/mlx5: support missing counter in extended statistics The current code would fail if one of the counters DPDK counters was not found on the device counters. As representors and PF port has different counters the both cannot work together. Addressing this issue by making the counter init more flexible to contain all the counter found and skipping the error. Cc: stable@dpdk.org Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>	2018-10-11 18:56:02 +02:00
Yongseok Koh	40c9ccf9e9	net/mlx5: remove Netlink flow driver Netlink based E-Switch flow engine will be migrated to the new flow engine. nl_flow will be renamed to flow_tcf as it goes through Linux TC flower interface. Signed-off-by: Yongseok Koh <yskoh@mellanox.com>	2018-10-11 18:53:49 +02:00
Ori Kam	51e72d386c	net/mlx5: add runtime parameter to enable Direct Verbs DV flow API is based on new kernel API and is missing some functionality like counter but add other functionality like encap. In order not to affect current users even if the kernel supports the new DV API it should be enabled only manually. Signed-off-by: Ori Kam <orika@mellanox.com> Acked-by: Yongseok Koh <yskoh@mellanox.com>	2018-10-11 18:53:49 +02:00
Ori Kam	865a0c1567	net/mlx5: add Direct Verbs prepare function This function allocates the Direct Verbs device flow, and introduce the relevant PRM structures. This commit also adds the matcher object. The matcher object acts as a mask and should be shared between flows. For example all rules that should match source IP with full mask should use the same matcher. A flow that should match dest IP or source IP but without full mask should have a new matcher allocated. Signed-off-by: Ori Kam <orika@mellanox.com> Acked-by: Yongseok Koh <yskoh@mellanox.com>	2018-10-11 18:53:49 +02:00
Ori Kam	c322c0e558	net/mlx5: add bluefield VF support Signed-off-by: Ori Kam <orika@mellanox.com> Acked-by: Yongseok Koh <yskoh@mellanox.com>	2018-09-28 01:41:01 +02:00
Shahaf Shuler	f9de87187b	net/mlx5: disable ConnectX-4 Lx Multi Packet Send by default On ConnectX-4 Lx the Multi Packet Send (MPW) feature is considered un-secure, as on some cases were the application provides incorrect mbufs on the Tx burst the host or NIC can get stuck. Hence, disabling the feature by default for this specific NIC. Users can still enable this feature and enjoy the performance gain (mostly for low number of cores) by using the txq_mpw_en devarg. This patch will impact the out of the box performance of some application using ConnectX-4 Lx for the sack of security and robustness. Since we need different defaults based on the underlying device the mpw field in the configuration struct was extended to contain also the MLX5_ARG_UNSET option. Cc: stable@dpdk.org Signed-off-by: Shahaf Shuler <shahafs@mellanox.com> Acked-by: Yongseok Koh <yskoh@mellanox.com>	2018-08-28 15:27:39 +02:00
Adrien Mazarguil	3f8cb05df5	net/mlx5: fix invalid network interface index Network interface indices being unsigned, an invalid index or error is normally expressed through a zero value (see if_nametoindex()). mlx5_ifindex() has a signed return type for negative values in case of error. Since mlx5_nl.c does not check for errors, these may be fed back as invalid interfaces indices to subsequent system calls. This usage would have been correct if mlx5_ifindex() returned a zero value instead. This patch makes mlx5_ifindex() unsigned for convenience. Fixes: `ccdcba53a3` ("net/mlx5: use Netlink to add/remove MAC addresses") Cc: stable@dpdk.org Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com> Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com> Acked-by: Yongseok Koh <yskoh@mellanox.com>	2018-07-26 14:05:52 +02:00
Nelio Laranjeiro	5366074b01	net/mlx5: fix route Netlink message overflow Route Netlink message socket is wrongly initialized by registering to the route link group. This causes the socket to receive all link message related to routes whereas the PMD do not expect to receive such information. In some situation it ends by filling the socket at a point that any new message cannot be exchanged. As the PMD is not expected to process such broadcast messages, the parameter in the nl_group in the function is also remove. Fixes: `ccdcba53a3` ("net/mlx5: use Netlink to add/remove MAC addresses") Cc: stable@dpdk.org Signed-off-by: Zijie Pan <zijie.pan@6wind.com> Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com> Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>	2018-07-26 14:05:52 +02:00
Nelio Laranjeiro	f872b4b99d	net/mlx5: fix representors detection On systems where the required Netlink commands are not supported but Mellanox OFED is installed, representors information must be retrieved through sysfs. Fixes: `26c08b979d` ("net/mlx5: add port representor awareness") Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com> Acked-by: Shahaf Shuler <shahafs@mellanox.com>	2018-07-26 14:05:52 +02:00
Adrien Mazarguil	8f9059ccee	net/mlx5: add framework for switch flow rules Because mlx5 switch flow rules are configured through Netlink (TC interface) and have little in common with Verbs, this patch adds a separate parser function to handle them. - mlx5_nl_flow_transpose() converts a rte_flow rule to its TC equivalent and stores the result in a buffer. - mlx5_nl_flow_brand() gives a unique handle to a flow rule buffer. - mlx5_nl_flow_create() instantiates a flow rule on the device based on such a buffer. - mlx5_nl_flow_destroy() performs the reverse operation. These functions are called by the existing implementation when encountering flow rules which must be offloaded to the switch (currently relying on the transfer attribute). Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com> Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com> Acked-by: Yongseok Koh <yskoh@mellanox.com>	2018-07-26 14:05:52 +02:00
Adrien Mazarguil	20b71e92ef	net/mlx5: lay groundwork for switch offloads With mlx5, unlike normal flow rules implemented through Verbs for traffic emitted and received by the application, those targeting different logical ports of the device (VF representors for instance) are offloaded at the switch level and must be configured through Netlink (TC interface). This patch adds preliminary support to manage such flow rules through the flow API (rte_flow). Instead of rewriting tons of Netlink helpers and as previously suggested by Stephen [1], this patch introduces a new dependency to libmnl [2] (LGPL-2.1) when compiling mlx5. [1] https://mails.dpdk.org/archives/dev/2018-March/092676.html [2] https://netfilter.org/projects/libmnl/ Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com> Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com> Acked-by: Yongseok Koh <yskoh@mellanox.com>	2018-07-26 14:05:52 +02:00
Moti Haimovsky	6bf10ab69b	net/mlx5: support 32-bit systems This patch adds support for building and running mlx5 PMD on 32bit systems such as i686. The main issue to tackle was handling the 32bit access to the UAR as quoted from the mlx5 PRM: QP and CQ DoorBells require 64-bit writes. For best performance, it is recommended to execute the QP/CQ DoorBell as a single 64-bit write operation. For platforms that do not support 64 bit writes, it is possible to issue the 64 bits DoorBells through two consecutive writes, each write 32 bits, as described below: * The order of writing each of the Dwords is from lower to upper addresses. * No other DoorBell can be rung (or even start ringing) in the midst of an on-going write of a DoorBell over a given UAR page. The last rule implies that in a multi-threaded environment, the access to a UAR page (which can be accessible by all threads in the process) must be synchronized (for example, using a semaphore) unless an atomic write of 64 bits in a single bus operation is guaranteed. Such a synchronization is not required for when ringing DoorBells on different UAR pages. Signed-off-by: Moti Haimovsky <motih@mellanox.com> Acked-by: Yongseok Koh <yskoh@mellanox.com>	2018-07-12 14:34:59 +02:00
Nelio Laranjeiro	60bd8c9747	net/mlx5: add count flow action This is only supported by Mellanox OFED. Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com> Acked-by: Yongseok Koh <yskoh@mellanox.com>	2018-07-12 12:12:27 +02:00
Nelio Laranjeiro	2815702bae	net/mlx5: replace verbs priorities by flow Previous work introduce verbs priorities, whereas the PMD is making translation between Flow priority into Verbs. Rename this to make more sense on what the PMD has to translate. Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com> Acked-by: Yongseok Koh <yskoh@mellanox.com>	2018-07-12 12:10:01 +02:00
Nelio Laranjeiro	78be885295	net/mlx5: handle drop queues as regular queues Drop queues are essentially used in flows due to Verbs API, the information if the fate of the flow is a drop or not is already present in the flow. Due to this, drop queues can be fully mapped on regular queues. Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com> Acked-by: Yongseok Koh <yskoh@mellanox.com>	2018-07-12 12:10:01 +02:00
Adrien Mazarguil	2b73026388	net/mlx5: probe all port representors Probe existing port representors in addition to their master device and associate them automatically. To avoid collision between Ethernet devices, they are named as follows: - "{DBDF}" for master/switch devices. - "{DBDF}_representor_{rep}" with "rep" starting from 0 for port representors. (Patch based on prior work from Yuanhan Liu) Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com> Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com> Reviewed-by: Xueming Li <xuemingl@mellanox.com>	2018-07-11 15:37:19 +02:00
Adrien Mazarguil	26c08b979d	net/mlx5: add port representor awareness The current PCI probing method is not aware of Verbs port representors, which appear as standard Verbs devices bound to the same PCI address and cannot be distinguished. Problem is that more often than not, the wrong Verbs device is used, resulting in unexpected traffic. This patch makes the driver discard representors to only use the master device. If unable to identify it (e.g. kernel drivers not recent enough), either: - There is only one matching device which isn't identified as a representor, in that case use it. - Otherwise log an error and do not probe the device. (Patch based on prior work from Yuanhan Liu) Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com> Reviewed-by: Xueming Li <xuemingl@mellanox.com>	2018-07-11 15:37:14 +02:00
Adrien Mazarguil	9083982ce7	net/mlx5: drop useless support for several Verbs ports Unlike mlx4 from which this capability was inherited, mlx5 devices expose exactly one Verbs port per PCI bus address. Each physical port gets assigned its own bus address with a single Verbs port. While harmless, this code requires an extra loop that would get in the way of subsequent refactoring. No functional impact. Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>	2018-07-11 15:36:55 +02:00
Matan Azrad	1f106da2bf	net/mlx5: support MPLS-in-GRE and MPLS-in-UDP Add support for MPLS over GRE and MPLS over UDP tunnel types as described in the next RFCs: 1. https://tools.ietf.org/html/rfc4023 2. https://tools.ietf.org/html/rfc7510 3. https://tools.ietf.org/html/rfc4385 Signed-off-by: Matan Azrad <matan@mellanox.com> Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>	2018-05-17 12:31:42 +02:00
Shahaf Shuler	dd3331c6f1	net/mlx5: add Bluefield device id Signed-off-by: Shahaf Shuler <shahafs@mellanox.com> Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>	2018-05-17 12:31:42 +02:00
Yongseok Koh	7d6bf6b866	net/mlx5: add Multi-Packet Rx support Multi-Packet Rx Queue (MPRQ a.k.a Striding RQ) can further save PCIe bandwidth by posting a single large buffer for multiple packets. Instead of posting a buffer per a packet, one large buffer is posted in order to receive multiple packets on the buffer. A MPRQ buffer consists of multiple fixed-size strides and each stride receives one packet. Rx packet is mem-copied to a user-provided mbuf if the size of Rx packet is comparatively small, or PMD attaches the Rx packet to the mbuf by external buffer attachment - rte_pktmbuf_attach_extbuf(). A mempool for external buffers will be allocated and managed by PMD. Signed-off-by: Yongseok Koh <yskoh@mellanox.com> Acked-by: Shahaf Shuler <shahafs@mellanox.com>	2018-05-14 22:31:52 +01:00
Yongseok Koh	974f1e7ef1	net/mlx5: add new memory region support This is the new design of Memory Region (MR) for mlx PMD, in order to: - Accommodate the new memory hotplug model. - Support non-contiguous Mempool. There are multiple layers for MR search. L0 is to look up the last-hit entry which is pointed by mr_ctrl->mru (Most Recently Used). If L0 misses, L1 is to look up the address in a fixed-sized array by linear search. L0/L1 is in an inline function - mlx5_mr_lookup_cache(). If L1 misses, the bottom-half function is called to look up the address from the bigger local cache of the queue. This is L2 - mlx5_mr_addr2mr_bh() and it is not an inline function. Data structure for L2 is the Binary Tree. If L2 misses, the search falls into the slowest path which takes locks in order to access global device cache (priv->mr.cache) which is also a B-tree and caches the original MR list (priv->mr.mr_list) of the device. Unless the global cache is overflowed, it is all-inclusive of the MR list. This is L3 - mlx5_mr_lookup_dev(). The size of the L3 cache table is limited and can't be expanded on the fly due to deadlock. Refer to the comments in the code for the details - mr_lookup_dev(). If L3 is overflowed, the list will have to be searched directly bypassing the cache although it is slower. If L3 misses, a new MR for the address should be created - mlx5_mr_create(). When it creates a new MR, it tries to register adjacent memsegs as much as possible which are virtually contiguous around the address. This must take two locks - memory_hotplug_lock and priv->mr.rwlock. Due to memory_hotplug_lock, there can't be any allocation/free of memory inside. In the free callback of the memory hotplug event, freed space is searched from the MR list and corresponding bits are cleared from the bitmap of MRs. This can fragment a MR and the MR will have multiple search entries in the caches. Once there's a change by the event, the global cache must be rebuilt and all the per-queue caches will be flushed as well. If memory is frequently freed in run-time, that may cause jitter on dataplane processing in the worst case by incurring MR cache flush and rebuild. But, it would be the least probable scenario. To guarantee the most optimal performance, it is highly recommended to use an EAL option - '--socket-mem'. Then, the reserved memory will be pinned and won't be freed dynamically. And it is also recommended to configure per-lcore cache of Mempool. Even though there're many MRs for a device or MRs are highly fragmented, the cache of Mempool will be much helpful to reduce misses on per-queue caches anyway. '--legacy-mem' is also supported. Signed-off-by: Yongseok Koh <yskoh@mellanox.com>	2018-05-14 22:31:51 +01:00
Yongseok Koh	d561b5dc13	net/mlx5: remove memory region support This patch removes current support of Memory Region (MR) in order to accommodate the dynamic memory hotplug patch. This patch can be compiled but traffic can't flow and HW will raise faults. Subsequent patches will add new MR support. Signed-off-by: Yongseok Koh <yskoh@mellanox.com>	2018-05-14 22:31:51 +01:00
Yongseok Koh	df428ceef4	net/mlx5: change device reference for secondary process rte_eth_devices[] is not shared between primary and secondary process, but a static array to each process. The reverse pointer of device (priv->dev) is invalid. Instead, priv has the pointer to shared data of the device, struct rte_eth_dev_data *dev_data; Two macros are added, #define PORT_ID(priv) ((priv)->dev_data->port_id) #define ETH_DEV(priv) (&rte_eth_devices[PORT_ID(priv)]) Signed-off-by: Yongseok Koh <yskoh@mellanox.com>	2018-05-14 22:31:51 +01:00

1 2 3 4

166 Commits