625 Commits

Author SHA1 Message Date
Adrien Mazarguil
681289345e net/mlx5: re-indent generic probing function
Since commit "net/mlx5: drop useless support for several Verbs ports"
removed an inner loop, mlx5_dev_spawn() is left with an unnecessary indent
level.

This patch eliminates a block, moves its local variables to function scope,
and re-indents its contents (diff best viewed with --ignore-all-space).

No functional impact.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Reviewed-by: Xueming Li <xuemingl@mellanox.com>
2018-07-11 15:37:10 +02:00
Adrien Mazarguil
f38c54571d net/mlx5: split PCI from generic probing
All the generic probing code needs is an IB device. While this device is
currently supplied by a PCI lookup, other methods will be added soon.

This patch divides the original function, which has become huge over time,
as follows:

1. PCI-specific (mlx5_pci_probe()).
2. Verbs device (mlx5_dev_spawn()).

(Patch based on prior work from Yuanhan Liu)

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Reviewed-by: Xueming Li <xuemingl@mellanox.com>
2018-07-11 15:37:03 +02:00
Adrien Mazarguil
9083982ce7 net/mlx5: drop useless support for several Verbs ports
Unlike mlx4 from which this capability was inherited, mlx5 devices expose
exactly one Verbs port per PCI bus address. Each physical port gets
assigned its own bus address with a single Verbs port.

While harmless, this code requires an extra loop that would get in the way
of subsequent refactoring.

No functional impact.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
2018-07-11 15:36:55 +02:00
Adrien Mazarguil
3ff4b0866f net/mlx5: remove redundant objects in probe function
This patch gets rid of redundant calls to open the device and query its
attributes in order to simplify the code.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Reviewed-by: Xueming Li <xuemingl@mellanox.com>
2018-07-11 15:36:52 +02:00
Adrien Mazarguil
6057a10b3b net/mlx5: rename confusing object in probe function
There are several attribute objects in this function:

- IB device attributes (struct ibv_device_attr_ex device_attr).
- Direct Verbs attributes (struct mlx5dv_context attrs_out).
- Port attributes (struct ibv_port_attr).
- IB device attributes again (struct ibv_device_attr_ex device_attr_ex).

"attrs_out" is both odd and initialized using a nonstandard syntax. Rename
it "dv_attr" for consistency.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Reviewed-by: Xueming Li <xuemingl@mellanox.com>
2018-07-11 15:36:46 +02:00
Thomas Monjalon
f8e9989606 remove useless constructor headers
A constructor is usually declared with RTE_INIT* macros.
As it is a static function, no need to declare before its definition.
The macro is used directly in the function definition.

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
2018-07-12 00:00:35 +02:00
Matan Azrad
1ff30d182c net/mlx5: activate Verbs cleanup on removal
Starting from rdma-core v19, Mellanox OFED 4.4, the Verbs resources
cleanup is properly activated in plug-out process when setting the
MLX5_DEVICE_FATAL_CLEANUP environment variable to 1.

Set the aforementioned variable to 1.

Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
2018-07-04 16:48:53 +02:00
Ferruh Yigit
70815c9eca ethdev: add new offload flag to keep CRC
DEV_RX_OFFLOAD_KEEP_CRC offload flag is added. PMDs that support
keeping CRC should advertise this offload capability.

DEV_RX_OFFLOAD_CRC_STRIP flag will remain one more release
default behavior in PMDs are to keep the CRC until this flag removed

Until DEV_RX_OFFLOAD_CRC_STRIP flag is removed:
- Setting both KEEP_CRC & CRC_STRIP is INVALID
- Setting only CRC_STRIP PMD should strip the CRC
- Setting only KEEP_CRC PMD should keep the CRC
- Not setting both PMD should keep the CRC

A helper function rte_eth_dev_is_keep_crc() has been added to be able to
change the no flag behavior with minimal changes in PMDs.

The PMDs that doesn't report the DEV_RX_OFFLOAD_KEEP_CRC offload can
remove rte_eth_dev_is_keep_crc() checks next release, related code
commented to help the maintenance task.

And DEV_RX_OFFLOAD_CRC_STRIP has been added to virtual drivers since
they don't use CRC at all, when an application requires this offload
virtual PMDs should not return error.

Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Allain Legacy <allain.legacy@windriver.com>
Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
2018-07-03 01:35:58 +02:00
Adrien Mazarguil
f264c7980c net/mlx5: fix invalid error check
Since its return type is unsigned, if_nametoindex() returns 0 in case of
error, never -1.

Fixes: ccdcba53a3f4 ("net/mlx5: use Netlink to add/remove MAC addresses")
Cc: stable@dpdk.org

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
2018-07-03 01:35:58 +02:00
Yongseok Koh
f9aaa6ac44 net/mlx5: increase number of strides
If WQE ID is used in CQE for Multi-Packet RQ, the ratio of CQE compression
drops a little bit.  In order to reach to 100Gbps with 64B traffic, it is
needed to further save PCIe bandwidth by increasing the number of strides
in a WQE. It is now 64 by default but adjustable by a PMD parameter -
mprq_log_stride_num.

Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
2018-07-03 01:35:58 +02:00
Yongseok Koh
1787eb7b4d net/mlx5: use stride index in Rx completion entry
Multi-Packet Receive Queue is to receive multiple packets on a single large
buffer. The number of consumed strides in CQE is accumulated to keep track
of the current stride index. However, it is safer to directly use stride
index in CQE to avoid out-of-order situation which can possibly be caused
by introducing LRO in the future.

If Rx CQE compression is enabled, HW can be configured to store the stride
index in a mini-CQE but this will need newer version of library/driver.
Therefore, since this change, MPRQ is only supported with the newer
library/driver and Rx hash result is not supported if MPRQ is enabled along
with Rx CQE compression.

Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
2018-07-03 01:35:58 +02:00
Yongseok Koh
5c0e2db619 net/mlx5: add warning message for Multi-Packet RQ
If Multi-Packet RQ is enabled but not supported by device or
kernel/library, print out a warning message.

Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
2018-07-03 01:35:58 +02:00
Yongseok Koh
a49b7c75de net/mlx5: add new fields in Rx completion entry
Stride index is added to mlx5_mini_cqe8 structure and WQE ID is added to
mlx5_cqe structure.

Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
2018-07-03 01:35:58 +02:00
Yongseok Koh
2e633f1f6d net/mlx5: change return value of Rx completion poll
mlx5_rx_poll_len() returns Rx hash result extracted from either mini CQE or
regular CQE. As mini CQE may not have the hash result if configured
otherwise, it shouldn't assume the first DWORD of mini CQE is always hash
result. mlx5_rx_poll_len() is changed to return pointer to the mini CQE if
compressed.

Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
2018-07-03 01:35:58 +02:00
Yongseok Koh
e10245a13b net/mlx5: fix Rx buffer replenishment threshold
The threshold of buffer replenishment for vectorized Rx burst is a constant
value (64). If the size of Rx queue is comparatively small, device could
run out of buffers. For example, if the size of Rx queue is 128, buffers
are replenished only twice per a wraparound. This can cause jitter in
receiving packets and the jitter can cause unnecessary retransmission for
TCP connections.

Fixes: 6cb559d67b83 ("net/mlx5: add vectorized Rx/Tx burst for x86")
Fixes: 570acdb1da8a ("net/mlx5: add vectorized Rx/Tx burst for ARM")
Cc: stable@dpdk.org

Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
2018-07-03 01:35:58 +02:00
Shahaf Shuler
e46821e9fc net/mlx5: separate generic tunnel TSO from the standard one
The generic tunnel TSO was depended in the regular one capabilities to
be enabled.

Cc: stable@dpdk.org

Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
2018-07-03 01:35:58 +02:00
Yongseok Koh
5cffc8b28d net/mlx5: fix error number handling
rte_errno should be saved only if error has occurred because rte_errno
could have garbage value.

Fixes: a6d83b6a9209 ("net/mlx5: standardize on negative errno values")
Cc: stable@dpdk.org

Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
2018-07-03 01:35:57 +02:00
Nelio Laranjeiro
c44fbc7cc2 net/mlx5: clean-up developer logs
Split maintainers logs from user logs.

A lot of debug logs are present providing internal information on how
the PMD works to users.  Such logs should not be available for them and
thus should remain available only when the PMD is compiled in debug
mode.

This commits removes some useless debug logs, move the Maintainers ones
under DEBUG and also move dump into debug mode only.

Cc: stable@dpdk.org

Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
2018-07-03 01:35:57 +02:00
Stephen Hemminger
3d96644aa3 net/mlx5: fix log initialization
The mlx5 driver had two init functions, but this could
cause log initialization to be done after the
other initialization. Also, the name of the function does
not match convention (cut/paste error?).

Fix by initializing log type first at start of the pmd_init.
This also gets rid of having two constructor functions.

Fixes: a170a30d22a8 ("net/mlx5: use dynamic logging")
Cc: stable@dpdk.org

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
2018-06-17 10:17:53 +02:00
Xueming Li
a9fc0b0ef0 net/mlx5: fix crash in device probe
This patch initializes counter descriptor struct before invoking Verbs
api to avoid segmentation fault.

Fixes: 9a761de8ea14 ("net/mlx5: flow counter support")
Cc: stable@dpdk.org

Signed-off-by: Xueming Li <xuemingl@mellanox.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
2018-06-17 10:04:48 +02:00
Adrien Mazarguil
93068a9d5a net/mlx5: fix error message in probe function
Error values passed to strerror() must be positive.

Fixes: 012ad9944dfc ("net/mlx5: fix probe return value polarity")
Cc: stable@dpdk.org

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
2018-06-17 10:04:48 +02:00
Adrien Mazarguil
c6ce7e34ad net/mlx5: fix missing errno in probe function
Fixes: b43802b4bdfe ("net/mlx5: support 16 hardware priorities")
Cc: stable@dpdk.org

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
2018-06-17 10:04:48 +02:00
Adrien Mazarguil
8c3c2372ed net/mlx5: fix errno object in probe function
Fixes: a6d83b6a9209 ("net/mlx5: standardize on negative errno values")
Cc: stable@dpdk.org

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
2018-06-17 10:04:48 +02:00
Adrien Mazarguil
c93adccc97 net/mlx5: remove limitation on number of instances
This artificial limitation was inherited from the mlx4 code base and has no
purpose other than adding unnecessary noise.

This patch is a port of commit f2318196c71a ("net/mlx4: remove limitation
on number of instances").

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
2018-06-17 10:04:48 +02:00
David Marchand
44b1d513d5 net/mlx5: register memory callback only when probing
The callback should be invoked only for memory that has been registered
in a device, hence, no need to track cleanup events if no device is
present.

Bugzilla ID: 56
Fixes: 974f1e7ef146 ("net/mlx5: add new memory region support")

Signed-off-by: David Marchand <david.marchand@6wind.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
Tested-by: Ferruh Yigit <ferruh.yigit@intel.com>
2018-05-30 21:16:43 +02:00
Xueming Li
0ace586dee net/mlx5: fix memory region cache init
MR cache init takes place on the device configuration.
When the device is re-configured multiple times, for example when
changing the number of queue on the flight, deadlock can happen.

This patch moved MR cache init from device configuration function to
probe function to make sure init only once.

Fixes: 974f1e7ef146 ("net/mlx5: add new memory region support")

Signed-off-by: Xueming Li <xuemingl@mellanox.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
2018-05-28 16:28:43 +02:00
Adrien Mazarguil
e89c15b697 net/mlx5: fix crash when configure is not called
Although uncommon, applications may destroy a device immediately after
probing it without going through dev_configure() first.

This patch addresses a crash which occurs when mlx5_dev_close() calls
mlx5_mr_release() due to an uninitialized entry in the private structure.

Fixes: 974f1e7ef146 ("net/mlx5: add new memory region support")

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
2018-05-28 07:50:38 +02:00
Shahaf Shuler
1aa88b5bd9 net/mlx5: fix generic tunnel offload compatibility check
On some distros, the inbox rdma-core tree can contain the Software
Parser enum while the remaining structs still missing.

Fixes: 5f8ba81c4228 ("net/mlx5: support generic tunnel offloading")

Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
2018-05-25 17:07:40 +02:00
Yongseok Koh
52056a99c6 net/mlx5: fix SW parser offset
This is to fix the offloads introduced by commits
5f8ba81 net/mlx5: support generic tunnel offloading
5355f44 ethdev: introduce generic IP/UDP tunnel checksum and TSO

Fixes: 8589e944d075 ("net/mlx5: fix setting offsets for SW parser")

Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
2018-05-25 17:07:40 +02:00
Yongseok Koh
8f6d9e13a9 net/mlx5: remove redundant checks
Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Xueming Li <xuemingl@mellanox.com>
2018-05-23 00:35:01 +02:00
Yongseok Koh
8589e944d0 net/mlx5: fix setting offsets for SW parser
Since ConnectX-5, SW parser just complements HW parser. SW parser starts to
engage only if HW parser can't reach a header. For the older devices, HW
parser will not kick in if any of SWP offsets is set. Therefore, all of the
L3 offsets should be set regardless of HW offload. As IPv6 doesn't have
header checksum, the mbuf can't have PKT_TX_[OUTER_]IP_CKSUM if outer or
inner L3 is IPv6.

And if inner packet isn't IP, the inner offsets shouldn't be set.

Fixes: 5f8ba81c4228 ("net/mlx5: support generic tunnel offloading")

Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Xueming Li <xuemingl@mellanox.com>
2018-05-23 00:35:01 +02:00
David Marchand
5fd4d04969 net/mlx5: fix count in xstats
With the commit af4f09f28294 ("net/mlx5: prefix all functions with mlx5"),
mlx5_xstats_get() is not compliant any longer with the api.
It always returns the caller max entries count while it should return how
many entries it wrote/wanted to write.

Fixes: af4f09f28294 ("net/mlx5: prefix all functions with mlx5")

Signed-off-by: David Marchand <david.marchand@6wind.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
2018-05-23 00:35:01 +02:00
Gavin Hu
74572f23cd net/mlx5: fix build with clang on ARM
This patch adds a pair of "()" to embrace the argument
input to the function-like macro invocation.

drivers/net/mlx5/mlx5_rxtx_vec.c:37:
drivers/net/mlx5/mlx5_rxtx_vec_neon.h:170:24: error: too many arguments
provided to function-like macro invocation
	(uint16x8_t) { 0, 0, cs_flags, rte_cpu_to_be_16(len),

Fixes: 570acdb1da ("net/mlx5: add vectorized Rx/Tx burst for ARM")
Cc: stable@dpdk.org

Signed-off-by: Gavin Hu <gavin.hu@arm.com>
Signed-off-by: Sirshak Das <sirshak.das@arm.com>
Reviewed-by: Phil Yang <phil.yang@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
2018-05-21 00:55:39 +02:00
Shahaf Shuler
607fc8e4a9 net/mlx5: fix default RSS level
Using inner RSS by default for GRE leads to memory corruption as the
extra flow items added for the inner RSS are not counted in the flow
attributes buffer size.

Fixing by enforcing the default RSS level to be outer. This much
simplify the flow engine and more robust.
Future optimization for out of the box RSS can be done on subsequent
commits.

Fixes: d4a405186b73 ("net/mlx5: support tunnel RSS level")

Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
2018-05-17 19:06:29 +02:00
Shahaf Shuler
34511c25d5 net/mlx5: fix build without tunnel RSS support
IBV_RX_HASH_INNER should be referenced only when having tunnel support
in the Verbs headers.

Fixes: 80f2d0ed7ff9 ("net/mlx5: add hardware flow debug dump")

Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
2018-05-17 12:31:42 +02:00
Matan Azrad
1f106da2bf net/mlx5: support MPLS-in-GRE and MPLS-in-UDP
Add support for MPLS over GRE and MPLS over UDP tunnel types as
described in the next RFCs:
1. https://tools.ietf.org/html/rfc4023
2. https://tools.ietf.org/html/rfc7510
3. https://tools.ietf.org/html/rfc4385

Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
2018-05-17 12:31:42 +02:00
Shahaf Shuler
dd3331c6f1 net/mlx5: add Bluefield device id
Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
2018-05-17 12:31:42 +02:00
Shahaf Shuler
8fe576ad17 net/mlx5: fix flow director drop rule deletion crash
Drop flow rules are created on the ETH queue even though the parser layer
matches the flow rule layer (L3/L4)

Fixes: 6f2f4948b236 ("net/mlx5: fix flow director rule deletion crash")
Cc: stable@dpdk.org

Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
2018-05-17 12:31:42 +02:00
Andy Green
f11a4a7d8a net/mlx5: fix uninitialized variable in probing
Fixes: ccdcba53a3f4 ("net/mlx5: use Netlink to add/remove MAC addresses")

Signed-off-by: Andy Green <andy@warmcat.com>
Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
2018-05-15 22:29:22 +02:00
Yongseok Koh
c9ec2192ff net/mlx5: use correct field in a union structure
This is not a bug but it is better to use semantically correct field.

Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
2018-05-14 22:32:23 +01:00
Yongseok Koh
0cfdc1808d net/mlx5: use coherent I/O memory barrier
Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
2018-05-14 22:32:22 +01:00
Yongseok Koh
5f44cfd011 net/mlx5: fix inlining segmented TSO packet
When a multi-segmented packet is inlined, data can be further inlined even
after the first segment. In case of TSO packet, extra inline data after TSO
header should be carried by an inline DSEG which has 4B inline header
recording the length of the inline data. If more than one segment is
inlined, the length doesn't count from the second segment. This will cause
a fault in HW and CQE will have an error, which is ignored by PMD.

Fixes: f895536be4fa ("net/mlx5: enable inlining data from multiple segments")
Cc: stable@dpdk.org

Signed-off-by: Xueming Li <xuemingl@mellanox.com>
Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
2018-05-14 22:32:22 +01:00
Thomas Monjalon
fbe90cdd77 ethdev: add probing finish function
A new hook function is added and called inside the PMDs at the end
of the device probing:
	- in primary process, after allocating, init and config
	- in secondary process, after attaching and local init

This new function is almost empty for now.
It will be used later to add some post-initialization processing.

For the PMDs calling the helpers rte_eth_dev_create() or
rte_eth_dev_pci_generic_probe(), the hook rte_eth_dev_probing_finish()
is called from here, and not in the PMD itself.

Note that the helper rte_eth_dev_create() could be used more,
especially for vdevs, avoiding some code duplication in PMDs.

Cc: stable@dpdk.org

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>
Reviewed-by: Stephen Hemminger <stephen@networkplumber.org>
2018-05-14 22:31:53 +01:00
Yongseok Koh
7d6bf6b866 net/mlx5: add Multi-Packet Rx support
Multi-Packet Rx Queue (MPRQ a.k.a Striding RQ) can further save PCIe
bandwidth by posting a single large buffer for multiple packets. Instead of
posting a buffer per a packet, one large buffer is posted in order to
receive multiple packets on the buffer. A MPRQ buffer consists of multiple
fixed-size strides and each stride receives one packet.

Rx packet is mem-copied to a user-provided mbuf if the size of Rx packet is
comparatively small, or PMD attaches the Rx packet to the mbuf by external
buffer attachment - rte_pktmbuf_attach_extbuf(). A mempool for external
buffers will be allocated and managed by PMD.

Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
2018-05-14 22:31:52 +01:00
Yongseok Koh
18bee13096 net/mlx5: add a function to rdma-core glue
mlx5dv_create_wq() is added for the Multi-Packet RQ (a.k.a Striding RQ).

Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
2018-05-14 22:31:52 +01:00
Yongseok Koh
3e1f82a1f1 net/mlx5: separate filling Rx flags
Filling in fields of mbuf becomes a separate inline function so that this
can be reused.

Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
2018-05-14 22:31:52 +01:00
Yongseok Koh
974f1e7ef1 net/mlx5: add new memory region support
This is the new design of Memory Region (MR) for mlx PMD, in order to:
- Accommodate the new memory hotplug model.
- Support non-contiguous Mempool.

There are multiple layers for MR search.

L0 is to look up the last-hit entry which is pointed by mr_ctrl->mru (Most
Recently Used). If L0 misses, L1 is to look up the address in a fixed-sized
array by linear search. L0/L1 is in an inline function -
mlx5_mr_lookup_cache().

If L1 misses, the bottom-half function is called to look up the address
from the bigger local cache of the queue. This is L2 - mlx5_mr_addr2mr_bh()
and it is not an inline function. Data structure for L2 is the Binary Tree.

If L2 misses, the search falls into the slowest path which takes locks in
order to access global device cache (priv->mr.cache) which is also a B-tree
and caches the original MR list (priv->mr.mr_list) of the device. Unless
the global cache is overflowed, it is all-inclusive of the MR list. This is
L3 - mlx5_mr_lookup_dev(). The size of the L3 cache table is limited and
can't be expanded on the fly due to deadlock. Refer to the comments in the
code for the details - mr_lookup_dev(). If L3 is overflowed, the list will
have to be searched directly bypassing the cache although it is slower.

If L3 misses, a new MR for the address should be created -
mlx5_mr_create(). When it creates a new MR, it tries to register adjacent
memsegs as much as possible which are virtually contiguous around the
address. This must take two locks - memory_hotplug_lock and
priv->mr.rwlock. Due to memory_hotplug_lock, there can't be any
allocation/free of memory inside.

In the free callback of the memory hotplug event, freed space is searched
from the MR list and corresponding bits are cleared from the bitmap of MRs.
This can fragment a MR and the MR will have multiple search entries in the
caches. Once there's a change by the event, the global cache must be
rebuilt and all the per-queue caches will be flushed as well. If memory is
frequently freed in run-time, that may cause jitter on dataplane processing
in the worst case by incurring MR cache flush and rebuild. But, it would be
the least probable scenario.

To guarantee the most optimal performance, it is highly recommended to use
an EAL option - '--socket-mem'. Then, the reserved memory will be pinned
and won't be freed dynamically. And it is also recommended to configure
per-lcore cache of Mempool. Even though there're many MRs for a device or
MRs are highly fragmented, the cache of Mempool will be much helpful to
reduce misses on per-queue caches anyway.

'--legacy-mem' is also supported.

Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
2018-05-14 22:31:51 +01:00
Yongseok Koh
d561b5dc13 net/mlx5: remove memory region support
This patch removes current support of Memory Region (MR) in order to
accommodate the dynamic memory hotplug patch. This patch can be compiled
but traffic can't flow and HW will raise faults. Subsequent patches will
add new MR support.

Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
2018-05-14 22:31:51 +01:00
Wei Dai
a4996bd89c ethdev: new Rx/Tx offloads API
This patch check if a input requested offloading is valid or not.
Any reuqested offloading must be supported in the device capabilities.
Any offloading is disabled by default if it is not set in the parameter
dev_conf->[rt]xmode.offloads to rte_eth_dev_configure() and
[rt]x_conf->offloads to rte_eth_[rt]x_queue_setup().
If any offloading is enabled in rte_eth_dev_configure() by application,
it is enabled on all queues no matter whether it is per-queue or
per-port type and no matter whether it is set or cleared in
[rt]x_conf->offloads to rte_eth_[rt]x_queue_setup().
If a per-queue offloading hasn't be enabled in rte_eth_dev_configure(),
it can be enabled or disabled for individual queue in
ret_eth_[rt]x_queue_setup().
A new added offloading is the one which hasn't been enabled in
rte_eth_dev_configure() and is reuqested to be enabled in
rte_eth_[rt]x_queue_setup(), it must be per-queue type,
otherwise trigger an error log.
The underlying PMD must be aware that the requested offloadings
to PMD specific queue_setup() function only carries those
new added offloadings of per-queue type.

This patch can make above such checking in a common way in rte_ethdev
layer to avoid same checking in underlying PMD.

This patch assumes that all PMDs in 18.05-rc2 have already
converted to offload API defined in 17.11 . It also assumes
that all PMDs can return correct offloading capabilities
in rte_eth_dev_infos_get().

In the beginning of [rt]x_queue_setup() of underlying PMD,
add offloads = [rt]xconf->offloads |
dev->data->dev_conf.[rt]xmode.offloads; to keep same as offload API
defined in 17.11 to avoid upper application broken due to offload
API change.
PMD can use the info that input [rt]xconf->offloads only carry
the new added per-queue offloads to do some optimization or some
code change on base of this patch.

Signed-off-by: Wei Dai <wei.dai@intel.com>
Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
2018-05-14 22:31:51 +01:00
Yongseok Koh
df428ceef4 net/mlx5: change device reference for secondary process
rte_eth_devices[] is not shared between primary and secondary process, but
a static array to each process. The reverse pointer of device (priv->dev)
is invalid. Instead, priv has the pointer to shared data of the device,
  struct rte_eth_dev_data *dev_data;

Two macros are added,
  #define PORT_ID(priv) ((priv)->dev_data->port_id)
  #define ETH_DEV(priv) (&rte_eth_devices[PORT_ID(priv)])

Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
2018-05-14 22:31:51 +01:00