This feature enables the TX burst function to emit up to 5 packets using
only two work queue entries (WQEs) on devices that support it. Saves PCI
bandwidth and improves performance.
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Signed-off-by: Olga Shern <olgas@mellanox.com>
Implement send inline feature which copies packet data directly into
work queue entries (WQEs) for improved latency. The maximum packet
size and the minimum number of Tx queues to qualify for inline send
are user-configurable.
This feature is effective when HW causes a performance bottleneck.
Signed-off-by: Yaacov Hazan <yaacovh@mellanox.com>
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Replacing the variable countdown (which depends on the number of
descriptors) with a fixed relative threshold known at compile time improves
performance by reducing the TX queue structure footprint and the amount of
code to manage completions during a burst.
Completions are now requested at most once per burst after threshold is
reached.
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Signed-off-by: Vasily Philipov <vasilyf@mellanox.com>
Mini (compressed) completion queue entries (CQEs) are returned by the
NIC when PCI back pressure is detected, in which case the first CQE64
contains common packet information followed by a number of CQE8
providing the rest, followed by a matching number of empty CQE64
entries to be used by software for decompression.
Before decompression:
0 1 2 6 7 8
+-------+ +---------+ +-------+ +-------+ +-------+ +-------+
| CQE64 | | CQE64 | | CQE64 | | CQE64 | | CQE64 | | CQE64 |
|-------| |---------| |-------| |-------| |-------| |-------|
| ..... | | cqe8[0] | | | . | | | | | ..... |
| ..... | | cqe8[1] | | | . | | | | | ..... |
| ..... | | ....... | | | . | | | | | ..... |
| ..... | | cqe8[7] | | | | | | | | ..... |
+-------+ +---------+ +-------+ +-------+ +-------+ +-------+
After decompression:
0 1 ... 8
+-------+ +-------+ +-------+
| CQE64 | | CQE64 | | CQE64 |
|-------| |-------| |-------|
| ..... | | ..... | . | ..... |
| ..... | | ..... | . | ..... |
| ..... | | ..... | . | ..... |
| ..... | | ..... | | ..... |
+-------+ +-------+ +-------+
This patch does not perform the entire decompression step as it would be
really expensive, instead the first CQE64 is consumed and an internal
context is maintained to interpret the following CQE8 entries directly.
Intermediate empty CQE64 entries are handed back to HW without further
processing.
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Signed-off-by: Olga Shern <olgas@mellanox.com>
Signed-off-by: Vasily Philipov <vasilyf@mellanox.com>
The intent is to replace the remaining compile-time options and environment
variables with a common mean of runtime configuration. This commit only
adds the kvargs handling code, subsequent commits will update the rest.
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
These structures and macros extend those exposed by libmlx5 (in mlx5_hw.h)
to let the PMD manage work queue and completion queue elements directly.
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
The latest version of Mellanox OFED exposes hardware definitions necessary
to implement data path operation bypassing Verbs. Update the minimum
version requirement to MLNX_OFED >= 3.3 and clean up compatibility checks
for previous releases.
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
To keep the data path as efficient as possible, move fields only useful to
the control path into new structure rxq_ctrl.
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
To keep the data path as efficient as possible, move fields only useful to
the control path into new structure txq_ctrl.
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Inline TX will be fully managed by the PMD after Verbs is bypassed in the
data path. Remove the current code until then.
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
There is no scatter/gather support anymore, CONFIG_RTE_LIBRTE_MLX5_SGE_WR_N
has no purpose and can be removed.
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
This is done in preparation of bypassing Verbs entirely for the data path
as a performance improvement. RX scatter cannot be maintained during the
transition and will be reimplemented later.
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
This is done in preparation of bypassing Verbs entirely for the data path
as a performance improvement. TX gather cannot be maintained during the
transition and will be reimplemented later.
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Except for the first time when memory registration occurs, the lkey is
always cached. Since memory registration is slow and performs system calls,
performance can be improved by moving that code to its own function outside
of the data path so only the lookup code is left in the original inlined
function.
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Use RTE_PCI_DEVICE macro to set all fields rather than explicitly setting
them individually in the code. This shortens the code while helping to
future-proof against future changes to the rte_pci_id structure.
Fixes: 701c8d80c820 ("pci: support class id probing")
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
SR-IOV mode is currently set when dealing with VF devices. PF devices must
be taken into account as well if they have active VFs.
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
A hardware capability check is missing before enabling RX VLAN stripping
during queue setup.
Also, while dev_conf.rxmode.hw_vlan_strip is currently a single bit that
can be stored in priv->hw_vlan_strip directly, it should be interpreted as
a boolean value for safety.
Fixes: f3db9489188a ("mlx5: support Rx VLAN stripping")
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
There is no guarantee that the new MTU is effective after writing its value
to sysfs. Retrieve it to be sure.
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Memory regions are always local with raw Ethernet queues, drop the remote
property as it adds extra processing on the hardware side.
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Since _BSD_SOURCE was deprecated in favor of _DEFAULT_SOURCE in Glibc 2.19
and entirely removed in 2.20, various BSD ioctl macros are not exposed
anymore when _XOPEN_SOURCE is defined, and linux/if.h now conflicts with
net/if.h.
Add _DEFAULT_SOURCE and keep _BSD_SOURCE for compatibility with older
versions.
Suggested-by: Bruce Richardson <bruce.richardson@intel.com>
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
The behavior of PKT_RX_VLAN_PKT was not very well defined, resulting in
PMDs not advertising the same flags in similar conditions.
Following discussion in [1], introduce 2 new flags PKT_RX_VLAN_STRIPPED
and PKT_RX_QINQ_STRIPPED that are better defined:
PKT_RX_VLAN_STRIPPED: a vlan has been stripped by the hardware and its
tci is saved in mbuf->vlan_tci. This can only happen if vlan stripping
is enabled in the RX configuration of the PMD.
For now, the old flag PKT_RX_VLAN_PKT is kept but marked as deprecated.
It should be removed from applications and PMDs in a future revision.
This patch also updates the drivers. For PKT_RX_VLAN_PKT:
- e1000, enic, i40e, mlx5, nfp, vmxnet3: done, PKT_RX_VLAN_PKT already
had the same meaning than PKT_RX_VLAN_STRIPPED, minor update is
required.
- fm10k: done, PKT_RX_VLAN_PKT already had the same meaning than
PKT_RX_VLAN_STRIPPED, and vlan stripping is always enabled on fm10k.
- ixgbe: modification done (vector and normal), the old flag was set
when a vlan was recognized, even if vlan stripping was disabled.
- the other drivers do not support vlan stripping.
For PKT_RX_QINQ_PKT, it was only supported on i40e, and the behavior was
already correct, so we can reuse the same bit value for
PKT_RX_QINQ_STRIPPED.
[1] http://dpdk.org/ml/archives/dev/2016-April/037837.html,
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Some architectures (ex: Power8) have a cache line size of 128 bytes,
so the drivers should not expect that prefetching the second part of
the mbuf with rte_prefetch0(&m->cacheline1) is valid.
This commit add helpers that can be used by drivers to prefetch the
rx or tx part of the mbuf, whatever the cache line size.
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Reviewed-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Introduce rte_mempool_populate_default() which allocates
mempool objects in several memzones.
The mempool header is now always allocated in a specific memzone
(not with its objects). Thanks to this modification, we can remove
many specific behavior that was required when hugepages are not
enabled in case we are using rte_mempool_xmem_create().
This change requires to update how kni and mellanox drivers lookup for
mbuf memory. For now, this will only work if there is only one memory
chunk (like today), but we could make use of rte_mempool_mem_iter() to
support more memory chunks.
We can also remove RTE_MEMPOOL_OBJ_NAME that is not required anymore for
the lookup, as memory chunks are referenced by the mempool.
Note that rte_mempool_create() is still broken (it was the case before)
when there is no hugepages support (rte_mempool_create_xmem() has to be
used). This is fixed in next commit.
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Do not use paddr table to store the mempool memory chunks.
This will allow to have several chunks with different virtual addresses.
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Now that the mempool objects are chained into a list, we can use it to
browse them. This implies a rework of rte_mempool_obj_iter() API, that
does not need to take as many arguments as before. The previous function
is kept as a private function, and renamed in this commit. It will be
removed in a next commit of the patch series.
The only internal users of this function are the mellanox drivers. The
code is updated accordingly.
Introducing an API compatibility for this function has been considered,
but it is not easy to do without keeping the old code, as the previous
function could also be used to browse elements that were not added in a
mempool. Moreover, the API is already be broken by other patches in this
version.
The library version was already updated in
commit 213af31e0960 ("mempool: reduce structure size if no cache needed")
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
This commit removes the const qualifier for the mempool in
rte_mempool_walk() callback prototype.
Indeed, most functions that can be done on a mempool require a non-const
mempool pointer, except the dump and the audit. Therefore, the
mempool_walk() is more useful if the mempool pointer is not const.
This is required by next commit where the mellanox drivers use
rte_mempool_walk() to iterate the mempools, then rte_mempool_obj_iter()
to iterate the objects in each mempool.
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Many drivers provide their own implementation of rte_mbuf_raw_alloc(),
duplicating the code. Introduce a new public function in rte_mbuf to
allocate a raw mbuf (uninitialized).
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
The link speed configuration is now done with bitmaps so 100G speed
requires only a new bit flag.
The actual link speed is a number so its size must be increased from
16-bit to 32-bit.
Signed-off-by: Marc Sune <marcdevel@gmail.com>
Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Tested-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Tested-by: Matej Vido <vido@cesnet.cz>
This patch redesigns the API to set the link speed/s configuration
of an ethernet port. Specifically:
- it allows to define a set of advertised speeds for
auto-negociation.
- it allows to disable link auto-negociation (single fixed speed).
- default: auto-negociate all supported speeds.
A flag autoneg in struct rte_eth_link indicates if link speed was a
result of auto-negociation or was fixed by configuration.
Signed-off-by: Marc Sune <marcdevel@gmail.com>
Tested-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Tested-by: Beilei Xing <beilei.xing@intel.com>
Tested-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
The speed capabilities of a device can be retrieved with
rte_eth_dev_info_get().
The new field speed_capa is initialized in the drivers without
taking care of device characteristics in this patch.
When the capabilities of a driver are accurate, the table in
overview.rst must be filled.
Signed-off-by: Marc Sune <marcdevel@gmail.com>
When the number of RX queues is not a power of two,
the RETA table is configured to its maximum size for
better balancing.
Testing showed that limiting its size to 256 improves performance
noticeably with little to no impact on balancing results.
Fixes: ebb30ec64a68 ("mlx5: increase RETA table size")
Signed-off-by: Yaacov Hazan <yaacovh@mellanox.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Once freed, completed mbufs pointers are not set to NULL in the TX queue.
Clean up function must take this into account.
Fixes: 2e22920b85d9 ("mlx5: support non-scattered Tx and Rx")
Fixes: 7fae69eeff13 ("mlx4: new poll mode driver")
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Update function can be called with no key to enable or disable a RSS
protocol, or with a key to be applied to the desired protocols.
Fixes: 2f97422e7759 ("mlx5: support RSS hash update and get")
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
RSS configuration provided by the application should not be used as storage
by the PMD.
Fixes: 2f97422e7759 ("mlx5: support RSS hash update and get")
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
VLAN insertion can be done in hardware when supported in Verbs. A software
fallback is provided otherwise. The software implementation is also used
when multi-packet send is enabled on a queue, as both features are mutually
exclusive.
Signed-off-by: Yaacov Hazan <yaacovh@mellanox.com>
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Environment variable MLX5_PMD_ENABLE_PADDING enables HW packet padding
in PCI bus transactions.
When packet size is cache aligned and CRC stripping is enabled, 4 fewer
bytes are written to the PCI bus. Enabling padding makes such packets
aligned again.
In cases where PCI bandwidth is the bottleneck, padding can improve
performance by 10%.
This is disabled by default since this can also decrease performance for
unaligned packet sizes.
Signed-off-by: Olga Shern <olgas@mellanox.com>
fix packet padding macro check
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Secondary processes are expected to use queues and other resources
allocated by the primary, however Verbs resources can only be shared
between processes when inherited through fork().
This limitation can be worked around for TX by configuring separate queues
from secondary processes.
Signed-off-by: Or Ami <ora@mellanox.com>
Add driver functions to set link state up or down.
Burst functions are updated to make sure applications cannot attempt to
send/receive after link is brought down.
Signed-off-by: Or Ami <ora@mellanox.com>
Add a new API rte_eth_dev_get_supported_ptypes to query what packet types
can be filled by a given device. The device should be already started or
its PMD RX burst function already decided, since the packet types supported
may vary depending on RX function.
Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Change rxq_cq_to_ol_flags() to set checksum flags according to packet type,
so for non L3/L4 packets the mbuf chksum_bad flags will not be set.
Fixes: 67fa62bc672d ("mlx5: support checksum offload")
Signed-off-by: Yaacov Hazan <yaacovh@mellanox.com>