Retrieving link status information through the link update callback should
be quick and non-blocking.
Mellanox PMDs retrieve this information through ioctl() calls on the
related kernel netdevice. This appears to take a long time to
complete and may cause significant slowdowns in applications.
While these system calls cannot be accelerated, removing the lock on the
private structure allows applications to perform other control operations
from separate threads in the meantime. This function remains safe without
locking as it does not write the private structure, it is only used to
retrieve the name of the netdevice.
Signed-off-by: Matthieu Ternisien d'Ouville <matthieu.tdo@6wind.com>
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Add PCI device ID for ConnectX-5 and enable multi-packet send for PF and VF
along with changing documentation and release note.
Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
This makes struct rte_eth_dev independent of struct rte_pci_device by
replacing it with a pointer to the generic struct rte_device.
Signed-off-by: Jan Blunck <jblunck@infradead.org>
Acked-by: Shreyansh Jain <shreyansh.jain@nxp.com>
This moves the non-PCI related initialization of the link state interrupt
callback list and the setting of the default MTU to rte_eth_dev_allocate()
so that drivers only need to set non-default values.
Signed-off-by: Jan Blunck <jblunck@infradead.org>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
Lets clear the eth_dev->data when allocating a new rte_eth_dev so that
drivers only need to set non-zero values.
Signed-off-by: Jan Blunck <jblunck@infradead.org>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
Add a new macro RTE_PMD_REGISTER_KMOD_DEP() that allows a driver to
declare the list of kernel modules required to run properly.
Today, most PCI drivers require uio/vfio.
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Fiona Trahe <fiona.trahe@intel.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
This commit fixes link status report on device start up when
lcs callback is configured.
Fixes: 62072098b5 ("mlx5: support setting link up or down")
Signed-off-by: Olga Shern <olgas@mellanox.com>
All macros related to driver registeration renamed from DRIVER_*
to RTE_PMD_*
This includes:
DRIVER_REGISTER_PCI -> RTE_PMD_REGISTER_PCI
DRIVER_REGISTER_PCI_TABLE -> RTE_PMD_REGISTER_PCI_TABLE
DRIVER_REGISTER_VDEV -> RTE_PMD_REGISTER_VDEV
DRIVER_REGISTER_PARAM_STRING -> RTE_PMD_REGISTER_PARAM_STRING
DRIVER_EXPORT_* -> RTE_PMD_EXPORT_*
Fix PMDINFOGEN tool to look for matches of RTE_PMD_REGISTER_*.
Signed-off-by: Shreyansh Jain <shreyansh.jain@nxp.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
With recent gcc versions, e.g. gcc 6.1, compilation of mlx drivers with
debug enabled produces lots of errors complaining that "pedantic" is
not a warning level that can be ignored.
error: ‘-pedantic’ is not an option that controls warnings [-Werror=pragmas]
#pragma GCC diagnostic ignored "-pedantic"
^~~~~~~~~~~
These errors can be removed by changing the "-pedantic" to "-Wpedantic".
Fixes: 7fae69eeff ("mlx4: new poll mode driver")
Fixes: 771fa900b7 ("mlx5: introduce new driver for Mellanox ConnectX-4 adapters")
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Now that rte_device is available, drivers can start using its members
(numa, name) as well as link themselves into another rte_device list.
As of now no one is using this list, but can be used for moving over all
devices (pdev/vdev/Xdev) and perform bulk actions (like cleanup).
Signed-off-by: Jan Viktorin <viktorin@rehivetech.com>
[Shreyansh: Reword commit log for extra rte_device list]
Signed-off-by: Shreyansh Jain <shreyansh.jain@nxp.com>
Acked-by: David Marchand <david.marchand@6wind.com>
Remove the 'name' member from rte_pci_driver and move to generic
rte_driver.
Most of the PMD drivers were initially using DRIVER_REGISTER_PCI(<name>..)
as well as assigning a name to eth_driver.pci_drv.name member.
In this patch, only the original DRIVER_REGISTER_PCI(<name>..) name has
been populated into the rte_driver.name member - assignments through
eth_driver has been removed.
Signed-off-by: Jan Viktorin <viktorin@rehivetech.com>
[Shreyansh: Rebase and expand changes to newly added files]
Signed-off-by: Shreyansh Jain <shreyansh.jain@nxp.com>
Acked-by: David Marchand <david.marchand@6wind.com>
Now that hotplug has been moved to eal, there is no reason to keep the
device type in this layer.
Signed-off-by: David Marchand <david.marchand@6wind.com>
Signed-off-by: Shreyansh Jain <shreyansh.jain@nxp.com>
Simplify crypto and ethdev pci drivers init by using newly introduced
init macros and helpers.
Those drivers then don't need to register as "rte_driver"s anymore.
Exceptions:
- virtio and mlx* use RTE_INIT directly as they have custom initialization
steps.
- VDEV devices are not modified - they continue to use PMD_REGISTER_DRIVER.
Update documentation for replacing an example referring to
PMD_REGISTER_DRIVER.
Signed-off-by: David Marchand <david.marchand@6wind.com>
Signed-off-by: Shreyansh Jain <shreyansh.jain@nxp.com>
Probe and Remove are more appropriate names for PCI init and uninint
operations. This is a cosmetic change.
Only MLX* uses the PCI direct registration, bypassing PMD_* macro.
The callbacks for this too have been updated.
VDEV are left out. For them, init/uninit are more appropriate.
Suggested-by: David Marchand <david.marchand@6wind.com>
Signed-off-by: Shreyansh Jain <shreyansh.jain@nxp.com>
Acked-by: David Marchand <david.marchand@6wind.com>
As discussed in the past release, driver names are modified
to be more consistent, and the future driver should follow
this new convention.
Driver names consist of:
"driver category"_"driver folder name"_"optional extra name".
For example:
- Crypto null driver -> "crypto_null"
- Network IXGBE VF driver -> "net_ixgbe_vf"
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Since now the PMD_REGISTER_DRIVER macro sets the driver names,
there is no need to have the rte_driver structure setting it
statically, as it will get overridden.
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Compilation fails because of some typos.
Fixes: cb6696d220 ("drivers: update registration macro usage")
Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Tested-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Modify the PMD_REGISTER_DRIVER macro, adding a name argument to it. The
addition of a name argument creates a token that can be used for subsequent
macros in the creation of unique symbol names to export additional bits of
information for use by the pmdinfogen tool. For example:
PMD_REGISTER_DRIVER(ena_driver, ena);
registers the ena_driver struct as it always did, and creates a symbol
const char this_pmd_name0[] __attribute__((used)) = "ena";
which pmdinfogen can search for and extract. The subsequent macro
DRIVER_REGISTER_PCI_TABLE(ena, ena_pci_id_map);
creates a symbol const char ena_pci_tbl_export[] __attribute__((used)) =
"ena_pci_id_map";
Which allows pmdinfogen to find the pci table of this driver
Using this pattern, we can export arbitrary bits of information.
pmdinfo uses this information to extract hardware support from an object
file and create a json string to make hardware support info discoverable
later.
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Panu Matilainen <pmatilai@redhat.com>
Acked-by: Remy Horton <remy.horton@intel.com>
This feature enables the TX burst function to emit up to 5 packets using
only two work queue entries (WQEs) on devices that support it. Saves PCI
bandwidth and improves performance.
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Signed-off-by: Olga Shern <olgas@mellanox.com>
Implement send inline feature which copies packet data directly into
work queue entries (WQEs) for improved latency. The maximum packet
size and the minimum number of Tx queues to qualify for inline send
are user-configurable.
This feature is effective when HW causes a performance bottleneck.
Signed-off-by: Yaacov Hazan <yaacovh@mellanox.com>
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Mini (compressed) completion queue entries (CQEs) are returned by the
NIC when PCI back pressure is detected, in which case the first CQE64
contains common packet information followed by a number of CQE8
providing the rest, followed by a matching number of empty CQE64
entries to be used by software for decompression.
Before decompression:
0 1 2 6 7 8
+-------+ +---------+ +-------+ +-------+ +-------+ +-------+
| CQE64 | | CQE64 | | CQE64 | | CQE64 | | CQE64 | | CQE64 |
|-------| |---------| |-------| |-------| |-------| |-------|
| ..... | | cqe8[0] | | | . | | | | | ..... |
| ..... | | cqe8[1] | | | . | | | | | ..... |
| ..... | | ....... | | | . | | | | | ..... |
| ..... | | cqe8[7] | | | | | | | | ..... |
+-------+ +---------+ +-------+ +-------+ +-------+ +-------+
After decompression:
0 1 ... 8
+-------+ +-------+ +-------+
| CQE64 | | CQE64 | | CQE64 |
|-------| |-------| |-------|
| ..... | | ..... | . | ..... |
| ..... | | ..... | . | ..... |
| ..... | | ..... | . | ..... |
| ..... | | ..... | | ..... |
+-------+ +-------+ +-------+
This patch does not perform the entire decompression step as it would be
really expensive, instead the first CQE64 is consumed and an internal
context is maintained to interpret the following CQE8 entries directly.
Intermediate empty CQE64 entries are handed back to HW without further
processing.
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Signed-off-by: Olga Shern <olgas@mellanox.com>
Signed-off-by: Vasily Philipov <vasilyf@mellanox.com>
The intent is to replace the remaining compile-time options and environment
variables with a common mean of runtime configuration. This commit only
adds the kvargs handling code, subsequent commits will update the rest.
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
The latest version of Mellanox OFED exposes hardware definitions necessary
to implement data path operation bypassing Verbs. Update the minimum
version requirement to MLNX_OFED >= 3.3 and clean up compatibility checks
for previous releases.
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
To keep the data path as efficient as possible, move fields only useful to
the control path into new structure rxq_ctrl.
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
To keep the data path as efficient as possible, move fields only useful to
the control path into new structure txq_ctrl.
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Use RTE_PCI_DEVICE macro to set all fields rather than explicitly setting
them individually in the code. This shortens the code while helping to
future-proof against future changes to the rte_pci_id structure.
Fixes: 701c8d80c8 ("pci: support class id probing")
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
SR-IOV mode is currently set when dealing with VF devices. PF devices must
be taken into account as well if they have active VFs.
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
VLAN insertion can be done in hardware when supported in Verbs. A software
fallback is provided otherwise. The software implementation is also used
when multi-packet send is enabled on a queue, as both features are mutually
exclusive.
Signed-off-by: Yaacov Hazan <yaacovh@mellanox.com>
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Environment variable MLX5_PMD_ENABLE_PADDING enables HW packet padding
in PCI bus transactions.
When packet size is cache aligned and CRC stripping is enabled, 4 fewer
bytes are written to the PCI bus. Enabling padding makes such packets
aligned again.
In cases where PCI bandwidth is the bottleneck, padding can improve
performance by 10%.
This is disabled by default since this can also decrease performance for
unaligned packet sizes.
Signed-off-by: Olga Shern <olgas@mellanox.com>
fix packet padding macro check
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Secondary processes are expected to use queues and other resources
allocated by the primary, however Verbs resources can only be shared
between processes when inherited through fork().
This limitation can be worked around for TX by configuring separate queues
from secondary processes.
Signed-off-by: Or Ami <ora@mellanox.com>
Add driver functions to set link state up or down.
Burst functions are updated to make sure applications cannot attempt to
send/receive after link is brought down.
Signed-off-by: Or Ami <ora@mellanox.com>
Add a new API rte_eth_dev_get_supported_ptypes to query what packet types
can be filled by a given device. The device should be already started or
its PMD RX burst function already decided, since the packet types supported
may vary depending on RX function.
Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
RSS configuration should not be freed when priv is NULL.
Fixes: 2f97422e77 ("mlx5: support RSS hash update and get")
Signed-off-by: Or Ami <ora@mellanox.com>
Allows HW to strip the 802.1Q header from incoming frames and report it
through the mbuf structure.
This feature requires MLNX_OFED >= 3.2.
Signed-off-by: Yaacov Hazan <yaacovh@mellanox.com>
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Until now, broadcast frames were handled like unicast. Moving the related
flow to the special flows table frees up the related unicast MAC entry.
The same method is used to handle IPv6 multicast frames.
Signed-off-by: Yaacov Hazan <yaacovh@mellanox.com>
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Merge redundant code by adding a static initialization table to manage
promiscuous and allmulticast (special) flows.
New function priv_rehash_flows() implements the logic to enable/disable
relevant flows in one place from any context.
Signed-off-by: Yaacov Hazan <yaacovh@mellanox.com>
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
The following error occurs when CONFIG_RTE_LIBRTE_MLX5_DEBUG=y:
drivers/net/mlx5/mlx5.c:381:4: error: ISO C forbids braced-groups within expressions
RTE_MIN() uses the non-standard ({ ... }) syntax to declare variables within
parentheses, which is rejected by -pedantic.
Since the RSS_INDIRECTION_TABLE_SIZE check is meant to go away as soon as
DPDK supports larger/variable indirection tables, put it in a separate
condition.
Fixes: 634efbc2c8 ("mlx5: support RETA query and update")
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Use new function rte_eth_copy_pci_info.
Copy device info for the following pdevs:
bnx2x
cxgbe
e1000
enic
fm10k
i40e
ixgbe
mlx4
mlx5
virtio
vmxnet3
Signed-off-by: Bernard Iremonger <bernard.iremonger@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
ConnectX-4 adapters do not have a constant indirection table size, which is
set at runtime from the number of RX queues. The maximum size is retrieved
using a hardware query and is normally 512.
Since the current RETA API cannot handle a variable size, any query/update
command causes it to be silently updated to RSS_INDIRECTION_TABLE_SIZE
entries regardless of the original size.
Also due to the underlying type of the configuration structure, the maximum
size is limited to RSS_INDIRECTION_TABLE_SIZE (currently 128, at most 256
entries).
A port stop/start must be done to apply the new RETA configuration.
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Add interrupts handler for port status notification.
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
DPDK expects to have an RSS hash key per flow type (IPv4, IPv6, UDPv4,
etc.), to handle this the PMD must keep a table of hash keys to be able
to reconfigure the queues at each start/stop call.
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
First implementation of rss_hash_update and rss_hash_conf_get, those
functions still lack in functionality but are usable to change the RSS
hash key. For now, the PMD does not handle an indirection table for
each kind of flow (IPv4, IPv6, etc.), the same RSS hash key is used
for all protocols. This situation explains why the rss_hash_conf_get
returns the RSS hash key for all DPDK supported protocols and why the
hash key is set for all of them too.
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Use the maximum size of the indirection table when the number of requested
RX queues is not a power of two, this help to improve RSS balancing.
A message informs users that balancing is not optimal in such cases.
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
The new Verbs RSS API is lower-level than the previous one and much more
flexible but requires RX queues to use Work Queues (WQs) internally instead
of Queue Pairs (QPs), which are grouped in an indirection table used by a
new kind of hash RX QPs.
Hash RX QPs and the indirection table together replace the parent RSS QP
while WQs are mostly similar to child QPs.
RSS hash key is not configurable yet.
Summary of changes:
- Individual DPDK RX queues do not store flow properties anymore, this info
is now part of the hash RX queues.
- All functions affecting the parent queue when RSS is enabled or the basic
queues otherwise are modified to affect hash RX queues instead.
- Hash RX queues are also used when a single DPDK RX queue is configured (no
RSS) to remove that special case.
- Hash RX queues and indirection table are created/destroyed when device
is started/stopped in addition to create/destroy flows.
- Contrary to QPs, WQs are moved to the "ready" state before posting RX
buffers, otherwise they are ignored.
- Resource domain information is added to WQs for better performance.
- CQs are not resized anymore when switching between non-SG and SG modes as
it does not work correctly with WQs. Use the largest possible size
instead, since CQ size does not have to be the same as the number of
elements in the RX queue. This also applies to the maximum number of
outstanding WRs in a WQ (max_recv_wr).
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Signed-off-by: Olga Shern <olgas@mellanox.com>
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Signed-off-by: Or Ami <ora@mellanox.com>
Signed-off-by: Yaacov Hazan <yaacovh@mellanox.com>
All MAC RX flows must be updated with VLAN information when configuring a
VLAN filter.
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>