This commit convert jump resource to indexed.
The table data struct is allocated from indexed memory. As it is add in
the hash list, the pointer is still used for hash list search. The index
is added to the table struct, and the pointer in flow handle is decrease
to uint32_t type. For flow without jump flows, it saves 4 bytes memory.
Signed-off-by: Suanming Mou <suanmingm@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
This commit converts port id action to indexed.
Using the uint32_t index instead of pointer saves 4 bytes memory for the
flow handle. For millions flows, it will save several MBytes of memory.
Signed-off-by: Suanming Mou <suanmingm@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
This commit convert tag resource to indexed.
As tag resources are add in the hash list, to avoid introduce
performance issue and keep the hash list, only the tag resource memory
is allocated from indexed memory. The resources is still added to the
hash list. Add four bytes index in the tag resource struct and change
the tag resources in the flow handle from pointer to uint32_t seems be
no benefit for tag resource, but it saves memory for flows without tag
action. And also for sub flows share one tag action resource.
Signed-off-by: Suanming Mou <suanmingm@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
This commit converts the push VLAN resource to indexed.
Using the uint32_t index instead of pointer saves 4 bytes memory for the
flow handle. For millions flows, it will save several MBytes of memory.
Signed-off-by: Suanming Mou <suanmingm@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
This commit converts the flow encap/decap resource to indexed.
Using the uint32_t index instead of pointer saves 4 bytes memory for the
flow handle. For millions flows, it will save several MBytes of memory.
Signed-off-by: Suanming Mou <suanmingm@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Refactor common memory btree and cache management to common driver.
Replace some input parameters of MR APIs to more common data structure
like PD, port_id, share_cache,... so that multiple PMD drivers can
use those MR APIs.
Modify mlx5 net pmd driver to use MR management APIs from common driver.
Signed-off-by: Vu Pham <vuhuong@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Refactor common multi-process handling codes from net PMD to common
driver. Using tuple mp_id{name, port_id} as standard input parameter
for all multi-process IPC APIs instead of using rte_eth_dev.
Modify net PMD to use multi-process APIs from common driver.
Signed-off-by: Vu Pham <vuhuong@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Currently, the meter suffix table is created and saved in the mlx5
shared struct. It causes the suffix table will never be released
even without any meter rules.
Move the suffix table to meter domain struct to help the suffix table
be released when all the meter rules are destroyed.
Fixes: 46a5e6bc6a ("net/mlx5: prepare meter flow tables")
Cc: stable@dpdk.org
Signed-off-by: Suanming Mou <suanmingm@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Define a device parameter to configure log 2 of a stride size for MPRQ
- mprq_log_stride_size. User is able to specify a stride size in a range
allowed by an underlying hardware. The default stride size is defined as
2048 bytes to encompass most commonly used packet sizes in the Internet
(MTU 1518 and less) and will be used in case a maximum configured packet
size cannot fit into the largest possible stride size. Otherwise a
stride size is set to a large enough value to encompass a whole packet.
Cc: stable@dpdk.org
Signed-off-by: Alexander Kozyrev <akozyrev@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
When creating a flow, usually the creating routine is called in
serial. No parallel execution is supported right now. The same
function will be called only once for a single flow creation.
But there is a special case that the creating routine will be called
nested. If the xmeta feature is enabled and there is FLAG / MARK in
the actions list, some metadata reg copy flow needs to be created
before the original flow is applied to the hardware.
In the flow non-cached mode, resources only for flow creation will
not be saved anymore. The memory space is pre-allocated and reused
for each flow. A global index for each device is used to indicate
the memory address of the resources. If the function is called in a
nested mode, then the index will be reset and make everything get
corrupted.
To solve this, a nested index is introduced to save the position for
the original flow creation. Currently, only one level nested call
of the flow creating routine is supported.
Fixes: e7bfa3596a ("net/mlx5: separate the flow handle resource")
Signed-off-by: Bing Zhao <bingz@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Currently, the fallback counter is also allocated from the pool, the
specify fallback function code becomes a bit duplicate.
Reorganize the fallback counter code to make it reuse from the normal
counter code.
Signed-off-by: Suanming Mou <suanmingm@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
Currently, the counter struct saves both the members used by batch
counters and none batch counters. The members which are only used
by none batch counters cost 16 bytes extra memory for batch counters.
As normally there will be limited none batch counters, mix the none
batch counter and batch counter members becomes quite expensive for
batch counter. If 1 million batch counters are created, it means 16 MB
memory which will not be used by the batch counters are allocated.
Split the mlx5_flow_counter struct for batch and none batch counters
helps save the memory.
Signed-off-by: Suanming Mou <suanmingm@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
Currently, DV and verbs counters are both changed to indexed. It means
while creating the flow with counter, flow can save the indexed value to
address the counter.
Save the 4 bytes indexed value in the rte_flow instead of 8 bytes
pointer helps to save memory with millions of flows.
Signed-off-by: Suanming Mou <suanmingm@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
This part of the counter optimize change the DV counter to indexed as
what have already done in verbs. In this case, all the mlx5 flow counter
can be addressed by index.
The counter index is composed of pool index and the counter offset in
the pool counter array. The batch and none batch counter dcs ID offset
0x800000 is used to avoid the mix up for the index. As batch counter dcs
ID starts from 0x800000 and none batch counter dcs starts from 0, the
0x800000 offset is added to the batch counter index to indicate the
index of batch counter.
The counter pointer in rte_flow struct will be aligned to index instead
of pointer. It will save 4 bytes memory for every rte_flow. With
millions of rte_flow, it will save MBytes memory.
Signed-off-by: Suanming Mou <suanmingm@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
This is part of the counter optimize which will save the indexed counter
id instead of the counter pointer in the rte_flow.
Place the verbs counter into the container pool helps the counter to be
indexed correctly independent with the raw counter.
The counter pointer in rte_flow will be changed to indexed value after
the DV counter is also changed to indexed.
Signed-off-by: Suanming Mou <suanmingm@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
Query generation was introduced to avoid counter to be reallocated
before the counter statistics be fully updated. Since the counters
be released between query trigger and query handler may miss the
packets arrived in the trigger and handler gap period. In this case,
user can only allocate the counter while pool query_gen is greater
than the counter query_gen + 1 which indicates a new round of query
finished, the statistic is fully updated.
Split the pool query_gen to start_query_gen and end_query_gen helps
to have a better identify for the counter released in the gap period.
And it helps the counter released before query trigger or after query
handler can be reallocated more efficiently.
Signed-off-by: Suanming Mou <suanmingm@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
The Hw counters is defined as 32bit unsigned value and read from
the sysfs. Firstly read the base value while application start,
then fetch the new value while do query and minus the base value.
If the new value is less than base value, will result in a
negative value and convert to the big value as unsigned 64bit.
PMD add xstats field to store the last successfully read counter,
use it if failed to read hw counter from sysfs.
PMD also record the last output value to handle the wrap around case,
if overflow happened, increase the wrap count by 1 and save into the
higher 32bit, and update the new value into lower 32bit, finally
return the 64bit counter value.
Fixes: ce9494d76c ("net/mlx5: report imissed statistics")
Cc: stable@dpdk.org
Signed-off-by: Jiawei Wang <jiaweiw@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
When creating a hairpin queue, the total data size and the maximal
number of packets are interrelated. The differ is the stride size.
Larger buffer size means big packet like jumbo could be supported,
but in the meanwhile, it will introduce more cache misses and have a
side effect on the performance.
Now a new device parameter "hp_buf_log_sz" is introduced for
applications to set the total data buffer size (the logarithm value).
Then the maximal number of packets will also be calculated
automatically by this value.
Applications could also change this value to a larger one in order
to support larger packets in hairpin case. A smaller value will be
beneficial for memory consumption.
If it is not set, the default value will be used.
Signed-off-by: Bing Zhao <bingz@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Only the members of flow handle structure will be used when trying
to destroy a flow. Other members of mlx5 device flow resource will
only be used for flow creating, and they could be reused for different
flows.
So only the device flow handle structure needs to be saved for further
usage. This could be separated from the whole mlx5 device flow and
stored with a list for each rte flow.
Other members will be pre-allocated with an array, and an index will
be used to help to apply each device flow to the hardware.
The flow handle sizes of Verbs and DV mode will be different, and
some calculation could be done before allocating a verbs handle.
Then the total memory consumption will less for Verbs when there is
no inbox driver being used.
Signed-off-by: Bing Zhao <bingz@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
When stopping a mlx5 device, all the flows inserted will be flushed
since they are with non-cached mode. And no more action will be done
for these flows in the device closing stage.
If the device restarts after stopped, no flow with non-cached mode
will be re-inserted.
The flush operation through rte interface will remain the same, and
all the flows will be flushed actively.
Signed-off-by: Bing Zhao <bingz@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
There is a macro RTE_FINI for destructors,
which is now used where appropriate for consistency.
The destructor function mlx5_pmd_socket_uninit does not need
to be declared separately in mlx5.h.
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
There are RDMA-CORE versions which are not supported multi-table for
some Mellanox mlx5 devices.
Hence, the optimization added in commit [1] which forwards all the FDB
traffic to table 1 cannot be configured.
Make the above optimization optional:
Do not fail when either table 1 cannot be created or the jump rule
(all =>jump to table 1) is not configured successfully.
In this case, all the flows will be configured to table 0.
[1] commit b67b4ecbde ("net/mlx5: skip table zero to improve
insertion rate")
Cc: stable@dpdk.org
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Move Netlink mechanism and its dependencies from net/mlx5 to
common/mlx5 in order to be ready to use by other mlx5 drivers.
The dependencies are BITFIELD defines, the ppc64 compilation workaround
for bool type and the function mlx5_translate_port_name.
Update build mechanism accordingly.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
As an arrangment for Netlink command moving to the common library,
reduce the net/mlx5 dependencies.
Replace ethdev class command parameters.
Improve Netlink sequence number mechanism to be controlled by the
mlx5 Netlink mechanism.
Move mlx5_nl_check_switch_info to mlx5_nl.c since it is the only one
which uses it.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
The Netlink commands interfaces is included in the mlx5.h file with a
lot of other PMD interfaces.
As an arrangement to make the Netlink commands shared with different
PMDs, this patch moves the Netlink interface to a new file called
mlx5_nl.h.
Move non Netlink pure vlan commands from mlx5_nl.c to the
mlx5_vlan.c.
Rename Netlink commands and structures to use prefix mlx5_nl.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Move the vendor information, vendor ID and device IDs from net/mlx5 PMD
to the common mlx5 file.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Move PCI detection by IB device from mlx5 PMD to the common code.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
A new Mellanox vdpa PMD will be added to support vdpa operations by
Mellanox adapters.
This vdpa PMD design includes mlx5_glue and mlx5_devx operations and
large parts of them are shared with the net/mlx5 PMD.
Create a new common library in drivers/common for mlx5 PMDs.
Move mlx5_glue, mlx5_devx_cmds and their dependencies to the new mlx5
common library in drivers/common.
The files mlx5_devx_cmds.c, mlx5_devx_cmds.h, mlx5_glue.c,
mlx5_glue.h and mlx5_prm.h are moved as is from drivers/net/mlx5 to
drivers/common/mlx5.
Share the log mechanism macros.
Separate also the log mechanism to allow different log level control to
the common library.
Build files and version files are adjusted accordingly.
Include lines are adjusted accordingly.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
The DevX commands interface is included in the mlx5.h file with a lot
of other PMD interfaces.
As an arrangement to make the DevX commands shared with different PMDs,
this patch moves the DevX interface to a new file called mlx5_devx_cmds.h.
Also remove shared device structure dependency on DevX commands.
Replace the DevX commands log mechanism from the mlx5 driver log
mechanism to the EAL log mechanism.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Flow with meter will split to three subflows, the prefix subflow with
meter action do the color, the meter subflow filter the packets, the
suffix subflow do all the left actions for packets pass the filter.
Both the color and the subflow match between prefix and suffix use the
register to store the tag.
For some of the NICs with meter color register share capability, it
only uses 8 LSB of the register for color, the left 24 MSB can be used
for flow id match between meter prefix subflow and suffix subflow.
Currently, one entire register is allocated for flow matching which
causes the NICs with limited registers don't have enough register for
other matching.
Add the meter color share capability checking to fix lacking of
registers issue.
Fixes: 9ea9b049a9 ("net/mlx5: split meter flow")
Cc: stable@dpdk.org
Signed-off-by: Suanming Mou <suanmingm@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
The id allocated is for the register unique id match. Some registers may
not use the full 32 bits. Add the maximum id to avoid allocate id over
the register restriction.
Signed-off-by: Suanming Mou <suanmingm@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
This patch adds to MLX5 PMD support of matching on GTP item,
fields msg_type and teid, according to RFC [1].
GTP item validation and translation functions are added and called.
GTP tunnel type is added to supported tunnels.
[1] http://mails.dpdk.org/archives/dev/2019-December/152799.html
Signed-off-by: Dekel Peled <dekelp@mellanox.com>
Acked-by: Ori Kam <orika@mellanox.com>
Add pmd unix socket server to enable external tool applications to
trigger flow dump.
Socket path:
/var/tmp/dpdk_mlx5_<pid>
Socket format:
io_raw: port_id of uint16
file: file descriptor of int
Signed-off-by: Xueming Li <xuemingl@mellanox.com>
Signed-off-by: Xiaoyu Min <jackmin@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Dump fdb/nic_rx/nic_tx raw flow data into specified file.
Signed-off-by: Xueming Li <xuemingl@mellanox.com>
Signed-off-by: Xiaoyu Min <jackmin@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Maximal size of coalesced LRO segment is set in TIR attributes as
number of chunks of size 256 bytes each.
Current implementation uses the hardcoded value 256 in several places.
This patch adds a definition for this value, and uses this definition
in all relevant places.
A debug message is added to clearly notify the actual configured size.
Signed-off-by: Dekel Peled <dekelp@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
This patch implements use of the API for LRO aggregated packet
max size.
Rx queue create is updated to use the relevant configuration.
Documentation is updated accordingly.
Signed-off-by: Dekel Peled <dekelp@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
Allow to configure the default MAC address of a VF
via its representor port in the host.
An API was proposed to specify explicitly the VF as a
target: https://patches.dpdk.org/patch/62176/
It has been rejected by the technical board in order to
keep compatibility with behavior in Intel PMDs.
http://mails.dpdk.org/archives/dev/2019-November/150588.html
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Signed-off-by: Raslan Darawsheh <rasland@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Tag action for flow mark/flag could be reused by different flows.
When creating a new flow with mark, the existing tag resources will
be traversed in order to confirm if the action is already created.
If only one linked list is used, the searching rate will drop
significantly with the number of tag actions increasing.
By using a hash lists table, it will speed up the searching process
and in the meanwhile, the memory consumption won't be large if only
a small number tag action resources are created(compared to other
hash table implementations). The list heads array size could be
optimized with some extendable hash table in the future.
Signed-off-by: Bing Zhao <bingz@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Matchers are created on the specific table. If a single linked list
is used to store these, then the finding process might be the
bottleneck when there are a lot of different flow matchers on a
huge amount of tables. The matchers could be move into the table
data resource structure in order to reduce the comparison times
when finding.
Signed-off-by: Bing Zhao <bingz@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Jump object is associated with table object, so there is no need to
use a single linked list to store it. All the jump objects could be
put together with related flow tables.
Signed-off-by: Bing Zhao <bingz@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
In the current flow tables organization, arrays are used. This is
fast for searching, creating related object that will be used in
flow creation. But it introduces some limitation to the table index.
Then we can reorganize the flow tables information with hash list.
When using hash list, there is no need to maintain three arrays for
NIC TX, RX and FDB tables object information.
This attribute could be used together with the table ID to generate
a 64-bits key that is unique for the hash list insertion, lookup and
deletion.
Signed-off-by: Bing Zhao <bingz@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
The rdma core library can map doorbell register in two ways,
depending on the environment variable "MLX5_SHUT_UP_BF":
- as regular cached memory, the variable is either missing or
set to zero. This type of mapping may cause the significant
doorbell register writing latency and requires explicit
memory write barrier to mitigate this issue and prevent
write combining.
- as non-cached memory, the variable is present and set to
not "0" value. This type of mapping may cause performance
impact under heavy loading conditions but the explicit write
memory barrier is not required and it may improve core
performance.
The new devarg is introduced "tx_db_nc", if this parameter is
set to zero, the doorbell register is forced to be mapped to
cached memory and requires explicit memory barrier after
writing to. If "tx_db_nc" is set to non-zero value the doorbell
will be mapped as non-cached memory, not requiring the memory
barrier. If "tx_db_nc" is missing the behaviour will be defined
by presence of "MLX5_SHUT_UP_BF" in environment. If variable
is missed the default value zero will be set for ARM64 hosts
and one for others.
In run time the code checks the mapping type and provides the
memory barrier after writing to tx doorbell register if it is
needed. The mapping type is extracted directly from the
uar_mmap_offset field in the queue properties.
Fixes: 18a1c20044 ("net/mlx5: implement Tx burst template")
Cc: stable@dpdk.org
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
Add the meter attach and detach for the flow create.
When create the flow with meter, first try to find any created meter
action matches the flow meter id. If the meter action is already
created, just attach to it and increase the ref_cnt. If not, create
one.
For the dettach, decrease the ref_cnt, destroy the meter action while
the ref_cnt decreased to zero.
Signed-off-by: Suanming Mou <suanmingm@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
Expose the flow counter management mechanism for other components to
use.
Signed-off-by: Suanming Mou <suanmingm@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
This commit add the basic meter operations for meter create and destroy.
New internal functions in rte_mtr_ops callback:
1. create()
2. destroy()
The create() callback will create the corresponding flow rules on the
meter table.
The destroy() callback destroys the flow rules on the meter table.
Signed-off-by: Suanming Mou <suanmingm@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
This commit prepare the meter table and suffix table.
A flow with meter will be split to three flows. The three flows are
created on differnet tables. The packets transfer between the flows
on the tables as below:
Prefix flow -> Meter flow -> Suffix flow
Prefix flow does the user defined match and the meter action. The meter
action colors the packet and set its destination to meter table to be
processed by the meter flow.
The meter flow judges if the packet can be passed or not. If packet can
be passed, it will be transferred to the suffix table.
The suffix flow on the suffix table will apply the left user defined
actions to the packet.
The ingress egress and transfer all have the independent meter and
suffix tables.
Signed-off-by: Suanming Mou <suanmingm@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
This commit add the support of meter profile add and delete operations.
New internal functions in rte_mtr_ops callback:
1. meter_profile_add()
2. meter_profile_delete()
Only RTE_MTR_SRTCM_RFC2697 algorithm is supported and can be added. To
add other algorithm will report an error.
Signed-off-by: Suanming Mou <suanmingm@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
Meter need the metadata REG_C to have the color match between the prefix
flow and the meter flow.
As the user define or metadata feature will both use the REG_C in the
suffix flow, the color match register meter uses will not impact the
register use in the later sub flow.
Another case is that tag is add before meter flow. In this case, meter
should not touch the register the tag action is using. To avoid that
case, meter should reserve the REG_C's used by user defined MLX5_APP_TAG.
Signed-off-by: Suanming Mou <suanmingm@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
This commit add the support of fill and get the meter capabilities
from DevX.
Support items:
1. The srTCM color bind mode.
2. Meter share with multiple flows.
3. Action drop.
The color aware mode and multiple meter chaining in a flow are not
supported.
New internal function in rte_mtr_ops callback:
1. capabilities_get()
Signed-off-by: Suanming Mou <suanmingm@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>