Commit Graph

6892 Commits

Author SHA1 Message Date
David Marchand
56ea803e87 build: remove Windows export symbol list
Rather than have two files that keeps getting out of sync, let's
annotate the version.map to generate the Windows export file.

Some mlx5 symbols (haswell_broadwell_cpu, mlx5_glue, mlx5_os_*) were
only exported for Windows.
All of them are available and used by Linux too, so this patch adds
them in version.map.

Note: Existing version.map annotation achieved with:
$ for dir in lib/librte_eal drivers/common/mlx5; do
    ./buildtools/map-list-symbol.sh $dir/*.map |
    while read file version sym; do
      ! git grep -qw $sym $dir/*.def || continue;
      sed -i -e "s/$sym;/$sym; # WINDOWS_NO_EXPORT/" $dir/*.map;
    done;
  done

Signed-off-by: David Marchand <david.marchand@redhat.com>
Tested-by: Tal Shnaiderman <talshn@nvidia.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
2021-04-08 17:57:33 +02:00
David Marchand
60e0e75b61 service: clean references to removed symbol
rte_service_get_id() was removed in v17.11 but the API description
still referenced it and a version node was still present in EAL map.

Fixes: 8edc9aaaf2 ("service: use id in get by name function")
Cc: stable@dpdk.org

Signed-off-by: David Marchand <david.marchand@redhat.com>
Reviewed-by: Harry van Haaren <harry.van.haaren@intel.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
2021-04-08 17:47:18 +02:00
Renata Saiakhova
2e761ce184 eal: add synchronous interrupt unregister
Avoid race with unregister interrupt handler if interrupt
source has some active callbacks at the moment, use wrapper
around rte_intr_callback_unregister() to check for -EAGAIN
return value and to loop until rte_intr_callback_unregister()
succeeds.

Signed-off-by: Renata Saiakhova <renata.saiakhova@ekinops.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Harman Kalra <hkalra@marvell.com>
2021-04-07 11:16:11 +02:00
Roy Shterman
edf20bd8a5 mem: fix freeing segments in --huge-unlink mode
When using huge_unlink we unlink the segment right
after allocation. Although we unlink the file we keep
the fd in fd_list so file still exist just the path deleted.
When freeing the hugepage we need to close the fd and assign
it with (-1) in fd_list for the page to be released.

The current flow fails rte_malloc in the following flow when working
with --huge-unlink option:
1. alloc_seg() for segment A -
    We allocate a segment, unlink the path to the segment
    and keep the file descriptor in fd_list.
2. free_seg() for segment A -
    We clear the segment metadata and return - without closing fd
    or assigning (-1) in fd list.
3. alloc_seg() for segment A again -
    We find segment A as available, try to allocate it,
    find the old fd in fd_list try to unlink it
    as part of alloc_seg() but failed because path doesn't exist.

The impact of such error is falsely failing rte_malloc()
although we have hugepages available.

Fixes: d435aad37d ("mem: support --huge-unlink mode")
Cc: stable@dpdk.org

Signed-off-by: Roy Shterman <roy.shterman@vastdata.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
2021-04-07 11:13:45 +02:00
Thomas Monjalon
4d509afa7b pci: rename catch-all ID
The name of the constant PCI_ANY_ID was missing RTE_ prefix.
It is renamed, and the old name becomes a deprecated alias.

While renaming, the duplicate definitions in rte_bus_pci.h
are removed to keep only those in rte_pci.h.
Note: rte_pci.h is included in rte_bus_pci.h

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
2021-04-06 14:52:49 +02:00
Anatoly Burakov
190f38773a power: do not skip saving original P-state governor
Currently, when we set the pstate governor to "performance", we check if
it is already set to this value, and if it is, we skip setting it.

However, we never save this value anywhere, so that next time we come
back and request the governor to be set to its original value, the
original value is empty.

Fix it by saving the original pstate governor first. While we're at it,
replace `strlcpy` with `rte_strscpy`.

Fixes: e6c6dc0f96 ("power: add p-state driver compatibility")
Cc: stable@dpdk.org

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Reshma Pattan <reshma.pattan@intel.com>
2021-04-06 10:36:49 +02:00
Anatoly Burakov
8a5febaac4 power: fix P-state base frequency handling
Previous fix for base frequency handling in pstate mode introduced a
couple of issues:

- When base_frequency file does not exist, it simply bails out because
  of what appears to be accidental addition of FOPEN_OR_ERR_RET. This is
  incorrect, as absence of this file is not fatal and is in fact
  expected on kernel versions earlier than 5.3
- When base_frequency file does exist, it gets opened, but never gets
  closed, resulting in a resource leak

Both issues also manifest themselves as Coverity defects (dead code, and
a resource leak), so this fix addresses both.

Coverity issue: 369693, 369694
Bugzilla ID: 668
Fixes: 4db9587bbf ("power: check sysfs base frequency")

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Reshma Pattan <reshma.pattan@intel.com>
2021-04-06 10:36:42 +02:00
Marvin Liu
af584d21bf vhost: fix batch dequeue potential buffer overflow
Similar as single dequeue, the multiple accesses of descriptor length
will lead to potential risk. One-time access of descriptor length can
eliminate this risk.

Fixes: 75ed516978 ("vhost: add packed ring batch dequeue")
Cc: stable@dpdk.org

Signed-off-by: Marvin Liu <yong.liu@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2021-03-31 09:34:17 +02:00
Marvin Liu
93ed2f49de vhost: fix packed ring potential buffer overflow
Similar as split ring, the multiple accesses of descriptor length will
lead to potential risk. One-time access of descriptor length can
eliminate this risk.

Fixes: 2f3225a7d6 ("vhost: add vector filling support for packed ring")
Cc: stable@dpdk.org

Signed-off-by: Marvin Liu <yong.liu@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2021-03-31 09:34:17 +02:00
Marvin Liu
134228ca39 vhost: fix split ring potential buffer overflow
In vhost datapath, descriptor's length are mostly used in two coherent
operations. First step is used for address translation, second step is
used for memory transaction from guest to host. But the interval between
two steps will give a window for malicious guest, in which can change
descriptor length after vhost calculated buffer size. Thus may lead to
buffer overflow in vhost side. This potential risk can be eliminated by
accessing the descriptor length once.

Fixes: 1be4ebb1c4 ("vhost: support indirect descriptor in mergeable Rx")
Cc: stable@dpdk.org

Signed-off-by: Marvin Liu <yong.liu@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2021-03-31 09:34:17 +02:00
Keiichi Watanabe
790b1c3171 vhost: get negotiated protocol features
Add rte_vhost_get_negotiated_protocol_features, which returns a set of
enabled protocol features.

Signed-off-by: Keiichi Watanabe <keiichiw@chromium.org>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2021-03-31 08:15:14 +02:00
Maxime Coquelin
af4844503e vhost: optimize virtqueue structure
This patch moves vhost_virtqueue struct fields in order
to both optimize packing and move hot fields on the first
cachelines.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
Tested-by: Balazs Nemeth <bnemeth@redhat.com>
2021-03-31 07:48:32 +02:00
Maxime Coquelin
1818a63147 vhost: move dirty logging cache out of virtqueue
This patch moves the per-virtqueue's dirty logging cache
out of the virtqueue struct, by allocating it dynamically
only when live-migration is enabled.

It saves 8 cachelines in vhost_virtqueue struct.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
Tested-by: Balazs Nemeth <bnemeth@redhat.com>
2021-03-31 07:48:32 +02:00
Maxime Coquelin
2453bbf7e1 vhost: remove unused virtqueue field
This patch removes the "backend" field of the
vhost_virtqueue struct, which is not used by the
library.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
Tested-by: Balazs Nemeth <bnemeth@redhat.com>
2021-03-31 07:48:32 +02:00
Tyler Retzlaff
19cc526d6c ethdev: install driver headers
Introduce a meson option 'enable_driver_sdk', when true installs internal
driver headers for ethdev. This allows drivers that do not depend on
stable api/abi to be built external to the dpdk source tree.

Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
2021-03-30 14:46:33 +02:00
Thomas Monjalon
fb7ad441d4 ethdev: replace callback getting filter operations
Since rte_flow is the only API for filtering operations,
the legacy driver interface filter_ctrl was too much complicated
for the simple task of getting the struct rte_flow_ops.

The filter type RTE_ETH_FILTER_GENERIC and
the filter operarion RTE_ETH_FILTER_GET are removed.
The new driver callback flow_ops_get replaces filter_ctrl.

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Acked-by: Haiyue Wang <haiyue.wang@intel.com>
Acked-by: Rosen Xu <rosen.xu@intel.com>
Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Reviewed-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
2021-03-26 18:37:13 +01:00
Ivan Malov
43af98e687 ethdev: reuse VXLAN header definition in flow item
One ought to reuse existing header structs in flow items.

Signed-off-by: Ivan Malov <ivan.malov@oktetlabs.ru>
Reviewed-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Reviewed-by: Andy Moreton <amoreton@xilinx.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
2021-03-22 17:19:16 +01:00
Ivan Malov
694d6ad392 net: clarify endianness of 32-bit fields in VXLAN headers
These fields have network byte order. Highlight it using dedicated type.

Signed-off-by: Ivan Malov <ivan.malov@oktetlabs.ru>
Reviewed-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Reviewed-by: Andy Moreton <amoreton@xilinx.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
2021-03-22 17:19:16 +01:00
Ivan Malov
a56a262e34 ethdev: reuse VLAN header definition in flow item
One ought to reuse existing header structs in flow items.
This particular item contains non-header fields, so it's
important to keep the header fields in a separate struct.

Signed-off-by: Ivan Malov <ivan.malov@oktetlabs.ru>
Reviewed-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Reviewed-by: Andy Moreton <amoreton@xilinx.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
2021-03-22 17:19:16 +01:00
Ivan Malov
6f2168b69a ethdev: reuse ethernet header definition in flow item
One ought to reuse existing header structs in flow items.
This particular item contains non-header fields, so it's
important to keep the header fields in a separate struct.

Signed-off-by: Ivan Malov <ivan.malov@oktetlabs.ru>
Reviewed-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Reviewed-by: Andy Moreton <amoreton@xilinx.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
2021-03-22 17:19:16 +01:00
David Marchand
6545d6c52b telemetry: cleanup internal header
The experimental banner can be removed.
Every in-tree file is compiled with _GNU_SOURCE, so RTE_HAS_CPUSET is
unneeded for an internal header.

Fixes: 0e64ae618e ("telemetry: move init function to internal header")

Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Ciara Power <ciara.power@intel.com>
2021-03-26 17:17:45 +01:00
Dmitry Kozlyuk
b2f24588b5 mem: fix cleanup when multi-process is disabled
rte_eal_memory_detach() did not account for cases where multi-process
mode is disabled: --in-memory and --no-shconf. This resulted
in unmapping memory that had not been mapped, which caused errors:

    EAL: Could not unmap memory: No error   (Windows)
    EAL: Cannot munmap(0x1d47f40, 0x7000): Invalid argument  (Linux)

Confusing "No error" was caused by using errno instead of rte_errno
set by rte_mem_unmap().

Skip detaching memory altogether when --in-memory is specified.
Skip unmapping configuration when it's not shared.
Fix and add error handling to produce proper log messages.

Fixes: dfbc61a2f9 ("mem: detach memsegs on cleanup")

Reported-by: Jie Zhou <jizh@microsoft.com>
Suggested-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
Acked-by: Ranjit Menon <ranjit.menon@intel.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
2021-03-26 17:17:45 +01:00
Tal Shnaiderman
1325a1ffd9 eal: rename thread TLS API
Rename the key opaque pointer from rte_tls_key to
rte_thread_key to avoid confusion with transport layer security.

Also rename and remove the "_tls" term from the following
functions to avoid redundancy:

rte_thread_tls_key_create
rte_thread_tls_key_delete
rte_thread_tls_value_set
rte_thread_tls_value_get

Suggested-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>
Suggested-by: Morten Brørup <mb@smartsharesystems.com>
Signed-off-by: Tal Shnaiderman <talshn@nvidia.com>
Acked-by: Morten Brørup <mb@smartsharesystems.com>
2021-03-26 09:22:39 +01:00
Tal Shnaiderman
3d2913c67c eal: add error numbers in thread TLS API
Add error number reporting to rte_errno in all
functions in the rte_thread_tls_* API.

Suggested-by: Anatoly Burakov <anatoly.burakov@intel.com>
Signed-off-by: Tal Shnaiderman <talshn@nvidia.com>
Acked-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
2021-03-26 09:21:05 +01:00
Bruce Richardson
0e64ae618e telemetry: move init function to internal header
The rte_telemetry_init() function is for EAL use only, so can be moved to
the internal header rather than being in the public one.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Ciara Power <ciara.power@intel.com>
2021-03-25 17:35:10 +01:00
Bruce Richardson
d700251c7a telemetry: rename internal-only header file
The header file containing the legacy telemetry function prototypes was all
internal-only, so we rename the file to be an internal-only one to make it
clearer it's not for installation.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Ciara Power <ciara.power@intel.com>
2021-03-25 17:35:10 +01:00
Bruce Richardson
e34e2f55a7 telemetry: make the legacy registration function internal
The function for registration of callbacks for legacy telemetry was
documented as internal-only in the API documents, but marked as
experimental in the version.map file. Since this is an internal-only
function, for consistency we update the version mapping to have it as
internal.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Ciara Power <ciara.power@intel.com>
2021-03-25 17:35:10 +01:00
Bruce Richardson
37b881a961 telemetry: use log function from pointer
Rather than passing back an error string to the caller, take as input the
rte_log function to use, and just use regular logging.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Ciara Power <ciara.power@intel.com>
2021-03-25 17:35:10 +01:00
David Hunt
4db9587bbf power: check sysfs base frequency
Some kernels may show in incorrect value for base frequency in
sysfs (e.g. 15 GHz). This throws off the SST-BF algorithm for
high and low priority cores. So if base_frequency is greater
than max turbo frequency, ignore, and handle it as a normal
core.

Known Kernel version with issue: Linux 5.8.7

Signed-off-by: David Hunt <david.hunt@intel.com>
2021-03-24 22:04:48 +01:00
Cristian Dumitrescu
f38913b7fb pipeline: add meter array to SWX
Meter arrays are stateful objects that are updated by the data plane
and configured & monitored by the control plane. The meters implement
the RFC 2698 Two Rate Three Color Marker (trTCM) algorithm.

Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
2021-03-24 19:18:45 +01:00
Cristian Dumitrescu
64cfcebd68 pipeline: add register array to SWX
Register arrays are stateful objects that can be read & modified by
both the data plane and the control plane, as opposed to tables, which
are read-only for data plane. One key use-case is the implementation
of stats counters.

Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
2021-03-24 19:18:40 +01:00
Cristian Dumitrescu
d366a48f46 pipeline: fix instruction translation
The SWX pipeline instructions work with operands of different types:
header fields (h.header.field), packet meta-data (m.field), extern
object mailbox field (e.obj.field), extern function (f.field), action
data read from table entries (t.field), or immediate values; hence the
HMEFTI acronym.

For some pipeline instructions (add/sub, srl/shr, jmplt/jmpgt), only
the H, M and I cases were handled, while the E, F and T cases were
disregarded. This is what we fix here.

Fixes: baf7999303 ("pipeline: introduce SWX add instruction")
Fixes: c88c629438 ("pipeline: introduce SWX subtract instruction")
Fixes: b09ba6d0a3 ("pipeline: introduce SWX SHL instruction")
Fixes: e0f51638b7 ("pipeline: introduce SWX SHR instruction")
Fixes: b3947e25be ("pipeline: introduce SWX jump and return instructions")
Cc: stable@dpdk.org

Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
2021-03-24 18:54:04 +01:00
Venkata Suresh Kumar P
e2b8dc5256 port: add file descriptor SWX port
Add the file descriptor input/output port type for the SWX pipeline.
File descriptor port type provides interface with the kernel network
stack. Example file descriptor port is TAP device.

Signed-off-by: Venkata Suresh Kumar P <venkata.suresh.kumar.p@intel.com>
Signed-off-by: Churchill Khangar <churchill.khangar@intel.com>
Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
2021-03-23 19:50:44 +01:00
Cristian Dumitrescu
66440b7b22 table: add wildcard match table type
Add the widlcard match/ACL table type for the SWX pipeline, which is
used under the hood by the table instruction.

Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
Signed-off-by: Churchill Khangar <churchill.khangar@intel.com>
2021-03-23 19:47:20 +01:00
Cristian Dumitrescu
ddcce12249 table: add table entry priority
Add support for table entry priority, which is required for the
wildcard match/ACL table type.

Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
2021-03-23 18:49:41 +01:00
Cristian Dumitrescu
924d3ba019 pipeline: support non-incremental table updates
Some table types (e.g. exact match/hash) allow for incremental table
updates, while others (e.g. wildcard match/ACL) do not. The former is
already supported, the latter is enabled by this patch.

Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
2021-03-23 18:45:44 +01:00
Cristian Dumitrescu
cff9a7178e pipeline: improve table entry parsing
Improve the table entry parsing: better code structure, enable parsing
for the key field masks, allow comments and empty lines in the table
entry files.

Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
Signed-off-by: Venkata Suresh Kumar P <venkata.suresh.kumar.p@intel.com>
Signed-off-by: Churchill Khangar <churchill.khangar@intel.com>
2021-03-23 18:41:47 +01:00
Cristian Dumitrescu
d526f6e2d0 pipeline: improve table entry helpers
Improve the internal table entry helper routines for key comparison,
entry duplication and checks.

Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
2021-03-23 18:38:58 +01:00
Cristian Dumitrescu
75a09af1b4 table: fix actions with different data size
The table layer provisions an action_id and action_data_size data
bytes for each table key. This action_data_size is a maximal amount,
as some actions (depending on action_id) can require zero or less data
bytes than the maximal action_data_size. This fix allows for actions
with different data sizes to co-exist within the same table.

Fixes: d0a0096661 ("table: add exact match SWX table")
Cc: stable@dpdk.org

Signed-off-by: Churchill Khangar <churchill.khangar@intel.com>
Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
2021-03-23 17:24:52 +01:00
Cristian Dumitrescu
77a413017c port: add ring SWX port
Add the ring input/output port type for the SWX pipeline.

Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
2021-03-23 17:22:47 +01:00
Thomas Monjalon
e0473c6d5b eal: fix build with musl
In musl libc, cpu_set_t is defined only if _GNU_SOURCE is defined.
In case _GNU_SOURCE is undefined, as in eal_common_errno.c,
it was not possible to include rte_os.h which uses cpu_set_t.

This limitation is removed: if CPU_SETSIZE is not defined,
cpu_set_t related definitions and functions are skipped.
Note: such definitions are unneeded in eal_common_errno.c.

Applications which do not define _GNU_SOURCE may miss cpu_set_t related
features on musl. Such case is detected by RTE_HAS_CPUSET being undefined,
so functions which depend on rte_cpuset_t will be unavailable.

A missing include of fcntl.h is also added.

Bugzilla ID: 35
Fixes: 11b57c6980 ("eal: fix error string function")
Fixes: 176bb37ca6 ("eal: introduce internal wrappers for file operations")
Cc: stable@dpdk.org

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Signed-off-by: Natanael Copa <ncopa@alpinelinux.org>
Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Acked-by: David Marchand <david.marchand@redhat.com>
2021-03-23 08:41:05 +01:00
Thomas Monjalon
bfb42c3777 eal: fix comment of OS-specific header files
The same comment is on top of each rte_os.h file.
It is reworded to remove the mention of "future releases".

Fixes: 428eb983f5 ("eal: add OS specific header file")
Cc: stable@dpdk.org

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Acked-by: David Marchand <david.marchand@redhat.com>
2021-03-23 08:25:16 +01:00
Xueming Li
df7547a6a2 ethdev: add helper function to get representor ID
The NIC can have multiple PCIe links and can be attached to multiple
hosts, for example the same single NIC can be shared for multiple server
units in the rack. On each PCIe link NIC can provide multiple PFs and
VFs/SFs based on these ones. The full representor identifier consists of
three indices - controller index, PF index, and VF or SF index (if any).

SR-IOV and SubFunction are created on top of PF. PF index is introduced
because there might be multiple PFs in the bonding configuration and
only bonding device is probed.

In eth representor comparator callback, ethdev representor ID was
compared with devarg. Since controller index and PF index not compared,
callback returned representor from other PF or controller.

This patch adds new API to get representor ID from controller, pf and
vf/sf index. Representor comparer callback get representor ID then
compare with device representor ID.

Signed-off-by: Xueming Li <xuemingl@nvidia.com>
Reviewed-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
2021-03-17 19:12:09 +01:00
Xueming Li
85e1588ca7 ethdev: add API to get representor info
The NIC can have multiple PCIe links and can be attached to multiple
hosts, for example the same single NIC can be shared for multiple server
units in the rack. On each PCIe link NIC can provide multiple PFs and
VFs/SFs based on these ones. The full representor identifier consists of
three indices - controller index, PF index, and VF or SF index (if any).

This patch introduces a new API rte_eth_representor_info_get() to
retrieve representor corresponding info mapping:
 - caller controller index and pf index.
 - supported representor ID ranges.
 - type, controller, pf and start vf/sf ID of each range.
The API is useful to calculate representor from devargs to representor
ID.

New ethdev callback representor_info_get() is added to retrieve info
from PMD driver, optional for PMD that doesn't support new devargs
representor syntax.

Signed-off-by: Xueming Li <xuemingl@nvidia.com>
Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
2021-03-17 19:11:56 +01:00
Xueming Li
66e0ea2c98 ethdev: support multi-host in representor
The NIC can have multiple PCIe links and can be attached to the multiple
hosts, for example the same single NIC can be shared for multiple server
units in the rack. On each PCIe link NIC can provide multiple PFs and
VFs/SFs based on these ones. To provide the unambiguous identification
of the PCIe function the controller index is added. The full representor
identifier consists of three indices - controller index, PF index, and
VF or SF index (if any).

This patch introduces controller index to ethdev representor syntax,
examples:

[[c#]pf#]vf#: VF port representor/s, example: pf0vf1
[[c#]pf#]sf#: SF port representor/s, example: c1pf1sf[0-3]

c# is controller(host) ID/range in case of multi-host, optional.

For user application (e.g. OVS), PMD is responsible to interpret and
locate representor device based on controller ID, PF ID and VF/SF ID in
representor syntax.

Signed-off-by: Xueming Li <xuemingl@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
2021-03-16 20:15:29 +01:00
Xueming Li
da97592635 ethdev: support PF index in representor
With Kernel bonding, multiple underlying PFs are bonded, VFs come
from different PF, need to identify representor of VFs unambiguously by
adding PF index.

This patch introduces optional 'pf' section to representor devargs
syntax, examples:
 representor=pf0vf0             - single VF representor
 representor=pf[0-1]sf[0-1023]  - SF representors from 2 PFs

PF type representor is supported by using standalone 'pf' section:
 representor=pf1                - PF representor

Signed-off-by: Xueming Li <xuemingl@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
2021-03-16 20:15:29 +01:00
Xueming Li
9be46b4308 kvargs: support multiple lists
This patch updates kvargs parser to support value of multiple lists or
ranges:
  k1=v[1,2]v[3-5]

Signed-off-by: Xueming Li <xuemingl@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
2021-03-16 20:15:29 +01:00
Xueming Li
fa4f3fecb9 ethdev: support sub-function representor
SubFunction is a portion of the PCI device, created on demand, a SF
netdev has its own dedicated queues(txq, rxq). A SF netdev supports
eswitch representation offload similar to existing PF and VF
representors.

To support SF representor, this patch introduces new devargs syntax,
examples:
 representor=sf0               - single SubFunction representor
 representor=sf[1,3,5]         - single list
 representor=sf[0-3],          - single range
 representor=sf[0,2-6,8,10-12] - list with singles and ranges

Signed-off-by: Xueming Li <xuemingl@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
2021-03-16 20:15:29 +01:00
Xueming Li
cebf7f1715 ethdev: support new VF representor syntax
Current VF representor syntax:
 representor=2          - single representor
 representor=[0-3]      - single range

To prepare for more representor types, this patch adds compatible VF
representor devargs syntax:

vf#:
 representor=vf2          - single representor
 representor=vf[1,3,5]    - single list
 representor=vf[0-3]      - single range
 representor=vf[0,1,4-7]  - list with singles and range

For backwards compatibility, representor "#" is interpreted as "vf#".

Signed-off-by: Xueming Li <xuemingl@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
2021-03-16 20:15:29 +01:00
Xueming Li
83a675177f ethdev: refactor representor port list parsing
To the extended representor syntax which need to reuse the value parsing
function for controller and PF section, this patch refactors the port
list parsing.

Signed-off-by: Xueming Li <xuemingl@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
2021-03-16 20:15:29 +01:00