Commit Graph

7782 Commits

Author SHA1 Message Date
Stephen Hemminger
6e97b5fc1a eal: move Unix filesystem functions into one file
Both Linux and FreeBSD have same code for creating runtime
directory and reading sysfs files. Put them in the new lib/eal/unix
subdirectory.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2022-02-09 19:12:53 +01:00
Stephen Hemminger
1835a22f34 support systemd service convention for runtime directory
Systemd.exec supports configuring the runtime directory of a service
via RuntimeDirectory=. This creates the directory with the necessary
permissions which actual service may not have if running in container.

The change to DPDK is to look for the environment RUNTIME_DIRECTORY
first and use that in preference to the fallback alternatives.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Reviewed-by: Morten Brørup <mb@smartsharesystems.com>
2022-02-09 19:12:40 +01:00
Stephen Hemminger
36514d8dfa eal: remove size for setting runtime directory
The size argument to eal_set_runtime_dir is useless and was
being used incorrectly in strlcpy. It worked only because
all callers passed PATH_MAX which is same as sizeof the destination
runtime_dir.

Note: this is an internal API so no user exposed change.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Reviewed-by: Morten Brørup <mb@smartsharesystems.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
2022-02-09 16:42:31 +01:00
Srikanth Yalavarthi
f3ca33bb20 eal: add internal function to get base address
Added an internal helper to get OS-specific EAL mapping base address

This helper can be used by the drivers to program offload / accelerator
devices, where the base address can be used as a reference address by
the accelerator to access the host memory

An address can also be represented as an offset relative to the base
address using smaller data types

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
Acked-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
2022-02-08 23:59:10 +01:00
Dmitry Kozlyuk
0dff3f26d6 eal: extend --huge-unlink for hugepage file reuse
Expose Linux EAL ability to reuse existing hugepage files
via --huge-unlink=never switch.
Default behavior is unchanged, it can also be specified
using --huge-unlink=existing for consistency.
Old --huge-unlink switch is kept,
it is an alias for --huge-unlink=always.
Add a test case for the --huge-unlink=never mode.

Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
2022-02-08 21:32:53 +01:00
Dmitry Kozlyuk
32b4771cd8 eal/linux: allow hugepage file reuse
Linux EAL ensured that mapped hugepages are clean
by always mapping from newly created files:
existing hugepage backing files were always removed.
In this case, the kernel clears the page to prevent data leaks,
because the mapped memory may contain leftover data
from the previous process that was using this memory.
Clearing takes the bulk of the time spent in mmap(2),
increasing EAL initialization time.

Introduce a mode to keep existing files and reuse them
in order to speed up initial memory allocation in EAL.
Hugepages mapped from such files may contain data
left by the previous process that used this memory,
so RTE_MEMSEG_FLAG_DIRTY is set for their segments.
If multiple hugepages are mapped from the same file:
1. When fallocate(2) is used, all memory mapped from this file
   is considered dirty, because it is unknown
   which parts of the file are holes.
2. When ftruncate(3) is used, memory mapped from this file
   is considered dirty unless the file is extended
   to create a new mapping, which implies clean memory.

Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>
2022-02-08 21:32:53 +01:00
Dmitry Kozlyuk
52d7d91ed4 eal: refactor --huge-unlink storage
In preparation to extend --huge-unlink option semantics
refactor how it is stored in the internal configuration.
It makes future changes more isolated.

Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
2022-02-08 21:32:53 +01:00
Dmitry Kozlyuk
2edd037c09 mem: add dirty malloc element support
EAL malloc layer assumed all free elements content
is filled with zeros ("clean"), as opposed to uninitialized ("dirty").
This assumption was ensured in two ways:
1. EAL memalloc layer always returned clean memory.
2. Freed memory was cleared before returning into the heap.

Clearing the memory can be as slow as around 14 GiB/s.
To save doing so, memalloc layer is allowed to return dirty memory.
Such segments being marked with RTE_MEMSEG_FLAG_DIRTY.
The allocator tracks elements that contain dirty memory
using the new flag in the element header.
When clean memory is requested via rte_zmalloc*()
and the suitable element is dirty, it is cleared on allocation.
When memory is deallocated, the freed element is joined
with adjacent free elements, and the dirty flag is updated:

a) If the joint element contains dirty parts, it is dirty:

    dirty + freed + dirty = dirty  =>  no need to clean
            freed + dirty = dirty      the freed memory

   Dirty parts may be large (e.g. initial allocation),
   so clearing them could create unpredictable slowdown.

b) If the only dirty part of the joint element
   is the freed memory, the joint element can be made clean:

    clean + freed + clean = clean  =>  freed memory
    clean + freed         = clean      must be cleared
            freed + clean = clean
            freed         = clean

   This logic naturally reproduces the old behavior
   and always applies in modes when EAL memalloc layer
   returns only clean segments.

As a result, memory is either cleared on free, as before,
or it will be cleared on allocation if need be, but never twice.

Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>
2022-02-08 21:32:53 +01:00
Weiguo Li
7750099d2c eventdev: remove useless C++ include guard
This private header contains an incomplete cplusplus guard,
just remove it.

Fixes: d35e61322d ("eventdev: move inline APIs into separate structure")

Signed-off-by: Weiguo Li <liwg06@foxmail.com>
2022-02-08 17:15:47 +01:00
Weiguo Li
0ae7844fcd eal/windows: remove useless C++ include guard
Remove the incomplete cplusplus guard in internal header.

Fixes: 6e1ed4cbbe ("eal/windows: add dirent implementation")

Signed-off-by: Weiguo Li <liwg06@foxmail.com>
Acked-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
Acked-by: Pallavi Kadam <pallavi.kadam@intel.com>
2022-02-08 17:13:50 +01:00
Jie Zhou
e14f1744d6 eal: differentiate strerror message on Windows
On Windows, strerror returns just "Unknown error" for errnum greater
than MAX_ERRNO, while linux and freebsd returns "Unknown error <num>",
which is the current expectation for errno_autotest. Differentiate
the error string on Windows to remove a "duplicate error code" failure.

Signed-off-by: Jie Zhou <jizh@linux.microsoft.com>
Acked-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
2022-02-08 14:19:40 +01:00
Jie Zhou
7e71c4dce3 eal/windows: fix error code for not supported API
UT memory_autotest on Windows has 2 failed cases on EAL APIs
eal_memalloc_get_seg_fd and eal_memalloc_get_seg_fd_offset. These 2
APIs are not supported on Windows yet. Should return ENOTSUP such that
in test_memory.c these 2 ENOTSUP cases will not be marked as failures,
same as other ENOTSUP cases.

Fixes: 2a5d547a4a ("eal/windows: implement basic memory management")
Cc: stable@dpdk.org

Signed-off-by: Jie Zhou <jizh@linux.microsoft.com>
Acked-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
2022-02-08 14:19:40 +01:00
Zhihong Wang
0e4dc6af06 ring: fix overflow in memory size calculation
Parameters count and esize are both unsigned int, and their product can
legaly exceed unsigned int and lead to runtime access violation.

Fixes: cc4b218790 ("ring: support configurable element size")
Cc: stable@dpdk.org

Signed-off-by: Zhihong Wang <wangzhihong.wzh@bytedance.com>
Reviewed-by: Liang Ma <liangma@liangbit.com>
Reviewed-by: Morten Brørup <mb@smartsharesystems.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2022-02-05 18:15:33 +01:00
Robert Sanford
436e82b78c ring: update ring size doxygen comments
- Add RING_F_EXACT_SZ description to rte_ring_init and
  rte_ring_create param comments.
- Fix ring size comments.

Signed-off-by: Robert Sanford <rsanford@akamai.com>
Acked-by: Morten Brørup <mb@smartsharesystems.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2022-02-05 18:03:07 +01:00
Yunjian Wang
074717be3e ring: fix error code when creating ring
The error value returned by rte_ring_create_elem() should be positive
integers. However, if the rte_ring_get_memsize_elem() function fails,
a negative number is returned and is directly used as the return value.
As a result, this will cause the external call to check the return
value to fail(like called by rte_mempool_create()).

Fixes: a182620042 ("ring: get size in memory")
Cc: stable@dpdk.org

Reported-by: Nan Zhou <zhounan14@huawei.com>
Signed-off-by: Yunjian Wang <wangyunjian@huawei.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2022-02-05 17:53:27 +01:00
Andrzej Ostruszka
97ed4cb6fb ring: optimize corner case for enqueue/dequeue
When enqueueing/dequeueing to/from the ring we try to optimize by manual
loop unrolling.  The check for this optimization looks like:

	if (likely(idx + n < size)) {

where 'idx' points to the first usable element (empty slot for enqueue,
data for dequeue).  The correct comparison here should be '<=' instead
of '<'.

This is not a functional error since we fall back to the loop with
correct checks on indexes.  Just a minor suboptimal behaviour for the
case when we want to enqueue/dequeue exactly the number of elements that
we have in the ring before wrapping to its beginning.

Fixes: cc4b218790 ("ring: support configurable element size")
Fixes: 286bd05bf7 ("ring: optimisations")

Signed-off-by: Andrzej Ostruszka <amo@semihalf.com>
Reviewed-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Reviewed-by: Morten Brørup <mb@smartsharesystems.com>
2022-02-04 15:19:06 +01:00
Pallavi Kadam
416c1bef9d eal/windows: set worker thread affinity at init
Sometimes OS tries to switch the core. So, bind the lcore thread
to a fixed core.
Implement affinity call on Windows similar to Linux.

Signed-off-by: Qiao Liu <qiao.liu@intel.com>
Signed-off-by: Pallavi Kadam <pallavi.kadam@intel.com>
Acked-by: Narcisa Vasile <navasile@linux.microsoft.com>
Acked-by: Ranjit Menon <ranjit.menon@intel.com>
Acked-by: Tal Shnaiderman <talshn@nvidia.com>
Tested-by: Idan Hackmon <idanhac@nvidia.com>
2022-02-02 23:44:05 +01:00
Martijn Bakker
44f44d8298 pflock: fix header file installation
The generic header file was missing
in the list of files to install.

Fixes: 9667d97c25 ("pflock: add phase-fair reader writer locks")
Cc: stable@dpdk.org

Signed-off-by: Martijn Bakker <gladdyu@gmail.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
2022-02-02 14:34:11 +01:00
Maxime Coquelin
afd857cba9 vhost: use proper logging type for data path
This patch changes type from config to data for functions
called in the datapath.

Suggested-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
2022-01-27 09:46:02 +01:00
Maxime Coquelin
d84723029c vhost: differentiate IOTLB logs
Same logging messages were used for both IOTLB cache
insertion failure and IOTLB pending insertion failure.

This patch differentiate them to ease logs analysis.

Suggested-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
2022-01-27 09:45:55 +01:00
Maxime Coquelin
5b03016509 vhost: remove multi-line logs
This patch replaces multi-lines logs in multiple single-
line logs in order to ease logs filtering based on their
socket path.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
2022-01-27 09:45:49 +01:00
Maxime Coquelin
02798b0735 vhost: improve virtio-net layer logs
This patch standardizes logging done in Virtio-net, so that
the Vhost-user socket path is always prepended to the logs.
It will ease log analysis when multiple Vhost-user ports
are in use.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
2022-01-27 09:45:43 +01:00
Maxime Coquelin
c85c35b1d4 vhost: improve socket layer logs
This patch adds the Vhost socket path whenever possible in
order to make debugging possible when multiple Vhost
devices are in use. Some vhost-user layer functions are
modified to pass the device path down to the socket layer.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
2022-01-27 09:45:36 +01:00
Maxime Coquelin
b53c9d20fa vhost: improve vhost-user layer logs
This patch adds the Vhost-user socket path to Vhost-user
layer logs in order to ease logs filtering.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
2022-01-27 09:45:29 +01:00
Maxime Coquelin
ccf1ebd611 vhost: improve vhost layer logs
This patch prepends Vhost logs with the Vhost-user socket
path when available to ease filtering logs for a given port.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
2022-01-27 09:45:11 +01:00
Maxime Coquelin
843d8e3fde vhost: improve vDPA registration failure log
This patch adds name of the device failing vDPA registration.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
2022-01-27 09:45:01 +01:00
Maxime Coquelin
2d1c05c9ac vhost: improve IOTLB logs
This patch adds IOTLB mempool name when logging debug
or error messages, and also prepends the socket path.
to all the logs.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
2022-01-27 09:44:31 +01:00
Andy Pei
8974b6c730 vhost: add log when setting vring base
This patch adds log for vring related info in handling of vhost message
VHOST_USER_SET_VRING_BASE, which will be useful in live migration case.

Signed-off-by: Andy Pei <andy.pei@intel.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
2022-01-27 09:39:08 +01:00
Yunjian Wang
52b49ea06f ethdev: fix Rx queue telemetry memory leak on failure
In eth_dev_handle_port_info() allocated memory for rxq_state,
we should free it when error happens, otherwise it will lead
to memory leak.

Fixes: 58b43c1ddf ("ethdev: add telemetry endpoint for device info")
Cc: stable@dpdk.org

Signed-off-by: Yunjian Wang <wangyunjian@huawei.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
2022-01-25 10:58:32 +01:00
Ferruh Yigit
b1cb30352d ethdev: mark old macros as deprecated
Old macros kept for backward compatibility, but this cause old macro
usage to sneak in silently.

Marking old macros as deprecated. Downside is this will cause some noise
for applications that are using old macros.

Fixes: 295968d174 ("ethdev: add namespace")

Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
2022-01-13 17:56:20 +01:00
Anatoly Burakov
4042dc2037 mem: quiet base address hint warning if not requested
Any EAL memory allocation often goes through eal_get_virtual_area()
function, which will print a warning whenever the resulting allocation
didn't match the specified address requirements. This is useful for
when we have requested a specific base virtual address, to let the user
know that the mapping has deviated from that address.

However, on Linux, we also have a default base address that's there to
ensure better chances of successful secondary process initialization,
as well as higher likelihood of the virtual areas to fit inside the
IOMMU address width. Because of this default base address, there are
warnings printed even when no base address was explicitly requested,
which can be confusing to the user.

Emit this warning with debug level unless base address was explicitly
requested by the user.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2022-01-28 12:06:26 +01:00
Naga Harish K S V
6ff2363130 eventdev/eth_rx: add event port get API
This patch introduces new api for retrieving event port id
of eth rx adapter.

Signed-off-by: Naga Harish K S V <s.v.naga.harish.k@intel.com>
Acked-by: Jay Jayatheerthan <jay.jayatheerthan@intel.com>
2022-01-23 16:29:43 +01:00
Pavan Nikhilesh
de3c3a2f20 eventdev/eth_rx: fix missing internal port checks
When event delivery is through internal port, stats are maintained
by HW and we should avoid reading SW data structures for stats.
Fix missing internal port checks.

Fixes: 995b150c1a ("eventdev/eth_rx: add queue stats API")
Cc: stable@dpdk.org

Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
Acked-by: Jay Jayatheerthan <jay.jayatheerthan@intel.com>
2022-01-20 14:25:05 +01:00
Stephen Hemminger
8a5a91401d eal/linux: log hugepage create errors with filename
While debugging running DPDK service in a container, it is
useful to see which file creation failed.  Don't hide this
failure with DEBUG.

Cc: stable@dpdk.org

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
2022-01-21 15:40:58 +01:00
Bruce Richardson
9412af9106 build: make cfgfile optional
Allow disabling of the cfgfile library in builds.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Morten Brørup <mb@smartsharesystems.com>
Acked-by: David Marchand <david.marchand@redhat.com>
2022-01-21 06:09:00 -05:00
Bruce Richardson
a7d53af2ec build: make "packet framework" optional
Add port, table and pipeline libraries - collectively often known as
the "packet framework" -  to the list of optional libraries, and
ensure tests can build with them disabled.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Morten Brørup <mb@smartsharesystems.com>
Acked-by: David Marchand <david.marchand@redhat.com>
2022-01-21 06:09:00 -05:00
Bruce Richardson
6a9145ca4b build: make flow classification optional
Add the flow_classify library to the list of optional libraries, and
ensure tests can build with it disabled.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Morten Brørup <mb@smartsharesystems.com>
Acked-by: David Marchand <david.marchand@redhat.com>
2022-01-21 12:05:14 +01:00
Bruce Richardson
e46cf9ffe0 build: make node optional
Allow the 'node' library to be disabled in builds.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Morten Brørup <mb@smartsharesystems.com>
Acked-by: David Marchand <david.marchand@redhat.com>
2022-01-21 12:05:14 +01:00
Bruce Richardson
e71626995e build: allow recursive disabling of libraries
Align the code in lib/meson.build with that in drivers/meson.build to
enable recursive disabling of libraries, i.e. if library b depends on
library a, disable library b if a is disabled (either explicitly or
implicitly). This allows libraries to be optional even if other DPDK
libs depend on them, something that was not previously possible.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Morten Brørup <mb@smartsharesystems.com>
Acked-by: David Marchand <david.marchand@redhat.com>
2022-01-21 12:05:14 +01:00
Elena Agostini
c8557ed434 gpudev: add alignment for memory allocation
Similarly to rte_malloc, rte_gpu_mem_alloc accepts as
input the memory alignment size.

GPU driver should return GPU memory address aligned
with the input value.

Signed-off-by: Elena Agostini <eagostini@nvidia.com>
2022-01-21 11:33:25 +01:00
Bruce Richardson
cadb255e25 eal: add OS defines for C conditional checks
Define a set of macros in the build configuration to allow C runtime
code to check the current OS environment. This saves the user having to
use ifdefs for e.g. disabling particular tests on Windows.
See included documentation changes for usage examples.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
2022-01-17 19:26:42 +01:00
Josh Soref
7be78d0279 fix spelling in comments and strings
The tool comes from https://github.com/jsoref

Signed-off-by: Josh Soref <jsoref@gmail.com>
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
2022-01-11 12:16:53 +01:00
Viacheslav Ovsiienko
bef7c9ff28 ethdev: announce migration to generic flow modify action
The generic RTE_FLOW_ACTION_TYPE_MODIFY_FIELD action was
introduced by [1]. This action provides an unified way
to perform various arithmetic and transfer operations over
packet network header fields and packet metadata.

[1] 73b68f4c54 ("ethdev: introduce generic modify flow action")

On other side there are a bunch of multiple legacy actions,
that can be superseded by the generic MODIFY_FIELD action:

RTE_FLOW_ACTION_TYPE_OF_SET_MPLS_TTL
RTE_FLOW_ACTION_TYPE_OF_DEC_MPLS_TTL
RTE_FLOW_ACTION_TYPE_OF_SET_NW_TTL
RTE_FLOW_ACTION_TYPE_OF_DEC_NW_TTL      sfc
RTE_FLOW_ACTION_TYPE_OF_COPY_TTL_OUT
RTE_FLOW_ACTION_TYPE_OF_COPY_TTL_IN
RTE_FLOW_ACTION_TYPE_SET_IPV4_SRC       bnxt, cxgbe, mlx5
RTE_FLOW_ACTION_TYPE_SET_IPV4_DST       bnxt, cxgbe, mlx5
RTE_FLOW_ACTION_TYPE_SET_IPV6_SRC       cxgbe, mlx5
RTE_FLOW_ACTION_TYPE_SET_IPV6_DST       cxgbe, mlx5
RTE_FLOW_ACTION_TYPE_SET_TP_SRC         cxgbe, mlx5
RTE_FLOW_ACTION_TYPE_SET_TP_DST         cxgbe, mlx5
RTE_FLOW_ACTION_TYPE_DEC_TTL            mlx5, sfc
RTE_FLOW_ACTION_TYPE_SET_TTL            mlx5
RTE_FLOW_ACTION_TYPE_SET_MAC_SRC        cxgbe, mlx5
RTE_FLOW_ACTION_TYPE_SET_MAC_DST        cxgbe, mlx5
RTE_FLOW_ACTION_TYPE_INC_TCP_SEQ        mlx5
RTE_FLOW_ACTION_TYPE_DEC_TCP_SEQ        mlx5
RTE_FLOW_ACTION_TYPE_INC_TCP_ACK        mlx5
RTE_FLOW_ACTION_TYPE_DEC_TCP_ACK        mlx5
RTE_FLOW_ACTION_TYPE_SET_IPV4_DSCP      mlx5
RTE_FLOW_ACTION_TYPE_SET_IPV6_DSCP      mlx5
RTE_FLOW_ACTION_TYPE_OF_SET_VLAN_VID    bnxt, cnxk, cxgbe, enic,
                                        mlx5, octeontx2, sfc
RTE_FLOW_ACTION_TYPE_OF_SET_VLAN_PCP    bnxt, cnxk, cxgbe, enic,
                                        mlx5, octeontx2, sfc
RTE_FLOW_ACTION_TYPE_SET_TAG            mlx5
RTE_FLOW_ACTION_TYPE_SET_META           mlx5

This note deprecates the following RTE Flow actions,
as not supported by any of PMDs:

RTE_FLOW_ACTION_TYPE_OF_SET_MPLS_TTL
RTE_FLOW_ACTION_TYPE_OF_DEC_MPLS_TTL
RTE_FLOW_ACTION_TYPE_OF_SET_NW_TTL
RTE_FLOW_ACTION_TYPE_OF_COPY_TTL_OUT
RTE_FLOW_ACTION_TYPE_OF_COPY_TTL_IN

The following actions are supposed to be deprecated in 22.07
and replaced by generic field modify action:

RTE_FLOW_ACTION_TYPE_OF_DEC_NW_TTL
RTE_FLOW_ACTION_TYPE_SET_IPV4_SRC
RTE_FLOW_ACTION_TYPE_SET_IPV4_DST
RTE_FLOW_ACTION_TYPE_SET_IPV6_SRC
RTE_FLOW_ACTION_TYPE_SET_IPV6_DST
RTE_FLOW_ACTION_TYPE_SET_TP_SRC
RTE_FLOW_ACTION_TYPE_SET_TP_DST
RTE_FLOW_ACTION_TYPE_DEC_TTL
RTE_FLOW_ACTION_TYPE_SET_TTL
RTE_FLOW_ACTION_TYPE_SET_MAC_SRC
RTE_FLOW_ACTION_TYPE_SET_MAC_DST
RTE_FLOW_ACTION_TYPE_INC_TCP_SEQ
RTE_FLOW_ACTION_TYPE_DEC_TCP_SEQ
RTE_FLOW_ACTION_TYPE_INC_TCP_ACK
RTE_FLOW_ACTION_TYPE_DEC_TCP_ACK
RTE_FLOW_ACTION_TYPE_SET_IPV4_DSCP
RTE_FLOW_ACTION_TYPE_SET_IPV6_DSCP
RTE_FLOW_ACTION_TYPE_SET_TAG
RTE_FLOW_ACTION_TYPE_SET_META

The VLAN set actions are interrelated to VLAN header insertion/removal
and supported by multiple PMDs and widely used by applications and
not supposed to be deprecated due to potential large impact on
drivers and applications.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
2021-11-26 18:02:49 +01:00
Jerin Jacob
44f44b803b doc: add traffic metering API walk-through
Added a diagram to document meter library components
and added text for steps performed by the application to
configure the traffic meter and policing library.

Signed-off-by: Jerin Jacob <jerinj@marvell.com>
Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
2021-11-26 15:09:15 +01:00
Elena Agostini
579147d7aa gpudev: remove unnecessary memory barrier
Remove unnecessary rte_gpu_wmb from rte_gpu_comm_populate_list_pkts.
It causes a performance degradation in case of NVIDIA GPU V100.

This change doesn't affect any functionality as the status resides
in CPU registered memory.

Fixes: c7ebd65c13 ("gpudev: add communication list")

Signed-off-by: Elena Agostini <eagostini@nvidia.com>
2021-11-26 12:26:46 +01:00
Sean Morrissey
f8dbaebbf1 fix PMD wording
Removing the use of driver following PMD as its unnecessary.

Cc: stable@dpdk.org

Signed-off-by: Sean Morrissey <sean.morrissey@intel.com>
Signed-off-by: Conor Fogarty <conor.fogarty@intel.com>
Acked-by: John McNamara <john.mcnamara@intel.com>
Reviewed-by: Conor Walsh <conor.walsh@intel.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
2021-11-26 11:28:34 +01:00
Sean Morrissey
b53d106d34 remove repeated 'the' in the code
Remove the use of double "the" as it does not make sense.

Cc: stable@dpdk.org

Signed-off-by: Sean Morrissey <sean.morrissey@intel.com>
Signed-off-by: Conor Fogarty <conor.fogarty@intel.com>
Acked-by: John McNamara <john.mcnamara@intel.com>
Reviewed-by: Conor Walsh <conor.walsh@intel.com>
Acked-by: Somnath Kotur <somnath.kotur@broadcom.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
2021-11-26 11:28:34 +01:00
Alexander Bechikov
c0b48da45c mbuf: fix dump of dynamic fields and flags
The dump of dynamic fields and flags fails if the shm is already
allocated. Add a check to fix the issue.

Fixes: d4902ed31c ("mbuf: check shared memory before dumping dynamic space")
Cc: stable@dpdk.org

Signed-off-by: Alexander Bechikov <asb.tyum@gmail.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
2021-11-24 15:11:21 +01:00
Elena Agostini
1674c56dc3 gpudev: manage null parameters in memory functions
The gpudev functions free, register and unregister
return gracefully if input pointer is NULL or size 0,
as API doc was indicating no-op accepted values.

CUDA driver checks are removed because redundant
with the checks added in gpudev library.

Fixes: e818c4e2bf ("gpudev: add memory API")

Signed-off-by: Elena Agostini <eagostini@nvidia.com>
2021-11-24 09:38:43 +01:00
Ferruh Yigit
25cf263074 net: add macro for VLAN header length
Multiple drivers are defining macros for VLAN header length, to remove
the redundancy defining macro in the ether header.
And updated drivers to use the new macro.

Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Haiyue Wang <haiyue.wang@intel.com>
Acked-by: Rosen Xu <rosen.xu@intel.com>
Acked-by: Jiawen Wu <jiawenwu@trustnetic.com>
2021-11-17 20:17:04 +01:00
Vladimir Medvedkin
9da077adb3 hash: clarify comment for bucket entries number
This patch adds a comment for RTE_HASH_BUCKET_ENTRIES
explaining why a particular value was chosen.

Signed-off-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
2021-11-17 18:33:33 +01:00
Elena Agostini
f64b299cb3 build: make gpudev optional
This library can be made optional.
drivers/gpu and app/test-gpudev depend on this library,
so they are automatically disabled if the lib is disabled.

Signed-off-by: Elena Agostini <eagostini@nvidia.com>
2021-11-17 18:16:57 +01:00
Jiayu Hu
af4d7ad5d9 vhost: fix packed ring descriptor update in async enqueue
If the packet uses multiple descriptors and its descriptor indices are
wrapped, the first descriptor flag is not updated last, which may cause
virtio read the incomplete packet. For example, given a packet uses 64
descriptors, and virtio ring size is 256, and its descriptor indices are
224~255 and 0~31, current implementation will update 224~255 descriptor
flags earlier than 0~31 descriptor flags.

This patch fixes this issue by updating descriptor flags in one loop,
so that the first descriptor flag is always updated last.

Fixes: 873e8dad6f ("vhost: support packed ring in async datapath")

Signed-off-by: Jiayu Hu <jiayu.hu@intel.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2021-11-16 11:21:48 +01:00
David Marchand
cbff4d8dc9 build: make pdump optional
This library can be made optional.
dumpcap and pdump applications depend on this library, check for
dependencies like what we have for examples.

Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
2021-11-17 12:49:19 +01:00
David Marchand
bb9be9a45e build: make metrics libraries optional
metrics, bitratestats, jobstats and latencystats libraries can be made
optional as they provide standalone features.

Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
2021-11-17 12:48:33 +01:00
David Marchand
6970401e97 build: make GRO/GSO libraries optional
GRO and GSO integration in testpmd is relatively self contained and easy
to extract.
Those libraries can be made optional as they provide standalone
features.

Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
2021-11-17 12:48:22 +01:00
Konstantin Ananyev
b7fc82ecb0 ip_frag: add namespace
Update public macros to have RTE_IP_FRAG_ prefix.
Update DPDK components to use new names.
Keep obsolete macro for compatibility reasons.
Renamed experimental function ``rte_frag_table_del_expired_entries``to
``rte_ip_frag_table_del_expired_entries`` to comply with other public
API naming convention.

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2021-11-17 10:29:14 +01:00
Vladimir Medvedkin
fba335b4b2 hash: fix Toeplitz hash implementation
This patch fixes various issues:
- replace _mm512_set_epi8 with _mm512_set_epi32 due to the lack
  of support by some compilers (at least, gcc 8),
- check if AVX512F is supported along with GFNI, this is done if the code
  is built on a platform that supports GFNI, but does not support AVX512,
- fix compilation problems on 32bit arch due to lack of support for
  _mm_extract_epi64() by implementing XOR folding with
  _mm_extract_epi32() on 32-bit arch,

Fixes: 4fd8c4cb0d ("hash: add new Toeplitz hash implementation")

Signed-off-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>
Acked-by: Lance Richardson <lance.richardson@broadcom.com>
Acked-by: Kai Ji <kai.ji@intel.com>
2021-11-17 10:23:01 +01:00
Dmitry Kozlyuk
04f9fac660 config/x86: skip GNU binutils bug check for LLVM
AVX512 was disabled when GNU binutils were missing or had a known bug,
even if LLVM binutils were used for the build,
because binutils-avx512-check.sh was invoked regardless and failed.
In particular, this was the case for FreeBSD with clang (default).
Run the check only when GNU binutils are used.

Fixes: 68b1f1cda5 ("build: check AVX512 rather than binutils version")
Cc: stable@dpdk.org

Signed-off-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
2021-11-17 09:41:13 +01:00
Stephen Hemminger
4a6672c2d3 fix spelling in comments and doxygen
Fix spelling errors in comments including doxygen found using codespell.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Acked-by: Ray Kinsella <mdr@ashroe.eu>
Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
Acked-by: Chenbo Xia <chenbo.xia@intel.com>
2021-11-16 17:57:09 +01:00
Mattias Rönnblom
bd99189724 eventdev: negate maintenance capability flag
Replace RTE_EVENT_DEV_CAP_REQUIRES_MAINT, which signaled the need
for the application to call rte_event_maintain(), with
RTE_EVENT_DEV_CAP_MAINTENANCE_FREE, which does the opposite (i.e.,
signifies that the event device does not require maintenance).

This approach is more in line with how other eventdev hardware and/or
software limitations are handled in the Eventdev API.

Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
2021-11-15 08:22:38 +01:00
Mattias Rönnblom
572dce2bf9 eventdev/eth_rx: fix stalls on event device backpressure
In the Eventdev Ethernet RX Adapter, correctly handle the case where
the circular enqueue buffer head and last index point to the same
element.

This bug may be triggered in case there is backpressure from the event
device to the RX adapter.

Fixes: 8113fd15e2 ("eventdev/eth_rx: make enqueue buffer circular")

Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
Acked-by: Jay Jayatheerthan <jay.jayatheerthan@intel.com>
2021-11-15 08:22:38 +01:00
Naga Harish K S V
741b499e64 eventdev/eth_tx: fix queue delete logic
This patch fixes heap-use-after-free reported by ASan.

The application can use the queue_id as `-1` to delete all
the queues of the eth_device that are added to tx_adapter
instance.
In above case, the queue_del API is trying to use number of
queues from adapter level instead of eth_device queues.
When there are queues added from multiple eth devices,
it will result in heap-use-after-free as reported by ASAN.

This patch fixes the queue_del API to use correct number of
queues.

Bugzilla ID: 869
Fixes: a3bbf2e097 ("eventdev: add eth Tx adapter implementation")
Cc: stable@dpdk.org

Signed-off-by: Naga Harish K S V <s.v.naga.harish.k@intel.com>
Acked-by: Jay Jayatheerthan <jay.jayatheerthan@intel.com>
Tested-by: David Marchand <david.marchand@redhat.com>
2021-11-15 08:22:38 +01:00
David Christensen
f2a66612ee eal/ppc: support ASan
Add support for Address Sanitizer (ASan) for PPC/POWER architecture.

Signed-off-by: David Christensen <drc@linux.vnet.ibm.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
2021-11-16 11:24:22 +01:00
Ferruh Yigit
5139502783 ethdev: fix typos
Fixes: 9039c81257 ("ethdev: change promiscuous callbacks to return status")
Fixes: 12e6e3e78f ("ethdev: add API to dump device internal flow info")
Fixes: 44bf3c796b ("ethdev: support flow aging")
Cc: stable@dpdk.org

Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Ori Kam <orika@nvidia.com>
2021-11-15 17:41:21 +01:00
Dmitry Kozlyuk
64be0e779f ethdev: fix device capability to string translation
Add support for RTE_ETH_DEV_CAPA_FLOW_{RULE,SHARED_OBJECT}_KEEP
to rte_eth_dev_capability_name(), missed when adding the capabilities.

Fixes: 1d5a3d68c0 ("ethdev: add capability to keep flow rules on restart")
Fixes: 2c9cd45de7 ("ethdev: add capability to keep shared objects on restart")

Reported-by: Ali Alnubani <alialnu@nvidia.com>
Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
Acked-by: Xueming Li <xuemingl@nvidia.com>
Tested-by: Ali Alnubani <alialnu@nvidia.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
2021-11-09 19:26:19 +01:00
Jim Harris
d4fb4eb087 power: remove unused poll counter
Following the previous fix, there is nothing using the ppi counter.

We can remove the related ppi_av array in struct priority_worker.
This allows us to also remove num_dequeue_pkts_prev and pc from
struct priority_worker since they are only used in conjunction
with the ppi_av array.

Suggested-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
2021-11-12 15:43:49 +01:00
Jim Harris
0353121c33 power: fix build with clang 13
clang-13 rightfully complains that the tot_ppi variable in update_stats
is set but not used, since the final accumulated tot_ppi results isn't
used anywhere.

Fixes: 450f079131 ("power: add traffic pattern aware power control")
Cc: stable@dpdk.org

Signed-off-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
2021-11-12 15:33:35 +01:00
Volodymyr Fialko
001d402c89 eal/arm64: support ASan
This patch defines ASAN_SHADOW_OFFSET for arm64 according to the ASan
documentation. This offset should cover all arm64 VMAs supported by
ASan.

Signed-off-by: Volodymyr Fialko <vfialko@marvell.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
Acked-by: Ruifeng Wang <ruifeng.wang@arm.com>
2021-11-12 15:30:00 +01:00
David Marchand
19d024003d build: factorize jansson availability check
Since two components wants to know if the jansson library is available,
move it to config/.

Signed-off-by: David Marchand <david.marchand@redhat.com>
2021-11-10 16:23:05 +01:00
David Marchand
d6024c0a67 build: cleanup libpcap dependent components
The RTE_PORT_PCAP variable is used to signal libpcap availability,
though its name seems to refer to pcap support in the port library.
Prefer a generic name and add explicit link dependencies where needed.

Fixes: 7a944656b3 ("test/pcapng: test pcapng library")
Fixes: 2eccf6afbe ("bpf: add function to convert classic BPF to DPDK BPF")
Fixes: cbb44143be ("app/dumpcap: add new packet capture application")

Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
2021-11-10 11:42:34 +01:00
Huichao Cai
013bb504c8 ip_frag: revert fix fragmenting IPv4 fragment
The patch ("ip_frag: fix fragmenting IPv4 fragment") introduces
a bug and needs to be rolled back. This is because the patch
and variables "flag_offset" conflict with each other.

Bugzilla ID: 835
Fixes: 567473433b ("ip_frag: fix fragmenting IPv4 fragment")
Cc: stable@dpdk.org

Signed-off-by: Huichao Cai <chcchc88@163.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2021-11-08 23:32:38 +01:00
Konstantin Ananyev
060ef29dc0 ip_frag: hide internal structures
Move internal reassembly structures into new private
header 'ip_reassembly.h'.

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2021-11-08 22:37:47 +01:00
Maciej Szwed
aeed570a21 interrupt: fix request notifier interrupt processing
We should call read() on RTE_INTR_HANDLE_VFIO_REQ event
to confirm that event.

Fixes: 0eb8a1c4c7 ("vfio: add request notifier interrupt")
Cc: stable@dpdk.org

Signed-off-by: Maciej Szwed <maciej.szwed@intel.com>
2021-11-08 18:26:07 +01:00
Harman Kalra
7e2083e462 eal/linux: check interrupt file descriptor validity
This patch fixes coverity issue by adding a check for negative event fd
value.

Coverity issue: 373711, 373694
Fixes: c2bd9367e1 ("lib: remove direct access to interrupt handle")

Signed-off-by: Harman Kalra <hkalra@marvell.com>
Acked-by: David Marchand <david.marchand@redhat.com>
2021-11-08 17:32:42 +01:00
Harman Kalra
3fcca9fac6 interrupts: check file descriptor validity
This patch fixes coverity issues by adding a check for negative event
fd value.

Coverity issue: 373716, 373699, 373693, 373688
Fixes: bbbac4cd6e ("interrupts: remove direct access to interrupt handle")

Signed-off-by: Harman Kalra <hkalra@marvell.com>
Acked-by: David Marchand <david.marchand@redhat.com>
2021-11-08 17:32:42 +01:00
Elena Agostini
c7ebd65c13 gpudev: add communication list
In heterogeneous computing system, processing is not only in the CPU.
Some tasks can be delegated to devices working in parallel.
When mixing network activity with task processing there may be the need
to put in communication the CPU with the device in order to synchronize
operations.

An example could be a receive-and-process application
where CPU is responsible for receiving packets in multiple mbufs
and the GPU is responsible for processing the content of those packets.

The purpose of this list is to provide a buffer in CPU memory visible
from the GPU that can be treated as a circular buffer
to let the CPU provide fondamental info of received packets to the GPU.

A possible use-case is described below.

CPU:
- Trigger some task on the GPU
- in a loop:
    - receive a number of packets
    - provide packets info to the GPU

GPU:
- Do some pre-processing
- Wait to receive a new set of packet to be processed

Layout of a communication list would be:

     -------
    |   0    | => pkt_list
    | status |
    | #pkts  |
     -------
    |   1    | => pkt_list
    | status |
    | #pkts  |
     -------
    |   2    | => pkt_list
    | status |
    | #pkts  |
     -------
    |  ....  | => pkt_list
     -------

Signed-off-by: Elena Agostini <eagostini@nvidia.com>
2021-11-08 17:20:53 +01:00
Elena Agostini
f56160a255 gpudev: add communication flag
In heterogeneous computing system, processing is not only in the CPU.
Some tasks can be delegated to devices working in parallel.
When mixing network activity with task processing there may be the need
to put in communication the CPU with the device in order to synchronize
operations.

The purpose of this flag is to allow the CPU and the GPU to
exchange ACKs. A possible use-case is described below.

CPU:
- Trigger some task on the GPU
- Prepare some data
- Signal to the GPU the data is ready updating the communication flag

GPU:
- Do some pre-processing
- Wait for more data from the CPU polling on the communication flag
- Consume the data prepared by the CPU

Signed-off-by: Elena Agostini <eagostini@nvidia.com>
2021-11-08 17:20:53 +01:00
Elena Agostini
2d61b429cf gpudev: add memory barrier
Add a function for the application to ensure the coherency
of the writes executed by another device into the GPU memory.

Signed-off-by: Elena Agostini <eagostini@nvidia.com>
2021-11-08 17:20:53 +01:00
Elena Agostini
e818c4e2bf gpudev: add memory API
In heterogeneous computing system, processing is not only in the CPU.
Some tasks can be delegated to devices working in parallel.
Such workload distribution can be achieved by sharing some memory.

As a first step, the features are focused on memory management.
A function allows to allocate memory inside the device,
or in the main (CPU) memory while making it visible for the device.
This memory may be used to save packets or for synchronization data.

The next step should focus on GPU processing task control.

Signed-off-by: Elena Agostini <eagostini@nvidia.com>
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
2021-11-08 17:20:53 +01:00
Thomas Monjalon
a9af048aba gpudev: support multi-process
The device data shared between processes are moved in a struct
allocated in a shared memory (a new memzone for all GPUs).
The main struct rte_gpu references the shared memory
via the pointer mpshared.

The API function rte_gpu_attach() is added to attach a device
from the secondary process.
The function rte_gpu_allocate() can be used only by primary process.

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
2021-11-08 17:20:53 +01:00
Thomas Monjalon
82e5f6b658 gpudev: add child device representing a device context
The computing device may operate in some isolated contexts.
Memory and processing are isolated in a silo represented by
a child device.
The context is provided as an opaque by the caller of
rte_gpu_add_child().

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
2021-11-08 17:20:52 +01:00
Thomas Monjalon
18cb075631 gpudev: add event notification
Callback functions may be registered for a device event.
Callback management is per-process and not thread-safe.

The events RTE_GPU_EVENT_NEW and RTE_GPU_EVENT_DEL
are notified respectively after creation and before removal
of a device, as part of the library functions.
Some future events may be emitted from drivers.

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
2021-11-08 17:20:52 +01:00
Elena Agostini
8b8036a66e gpudev: introduce GPU device class library
In heterogeneous computing system, processing is not only in the CPU.
Some tasks can be delegated to devices working in parallel.

The new library gpudev is for dealing with GPGPU computing devices
from a DPDK application running on the CPU.

The infrastructure is prepared to welcome drivers in drivers/gpu/.

Signed-off-by: Elena Agostini <eagostini@nvidia.com>
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
2021-11-08 17:20:52 +01:00
Anatoly Burakov
4fd15c6af0 vfio: set errno on unsupported OS
Currently, when code is running on FreeBSD or Windows, there is no way
to distinguish between a geniune error and a "VFIO is unsupported"
error. Fix the dummy implementations to also set the rte_errno flag.

Fixes: 279b581c89 ("vfio: expose functions")
Fixes: c564a2a200 ("vfio: expose clear group function for internal usages")
Fixes: 964b2f3bfb ("vfio: export some internal functions")
Fixes: ea2dc10668 ("vfio: add multi container support")
Cc: stable@dpdk.org

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Chenbo Xia <chenbo.xia@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
2021-11-08 16:45:28 +01:00
Anatoly Burakov
da6e4cdca1 vfio: fix FreeBSD documentation
On FreeBSD, `rte_vfio_is_enabled()` and `rte_vfio_noiommu_is_enabled()`
API calls will not return error, and will instead return 0. This is
intentional, because the caller of this API does not care whether VFIO
is supported at all, and will instead be interested in whether VFIO is
enabled or not. However, the doxygen comments for these functions state
that they will return an error on FreeBSD, which is incorrect.

Fix the doxygen comment to call out the fact that these
functions are only relevant on Linux, but remove the reference to
returning errors.

Fixes: 279b581c89 ("vfio: expose functions")
Cc: stable@dpdk.org

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Chenbo Xia <chenbo.xia@intel.com>
2021-11-08 16:42:55 +01:00
Anatoly Burakov
bf8b792f3b vfio: fix FreeBSD clear group stub
On FreeBSD, `rte_vfio_clear_group()` was returning 0 even though this
function is not valid for FreeBSD, and is called out to return error in
doxygen comments.
Fix the return value to match documentation.

Fixes: c564a2a200 ("vfio: expose clear group function for internal usages")
Cc: stable@dpdk.org

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2021-11-08 16:42:44 +01:00
Anatoly Burakov
84e03bde1c vfio: drop fallback Linux implementation
Currently, VFIO support for Linux is compiled unconditionally, and
supported kernel versions start with 4.4, so VFIO is assumed to always
be enabled. There is no way of disabling VFIO support at compile time
anyway, so just drop the "VFIO not available" fallback code altogether.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Chenbo Xia <chenbo.xia@intel.com>
2021-11-08 16:27:15 +01:00
Chengwen Feng
b1f4933ef3 kni: check error code of allmulticast mode switch
Some drivers may return errcode when switch allmulticast mode,
so it's necessary to check the return code.

Fixes: b34801d1aa ("kni: support allmulticast mode set")
Cc: stable@dpdk.org

Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
Signed-off-by: Min Hu (Connor) <humin29@huawei.com>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
2021-11-08 11:56:13 +01:00
Ali Alnubani
948be25cdb sched: fix debug build
Compare pkt_len to 0 instead of NULL to avoid the following build
failure with debug mode enabled:
../lib/sched/rte_pie.h: In function 'rte_pie_enqueue_empty':
../lib/sched/rte_pie.h:125:21: error: comparison between pointer
    and integer [-Werror]
  RTE_ASSERT(pkt_len != NULL);

Bugzilla ID: 878
Fixes: 44c730b0e3 ("sched: add PIE based congestion management")

Signed-off-by: Ali Alnubani <alialnu@nvidia.com>
2021-11-07 18:52:51 +01:00
Ferruh Yigit
b7ade5d31a ethdev: fix crash on owner delete
'eth_dev->data' can be null before ethdev allocated. The API walks
through all eth devices, at least for some data can be null.

Adding 'eth_dev->data' null check before accessing it.

Fixes: 33c73aae32 ("ethdev: allow ownership operations on unused port")
Cc: stable@dpdk.org

Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Chenbo Xia <chenbo.xia@intel.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
2021-11-05 15:35:57 +01:00
Mattias Rönnblom
578402f2d5 eventdev: support device maintenance in adapters
Introduce support for event devices requiring calls to
rte_event_maintain() in the Ethernet RX, Timer and Crypto Eventdev
adapters.

Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
Tested-by: Richard Eklycke <richard.eklycke@ericsson.com>
2021-11-04 13:28:09 +01:00
Mattias Rönnblom
54f17843a8 eventdev: add port maintenance API
Extend Eventdev API to allow for event devices which require various
forms of internal processing to happen, even when events are not
enqueued to or dequeued from a port.

Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
Tested-by: Richard Eklycke <richard.eklycke@ericsson.com>
Tested-by: Liron Himi <lironh@marvell.com>
2021-11-04 13:27:54 +01:00
Naga Harish K S V
9e58318531 eventdev/eth_rx: support telemetry
Added telemetry support for rxa_queue_stats and
rxa_queue_stats_reset to get and reset rx queue
stats respectively.

Signed-off-by: Naga Harish K S V <s.v.naga.harish.k@intel.com>
Acked-by: Jay Jayatheerthan <jay.jayatheerthan@intel.com>
2021-11-04 08:41:25 +01:00
Naga Harish K S V
995b150c1a eventdev/eth_rx: add queue stats API
This patch adds new api ``rte_event_eth_rx_adapter_queue_stats_get`` to
retrieve queue stats. The queue stats are in the format
``struct rte_event_eth_rx_adapter_queue_stats``.

For resetting the queue stats,
``rte_event_eth_rx_adapter_queue_stats_reset`` api is added.

The adapter stats_get and stats_reset apis are also updated to
handle queue level event buffer use case.

Signed-off-by: Naga Harish K S V <s.v.naga.harish.k@intel.com>
Acked-by: Jay Jayatheerthan <jay.jayatheerthan@intel.com>
2021-11-04 08:41:25 +01:00
Gowrishankar Muthukrishnan
259ca6d161 security: add telemetry endpoint for capabilities
Add telemetry endpoint for cryptodev security capabilities.
Details of endpoints added in documentation.

Signed-off-by: Gowrishankar Muthukrishnan <gmuthukrishn@marvell.com>
Acked-by: Akhil Goyal <gakhil@marvell.com>
2021-11-04 19:46:27 +01:00
Raja Zidane
9ad776442d crypto/mlx5: support 1MB data-unit
Add 1MB data-unit length to the capability's bitmap.
Handle 1MB data-unit length in the mlx5 session create operation,
and expose its capability in the mlx5 capabilities.

Signed-off-by: Raja Zidane <rzidane@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2021-11-04 19:46:27 +01:00
Radu Nicolau
ff4a29d167 ipsec: support TSO
Add support for transmit segmentation offload to inline crypto processing
mode. This offload is not supported by other offload modes, as at a
minimum it requires inline crypto for IPsec to be supported on the
network interface.

Signed-off-by: Declan Doherty <declan.doherty@intel.com>
Signed-off-by: Radu Nicolau <radu.nicolau@intel.com>
Signed-off-by: Abhijit Sinha <abhijit.sinha@intel.com>
Signed-off-by: Daniel Martin Buckley <daniel.m.buckley@intel.com>
Acked-by: Fan Zhang <roy.fan.zhang@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Akhil Goyal <gakhil@marvell.com>
2021-11-04 19:46:27 +01:00
Nicolas Chautru
ba2469cddf bbdev: promote API as stable
This promotes the bbdev interface to stable.
Overdue for some time as bbdev interface has been stable.

Signed-off-by: Nicolas Chautru <nicolas.chautru@intel.com>
Acked-by: Akhil Goyal <gakhil@marvell.com>
2021-11-04 19:43:14 +01:00
Gowrishankar Muthukrishnan
1c559ee846 cryptodev: add telemetry endpoint for capabilities
Add telemetry endpoint for getting cryptodev capabilities.

Signed-off-by: Gowrishankar Muthukrishnan <gmuthukrishn@marvell.com>
Acked-by: Akhil Goyal <gakhil@marvell.com>
2021-11-04 19:43:14 +01:00
Rebecca Troy
d3d98f5ce9 cryptodev: support telemetry
The cryptodev library now registers commands with telemetry, and
implements the corresponding callback functions. These commands
allow a list of cryptodevs to be queried, as well as info and stats
for the corresponding cryptodev.

An example usage can be seen below:

Connecting to /var/run/dpdk/rte/dpdk_telemetry.v2
{"version": "DPDK 21.11.0-rc0", "pid": 1135019, "max_output_len": 16384}
--> /
{"/": ["/", "/cryptodev/info", "/cryptodev/list", "/cryptodev/stats", ...]}
--> /cryptodev/list
{"/cryptodev/list": [0,1,2,3]}
--> /cryptodev/info,0
{"/cryptodev/info": {"device_name": "0000:1c:01.0_qat_sym", \
	 "max_nb_queue_pairs": 2}}
--> /cryptodev/stats,0
{"/cryptodev/stats": {"enqueued_count": 0, "dequeued_count": 0, \
	"enqueue_err_count": 0, "dequeue_err_count": 0}}

Signed-off-by: Rebecca Troy <rebecca.troy@intel.com>
Acked-by: Ciara Power <ciara.power@intel.com>
Acked-by: Akhil Goyal <gakhil@marvell.com>
2021-11-04 19:43:14 +01:00
Gregory Etelson
de39080bd6 ethdev: fix variable length flow elements support
RTE flow API defines two flow elements types - common and PMD private.
Common RTE flow types are defined in rte_flow.h while PMD private
types exists inside specific PMD only. Application can create a flow
rule with PMD private items or actions. RTE flow API restricts
private PMD types to negative values.

Current implementation tried to use negative PMD private item type
value as index in the rte_flow_desc_item[] array.

The patch allows access to rte_flow_desc_item[] and
rte_flow_desc_action[] arrays to non-private PMD types only.

Fixes: 6cf7204733 ("ethdev: support flow elements with variable length")

Signed-off-by: Gregory Etelson <getelson@nvidia.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
2021-11-04 13:23:29 +01:00
Thomas Monjalon
285725d93b ethdev: promote device removal check function as stable
The function rte_eth_dev_is_removed() was introduced in DPDK 18.02,
and is integrated in error checks of ethdev library.

It is promoted as stable ABI.

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Ray Kinsella <mdr@ashroe.eu>
Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
2021-11-04 11:11:28 +01:00
Maxime Coquelin
ab4bb42406 vhost: rename driver callbacks struct
As previously announced, this patch renames struct
vhost_device_ops to struct rte_vhost_device_ops.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
2021-11-03 11:59:27 +01:00
Maxime Coquelin
94c16e89d7 vhost: mark vDPA driver API as internal
This patch marks the vDPA driver APIs as internal and
rename the corresponding header file to vdpa_driver.h.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
2021-11-03 09:11:34 +01:00
Dmitry Kozlyuk
2c9cd45de7 ethdev: add capability to keep shared objects on restart
rte_flow_action_handle_create() did not mention what happens
with an indirect action when a device is stopped and started again.
It is natural for some indirect actions, like counter, to be persistent.
Keeping others at least saves application time and complexity.
However, not all PMDs can support it, or the support may be limited
by particular action kinds, that is, combinations of action type
and the value of the transfer bit in its configuration.

Add a device capability to indicate if at least some indirect actions
are kept across the above sequence. Without this capability the behavior
is still unspecified, and application is required to destroy
the indirect actions before stopping the device.
In the future, indirect actions may not be the only type of objects
shared between flow rules. The capability bit intends to cover all
possible types of such objects, hence its name.

Declare that the application can test for the persistence
of a particular indirect action kind by attempting to create
an indirect action of that kind when the device is stopped
and checking for the specific error type.
This is logical because if the PMD can to create an indirect action
when the device is not started and use it after the start happens,
it is natural that it can move its internal flow shared object
to the same state when the device is stopped and restore the state
when the device is started.

Indirect action persistence across a reconfigurations is not required.
In case a PMD cannot keep the indirect actions across reconfiguration,
it is allowed just to report an error.
Application must then flush the indirect actions before attempting it.

Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
Acked-by: Ori Kam <orika@nvidia.com>
Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
2021-11-02 18:59:17 +01:00
Dmitry Kozlyuk
1d5a3d68c0 ethdev: add capability to keep flow rules on restart
Previously, it was not specified what happens to the flow rules
when the device is stopped, possibly reconfigured, then started.
If flow rules were kept, it could be convenient for application
developers, because they wouldn't need to save and restore them.
However, due to the number of flows and possible creation rate it is
impractical to save all flow rules in DPDK layer. This means that flow
rules persistence really depends on whether PMD and HW can implement it
efficiently. It can also be limited by the rule item and action types,
and its attributes transfer bit (a combination of an item/action type
and a value of the transfer bit is called a rule feature).

Add a device capability bit for PMDs that can keep at least some
of the flow rules across restart. Without this capability behavior
is still unspecified and it is declared that the application must
flush the rules before stopping the device.
Allow the application to test for persistence of rules using
a particular feature by attempting to create a flow rule
using that feature when the device is stopped
and checking for the specific error.
This is logical because if the PMD can to create the flow rule
when the device is not started and use it after the start happens,
it is natural that it can move its internal flow rule object
to the same state when the device is stopped and restore the state
when the device is started.

Rule persistence across a reconfigurations is not required,
because tracking all the rules and configuration-dependent resources
they use may be infeasible. In case a PMD cannot keep the rules
across reconfiguration, it is allowed just to report an error.
Application must then flush the rules before attempting it.

Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
Acked-by: Ori Kam <orika@nvidia.com>
Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
2021-11-02 18:59:17 +01:00
Maxime Coquelin
541053891a vhost: increase number of async IO vectors
This patch increases the number of IO vectors for the
asynchronous data path from 512 to 2048. It has been
reported during testing the starvation of IO vectors
during iperf benchmark with 64KB packet size.

As there are no direct relationship between
VHOST_MAX_ASYNC_VEC and BUF_VECTOR_MAX, this patch also
assign VHOST_MAX_ASYNC_VEC value directly instead of being
a multiple of BUF_VECTOR_MAX.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Jiayu Hu <jiayu.hu@intel.com>
2021-10-29 12:32:30 +02:00
Maxime Coquelin
816a565bf3 vhost: merge sync and async mbuf to descriptor filling
This patches merges copy_mbuf_to_desc() used by the sync
path with async_mbuf_to_desc() used by the async path.

Most of these complex functions are identical, so merging
them will make the maintenance easier.

In order not to degrade performance, the patch introduces
a boolean function parameter to specify whether it is called
in async context. This boolean is statically passed to this
always-inlined function, so the compiler will optimize this
out.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Jiayu Hu <jiayu.hu@intel.com>
2021-10-29 12:32:30 +02:00
Maxime Coquelin
b84e85e3b1 vhost: prepare sync for mbuf to descriptor refactoring
This patch extracts the descriptors buffers filling
from copy_mbuf_to_desc() into a dedicated function as a
preliminary step of merging copy_mubf_to_desc() and
async_mbuf_to_desc().

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Jiayu Hu <jiayu.hu@intel.com>
2021-10-29 12:32:30 +02:00
Maxime Coquelin
dbfa4c0bbd vhost: prepare async for mbuf to descriptor refactoring
This patch extracts the IO vectors filling from
async_mbuf_to_desc() into a dedicated function as a
preliminary step of merging copy_mubf_to_desc() and
async_mbuf_to_desc().

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Jiayu Hu <jiayu.hu@intel.com>
2021-10-29 12:32:30 +02:00
Maxime Coquelin
c75987487d vhost: simplify getting first in-flight index
This patch reworks the function getting the index
for the first packet in-flight.

When this index turns out to be zero, let's use the simple
path. Doing that avoid having to do a modulo with the
virtqueue size.

The patch also rename the function for better clarification,
and only pass the virtqueue metadata pointer, as all the
needed information are stored there.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Jiayu Hu <jiayu.hu@intel.com>
2021-10-29 12:32:30 +02:00
Maxime Coquelin
a3cfa8081f vhost: simplify async enqueue completion
vhost_poll_enqueue_completed() assumes some inflight
packets could have been completed in a previous call but
not returned to the application. But this is not the case,
since check_completed_copies callback is never called with
more than the current count as argument.

In other words, async->last_pkts_n is always 0. Removing it
greatly simplifies the function.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Jiayu Hu <jiayu.hu@intel.com>
2021-10-29 12:32:30 +02:00
Maxime Coquelin
2cbe826e26 vhost: remove notion of async descriptor
Now that IO vectors iterator have been simplified, the
rte_vhost_async_desc struct only contains a pointer on
the iterator array stored in the async metadata.

This patch removes it, and pass directly the iterators
array pointer to the transfer_data callback. Doing that,
we avoid declaring the descriptor array in the stack, and
also avoid the cost of filling it.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Jiayu Hu <jiayu.hu@intel.com>
2021-10-29 12:32:30 +02:00
Maxime Coquelin
d5d25cfd85 vhost: improve IO vector logic
IO vectors and their iterators arrays were part of the
async metadata but not their indexes.

In order to makes this more consistent, the patch adds the
indexes to the async metadata. Doing that, we can avoid
triggering DMA transfer within the loop as it IO vector
index overflow is now prevented in the async_mbuf_to_desc()
function.

Note that previous detection mechanism was broken
since the overflow already happened when detected, so OOB
memory access would already have happened.

With this changes done, virtio_dev_rx_async_submit_split()
and virtio_dev_rx_async_submit_packed() can be further
simplified.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Jiayu Hu <jiayu.hu@intel.com>
2021-10-29 12:32:30 +02:00
Maxime Coquelin
0af9f99221 vhost: remove useless fields in async iterator struct
Offset and count fields are unused and so can be removed.
The offset field was actually in the Vhost example, but
in a way that does not make sense.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
Reviewed-by: Jiayu Hu <jiayu.hu@intel.com>
2021-10-29 12:32:30 +02:00
Maxime Coquelin
6171bfbfb2 vhost: introduce specific iovec structure
This patch introduces rte_vhost_iovec struct that contains
both source and destination addresses since we always have
a 1:1 mapping between source and destination. While using
the standard iovec struct might have seemed better, having
to duplicate IO vectors and its iterators is memory
inefficient and make the implementation more complex.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
Reviewed-by: Jiayu Hu <jiayu.hu@intel.com>
2021-10-29 12:32:30 +02:00
Maxime Coquelin
8b3fc5a213 vhost: remove async batch threshold
Reaching the async batch threshold was one of the condition
to trigger the DMA transfer. However, this condition was
never met since the threshold value is 32, same as the
MAX_PKT_BURST value.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
Reviewed-by: Jiayu Hu <jiayu.hu@intel.com>
2021-10-29 12:32:30 +02:00
Maxime Coquelin
3fe629547e vhost: simplify async IO vectors iterators
This patch splits the iterator arrays in two, one for
source and one for destination. The goal is make the code
easier to understand.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
Reviewed-by: Jiayu Hu <jiayu.hu@intel.com>
2021-10-29 12:32:30 +02:00
Maxime Coquelin
97064162d4 vhost: simplify async IO vectors
IO vectors implementation is unnecessarily complex, mixing
source and destinations vectors in the same array.

This patch declares two arrays, one for the source and one
for the destination. It also gets rid of seg_awaits variable
in both packed and split implementation, which is the same
as iovec_idx.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
Reviewed-by: Jiayu Hu <jiayu.hu@intel.com>
2021-10-29 12:32:30 +02:00
Maxime Coquelin
5f89c5e1e9 vhost: hide in-flight async structure
This patch moves async_inflight_info struct to internal
header since it should not be part of the API.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
Reviewed-by: Jiayu Hu <jiayu.hu@intel.com>
2021-10-29 12:32:30 +02:00
Maxime Coquelin
ee8024b3d4 vhost: move async data in dedicated structure
This patch moves async-related metadata from vhost_virtqueue
to a dedicated struct. It makes it clear which fields are
async related, and also saves some memory when async feature
is not in use.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
Reviewed-by: Jiayu Hu <jiayu.hu@intel.com>
2021-10-29 12:32:30 +02:00
Miao Li
c6e305141a power: support missing Rx queue info
Since some vdevs like virtio and vhost do not support rxq_info_get and
queue state inquiry, the error return value -ENOTSUP need to be ignored
when queue_stopped cannot get rx queue information and rx queue state.
This patch changes the return value of queue_stopped when
rte_eth_rx_queue_info_get return -ENOTSUP to support vdevs which cannot
provide rx queue information and rx queue state enable power management.

Fixes: 209fd58545 ("power: make ethdev power management thread unsafe")
Cc: stable@dpdk.org

Signed-off-by: Miao Li <miao.li@intel.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
2021-10-29 12:32:29 +02:00
Miao Li
34fd4373ce vhost: add power monitor API
This commit defines rte_vhost_power_monitor_cond which is used to pass
some information to vhost driver. The information is including the
address to monitor, the expected value, the mask to extract value read
from 'addr', the value size of monitor address, the match flag used to
distinguish the value used to match something or not match something.

Vhost driver can use these information to fill rte_power_monitor_cond.

Signed-off-by: Miao Li <miao.li@intel.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
Acked-by: David Hunt <david.hunt@intel.com>
2021-10-29 12:32:29 +02:00
Xuan Ding
5fd6e93b7e vhost: remove async DMA map status
Async DMA map status flag was added to prevent the unnecessary unmap
when DMA devices bound to kernel driver. This brings maintenance cost
for a lot of code. This patch removes the DMA map status by using
rte_errno instead.

This patch relies on the following patch to fix a partial
unmap check in vfio unmapping API.
[1] https://www.mail-archive.com/dev@dpdk.org/msg226464.html

Signed-off-by: Xuan Ding <xuan.ding@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2021-10-29 12:32:22 +02:00
David Marchand
e7c727c307 net: fix build with sparse on L2TPv2 bitfields
An external project that wants to do additional checks on fields
endianness can remap rte_beXX types to instrumented types and use
sparse.

The current code breaks OVS build with sparse:
../../lib/ofp-packet.c: note: in included file (through
  .../ovs/dpdk-dir/build/include/rte_flow.h, ../../lib/netdev-dpdk.h,
  ../../lib/dp-packet.h):
.../ovs/dpdk-dir/build/include/rte_l2tpv2.h:92:37:
  error: invalid bitfield specifier for type restricted ovs_be16.
.../ovs/dpdk-dir/build/include/rte_l2tpv2.h:93:37:
  error: invalid bitfield specifier for type restricted ovs_be16.
.../ovs/dpdk-dir/build/include/rte_l2tpv2.h:94:40:
  error: invalid bitfield specifier for type restricted ovs_be16.
.../ovs/dpdk-dir/build/include/rte_l2tpv2.h:95:37:
  error: invalid bitfield specifier for type restricted ovs_be16.
.../ovs/dpdk-dir/build/include/rte_l2tpv2.h:96:40:
  error: invalid bitfield specifier for type restricted ovs_be16.
.../ovs/dpdk-dir/build/include/rte_l2tpv2.h:97:37:
  error: invalid bitfield specifier for type restricted ovs_be16.
.../ovs/dpdk-dir/build/include/rte_l2tpv2.h:98:37:
  error: invalid bitfield specifier for type restricted ovs_be16.
.../ovs/dpdk-dir/build/include/rte_l2tpv2.h:99:40:
  error: invalid bitfield specifier for type restricted ovs_be16.
.../ovs/dpdk-dir/build/include/rte_l2tpv2.h💯39:
  error: invalid bitfield specifier for type restricted ovs_be16.
make[3]: *** [lib/ofp-packet.lo] Error 1

Use simple uint16_t types for bitfields in L2TPv2 struct.

Fixes: 3a929df1f2 ("ethdev: support L2TPv2 and PPP procotol")

Signed-off-by: David Marchand <david.marchand@redhat.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
2021-10-28 20:28:01 +02:00
David Marchand
41f2f05574 ethdev: warn once when using port not ready
Warning continuously is a pain when developping or if a unit test
is/gets broken.

It could also be a problem if application behaves badly only in some
corner cases and a DoS results of those logs being continuously displayed.

Let's warn once per port and per rx/tx.

Getting such a log is scary, but let's make it more eye catching by
dumping a backtrace with it.

Tested by introducing a bug in testpmd:
 static int
 eth_dev_start_mp(uint16_t port_id)
 {
-       if (is_proc_primary())
+       if (!is_proc_primary())
                return rte_eth_dev_start(port_id);

        return 0;

Then, running a basic null test:
$ ./devtools/test-null.sh
...
Start automatic packet forwarding
io packet forwarding - ports=2 - cores=1 - streams=2 - NUMA support
  enabled, MP allocation mode: native
Logical Core 1 (socket 0) forwards packets on 2 streams:
  RX P=0/Q=0 (socket 0) -> TX P=1/Q=0 (socket 0) peer=02:00:00:00:00:01
  RX P=1/Q=0 (socket 0) -> TX P=0/Q=0 (socket 0) peer=02:00:00:00:00:00

lcore 0 called rx_pkt_burst for not ready port 0
8: [build/app/dpdk-testpmd() [0x59e839]]
7: [/lib64/libc.so.6(__libc_start_main+0xf5) [0x7ff481b69555]]
6: [build/app/dpdk-testpmd(main+0x54b) [0x662d24]]
5: [build/app/dpdk-testpmd(start_packet_forwarding+0x263) [0x65e795]]
4: [build/app/dpdk-testpmd() [0x65e1be]]
3: [build/app/dpdk-testpmd() [0x65a996]]
2: [build/app/dpdk-testpmd() [0xa6cbc7]]
1: [build/app/dpdk-testpmd(rte_dump_stack+0x27) [0xaee796]]
lcore 0 called rx_pkt_burst for not ready port 1
8: [build/app/dpdk-testpmd() [0x59e839]]
7: [/lib64/libc.so.6(__libc_start_main+0xf5) [0x7ff481b69555]]
6: [build/app/dpdk-testpmd(main+0x54b) [0x662d24]]
5: [build/app/dpdk-testpmd(start_packet_forwarding+0x263) [0x65e795]]
4: [build/app/dpdk-testpmd() [0x65e1be]]
3: [build/app/dpdk-testpmd() [0x65a996]]
2: [build/app/dpdk-testpmd() [0xa6cbc7]]
1: [build/app/dpdk-testpmd(rte_dump_stack+0x27) [0xaee796]]
  io packet forwarding packets/burst=32
  nb forwarding cores=1 - nb forwarding ports=2
  port 0: RX queue number: 1 Tx queue number: 1
    Rx offloads=0x0 Tx offloads=0x0

Fixes: c87d435a4d ("ethdev: copy fast-path API into separate structure")

Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Tested-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
2021-10-27 19:28:45 +02:00
Olivier Matz
9bffc92850 mem: fix dynamic hugepage mapping in container
Since its introduction in 2018, the SIGBUS handler was never registered,
and all related functions were unused.

A SIGBUS can be received by the application when accessing to hugepages
even if mmap() was successful, This happens especially when running
inside containers when there is not enough hugepages. In this case, we
need to recover. A similar scheme can be found in eal_memory.c.

Fixes: 582bed1e1d ("mem: support mapping hugepages at runtime")
Cc: stable@dpdk.org

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: David Marchand <david.marchand@redhat.com>
2021-11-05 15:28:55 +01:00
Ilyes Ben Hamouda
770d41bf33 malloc: fix allocation with unknown socket ID
When using rte_malloc() from a thread which is not bound to a numa
socket (the typical case is a control thread, but it can also happen
on a dataplane thread if its cpu affinity is on cores attached to
several sockets), the used heap is the one from numa socket 0, which
may not have available memory.

Fix this by selecting the first socket which has available memory.

Note: malloc_get_numa_socket() is only used from one .c file, so move
it there, and remove the inline keyword.

Fixes: b94580d688 ("malloc: avoid unknown socket id")
Cc: stable@dpdk.org

Signed-off-by: Ilyes Ben Hamouda <ilyes.ben_hamouda@6wind.com>
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: David Marchand <david.marchand@redhat.com>
2021-11-05 15:28:49 +01:00
David Hunt
bb0bd346d5 eal: suggest using --lcores option
If the user requests to use an lcore above 128 using -l,
the eal will exit with "EAL: invalid core list syntax" and
very little else useful information.

This patch adds some extra information suggesting to use --lcores
so that physical cores above RTE_MAX_LCORE (default 128) can be
used. This is achieved by using the --lcores option by mapping
the logical cores in the application to physical cores.

For example, if "-l 12-16,130,132" is used, we see the following
additional output on the command line:

EAL: lcore 132 >= RTE_MAX_LCORE (128)
EAL: lcore 133 >= RTE_MAX_LCORE (128)
EAL: To use high physical core ids, please use --lcores to map them
to lcore ids below RTE_MAX_LCORE,
EAL: e.g. --lcores 0@12,1@13,2@14,3@15,4@16,5@132,6@133

The same is added to -c option parsing.

For example, if "-c 0x300000000000000000000000000000000" is
used, we see the following additional output on the command line:

EAL: lcore 128 >= RTE_MAX_LCORE (128)
EAL: lcore 129 >= RTE_MAX_LCORE (128)
EAL: To use high physical core ids, please use --lcores to map them
to lcore ids below RTE_MAX_LCORE,
EAL: e.g. --lcores 0@128,1@129

Signed-off-by: David Hunt <david.hunt@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
2021-11-05 14:39:37 +01:00
David Marchand
f5fa0e110f eal: promote non-EAL lcore API as stable
This API has been around for more than a year (and is in LTS 20.11).
It did not receive negative feedback and will be used in a next OVS
release.
Mark it stable.

Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
2021-11-04 22:57:58 +01:00
Konstantin Ananyev
65d9b7c664 bpf: fix convert API when libpcap missing
rte_bpf_convert() implementation depends on libpcap.
Right now it is defined only when this library is installed and
RTE_PORT_PCAP is defined.
Fix that by providing for such case stub rte_bpf_convert()
implementation that will always return an error.
To draw user attention, if proper implementation is disabled,
warning will be thrown at meson configure stage.
Also move stub for another function (rte_bpf_elf_load) into
the same place (bpf_stub.c).

Fixes: 2eccf6afbe ("bpf: add function to convert classic BPF to DPDK BPF")

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2021-11-04 19:56:20 +01:00
Konstantin Ananyev
7b0a120157 bpf: fix doxygen comment
Fix typo in doxygen comments for rte_bpf_convert().

Fixes: 2eccf6afbe ("bpf: add function to convert classic BPF to DPDK BPF")

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2021-11-04 19:56:14 +01:00
David Marchand
54abd300d5 pipeline: remove unreachable branch
A previous change blamed it on compiler/ASan, while this is a real
(yet minor) issue.

This return -EINVAL is never reached since we test all combinations of
fidx and fcin booleans.
All branches end up with a return 0, factorize them.

Fixes: 84f5ac9418 ("pipeline: fix build with ASan")
Fixes: f38913b7fb ("pipeline: add meter array to SWX")

Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
2021-11-04 18:11:08 +01:00
Yogesh Jangra
2ce3ccbe44 pipeline: fix dead code
Fix minor dead code issue reported by Coverity.

Coverity issue: 373653
Fixes: e9d870 ("pipeline: add SWX pipeline tables")

Signed-off-by: Yogesh Jangra <yogesh.jangra@intel.com>
Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
2021-11-04 16:43:27 +01:00
Wojciech Liguzinski
44c730b0e3 sched: add PIE based congestion management
Implement PIE based congestion management based on rfc8033.

The Proportional Integral Controller Enhanced (PIE) algorithm works
by proactively dropping packets randomly.
PIE is implemented as more advanced queue management is required to
address the bufferbloat problem and provide desirable quality of
service to users.

Tests for PIE code added to test application.
Added PIE related information to documentation.

Signed-off-by: Wojciech Liguzinski <wojciechx.liguzinski@intel.com>
Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
Acked-by: Jasvinder Singh <jasvinder.singh@intel.com>
2021-11-04 15:41:49 +01:00
David Marchand
5633173341 eal/linux: fix device hotplug
The device event interrupt handler was always freed.

Bugzilla ID: 845
Fixes: c2bd9367e1 ("lib: remove direct access to interrupt handle")

Signed-off-by: David Marchand <david.marchand@redhat.com>
Tested-by: Yan Xia <yanx.xia@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2021-11-04 15:13:41 +01:00
David Marchand
4847122aab eal/linux: fix uevent message parsing
Caught with ASan:
==9727==ERROR: AddressSanitizer: stack-buffer-overflow on address
  0x7f0daa2fc0d0 at pc 0x7f0daeefacb2 bp 0x7f0daa2fadd0 sp 0x7f0daa2fa578
READ of size 1 at 0x7f0daa2fc0d0 thread T1
    #0 0x7f0daeefacb1  (/lib64/libasan.so.5+0xbacb1)
    #1 0x115eba1 in dev_uev_parse ../lib/eal/linux/eal_dev.c:167
    #2 0x115f281 in dev_uev_handler ../lib/eal/linux/eal_dev.c:248
    #3 0x1169b91 in eal_intr_process_interrupts
  ../lib/eal/linux/eal_interrupts.c:1026
    #4 0x116a3a2 in eal_intr_handle_interrupts
  ../lib/eal/linux/eal_interrupts.c:1100
    #5 0x116a7f0 in eal_intr_thread_main
  ../lib/eal/linux/eal_interrupts.c:1172
    #6 0x112640a in ctrl_thread_init
  ../lib/eal/common/eal_common_thread.c:202
    #7 0x7f0dade27159 in start_thread (/lib64/libpthread.so.0+0x8159)
    #8 0x7f0dadb58f72 in clone (/lib64/libc.so.6+0xfcf72)

Address 0x7f0daa2fc0d0 is located in stack of thread T1 at offset 4192
  in frame
    #0 0x115f0c9 in dev_uev_handler ../lib/eal/linux/eal_dev.c:226

  This frame has 2 object(s):
    [32, 48) 'uevent'
    [96, 4192) 'buf' <== Memory access at offset 4192 overflows this
  variable
HINT: this may be a false positive if your program uses some custom
  stack unwind mechanism or swapcontext
      (longjmp and C++ exceptions *are* supported)
Thread T1 created by T0 here:
    #0 0x7f0daee92ea3 in __interceptor_pthread_create
  (/lib64/libasan.so.5+0x52ea3)
    #1 0x1126542 in rte_ctrl_thread_create
  ../lib/eal/common/eal_common_thread.c:228
    #2 0x116a8b5 in rte_eal_intr_init
  ../lib/eal/linux/eal_interrupts.c:1200
    #3 0x1159dd1 in rte_eal_init ../lib/eal/linux/eal.c:1044
    #4 0x7a22f8 in main ../app/test-pmd/testpmd.c:4105
    #5 0x7f0dada7f802 in __libc_start_main (/lib64/libc.so.6+0x23802)

Bugzilla ID: 792
Fixes: 0d0f478d04 ("eal/linux: add uevent parse and process")
Cc: stable@dpdk.org

Signed-off-by: David Marchand <david.marchand@redhat.com>
Tested-by: Yan Xia <yanx.xia@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2021-11-04 15:13:41 +01:00
Jim Harris
628bac7df1 eal/linux: remove unused variable for socket memory
clang-13 rightfully complains that the total_mem variable in
eal_parse_socket_arg is set but not used, since the final
accumulated total_mem result isn't used anywhere.
So just remove the total_mem variable.

Fixes: 0a703f0f36 ("eal/linux: fix parsing zero socket memory and limits")

Signed-off-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
2021-11-04 13:27:18 +01:00
Vladimir Medvedkin
11c5b9b51a fib: add RIB extension size parameter
This patch adds a new parameter to the FIB configuration to specify
the size of the extension for internal RIB structure.

Signed-off-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>
Tested-by: Conor Walsh <conor.walsh@intel.com>
2021-11-04 12:38:03 +01:00
Xueming Li
fc382022c6 eal: fix device iterator when no bus is selected
Devargs used in device iterator initialization wasn't set to zero, random
data like bus string lead to invalid address access.

This patch initializes devargs.

Bugzilla ID: 862
Fixes: c99a2d4c6b ("eal: implement device iteration initialization")
Cc: stable@dpdk.org

Signed-off-by: Xueming Li <xuemingl@nvidia.com>
2021-11-04 11:44:49 +01:00
Vladimir Medvedkin
adeca6685f hash: fix use after free in Toeplitz hash
This patch fixes use after free in thash library, reported by ASAN.

Bugzilla ID: 868
Fixes: 28ebff11c2 ("hash: add predictable RSS")
Cc: stable@dpdk.org

Signed-off-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
2021-11-04 11:43:20 +01:00
Vladimir Medvedkin
d27e2b7e9c hash: enable GFNI Toeplitz hash implementation
This patch enables new GFNI Toeplitz hash in
predictable RSS library.

Signed-off-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2021-11-04 11:19:10 +01:00
Vladimir Medvedkin
31d7c06947 hash: add bulk Toeplitz hash implementation
This patch adds a bulk version for the Toeplitz hash implemented
with Galios Fields New Instructions (GFNI).

Signed-off-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2021-11-04 11:19:10 +01:00
Vladimir Medvedkin
4fd8c4cb0d hash: add new Toeplitz hash implementation
This patch add a new Toeplitz hash implementation using
Galios Fields New Instructions (GFNI).

Signed-off-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2021-11-04 11:19:10 +01:00
Dmitry Kozlyuk
9790fc2149 eal/freebsd: fix IOVA mode selection
FreeBSD EAL selected IOVA mode PA even in --no-huge mode
where PA are not available. Memory zones were created with IOVA
equal to RTE_BAD_IOVA with no indication this field is not usable.

Change IOVA mode detection:
1. Always allow to force --iova-mode=va.
2. In --no-huge mode, disallow forcing --iova-mode=pa, and select VA.
3. Otherwise select IOVA mode according to bus requests, default to PA.
In case contigmem is inaccessible, memory initialization will fail
with a message indicating the cause.

Fixes: c2361bab70 ("eal: compute IOVA mode based on PA availability")
Cc: stable@dpdk.org

Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
2021-11-03 18:32:19 +01:00
Feifei Wang
6b70c6b31f distributor: use wait until scheme
Instead of polling for bufptr64 to be updated, use
wait until scheme for this case.

Signed-off-by: Feifei Wang <feifei.wang2@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
2021-11-03 15:50:14 +01:00
Feifei Wang
388bee69a5 bpf: use wait until scheme for Rx/Tx iteration
Instead of polling for cbi->use to be updated, use wait until scheme.

Signed-off-by: Feifei Wang <feifei.wang2@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2021-11-03 15:50:14 +01:00
Feifei Wang
4ed4e554ac mcslock: use wait until scheme for unlock
Instead of polling for mcslock to be updated, use wait until scheme
for this case.

Signed-off-by: Feifei Wang <feifei.wang2@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
2021-11-03 15:50:14 +01:00
Feifei Wang
41902d2468 pflock: use wait until scheme for read lock
Instead of polling for read pflock update, use wait until scheme for
this case.

Signed-off-by: Feifei Wang <feifei.wang2@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
2021-11-03 15:50:14 +01:00
Feifei Wang
875f350924 eal: add a new helper for wait until scheme
Add a new generic helper which is a macro for wait until scheme.

Furthermore, to prevent compilation warning in arm:
----------------------------------------------
'warning: implicit declaration of function ...'
----------------------------------------------
Delete 'undef' constructions for '__LOAD_EXC_xx', '__SEVL' and '__WFE'.
And add ‘__RTE_ARM’ for these macros to fix the namespace.
This is because original macros are undefine at the end of the file.
If the new macro calls them in other files, they will be seen as
'not defined'.

Signed-off-by: Feifei Wang <feifei.wang2@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
2021-11-03 15:50:14 +01:00
Konstantin Ananyev
53caecb844 pdump: fix freeing statistics memzone
rte_pdump_init() always allocates new memzone for pdump_stats.
Though rte_pdump_uninit() never frees it.
So the following combination will always fail:
rte_pdump_init(); rte_pdump_uninit(); rte_pdump_init();
The issue was caught by pdump_autotest UT.
While first test run successful, any consecutive runs
of this test-case will fail.
Fix the issue by calling rte_memzone_free() for statistics memzone.

Fixes: 10f726efe2 ("pdump: support pcapng and filtering")

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Reshma Pattan <reshma.pattan@intel.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
2021-11-03 12:53:03 +01:00
Stephen Hemminger
b2be63b55a pdump: fix packet snapshot length initialization
If packet dump was enabled via pdump_enable_by_deviceid
the packet snapshot length was not being set.

Bugzilla ID: 840
Fixes: 10f726efe2 ("pdump: support pcapng and filtering")

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-11-01 00:36:29 +01:00
Stephen Hemminger
ae1702fffe pcapng: use new ethdev namespace
RTE_ prefix was added by
commit 295968d174 ("ethdev: add namespace")

Fixes: 8d23ce8f5e ("pcapng: add new library for writing pcapng files")

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-10-31 23:25:02 +01:00
Zhihong Peng
6cc51b1293 mem: instrument allocator for ASan
This patch adds necessary hooks in the memory allocator for ASan.

This feature is currently available in DPDK only on Linux x86_64.
If other OS/architectures want to support it, ASAN_SHADOW_OFFSET must be
defined and RTE_MALLOC_ASAN must be set accordingly in meson.

Signed-off-by: Xueqin Lin <xueqin.lin@intel.com>
Signed-off-by: Zhihong Peng <zhihongx.peng@intel.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
2021-10-29 16:25:03 +02:00
Zhihong Peng
84f5ac9418 pipeline: fix build with ASan
Code changes to avoid the following build error:
"Control reaches end of non-void function".

Signed-off-by: Xueqin Lin <xueqin.lin@intel.com>
Signed-off-by: Zhihong Peng <zhihongx.peng@intel.com>
Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
2021-10-29 15:25:34 +02:00
Anatoly Burakov
ab910a8068 vfio: fix partial unmap
Partial unmap support was introduced in commit c13ca4e81c
("vfio: fix DMA mapping granularity for IOVA as VA"), and with it
was added a check that dereferenced the IOMMU type to determine whether
partial ummapping is supported for currently configured IOMMU type. In
certain circumstances (such as when VFIO is supported, but no devices
were bound to the VFIO driver), the IOMMU type pointer can be NULL.

However, dereferencing of IOMMU type was guarded by access to the user
maps list - that is, we were always checking the user map list first,
and then, if we found a memory region that encloses the one we're trying
to unmap, we would have performed the IOMMU type check.

This ensured that the IOMMU type check will not cause any NULL pointer
dereferences, because in order for an IOMMU type check to have been
performed, there necessarily must have been at least one memory region
that was previously mapped successfully, and that implies having a
defined IOMMU type.

When commit 56259f7fc0 ("vfio: allow partially unmapping adjacent
memory") was introduced, the IOMMU type check was moved to
before we were traversing the user mem maps list, thereby introducing a
potential NULL dereference, because the IOMMU type access was no longer
guarded by the user mem maps list traversal.

Fix the issue by moving the IOMMU type check to after the user mem maps
traversal, thereby ensuring that by the time the check happens, the
IOMMU type is always valid.

Fixes: 56259f7fc0 ("vfio: allow partially unmapping adjacent memory")

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
Tested-by: Xuan Ding <xuan.ding@intel.com>
2021-10-28 09:51:55 +02:00
Honnappa Nagarahalli
705356f081 eal: simplify control thread creation
Remove the usage of pthread barrier and replace it with
synchronization using atomic variable.
This also removes the use of reference count required to synchronize
freeing the memory.

Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Olivier Matz <olivier.matz@6wind.com>
2021-10-25 21:43:10 +02:00
Harman Kalra
8cb5d08db9 interrupts: extend event list
Dynamically allocating the efds and elist array of intr_handle
structure, based on size provided by user. Eg size can be
MSIX interrupts supported by a PCI device.

Signed-off-by: Harman Kalra <hkalra@marvell.com>
Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
Tested-by: Raslan Darawsheh <rasland@nvidia.com>
2021-10-25 21:20:12 +02:00
Harman Kalra
99e6c7e316 interrupts: rename device specific file descriptor
VFIO/UIO are mutually exclusive, storing file descriptor in a single
field is enough.

Signed-off-by: Harman Kalra <hkalra@marvell.com>
Signed-off-by: David Marchand <david.marchand@redhat.com>
Tested-by: Raslan Darawsheh <rasland@nvidia.com>
2021-10-25 21:20:12 +02:00
Harman Kalra
73d844fd08 interrupts: make interrupt handle structure opaque
Moving interrupt handle structure definition inside a EAL private
header to make its fields totally opaque to the outside world.

Signed-off-by: Harman Kalra <hkalra@marvell.com>
Signed-off-by: David Marchand <david.marchand@redhat.com>
Tested-by: Raslan Darawsheh <rasland@nvidia.com>
2021-10-25 21:20:12 +02:00
Harman Kalra
d61138d4f0 drivers: remove direct access to interrupt handle
Removing direct access to interrupt handle structure fields,
rather use respective get set APIs for the same.
Making changes to all the drivers access the interrupt handle fields.

Signed-off-by: Harman Kalra <hkalra@marvell.com>
Acked-by: Hyong Youb Kim <hyonkim@cisco.com>
Signed-off-by: David Marchand <david.marchand@redhat.com>
Tested-by: Raslan Darawsheh <rasland@nvidia.com>
2021-10-25 21:20:12 +02:00
Harman Kalra
c2bd9367e1 lib: remove direct access to interrupt handle
Removing direct access to interrupt handle structure fields,
rather use respective get set APIs for the same.
Making changes to all the libraries access the interrupt handle fields.

Signed-off-by: Harman Kalra <hkalra@marvell.com>
Signed-off-by: David Marchand <david.marchand@redhat.com>
Tested-by: Raslan Darawsheh <rasland@nvidia.com>
2021-10-25 21:20:12 +02:00
Harman Kalra
90b13ab8d4 alarm: remove direct access to interrupt handle
Removing direct access to interrupt handle structure fields,
rather use respective get set APIs for the same.
Making changes to all the libraries access the interrupt handle fields.

Implementing alarm cleanup routine, where the memory allocated
for interrupt instance can be freed.

Signed-off-by: Harman Kalra <hkalra@marvell.com>
Signed-off-by: David Marchand <david.marchand@redhat.com>
Tested-by: Raslan Darawsheh <rasland@nvidia.com>
2021-10-25 21:20:12 +02:00
Harman Kalra
bbbac4cd6e interrupts: remove direct access to interrupt handle
Making changes to the interrupt framework to use interrupt handle
APIs to get/set any field.

Signed-off-by: Harman Kalra <hkalra@marvell.com>
Signed-off-by: David Marchand <david.marchand@redhat.com>
Tested-by: Raslan Darawsheh <rasland@nvidia.com>
2021-10-25 21:20:12 +02:00
Harman Kalra
b7c9842916 interrupts: add allocator and accessors
Prototype/Implement get set APIs for interrupt handle fields.
User won't be able to access any of the interrupt handle fields
directly while should use these get/set APIs to access/manipulate
them.

Internal interrupt header i.e. rte_eal_interrupt.h is rearranged,
as APIs defined are moved to rte_interrupts.h and epoll specific
definitions are moved to a new header rte_epoll.h.
Later in the series rte_eal_interrupt.h will be removed.

Signed-off-by: Harman Kalra <hkalra@marvell.com>
Acked-by: Ray Kinsella <mdr@ashroe.eu>
Acked-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
Signed-off-by: David Marchand <david.marchand@redhat.com>
Tested-by: Raslan Darawsheh <rasland@nvidia.com>
2021-10-25 21:20:12 +02:00
Dmitry Kozlyuk
0c8fc83a71 eal/windows: fix IOVA mode detection and handling
Windows EAL did not detect IOVA mode and worked incorrectly
if physical addresses could not be obtained
(if virt2phys driver was missing or inaccessible).
In this case, rte_mem_virt2iova() reported RTE_BAD_IOVA for any address.
Inability to obtain IOVA, be it PA or VA, should cause a failure
for the DPDK allocator, but it was hidden by the implementation,
so allocations did not fail when they should.
The mode when DPDK cannot obtain PA but can work is IOVA-as-VA mode.
However, rte_eal_iova_mode() always returned RTE_IOVA_DC
(while it should only ever return RTE_IOVA_PA or RTE_IOVA_VA),
because IOVA mode detection was not implemented.

Implement IOVA mode detection:
1. Always allow to force --iova-mode=va.
2. Allow to force --iova-mode=pa only if virt2phys is available.
3. If no mode is forced and virt2phys is available,
   select the mode according to bus requests, default to PA.
4. If no mode is forced but virt2phys is unavailable, default to VA.
Fix rte_mem_virt2iova() by returning VA when using IOVA-as-VA.
Fix rte_eal_iova_mode() by returning the selected mode.

Fixes: 2a5d547a4a ("eal/windows: implement basic memory management")
Cc: stable@dpdk.org

Reported-by: Tal Shnaiderman <talshn@nvidia.com>
Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
Tested-by: Pallavi Kadam <pallavi.kadam@intel.com>
Acked-by: Pallavi Kadam <pallavi.kadam@intel.com>
2021-10-25 20:59:40 +02:00
Harman Kalra
e6732d0d6e mem: add telemetry infos
Registering new telemetry callbacks to list named (memzones)
and unnamed (malloc) memory reserved and return information
based on arguments provided by user.

Example:
Connecting to /var/run/dpdk/rte/dpdk_telemetry.v2
{"version": "DPDK 21.11.0-rc0", "pid": 59754, "max_output_len": 16384}
Connected to application: "dpdk-testpmd"
-->
--> /eal/memzone_list
{"/eal/memzone_list": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]}
-->
-->
--> /eal/memzone_info,0
{"/eal/memzone_info": {"Zone": 0, "Name": "rte_eth_dev_data",    \
"Length": 225408, "Address": "0x13ffc0280", "Socket": 0, "Flags": 0, \
"Hugepage_size": 536870912, "Hugepage_base": "0x120000000",   \
"Hugepage_used": 1}}
-->
-->
--> /eal/memzone_info,6
{"/eal/memzone_info": {"Zone": 6, "Name": "MP_mb_pool_0_0",  \
"Length": 669918336, "Address": "0x15811db80", "Socket": 0,  \
"Flags": 0, "Hugepage_size": 536870912, "Hugepage_base": "0x140000000", \
"Hugepage_used": 2}}
-->
-->
--> /eal/memzone_info,14
{"/eal/memzone_info": null}
-->
-->
--> /eal/heap_list
{"/eal/heap_list": [0]}
-->
-->
--> /eal/heap_info,0
{"/eal/heap_info": {"Head id": 0, "Name": "socket_0",     \
"Heap_size": 1610612736, "Free_size": 927645952,          \
"Alloc_size": 682966784, "Greatest_free_size": 529153152, \
"Alloc_count": 482, "Free_count": 2}}

Signed-off-by: Harman Kalra <hkalra@marvell.com>
Acked-by: Ciara Power <ciara.power@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
2021-10-25 19:39:54 +02:00
Vladimir Medvedkin
97e2ae4c58 rib: fix IPv6 depth mask
Fixes: 03b8372a9a ("rib: fix max depth IPv6 lookup")
Cc: stable@dpdk.org

Signed-off-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>
2021-10-25 19:13:12 +02:00
Vladimir Medvedkin
b16ac53657 lpm6: fix buffer overflow
This patch fixes buffer overflow reported by ASAN,
please reference https://bugs.dpdk.org/show_bug.cgi?id=819

The rte_lpm6 keeps routing information for control plane purpose
inside the rte_hash table which uses rte_jhash() as a hash function.
From the rte_jhash() documentation: If input key is not aligned to
four byte boundaries or a multiple of four bytes in length,
the memory region just after may be read (but not used in the
computation).
rte_lpm6 uses 17 bytes keys consisting of IPv6 address (16 bytes) +
depth (1 byte).

This patch increases the size of the depth field up to uint32_t
and sets the alignment to 4 bytes.

Bugzilla ID: 819
Fixes: 86b3b21952 ("lpm6: store rules in hash table")
Cc: stable@dpdk.org

Signed-off-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
2021-10-25 19:08:16 +02:00
Vladimir Medvedkin
45523f494c hash: fix Doxygen comment of Toeplitz file
Fixes: 7574c3ef74 ("hash: add toeplitz algorithm used by RSS")
Cc: stable@dpdk.org

Signed-off-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>
Acked-by: John McNamara <john.mcnamara@intel.com>
2021-10-25 19:06:07 +02:00
Honnappa Nagarahalli
3596537005 eal: fix memory ordering around lcore task accesses
Ensure that the memory operations before the call to
rte_eal_remote_launch are visible to the worker thread.
Use the function pointer to execute in worker thread
as the guard variable.

Ensure that the memory operations in worker thread, that happen
before it returns the status of the assigned function, are
visible to the main thread. Use the variable containing the
lcore's state as the guard variable.

Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com>
Reviewed-by: Feifei Wang <feifei.wang2@arm.com>
2021-10-25 18:20:59 +02:00
Honnappa Nagarahalli
f6c6c686f1 eal: remove FINISHED lcore state
FINISHED state seems to be used to indicate that the worker's update
of the 'state' is not visible to other threads. There seems to be no
requirement to have such a state.

Since the FINISHED state is removed, the API rte_eal_wait_lcore
is updated to always return the status of the last function that
ran in the worker core.

Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com>
Reviewed-by: Feifei Wang <feifei.wang2@arm.com>
2021-10-25 18:20:59 +02:00
Honnappa Nagarahalli
33969e9c61 eal: reset lcore task callback and argument
In the rte_eal_remote_launch function, the lcore function
pointer is checked for NULL. However, the pointer is never
reset to NULL. Reset the lcore function pointer and argument
after the worker has completed executing the lcore function.

Fixes: af75078fec ("first public release")
Cc: stable@dpdk.org

Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Feifei Wang <feifei.wang2@arm.com>
2021-10-25 18:20:59 +02:00
Eli Britstein
6de430b707 eal/x86: avoid cast-align warning in memcpy functions
Functions and macros in x86 rte_memcpy.h may cause cast-align warnings,
when using strict cast align flag with supporting gcc:
gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
CFLAGS="-Wcast-align=strict" make V=1 -C examples/l2fwd clean static

For example:
In file included from main.c:24:
/dpdk/build/include/rte_memcpy.h: In function 'rte_mov16':
/dpdk/build/include/rte_memcpy.h:306:25: warning: cast increases
required alignment of target type [-Wcast-align]
  306 |  xmm0 = _mm_loadu_si128((const __m128i *)src);
      |                         ^

As the code assumes correct alignment, add first a (void *) or (const
void *) castings, to avoid the warnings.

Fixes: 9484092baa ("eal/x86: optimize memcpy for AVX512 platforms")
Cc: stable@dpdk.org

Signed-off-by: Eli Britstein <elibr@nvidia.com>
2021-10-25 17:28:12 +02:00
Eli Britstein
da0333c879 mbuf: avoid cast-align warning in data offset macro
In rte_pktmbuf_mtod_offset macro, there is a casting from char * to type
't', which may cause cast-align warning when using strict cast align
flag with supporting gcc:
gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
CFLAGS="-Wcast-align=strict" make V=1 -C examples/l2fwd clean static

main.c: In function 'l2fwd_mac_updating':
/dpdk/build/include/rte_mbuf_core.h:719:3: warning: cast increases
required alignment of target type [-Wcast-align]
  719 |  ((t)((char *)(m)->buf_addr + (m)->data_off + (o)))
      |   ^
/dpdk/build/include/rte_mbuf_core.h:733:32: note: in expansion of macro
'rte_pktmbuf_mtod_offset'
  733 | #define rte_pktmbuf_mtod(m, t) rte_pktmbuf_mtod_offset(m, t, 0)
      |                                ^~~~~~~~~~~~~~~~~~~~~~~

As the code assumes correct alignment, add first a (void *) casting, to
avoid the warning.

Fixes: af75078fec ("first public release")
Cc: stable@dpdk.org

Signed-off-by: Eli Britstein <elibr@nvidia.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
2021-10-25 17:27:48 +02:00
Eli Britstein
a3f8d05871 net: avoid cast-align warning in VLAN insert function
In rte_vlan_insert there is a casting of rte_pktmbuf_prepend returned
value to (struct rte_ether_hdr *), which causes cast-align warning when
using strict cast align flag with supporting gcc:
gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
CFLAGS="-Wcast-align=strict" make V=1 -C examples/l2fwd clean static

In file included from main.c:35:
/dpdk/build/include/rte_ether.h:370:7: warning: cast increases required
alignment of target type [-Wcast-align]
  370 |  nh = (struct rte_ether_hdr *)
      |       ^

As the code assumes correct alignment, add first a (void *) casting, to
avoid the warning.

Fixes: c974021a59 ("ether: add soft vlan encap/decap")
Cc: stable@dpdk.org

Signed-off-by: Eli Britstein <elibr@nvidia.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
2021-10-25 17:27:17 +02:00
Dmitry Kozlyuk
6fda3ff6f0 mempool: fix non-IO flag inference
When mempool had been created with RTE_MEMPOOL_F_NO_IOVA_CONTIG flag
but later populated with valid IOVA, RTE_MEMPOOL_F_NON_IO was unset,
while it should be kept. The unit test did not catch this
because rte_mempool_populate_default() it used was populating
with RTE_BAD_IOVA.

Keep setting RTE_MEMPOOL_NON_IO at an empty mempool creation
and add an assert for it in the unit test (remove the separate case).
Do not reset the flag if RTE_MEMPOOL_F_ON_IOVA_CONTIG is set.

Fixes: 11541c5c81 ("mempool: add non-IO flag")

Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
2021-10-25 16:52:56 +02:00
Jasvinder Singh
fd9e07a1f4 sched: promote a function as stable
This API was introduced in 18.05, therefore removing
experimental tag to promote it to stable state

Signed-off-by: Jasvinder Singh <jasvinder.singh@intel.com>
Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
Acked-by: Ray Kinsella <mdr@ashroe.eu>
2021-10-25 15:14:22 +02:00
Yogesh Jangra
cd79e02058 pipeline: support action annotations
Enable restricting the scope of an action to regular table entries or
to the table default entry in order to support the P4 language
tableonly or defaultonly annotations.

Signed-off-by: Yogesh Jangra <yogesh.jangra@intel.com>
Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
2021-10-25 14:53:28 +02:00
Yogesh Jangra
0317c4521d port: configure loop count for source port
Add support for configurable number of loops through the input PCAP
file for the source port. Added an additional parameter to source
port CLI command.

Signed-off-by: Yogesh Jangra <yogesh.jangra@intel.com>
Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
2021-10-25 14:30:32 +02:00
Yogesh Jangra
55095ccb7f pipeline: fix instruction label check
The instruction_data array was incorrectly indexed, which resulted in
the array index getting out of bounds and sometimes segfault.

Fixes: a1711f (“pipeline: add SWX Rx and extract instructions“)
Cc: stable@dpdk.org

Signed-off-by: Yogesh Jangra <yogesh.jangra@intel.com>
Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
2021-10-25 14:06:02 +02:00
David Marchand
e0d3a74d92 net: fix build with pedantic for L2TPv2 definitions
Build is broken on RHEL7 following introduction of this new protocol.

Fixes: 3a929df1f2 ("ethdev: support L2TPv2 and PPP procotol")

Signed-off-by: David Marchand <david.marchand@redhat.com>
Tested-by: Raslan Darawsheh <rasland@nvidia.com>
2021-10-25 09:33:15 +02:00
Olivier Matz
daa02b5cdd mbuf: add namespace to offload flags
Fix the mbuf offload flags namespace by adding an RTE_ prefix to the
name. The old flags remain usable, but a deprecation warning is issued
at compilation.

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Acked-by: Somnath Kotur <somnath.kotur@broadcom.com>
2021-10-24 13:37:43 +02:00
Olivier Matz
5b63493241 mbuf: mark old VLAN offload flags as deprecated
The flags PKT_TX_VLAN_PKT and PKT_TX_QINQ_PKT are
marked as deprecated since commit 380a7aab1a ("mbuf: rename deprecated
VLAN flags") (2017). But they were not using the RTE_DEPRECATED
macro, because it did not exist at this time. Add it, and replace
usage of these flags.

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
2021-10-24 13:30:40 +02:00
Olivier Matz
0c03660db1 mbuf: remove duplicate definition of cksum offload flags
The flags PKT_RX_L4_CKSUM_BAD and PKT_RX_IP_CKSUM_BAD are defined
twice with the same value. Remove one of the occurrence, which was
marked as "deprecated".

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
2021-10-24 13:30:40 +02:00
Radu Nicolau
74176aec37 ipsec: fix telemetry text
Set correct tunnel type telemetry text - tunnel type
was wrongly set as IPv4-UDP for all types.

Fixes: bf5b65a8e781 ("ipsec: support SA telemetry")

Signed-off-by: Radu Nicolau <radu.nicolau@intel.com>
Acked-by: Akhil Goyal <gakhil@marvell.com>
2021-10-20 15:55:37 +02:00
Akhil Goyal
92cb130919 cryptodev: move device-specific structures
The device specific structures - rte_cryptodev
and rte_cryptodev_data are moved to cryptodev_pmd.h
to hide it from the applications.

Signed-off-by: Akhil Goyal <gakhil@marvell.com>
Tested-by: Rebecca Troy <rebecca.troy@intel.com>
Acked-by: Fan Zhang <roy.fan.zhang@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2021-10-20 15:33:16 +02:00
Akhil Goyal
f6849cdcc6 cryptodev: use new flat array in fast path API
Rework fast-path cryptodev functions to use rte_crypto_fp_ops[].
While it is an API/ABI breakage, this change is intended to be
transparent for both users (no changes in user app is required) and
PMD developers (no changes in PMD is required).

Signed-off-by: Akhil Goyal <gakhil@marvell.com>
Acked-by: Fan Zhang <roy.fan.zhang@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2021-10-20 15:33:16 +02:00
Akhil Goyal
33cd3fd52f cryptodev: add device probing finish function
Added a rte_cryptodev_pmd_probing_finish API which
need to be called by the PMD after the device is initialized
completely. This will set the fast path function pointers
in the flat array for secondary process. For primary process,
these are set in rte_cryptodev_start.

Signed-off-by: Akhil Goyal <gakhil@marvell.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Fan Zhang <roy.fan.zhang@intel.com>
2021-10-20 15:33:16 +02:00
Akhil Goyal
2fd66f758f cryptodev: move inline APIs into separate structure
Move fastpath inline function pointers from rte_cryptodev into a
separate structure accessed via a flat array.
The intention is to make rte_cryptodev and related structures private
to avoid future API/ABI breakages.

Signed-off-by: Akhil Goyal <gakhil@marvell.com>
Tested-by: Rebecca Troy <rebecca.troy@intel.com>
Acked-by: Fan Zhang <roy.fan.zhang@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2021-10-20 15:33:16 +02:00
Akhil Goyal
7f3876ad54 cryptodev: allocate max space for internal queue array
At queue_pair config stage, allocate memory for maximum
number of queue pair pointers that a device can support.

This will allow fast path APIs(enqueue_burst/dequeue_burst) to
refer pointer to internal QP data without checking for currently
configured QPs.
This is required to hide the rte_cryptodev and rte_cryptodev_data
structure from user.

Signed-off-by: Akhil Goyal <gakhil@marvell.com>
Acked-by: Fan Zhang <roy.fan.zhang@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2021-10-20 15:33:16 +02:00
Akhil Goyal
691e1f4d56 cryptodev: separate out internal structures
A new header file rte_cryptodev_core.h is added and all
internal data structures which need not be exposed directly to
application are moved to this file. These structures are mostly
used by drivers, but they need to be in the public header file
as they are accessed by datapath inline functions for
performance reasons.

Signed-off-by: Akhil Goyal <gakhil@marvell.com>
Tested-by: Rebecca Troy <rebecca.troy@intel.com>
Acked-by: Fan Zhang <roy.fan.zhang@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2021-10-20 15:33:16 +02:00
Andrew Rybchenko
68e8ca7b59 ethdev: avoid usage of ULL for 64-bit unsigned constants
Use UINT64_C() macro instead.

Signed-off-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
2021-10-22 19:11:35 +02:00
Andrew Rybchenko
4852c647d1 ethdev: replace single bit masks with macros
The macros RTE_BIT32 and RTE_BIT64 are used to replace single bit masks.

Do not switch VLAN offload flags since type is not fixed size.

Signed-off-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
2021-10-22 18:36:34 +02:00
Ferruh Yigit
295968d174 ethdev: add namespace
Add 'RTE_ETH' namespace to all enums & macros in a backward compatible
way. The macros for backward compatibility can be removed in next LTS.
Also updated some struct names to have 'rte_eth' prefix.

All internal components switched to using new names.

Syntax fixed on lines that this patch touches.

Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
Acked-by: Wisam Jaddo <wisamm@nvidia.com>
Acked-by: Rosen Xu <rosen.xu@intel.com>
Acked-by: Chenbo Xia <chenbo.xia@intel.com>
Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Acked-by: Somnath Kotur <somnath.kotur@broadcom.com>
2021-10-22 18:15:38 +02:00
Ivan Ilchenko
b26bee10ee ethdev: forbid MTU set before device configure
rte_eth_dev_configure() always sets MTU to either dev_conf.rxmode.mtu
or RTE_ETHER_MTU if application doesn't provide the value.
So, there is no point to allow rte_eth_dev_set_mtu() before since
set value will be overwritten on configure anyway.

Fixes: 1bb4a528c4 ("ethdev: fix max Rx packet length")

Signed-off-by: Ivan Ilchenko <ivan.ilchenko@oktetlabs.ru>
Signed-off-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
2021-10-22 15:26:54 +02:00
Andrew Rybchenko
9ce1717d3e ethdev: remove unused L2 tunnel mask defines
Fixes: cf47acc0f9 ("ethdev: remove L2 tunnel offload control API")
Cc: stable@dpdk.org

Signed-off-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
2021-10-22 12:03:52 +02:00
Xueming Li
93e441c9a0 ethdev: get device capability name as string
This patch adds API to return name of device capability.

Signed-off-by: Xueming Li <xuemingl@nvidia.com>
Reviewed-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
2021-10-22 00:08:57 +02:00
Xueming Li
dd22740cc2 ethdev: introduce shared Rx queue
In current DPDK framework, each Rx queue is pre-loaded with mbufs to
save incoming packets. For some PMDs, when number of representors scale
out in a switch domain, the memory consumption became significant.
Polling all ports also leads to high cache miss, high latency and low
throughput.

This patch introduces shared Rx queue. Ports in same Rx domain and
switch domain could share Rx queue set by specifying non-zero sharing
group in Rx queue configuration.

Shared Rx queue is identified by share_rxq field of Rx queue
configuration. Port A RxQ X can share RxQ with Port B RxQ Y by using
same shared Rx queue ID.

No special API is defined to receive packets from shared Rx queue.
Polling any member port of a shared Rx queue receives packets of that
queue for all member ports, port_id is identified by mbuf->port. PMD is
responsible to resolve shared Rx queue from device and queue data.

Shared Rx queue must be polled in same thread or core, polling a queue
ID of any member port is essentially same.

Multiple share groups are supported. PMD should support mixed
configuration by allowing multiple share groups and non-shared Rx queue
on one port.

Example grouping and polling model to reflect service priority:
 Group1, 2 shared Rx queues per port: PF, rep0, rep1
 Group2, 1 shared Rx queue per port: rep2, rep3, ... rep127
 Core0: poll PF queue0
 Core1: poll PF queue1
 Core2: poll rep2 queue0

PMD advertise shared Rx queue capability via RTE_ETH_DEV_CAPA_RXQ_SHARE.

PMD is responsible for shared Rx queue consistency checks to avoid
member port's configuration contradict each other.

Signed-off-by: Xueming Li <xuemingl@nvidia.com>
Reviewed-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
2021-10-22 00:08:50 +02:00
Huisong Li
17faaed854 ethdev: fix PCI device release in secondary process
In secondary process, rte_eth_dev_close() doesn't clear eth_dev->data.
If calling rte_dev_remove() after rte_eth_dev_close(), in
rte_eth_dev_pci_generic_remove() function, the released eth device still
can be found by its name in shared memory. As a result, the eth device
will be released repeatedly. The state of the eth device is modified to
RTE_ETH_DEV_UNUSED after rte_eth_dev_close(). So this state can be used
to avoid this problem.

Fixes: dcd5c8112b ("ethdev: add PCI driver helpers")
Cc: stable@dpdk.org

Signed-off-by: Huisong Li <lihuisong@huawei.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
2021-10-21 23:15:34 +02:00
Xuan Ding
7c61fa08b7 vhost: enable IOMMU for async vhost
The use of IOMMU has many advantages, such as isolation and address
translation. This patch extends the capability of DMA engine to use
IOMMU if the DMA engine is bound to vfio.

When set memory table, the guest memory will be mapped
into the default container of DPDK.

Signed-off-by: Xuan Ding <xuan.ding@intel.com>
Tested-by: Yvonne Yang <yvonnex.yang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2021-10-21 14:24:21 +02:00
Xuan Ding
56259f7fc0 vfio: allow partially unmapping adjacent memory
Currently, if we map a memory area A, then map a separate memory area B
that by coincidence happens to be adjacent to A, current implementation
will merge these two segments into one, and if partial unmapping is not
supported, these segments will then be only allowed to be unmapped in
one go. In other words, given segments A and B that are adjacent, it
is currently not possible to map A, then map B, then unmap A.

Fix this by adding a notion of "chunk size", which will allow
subdividing segments into equally sized segments whenever we are dealing
with an IOMMU that does not support partial unmapping. With this change,
we will still be able to merge adjacent segments, but only if they are
of the same size. If we keep with our above example, adjacent segments A
and B will be stored as separate segments if they are of different
sizes.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Signed-off-by: Xuan Ding <xuan.ding@intel.com>
Tested-by: Yvonne Yang <yvonnex.yang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2021-10-21 14:24:21 +02:00
Li Feng
5a4fbe79e6 vhost: add sanity check on inflight last index
The index in rte_vhost_set_last_inflight_io_split is from
the frontend driver, check if it's in the virtqueue range.

Fixes: bb0c2de960 ("vhost: add APIs to operate inflight ring")
Cc: stable@dpdk.org

Signed-off-by: Li Feng <fengli@smartx.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2021-10-21 14:24:21 +02:00
Jie Wang
3a929df1f2 ethdev: support L2TPv2 and PPP procotol
Added flow pattern items and header formats of L2TPv2 and PPP.

Signed-off-by: Wenjun Wu <wenjun1.wu@intel.com>
Signed-off-by: Jie Wang <jie1x.wang@intel.com>
Acked-by: Ori Kam <orika@nvidia.com>
Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
2021-10-21 14:15:59 +02:00
Andrew Rybchenko
55645ee65b ethdev: remove full stop after short comments
Full stop at the end of short comment just make line longer. It
should be either everywhere or nowhere to be consistent.

Signed-off-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
2021-10-21 13:43:56 +02:00
Andrew Rybchenko
cc0a644450 ethdev: make device and data structures readable
Add empty lines to separate fields commented using different styles.

Signed-off-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
2021-10-21 13:43:56 +02:00
Andrew Rybchenko
32ec9c6be7 ethdev: remove reserved fields from internal structures
Fixes: f9bdee267a ("ethdev: hide internal structures")

Signed-off-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
2021-10-21 13:43:56 +02:00
Andrew Rybchenko
bf73419d96 ethdev: fix EEPROM spelling
Signed-off-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
2021-10-21 13:43:56 +02:00
Andrew Rybchenko
5906be5af6 ethdev: fix ID spelling in comments and log messages
Signed-off-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Acked-by: Ori Kam <orika@nvidia.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
2021-10-21 13:43:56 +02:00
Andrew Rybchenko
5b49ba658b ethdev: fix VLAN spelling including VLAN ID case
Signed-off-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Acked-by: Ori Kam <orika@nvidia.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
2021-10-21 13:43:56 +02:00
Andrew Rybchenko
064e90c419 ethdev: fix DCB and VMDq spelling
Fix both in one changeset since they share line in a number of cases.

Signed-off-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
2021-10-21 13:43:56 +02:00
Andrew Rybchenko
0d9f56a857 ethdev: fix Ethernet spelling
Signed-off-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
2021-10-21 13:43:56 +02:00
Andrew Rybchenko
09fd42275b ethdev: fix Rx/Tx spelling
Fix it everywhere in ethdev including log messages.

Signed-off-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
2021-10-21 13:43:56 +02:00
Andrew Rybchenko
3c2ca0a982 ethdev: avoid documentation in next lines
Documentation in the next separate line is confusing. If documentation
requires own line it should be before, not after.

Move documentation to the previous line if documentation on the same
line makes it too long.

Fix a number of incorrect markups on the way.

When a lines is touched by the patch anyway, do other cosmetics
changes to avoid changes in next patches.

Signed-off-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Acked-by: Ori Kam <orika@nvidia.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
2021-10-21 13:43:56 +02:00
Zhihong Peng
6ad06203a5 cmdline: free on exit
Malloc cl in the cmdline_stdin_new function, so release in the
cmdline_stdin_exit function is logical, so that cl will not be
released alone.

Fixes: af75078fec ("first public release")

Signed-off-by: Zhihong Peng <zhihongx.peng@intel.com>
Reviewed-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Tested-by: Zhihong Peng <zhihongx.peng@intel.com>
2021-10-22 23:32:00 +02:00
Dmitry Kozlyuk
f8f8dc2890 cmdline: make struct rdline opaque
Hide struct rdline definition and some RDLINE_* constants in order
to be able to change internal buffer sizes transparently to the user.
Add new functions:

* rdline_new(): allocate and initialize struct rdline.
  This function replaces rdline_init() and takes an extra parameter:
  opaque user data for the callbacks.
* rdline_free(): deallocate struct rdline.
* rdline_get_history_buffer_size(): for use in tests.
* rdline_get_opaque(): to obtain user data in callback functions.

Remove rdline_init() function from library headers and export list,
because using it requires the knowledge of sizeof(struct rdline).

Signed-off-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Narcisa Vasile <navasile@linux.microsoft.com>
2021-10-22 23:23:45 +02:00
Dmitry Kozlyuk
f43809d28c cmdline: make struct cmdline opaque
Remove the definition of `struct cmdline` from public header.
Deprecation notice:
https://mails.dpdk.org/archives/dev/2020-September/183310.html

Signed-off-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
Acked-by: David Marchand <david.marchand@redhat.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Narcisa Vasile <navasile@linux.microsoft.com>
2021-10-22 22:44:18 +02:00
Gowrishankar Muthukrishnan
2f5c4025ab mempool: add telemetry endpoint
Add telemetry endpoint for mempool info.

Signed-off-by: Gowrishankar Muthukrishnan <gmuthukrishn@marvell.com>
Reviewed-by: Bruce Richardson <bruce.richardson@intel.com>
2021-10-22 22:40:59 +02:00
Bruce Richardson
b1094939a5 build/windows: remove separate list of libs
Rather than maintaining a separate list of libraries which are to be
built on windows, use the standard library list and explicitly add to
each library that is not to be built a check for windows and disable
the library at that per-lib level. As well as shortening the main
lib/meson.build file, this also leads to the build summary at the end of
the meson config run correctly listing the libraries which are not to be
built.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
2021-10-22 22:40:59 +02:00
Bruce Richardson
fed600889d dmadev: enable build on Windows
The dmadev library was not added to the list of libraries built on
Windows, meaning it was skipped in those builds and also that none of
the drivers were being considered for build. Adding dmadev to the list
fixes this, and also enables the skeleton dmadev driver to be built -
all-be-it with a small fix necessary.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Reviewed-by: Chengwen Feng <fengchengwen@huawei.com>
Tested-by: Conor Walsh <conor.walsh@intel.com>
2021-10-22 22:40:59 +02:00
David Marchand
223e0f7244 dmadev: remove symbol versioning for inline helpers
Inline helpers have no global symbols in shared libraries.
There is no reason to ask for versioning (plus this library would not
build on Windows).

Fixes: 91e581e5c9 ("dmadev: add data plane API")
Fixes: ea8cf0f853 ("dmadev: add burst capacity API")

Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
2021-10-22 22:40:59 +02:00
Stephen Hemminger
10f726efe2 pdump: support pcapng and filtering
This enhances the DPDK pdump library to support new
pcapng format and filtering via BPF.

The internal client/server protocol is changed to support
two versions: the original pdump basic version and a
new pcapng version.

The internal version number (not part of exposed API or ABI)
is intentionally increased to cause any attempt to try
mismatched primary/secondary process to fail.

Add new API to do allow filtering of captured packets with
DPDK BPF (eBPF) filter program. It keeps statistics
on packets captured, filtered, and missed (because ring was full).

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Reshma Pattan <reshma.pattan@intel.com>
Acked-by: Ray Kinsella <mdr@ashroe.eu>
2021-10-22 22:07:48 +02:00
Stephen Hemminger
745b7587f9 bpf: add function to dump eBPF instructions
When debugging converted (and other) programs it is useful
to see disassembled eBPF output.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Ray Kinsella <mdr@ashroe.eu>
2021-10-22 22:07:48 +02:00
Stephen Hemminger
2eccf6afbe bpf: add function to convert classic BPF to DPDK BPF
The pcap library emits classic BPF (32 bit) and is useful for
creating filter programs.  The DPDK BPF library only implements
extended BPF (eBPF).  Add an function to convert from old to
new.

The rte_bpf_convert function uses rte_malloc to put the resulting
program in hugepage shared memory so it can be passed from a
secondary process to a primary process.

The code to convert was originally done as part of the Linux
kernel implementation then converted to a userspace program.
See https://github.com/tklauser/filter2xdp

Both authors have agreed that it is allowable to create a modified
version of this code and license it with BSD license used by DPDK.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Ray Kinsella <mdr@ashroe.eu>
2021-10-22 17:19:13 +02:00
Stephen Hemminger
80da61198b bpf: allow self-xor operation
Some BPF programs may use XOR of a register with itself
as a way to zero register in one instruction.
The BPF filter converter generates this in the prolog
to the generated code.

The BPF validator would not allow this because the value of
register was undefined. But after this operation it always zero.

Fixes: 8021917293 ("bpf: add extra validation for input BPF program")
Cc: stable@dpdk.org

Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-10-22 17:19:13 +02:00
Stephen Hemminger
8d23ce8f5e pcapng: add new library for writing pcapng files
This is utility library for writing pcapng format files
used by Wireshark family of utilities. Older tcpdump
also knows how to read (but not write) this format.

See
  https://github.com/pcapng/pcapng/

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Reshma Pattan <reshma.pattan@intel.com>
Acked-by: Ray Kinsella <mdr@ashroe.eu>
2021-10-22 17:19:07 +02:00
Stephen Hemminger
09644b58a1 pdump: disable on Windows
The current version of the pdump library was building on
Windows, but it was useless since the pdump utility was not being
built and Windows does not have multi-process support.

The new version of pdump with filtering now has dependency
on bpf. But bpf library is not available on Windows.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
2021-10-22 15:46:19 +02:00
Chengwen Feng
a188277d53 dmadev: fix debug build
This patch fix compile error when enable RTE_DMADEV_DEBUG.

Fixes: ea8cf0f853 ("dmadev: add burst capacity API")

Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
Acked-by: Kevin Laatz <kevin.laatz@intel.com>
Acked-by: Conor Walsh <conor.walsh@intel.com>
2021-10-21 22:10:22 +02:00
David Marchand
c61c8282ef dmadev: hide devices array
No need to expose rte_dma_devices out of the dmadev library.
Existing helpers should be enough, and inlines make use of
rte_dma_fp_objs.

Signed-off-by: David Marchand <david.marchand@redhat.com>
Reviewed-by: Chengwen Feng <fengchengwen@huawei.com>
Tested-by: Conor Walsh <conor.walsh@intel.com>
Acked-by: Kevin Laatz <kevin.laatz@intel.com>
2021-10-21 22:01:37 +02:00
Harry van Haaren
976329581d eventdev: add usage hints to port configure API
This commit introduces 3 flags to the port configuration flags.
These flags allow the application to indicate what type of work
is expected to be performed by an eventdev port.

The three new flags are
- RTE_EVENT_PORT_CFG_HINT_PRODUCER (mostly RTE_EVENT_OP_NEW events)
- RTE_EVENT_PORT_CFG_HINT_CONSUMER (mostly RTE_EVENT_OP_RELEASE events)
- RTE_EVENT_PORT_CFG_HINT_WORKER   (mostly RTE_EVENT_OP_FORWARD events)

These flags are only hints, and the PMDs must operate under the
assumption that any port can enqueue an event with any type of op.

Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
2021-10-21 10:16:00 +02:00
Naga Harish K S V
81da8a5ff4 eventdev/eth_rx: fix WRR buffer overrun
When a poll queue is removed from a rx_adapter instance, the WRR poll
array is recomputed. The wrr array length is reduced in this case. The
next wrr position to poll is stored in wrr_pos variable of rx_adapter
instance. This wrr_pos can become invalid in some cases after wrr is
recomputed. Using this variable to get the next queue and device pair
may leed to wrr buffer overruns.

Resetting the wrr_pos to zero after recomputation of wrr array fixes
the buffer overrun issue.

Fixes: 9c38b704d2 ("eventdev: add eth Rx adapter implementation")
Cc: stable@dpdk.org

Signed-off-by: Naga Harish K S V <s.v.naga.harish.k@intel.com>
Acked-by: Jay Jayatheerthan <jay.jayatheerthan@intel.com>
2021-10-21 10:16:00 +02:00
Pavan Nikhilesh
fcf782051c eventdev: mark trace variables as internal
Mark rte_trace global variables as internal i.e. remove them
from experimental section of version map.
Some of them are used in inline APIs, mark those as global.

Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
Acked-by: Ray Kinsella <mdr@ashroe.eu>
2021-10-21 10:16:00 +02:00
Pavan Nikhilesh
f26f2ca657 eventdev: make trace API internal
Slowpath trace APIs are only used in rte_eventdev.c so make them
as internal.

Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
Acked-by: Jay Jayatheerthan <jay.jayatheerthan@intel.com>
Acked-by: Abhinandan Gujjar <abhinandan.gujjar@intel.com>
2021-10-21 10:16:00 +02:00
Pavan Nikhilesh
68e9668a09 eventdev: promote event vector API to stable
Promote event vector configuration APIs to stable.

Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
Acked-by: Jay Jayatheerthan <jay.jayatheerthan@intel.com>
Acked-by: Ray Kinsella <mdr@ashroe.eu>
2021-10-21 10:16:00 +02:00
Pavan Nikhilesh
f3f3a91788 eventdev/timer: move adapters memory to hugepage
Move memory used by timer adapters to hugepage.
Allocate memory on the first adapter create or lookup to address
both primary and secondary process usecases.
This will prevent TLB misses if any and aligns to memory structure
of other subsystems.

Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
2021-10-21 10:16:00 +02:00
Pavan Nikhilesh
1dcd67ba1e eventdev/timer: rearrange struct fields
Rearrange fields in rte_event_timer data structure to remove holes.
Also, remove use of volatile from rte_event_timer.

Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
2021-10-21 10:14:50 +02:00
Pavan Nikhilesh
a256a743cf eventdev: remove rte prefix for internal structs
Remove rte_ prefix from rte_eth_event_enqueue_buffer,
rte_event_eth_rx_adapter and rte_event_crypto_adapter
as they are only used in rte_event_eth_rx_adapter.c and
rte_event_crypto_adapter.c

Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
Acked-by: Jay Jayatheerthan <jay.jayatheerthan@intel.com>
Acked-by: Abhinandan Gujjar <abhinandan.gujjar@intel.com>
2021-10-21 10:14:50 +02:00
Pavan Nikhilesh
53548ad300 eventdev: hide timer adapter PMD file
Hide rte_event_timer_adapter_pmd.h file as it is an internal file.
Remove rte_ prefix from rte_event_timer_adapter_ops structure.

Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
2021-10-21 10:14:50 +02:00
Pavan Nikhilesh
295c053f90 eventdev: hide event device related structures
Move rte_eventdev, rte_eventdev_data structures to eventdev_pmd.h.

Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
Acked-by: Harman Kalra <hkalra@marvell.com>
2021-10-21 10:14:50 +02:00
Pavan Nikhilesh
052e25d912 eventdev: use new API for inline functions
Use new driver interface for the fastpath enqueue/dequeue inline
functions.

Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
Acked-by: Jay Jayatheerthan <jay.jayatheerthan@intel.com>
Acked-by: Abhinandan Gujjar <abhinandan.gujjar@intel.com>
2021-10-21 10:14:50 +02:00
Pavan Nikhilesh
d35e61322d eventdev: move inline APIs into separate structure
Move fastpath inline function pointers from rte_eventdev into a
separate structure accessed via a flat array.
The intention is to make rte_eventdev and related structures private
to avoid future API/ABI breakages.`

Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
Acked-by: Ray Kinsella <mdr@ashroe.eu>
2021-10-21 10:14:50 +02:00
Pavan Nikhilesh
9c67fcbfd6 eventdev: allocate max space for internal arrays
Allocate max space for internal port, port config, queue config and
link map arrays.
Introduce new macro RTE_EVENT_MAX_PORTS_PER_DEV and set it to max
possible value.
This simplifies the port and queue reconfigure scenarios and will
also allow inline functions to refer pointer to internal port data
without extra checking of current number of configured queues.

Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
2021-10-21 10:14:50 +02:00
Pavan Nikhilesh
26f14535ed eventdev: separate internal structures
Create rte_eventdev_core.h and move all the internal data structures
to this file. These structures are mostly used by drivers, but they
need to be in the public header file as they are accessed by datapath
inline functions for performance reasons.
The accessibility of these data structures is not changed.

Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
2021-10-21 10:14:50 +02:00
Pavan Nikhilesh
23d06e3766 eventdev: make driver interface as internal
Mark all the driver specific functions as internal, remove
`rte` prefix from `struct rte_eventdev_ops`.
Remove experimental tag from internal functions.
Remove `eventdev_pmd.h` from non-internal header files.

Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>
2021-10-21 10:14:50 +02:00
Ganapati Kundapura
814d017093 eventdev/eth_rx: support telemetry
Added telemetry callbacks to get Rx adapter stats, reset stats and
to get Rx queue config information.

Signed-off-by: Ganapati Kundapura <ganapati.kundapura@intel.com>
Acked-by: Jay Jayatheerthan <jay.jayatheerthan@intel.com>
Acked-by: Naga Harish K S V <s.v.naga.harish.k@intel.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
2021-10-21 10:14:50 +02:00
Naga Harish K S V
b06bca69b7 eventdev/eth_rx: add per-queue event buffer
Added per queue buffer. To configure per queue event buffer size,
application sets rte_event_eth_rx_adapter_params::use_queue_event_buf
flag as true while using rte_event_eth_rx_adapter_create_with_params().

The per queue event buffer size is populated in
rte_event_eth_rx_adapter_queue_conf::event_buf_size and passed
to rte_event_eth_rx_adapter_queue_add().

Signed-off-by: Naga Harish K S V <s.v.naga.harish.k@intel.com>
Acked-by: Jay Jayatheerthan <jay.jayatheerthan@intel.com>
2021-10-21 10:14:50 +02:00
Naga Harish K S V
bc0df25c83 eventdev/eth_rx: add event buffer size configurability
Currently event buffer is static array with a default size defined
internally.

To configure event buffer size from application,
rte_event_eth_rx_adapter_create_with_params() API is added which
takes struct rte_event_eth_rx_adapter_params to configure event
buffer size in addition other params. The event buffer size is
rounded up for better buffer utilization and performance. In case
of NULL params argument, default event buffer size is used.

Signed-off-by: Naga Harish K S V <s.v.naga.harish.k@intel.com>
Signed-off-by: Ganapati Kundapura <ganapati.kundapura@intel.com>
Acked-by: Jay Jayatheerthan <jay.jayatheerthan@intel.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
2021-10-21 10:14:50 +02:00
Ganapati Kundapura
da781e6488 eventdev/eth_rx: support Rx queue config get
Added rte_event_eth_rx_adapter_queue_conf_get() API to get rx queue
information - event queue identifier, flags for handling received packets,
scheduler type, event priority, polling frequency of the receive queue
and flow identifier in rte_event_eth_rx_adapter_queue_conf structure

Signed-off-by: Ganapati Kundapura <ganapati.kundapura@intel.com>
Acked-by: Jay Jayatheerthan <jay.jayatheerthan@intel.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
2021-10-21 10:14:50 +02:00
Ganapati Kundapura
83ab470d12 eventdev/eth_rx: use timestamp as dynamic mbuf field
Add support to register timestamp dynamic field in mbuf.

Update the timestamp in mbuf for each packet before enqueuing
to event device if the timestamp is not already set.

Adding the timestamp in Rx adapter avoids additional latency
due to the event device.

Signed-off-by: Ganapati Kundapura <ganapati.kundapura@intel.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
2021-10-21 10:14:50 +02:00