Commit Graph

7172 Commits

Author SHA1 Message Date
Anatoly Burakov
5dff9a72b0 power: support callbacks for multiple Rx queues
Currently, there is a hard limitation on the PMD power management
support that only allows it to support a single queue per lcore. This is
not ideal as most DPDK use cases will poll multiple queues per core.

The PMD power management mechanism relies on ethdev Rx callbacks, so it
is very difficult to implement such support because callbacks are
effectively stateless and have no visibility into what the other ethdev
devices are doing. This places limitations on what we can do within the
framework of Rx callbacks, but the basics of this implementation are as
follows:

- Replace per-queue structures with per-lcore ones, so that any device
  polled from the same lcore can share data
- Any queue that is going to be polled from a specific lcore has to be
  added to the list of queues to poll, so that the callback is aware of
  other queues being polled by the same lcore
- Both the empty poll counter and the actual power saving mechanism is
  shared between all queues polled on a particular lcore, and is only
  activated when all queues in the list were polled and were determined
  to have no traffic.
- The limitation on UMWAIT-based polling is not removed because UMWAIT
  is incapable of monitoring more than one address.

Also, while we're at it, update and improve the docs.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Tested-by: David Hunt <david.hunt@intel.com>
2021-07-09 21:13:13 +02:00
Anatoly Burakov
209fd58545 power: make ethdev power management thread unsafe
Currently, we expect that only one callback can be active at any given
moment, for a particular queue configuration, which is relatively easy
to implement in a thread-safe way. However, we're about to add support
for multiple queues per lcore, which will greatly increase the
possibility of various race conditions.

We could have used something like an RCU for this use case, but absent
of a pressing need for thread safety we'll go the easy way and just
mandate that the API's are to be called when all affected ports are
stopped, and document this limitation. This greatly simplifies the
`rte_power_monitor`-related code.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Tested-by: David Hunt <david.hunt@intel.com>
2021-07-09 21:13:13 +02:00
Anatoly Burakov
66834f2974 eal: add power monitor for multiple events
Use RTM and WAITPKG instructions to perform a wait-for-writes similar to
what UMWAIT does, but without the limitation of having to listen for
just one event. This works because the optimized power state used by the
TPAUSE instruction will cause a wake up on RTM transaction abort, so if
we add the addresses we're interested in to the read-set, any write to
those addresses will wake us up.

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: David Hunt <david.hunt@intel.com>
2021-07-09 21:13:13 +02:00
Anatoly Burakov
6afc4baf4f eal: use callbacks for power monitoring comparison
Previously, the semantics of power monitor were such that we were
checking current value against the expected value, and if they matched,
then the sleep was aborted. This is somewhat inflexible, because it only
allowed us to check for a specific value in a specific way.

This commit replaces the comparison with a user callback mechanism, so
that any PMD (or other code) using `rte_power_monitor()` can define
their own comparison semantics and decision making on how to detect the
need to abort the entering of power optimized state.

Existing implementations are adjusted to follow the new semantics.

Suggested-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Tested-by: David Hunt <david.hunt@intel.com>
Acked-by: Timothy McDaniel <timothy.mcdaniel@intel.com>
2021-07-09 21:13:13 +02:00
Juraj Linkeš
845048c522 eal/arm: update CPU flags
There are two execution states on armv8 architecture, aarch64 and
aarch32. Add PLATFORM_STR for the latter and update RTE_ARCH_* flags
according to e9b9739264.

Signed-off-by: Juraj Linkeš <juraj.linkes@pantheon.tech>
2021-07-09 20:00:19 +02:00
Ferruh Yigit
b67f598e23 kni: update link only on change
'rte_kni_update_link()' updates virtual KNI interface link using kernel
sysfs interface.
If the requested link status is same as interface link status, do not
update the link status but return with success.

Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
2021-07-09 17:22:42 +02:00
Richael Zhuang
ef1cc88f18 power: support cppc_cpufreq driver
Currently in DPDK only acpi_cpufreq and pstate_cpufreq drivers are
supported, which are both not available on arm64 platforms. Add
support for cppc_cpufreq driver which works on most arm64 platforms.

Signed-off-by: Richael Zhuang <richael.zhuang@arm.com>
Acked-by: David Hunt <david.hunt@intel.com>
2021-07-09 16:04:46 +02:00
Anatoly Burakov
06cffd468f power: refactor ACPI and intel_pstate support
Currently, ACPI and PSTATE modes have lots of code duplication,
confusing logic, and a bunch of other issues that can, and have, led to
various bugs and resource leaks.

This commit factors out the common parts of sysfs reading/writing for
ACPI and PSTATE drivers.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Signed-off-by: David Hunt <david.hunt@intel.com>
2021-07-08 22:32:13 +02:00
Anatoly Burakov
02a6d68311 power: fix namespace for internal struct
Currently, ACPI code uses rte_power_info as the struct name, which
gives the appearance that this is an externally visible API. Fix to
use internal namespace.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: David Hunt <david.hunt@intel.com>
2021-07-08 22:32:13 +02:00
Huisong Li
02edbfab1e ethdev: add dev configured flag
Currently, if dev_configure is not called or fails to be called, users
can still call dev_start successfully. So it is necessary to have a flag
which indicates whether the device is configured, to control whether
dev_start can be called and eliminate dependency on user invocation order.

The flag stored in "struct rte_eth_dev_data" is more reasonable than
 "enum rte_eth_dev_state". "enum rte_eth_dev_state" is private to the
primary and secondary processes, and can be independently controlled.
However, the secondary process does not make resource allocations and
does not call dev_configure(). These are done by the primary process
and can be obtained or used by the secondary process. So this patch
adds a "dev_configured" flag in "rte_eth_dev_data", like "dev_started".

Signed-off-by: Huisong Li <lihuisong@huawei.com>
Reviewed-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>

libabigail raised a warning on this change.
This change is fine wrt ABI as far as we understand, but we can't
express an exception rule (see libabigail bug #28060) to waive the
changes only in this part of the rte_eth_dev_data struct.
The solution for now is to globally waive any change on the
rte_eth_dev_data structure.

Signed-off-by: David Marchand <david.marchand@redhat.com>
2021-07-08 13:05:55 +02:00
David Marchand
e7885281de ipc: stop mp control thread on cleanup
When calling rte_eal_cleanup, the mp channel cleanup routine only sets
mp_fd to -1 leaving the rte_mp_handle control thread running.
This control thread can spew warnings on reading on an invalid fd.
This is especially noticed with ASAN enabled.

To handle this situation, set mp_fd to -1 to signal the control thread
it should exit, but since this thread might be sleeping on the socket,
cancel the thread too.

Fixes: 85d6815fa6 ("eal: close multi-process socket during cleanup")
Cc: stable@dpdk.org

Reported-by: Owen Hilyard <ohilyard@iol.unh.edu>
Signed-off-by: David Marchand <david.marchand@redhat.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2021-07-08 13:05:55 +02:00
Jan Viktorin
56912ddef2 ethdev: fix doc of flow action
The struct rte_flow_action was missing from DPDK API documentation.

Fixes: 3850cf0c8c ("ethdev: add tunnel encap/decap actions")
Cc: stable@dpdk.org

Signed-off-by: Jan Viktorin <viktorin@cesnet.cz>
Acked-by: Ori Kam <orika@nvidia.com>
Acked-by: Aman Deep Singh <aman.deep.singh@intel.com>
2021-07-02 19:03:03 +02:00
Jie Zhou
799a5b9aca eal/windows: add clock function
Add clock_gettime() on Windows in rte_os_shim.h.

Signed-off-by: Jie Zhou <jizh@linux.microsoft.com>
Acked-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
2021-07-02 19:03:03 +02:00
Jie Zhou
3d2fcb0e0a eal/windows: add device event stubs
Add device event stubs in eal_dev.c for Windows

Signed-off-by: Jie Zhou <jizh@linux.microsoft.com>
Acked-by: Tal Shnaiderman <talshn@nvidia.com>
Acked-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
2021-07-02 19:03:03 +02:00
Jie Zhou
22f463e181 eal/windows: add macros required by testpmd
Add required macros by testpmd on Windows in rte_os_shim.h

Signed-off-by: Jie Zhou <jizh@linux.microsoft.com>
Acked-by: Tal Shnaiderman <talshn@nvidia.com>
Acked-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
2021-07-02 19:03:03 +02:00
Jie Zhou
786881d152 lib: build testpmd dependencies on Windows
Enable building libraries that testpmd depends on for Windows

Signed-off-by: Jie Zhou <jizh@linux.microsoft.com>
Acked-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
2021-07-02 19:03:03 +02:00
Olivier Matz
45a08ef55e net: introduce functions to verify L4 checksums
Since commit d5df2ae042 ("net: fix unneeded replacement of TCP
checksum 0"), the functions rte_ipv4_udptcp_cksum() and
rte_ipv6_udptcp_cksum() can return either 0x0000 or 0xffff when used to
verify a packet containing a valid checksum.

Since these functions should be used to calculate the checksum to set in
a packet, introduce 2 new helpers for checksum verification. They return
0 if the checksum is valid in the packet.

Use this new helper in net/tap driver.

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Morten Brørup <mb@smartsharesystems.com>
Reviewed-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
2021-07-02 19:03:03 +02:00
David Marchand
40edb9c0d3 eal: handle compressed firmware
Introduce an internal firmware loading helper to remove code duplication
in our drivers and handle xz compressed firmware by calling libarchive.

This helper tries to look for .xz suffixes so that drivers are not aware
the firmware has been compressed.

libarchive is set as an optional dependency: without libarchive, a
runtime warning is emitted so that users know there is a compressed
firmware.

Windows implementation is left as an empty stub.

Signed-off-by: David Marchand <david.marchand@redhat.com>
Reviewed-by: Igor Russkikh <irusskikh@marvell.com>
Acked-by: Aaron Conole <aconole@redhat.com>
Tested-by: Haiyue Wang <haiyue.wang@intel.com>
2021-07-07 16:41:53 +02:00
Bruce Richardson
d5252f7d4b telemetry: add extra log message on socket bind failure
If the library fails to create the needed socket, add an additional
check to report if the error is due to a missing DPDK runtime dir.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
Acked-by: Morten Brørup <mb@smartsharesystems.com>
Acked-by: Ciara Power <ciara.power@intel.com>
2021-07-07 15:23:53 +02:00
Bruce Richardson
ce382fdddb eal: create runtime dir even when shared data is not used
When multi-process is not wanted and DPDK is run with the "no-shconf"
flag, the telemetry library still needs a runtime directory to place the
unix socket for telemetry connections. Therefore, rather than not
creating the directory when this flag is set, we can change the code to
attempt the creation anyway, but not error out if it fails. If it
succeeds, then telemetry will be available, but if it fails, the rest of
DPDK will run without telemetry. This ensures that the "in-memory" flag
will allow DPDK to run even if the whole filesystem is read-only, for
example.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Reviewed-by: Morten Brørup <mb@smartsharesystems.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
2021-07-07 15:23:09 +02:00
Maxime Coquelin
b3d4a18b9c vhost: use DPDK allocations for in-flight data
Inflight metadata are allocated using glibc's calloc.
This patch converts them to rte_zmalloc_socket to take
care of the NUMA affinity.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
2021-06-30 13:32:38 +02:00
Maxime Coquelin
b81c93466d vhost: allocate all data on same node as virtqueue
This patch saves the NUMA node the virtqueue is allocated
on at init time, in order to allocate all other data on the
same node.

While most of the data are allocated before numa_realloc()
is called and so the data will be reallocated properly, some
data like the log cache are most likely allocated after.

For the virtio device metadata, we decide to allocate them
on the same node as the VQ 0.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
2021-06-30 13:32:13 +02:00
Maxime Coquelin
97b4e3b1d0 vhost: improve NUMA reallocation
This patch improves the numa_realloc() function by making use
of rte_realloc_socket(), which takes care of the memory copy
and freeing of the old data.

Suggested-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
2021-06-30 13:32:13 +02:00
Maxime Coquelin
6305dfeff4 vhost: fix NUMA reallocation with multi-queue
Since the Vhost-user device initialization has been reworked,
enabling the application to start using the device as soon as
the first queue pair is ready, NUMA reallocation no more
happened on queue pairs other than the first one since
numa_realloc() was returning early if the device was running.

This patch fixes this issue by reallocating the device metadata
only if the device is running. For the virtqueues, a vring state
change notification is sent to notify the application of its
disablement. Since the callback is supposed to be blocking, it
is safe to reallocate it afterwards.

Fixes: d0fcc38f5f ("vhost: improve device readiness notifications")
Cc: stable@dpdk.org

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
2021-06-30 13:32:13 +02:00
Maxime Coquelin
eb40c50c17 vhost: fix missing cache logging NUMA realloc
When the guest allocates virtqueues on a different NUMA node
than the one the Vhost metadata are allocated, both the Vhost
device struct and the virtqueues struct are reallocated.

However, reallocating the log cache on the new NUMA node was
not done. This patch fixes this by reallocating it if it has
been allocated already, which means a live-migration is
on-going.

Fixes: 1818a63147 ("vhost: move dirty logging cache out of virtqueue")
Cc: stable@dpdk.org

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
2021-06-30 13:32:13 +02:00
Maxime Coquelin
57589cdfd7 vhost: fix missing guest pages table NUMA realloc
When the guest allocates virtqueues on a different NUMA node
than the one the Vhost metadata are allocated, both the Vhost
device struct and the virtqueues struct are reallocated.

However, reallocating the guest pages table was missing, which
likely causes at least one cross-NUMA accesses for every burst
of packets.

This patch reallocates this table on the same NUMA node as the
other metadata.

Fixes: e246896178 ("vhost: get guest/host physical address mappings")
Cc: stable@dpdk.org

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
2021-06-30 13:26:58 +02:00
Maxime Coquelin
8119ca9114 vhost: fix missing memory table NUMA realloc
When the guest allocates virtqueues on a different NUMA node
than the one the Vhost metadata are allocated, both the Vhost
device struct and the virtqueues struct are reallocated.

However, reallocating the Vhost memory table was missing, which
likely causes at least one cross-NUMA accesses for every burst
of packets.

This patch reallocates this table on the same NUMA node as the
other metadata.

Fixes: 552e8fd3d2 ("vhost: simplify memory regions handling")
Cc: stable@dpdk.org

Reported-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
2021-06-30 13:26:58 +02:00
Xueming Li
35d4f17b3d devargs: add common key definition
Add common devargs key definition for "bus", "class" and "driver".

Signed-off-by: Xueming Li <xuemingl@nvidia.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
2021-07-05 16:33:18 +02:00
Thomas Monjalon
dbba7c9efb eal: save error in string copy
The string copy api rte_strscpy() did not set rte_errno during failures,
instead it just returned negative error number.

Set rte_errrno if the destination buffer is too small.

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Signed-off-by: Xueming Li <xuemingl@nvidia.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
2021-07-05 15:11:30 +02:00
Ruifeng Wang
18f0b28eec eal/arm: remove unused type
Data types Elf32_auxv_t and Elf64_auxv_t are used by OS Linux
auxiliary vector read, and not used by arch specific cpu flag
API implementations. Hence remove them from Arm file.

Reported-by: James Grant <j.grant@qub.ac.uk>
Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
2021-07-05 09:50:51 +02:00
Owen Hilyard
03b8372a9a rib: fix max depth IPv6 lookup
ASAN found a stack buffer overflow in lib/rib/rte_rib6.c:get_dir.
The fix for the stack buffer overflow was to make sure depth
was always < 128, since when depth = 128 it caused the index
into the ip address to be 16, which read off the end of the array.

While trying to solve the buffer overflow, I noticed that a few
changes could be made to remove the for loop entirely.

Fixes: f7e861e21c ("rib: support IPv6")
Cc: stable@dpdk.org

Signed-off-by: Owen Hilyard <ohilyard@iol.unh.edu>
Acked-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>
2021-06-24 15:34:45 +02:00
Owen Hilyard
016441e3c7 flow_classify: fix leaking rules on delete
Rules in a classify table were not freed if the table
had a delete function.

Fixes: be41ac2a33 ("flow_classify: introduce flow classify library")
Cc: stable@dpdk.org

Signed-off-by: Owen Hilyard <ohilyard@iol.unh.edu>
Acked-by: Bernard Iremonger <bernard.iremonger@intel.com>
2021-06-24 15:34:45 +02:00
Yunjian Wang
0db3d5551a kni: fix mbuf allocation for kernel side use
In kni_allocate_mbufs(), we alloc mbuf for alloc_q as this code.
allocq_free = (kni->alloc_q->read - kni->alloc_q->write - 1) \
		& (MAX_MBUF_BURST_NUM - 1);
The value of allocq_free maybe zero, for example :
The ring size is 1024. After init, write = read = 0. Then we fill
kni->alloc_q to full. At this time, write = 1023, read = 0.

Then the kernel send 32 packets to userspace. At this time, write
= 1023, read = 32. And then the userspace receive this 32 packets.
Then fill the kni->alloc_q, (32 - 1023 - 1) & 31 = 0, fill nothing.
...
Then the kernel send 32 packets to userspace. At this time, write
= 1023, read = 992. And then the userspace receive this 32 packets.
Then fill the kni->alloc_q, (992 - 1023 - 1) & 31 = 0, fill nothing.

Then the kernel send 32 packets to userspace. The kni->alloc_q only
has 31 mbufs and will drop one packet.

Absolutely, this is a special scene. Normally, it will fill some
mbufs everytime, but may not enough for the kernel to use.

In this patch, we always keep the kni->alloc_q to full for the kernel
to use.

Fixes: 49da4e82cf ("kni: allocate no more mbuf than empty slots in queue")
Cc: stable@dpdk.org

Signed-off-by: Cheng Liu <liucheng11@huawei.com>
Signed-off-by: Yunjian Wang <wangyunjian@huawei.com>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
2021-06-24 09:42:37 +02:00
Balazs Nemeth
242695f612 vhost: allocate and free packets in bulk in Tx split
Same idea as commit a287ac2891 ("vhost: allocate and free packets
in bulk in Tx packed"), allocate and free packets in bulk.
Also remove the unused function virtio_dev_pktmbuf_alloc.

Signed-off-by: Balazs Nemeth <bnemeth@redhat.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2021-06-23 09:55:34 +02:00
Thierry Herbelot
9cfbe67691 vhost/crypto: check request pointer before dereference
Use vc_req only after it was checked not to be NULL.

Fixes: 2d962bb736 ("vhost/crypto: fix possible TOCTOU attack")
Cc: stable@dpdk.org

Signed-off-by: Thierry Herbelot <thierry.herbelot@6wind.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2021-06-23 09:55:23 +02:00
Dmitry Kozlyuk
cfdaa678b3 eal/windows: cleanup interrupt resources
Interrupt manager in Windows EAL allocates on IOCP and starts
a control thread that runs indefinitely. At DPDK cleanup
this thread was not stopped and IOCP handle was not closed.

Gracefully stop interrupt-handling in rte_eal_cleanup().
The thread already closes IOCP handle before exiting.

Fixes: 5c016fc020 ("eal/windows: add interrupt thread skeleton")
Cc: stable@dpdk.org

Signed-off-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
Acked-by: Ranjit Menon <ranjit.menon@intel.com>
Acked-by: Jie Zhou <jizh@microsoft.com>
Tested-by: Jie Zhou <jizh@microsoft.com>
2021-06-23 09:05:36 +02:00
Dmitry Kozlyuk
3888f31950 eal/windows: fix interrupt thread handle leakage
Each time a work was scheduled in the interrupt thread,
usually an alarm, a handle was opened but not closed.

Opening a handle is a system call, which harms alarm precision.
Instead of opening and closing a handle each time, open it
when interrupt thread starts and close it when the thread finishes.

Fixes: 5c016fc020 ("eal/windows: add interrupt thread skeleton")
Cc: stable@dpdk.org

Signed-off-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
Tested-by: Pallavi Kadam <pallavi.kadam@intel.com>
2021-06-23 09:04:28 +02:00
Dmitry Kozlyuk
35dff5d3b7 eal/windows: fix interrupt thread ID
Interrupt thread ID retained its value after interrupt thread finish.
Other interrupt routines could then operate on the wrong thread.
Clear interrupt thread ID before thread termination.

Fixes: 5c016fc020 ("eal/windows: add interrupt thread skeleton")
Cc: stable@dpdk.org

Signed-off-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
Acked-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
2021-06-23 09:03:14 +02:00
Christian Ehrhardt
31c5af644b vfio: add stdbool include
This became visible by backporting the following for the 19.11 stable tree:
 c13ca4e8 "vfio: fix DMA mapping granularity for IOVA as VA"

The usage of type bool in the vfio code would require "#include
<stdbool.h>", but rte_vfio.h has no direct paths to stdbool.h.
It happens that in eal_vfio_mp_sync.c it comes after "#include
<rte_log.h>".

And rte_log.h since 20.05 includes stdbool since this change:
 241e67bfe "log: add API to check if a logtype can log in a given level"
and thereby mitigates the issue.

It should be safe to include stdbool.h from rte_vfio.h itself
to be present exactly when needed for the struct it defines using that
type.

Fixes: c13ca4e81c ("vfio: fix DMA mapping granularity for IOVA as VA")
Cc: stable@dpdk.org

Signed-off-by: Christian Ehrhardt <christian.ehrhardt@canonical.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
2021-06-17 10:31:33 +02:00
Konstantin Ananyev
b3b36f0fbf acl: fix build with GCC 6.3
--buildtype=debug with gcc 6.3 produces the following error:

../lib/librte_acl/acl_run_avx512_common.h: In function
‘resolve_match_idx_avx512x16’:
../lib/librte_acl/acl_run_avx512x16.h:33:18: error:
	the last argument must be an 8-bit immediate
                               ^
../lib/librte_acl/acl_run_avx512_common.h:373:9: note:
	in expansion of macro ‘_M_I_’
      return _M_I_(slli_epi32)(mi, match_log);
             ^~~~~

Seems like gcc-6.3 complains about the following construct:

static const uint32_t match_log = 5;
    ...
_mm512_slli_epi32(mi, match_log);

It can't substitute constant variable 'match_log' with its actual value.
The fix replaces constant variable with its immediate value.

Bugzilla ID: 717
Fixes: b64c2295f7 ("acl: add 256-bit AVX512 classify method")
Fixes: 45da22e42e ("acl: add 512-bit AVX512 classify method")
Cc: stable@dpdk.org

Reported-by: Liang Ma <liangma@liangbit.com>
Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2021-06-17 09:37:11 +02:00
David Marchand
2ca92f5441 malloc: fix size annotation for NUMA-aware realloc
__rte_alloc_size is mapped to compiler alloc_size attribute.

Quoting gcc documentation:
"""
alloc_size
    The alloc_size attribute is used to tell the compiler that the
    function return value points to memory, where the size is given by
    one or two of the functions parameters. GCC uses this information
    to improve the correctness of __builtin_object_size.

    The function parameter(s) denoting the allocated size are specified
    by one or two integer arguments supplied to the attribute.
    The allocated size is either the value of the single function
    argument specified or the product of the two function arguments
    specified. Argument numbering starts at one.
"""

In rte_realloc_socket case, only 'size' matters.

Note: this has been spotted by Maxime trying to use rte_realloc_socket
and compiling with gcc 11.

Fixes: 17b347dab7 ("malloc: add alloc_size attribute to functions")
Cc: stable@dpdk.org

Signed-off-by: David Marchand <david.marchand@redhat.com>
Tested-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2021-06-11 11:03:38 +02:00
Ivan Ilchenko
1ffd3bc125 bitmap: fix buffer overrun in bitmap init
Bitmap initialization function is allowed to memset()
caller-provided buffer with number of bytes exceeded
this buffer size. This happens due to wrong comparison
sign between buffer size and number of bytes required
to initialize bitmap.

Fixes: 602c9ca33a ("sched: bitmap is now dynamically allocated")
Cc: stable@dpdk.org

Reported-by: Andy Moreton <amoreton@xilinx.com>
Signed-off-by: Ivan Ilchenko <ivan.ilchenko@oktetlabs.ru>
Reviewed-by: Andy Moreton <amoreton@xilinx.com>
Signed-off-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
2021-06-11 11:03:25 +02:00
Haiyue Wang
21f6adec07 bus/pci: configure PCI bus master
Add the API to set 'Bus Master Enable' bit to be enabled or disabled in
the PCI command register.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Acked-by: Ray Kinsella <mdr@ashroe.eu>
2021-06-04 09:38:08 +02:00
David Marchand
64307fad7d telemetry: remove static limit on callbacks count
This code is not performance sensitive and can be switched to dynamic
allocations.

Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Ciara Power <ciara.power@intel.com>
2021-06-03 18:36:03 +02:00
Hongbo Zheng
2d2bf7de1a graph: fix null dereference in stats
In function 'stats_mem_init', pointer 'stats' should
be confirmed not null before memset it.

Fixes: af1ae8b6a3 ("graph: implement stats")
Cc: stable@dpdk.org

Signed-off-by: Hongbo Zheng <zhenghongbo3@huawei.com>
Signed-off-by: Min Hu (Connor) <humin29@huawei.com>
Reviewed-by: Jerin Jacob <jerinj@marvell.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
2021-06-03 18:35:57 +02:00
Hongbo Zheng
3b47572fbe graph: fix memory leak in stats
Fix function 'stats_mem_populate' return without
free dynamic memory referenced by 'stats'.

Fixes: af1ae8b6a3 ("graph: implement stats")
Cc: stable@dpdk.org

Signed-off-by: Hongbo Zheng <zhenghongbo3@huawei.com>
Signed-off-by: Min Hu (Connor) <humin29@huawei.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
2021-06-03 18:35:47 +02:00
Thomas Monjalon
0d025f019c ethdev: fix comments of packet integrity flow item
The Doxygen comments are placed before the related lines,
but the markers were /**< instead of /**

The struct rte_flow_item_integrity did not appear in Doxygen output
because there was no general comment for the struct.

Fixes: b10a421a1f ("ethdev: add packet integrity check flow rules")

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
2021-05-19 22:52:03 +02:00
David Marchand
f31ce483bc vhost: restore IOTLB mempool allocation
IOTLB messages will be sent when some queues are not enabled. If we
initialize IOTLB in vhost_user_set_vring_num, it could happen that IOTLB
update comes when IOTLB pool of disabled queues are not initialized.

Fixes: 968bbc7e2e ("vhost: avoid IOTLB mempool allocation while IOMMU disabled")

Signed-off-by: David Marchand <david.marchand@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
2021-05-18 09:54:44 +02:00
Balazs Nemeth
3ad55b8e94 vhost: fix stored last used index
The optimization introduced by
commit d18db8049c ("vhost: read last used index once")
didn't account for the fact that vhost_flush_enqueue_shadow_packed
increments the last_used_idx.
For this reason, store last_used_idx after the potential call to
vhost_flush_enqueue_shadow_packed.

Bugzilla ID: 699
Fixes: d18db8049c ("vhost: read last used index once")

Signed-off-by: Balazs Nemeth <bnemeth@redhat.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Tested-by: Wei Ling <weix.ling@intel.com>
2021-05-18 09:43:35 +02:00
Cheng Jiang
35139e648a vhost: fix sign extension in async packed ring
Change the variable type in store_dma_desc_info_packed() to fix
suspicious implicit sign extension.

Coverity issue: 370608, 370610, 370612
Fixes: 873e8dad6f ("vhost: support packed ring in async datapath")

Signed-off-by: Cheng Jiang <cheng1.jiang@intel.com>
2021-05-12 10:28:18 +02:00
Cheng Jiang
11a7cd8c92 vhost: fix sign extension in async split ring
Change the variable type in store_dma_desc_info_split() to fix
suspicious implicit sign extension.

Coverity issue: 370604, 370607, 370609
Fixes: 3d6cb86b0d ("vhost: refactor async split ring functions")

Signed-off-by: Cheng Jiang <cheng1.jiang@intel.com>
2021-05-12 10:28:08 +02:00
David Marchand
8eff201b00 net/ice: fix leak on thread termination
A terminated pthread should be joined or detached so that its associated
resources are released.

The "ice-reset-<vf_id>" threads are used to service some reset task in
the background, but they are never joined by the thread that created
them.
The easiest solution is to detach new threads.

The Windows EAL did not provide a pthread_detach wrapper but there is no
resource to release for Windows threads, so add an empty wrapper.

Fixes: 3b3757bda3 ("net/ice: get VF hardware index in DCF")
Cc: stable@dpdk.org

Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Haiyue Wang <haiyue.wang@intel.com>
Acked-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
2021-05-11 23:40:22 +02:00
Hongbo Zheng
1fe00fd358 power: fix sanity checks for guest channel read
In function power_guest_channel_read_msg, 'lcore_id' is used before
validity check, which may cause buffer 'global_fds' accessed by index
'lcore_id' overflow.

This patch moves the validity check of 'lcore_id' before the 'lcore_id'
being used for the first time.

Fixes: 9dc843eb27 ("power: extend guest channel API for reading")
Cc: stable@dpdk.org

Signed-off-by: Hongbo Zheng <zhenghongbo3@huawei.com>
Signed-off-by: Min Hu (Connor) <humin29@huawei.com>
Reviewed-by: Reshma Pattan <reshma.pattan@intel.com>
Acked-by: David Hunt <david.hunt@intel.com>
2021-05-12 17:18:38 +02:00
Chengwen Feng
cc994d3922 ipc: use monotonic clock
Currently, the mp uses gettimeofday() API to get the time, and used as
timeout parameter.

But the time which gets from gettimeofday() API isn't monotonically
increasing. The process may fail if the system time is changed.

This fixes it by using clock_gettime() API with monotonic attribution.

Fixes: 783b6e5497 ("eal: add synchronous multi-process communication")
Fixes: f05e26051c ("eal: add IPC asynchronous request")
Cc: stable@dpdk.org

Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
Signed-off-by: Min Hu (Connor) <humin29@huawei.com>
Acked-by: Morten Brørup <mb@smartsharesystems.com>
2021-05-12 16:49:08 +02:00
Lance Richardson
6beb2d2947 eal: fix memory mapping on 32-bit target
For 32-bit targets, size_t is normally a 32-bit type and
does not have sufficient range to represent 64-bit offsets
that are needed when mapping PCI addresses.
Use uint64_t instead.

Found when attempting to run 32-bit Linux dpdk-testpmd
using VFIO driver:

    EAL: pci_map_resource(): cannot map resource(63, 0xc0010000, \
    0x200000, 0x20000000000): Invalid argument ((nil))

Fixes: c4b89ecb64 ("eal: introduce memory management wrappers")
Cc: stable@dpdk.org

Signed-off-by: Lance Richardson <lance.richardson@broadcom.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
2021-05-11 23:01:06 +02:00
David Marchand
1657e1f871 net: fix header include order for FreeBSD
Spotted by sparse in OVS build:

../../lib/netdev-dpdk.c: note: in included file (through
/home/runner/work/ovs/ovs/dpdk-dir/build/include/rte_ip.h,
/home/runner/work/ovs/ovs/dpdk-dir/build/include/rte_flow.h, ...):
../../include/sparse/arpa/inet.h:22:2: error: "Must include
<netinet/in.h> before <arpa/inet.h> for FreeBSD support"

This is a check enforced by OVS itself.
See [1] for some context.

1: https://github.com/openvswitch/ovs/commit/b2befd5bb2db

Fixes: 89813a522e ("net: provide IP-related API on any OS")

Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
2021-05-11 15:44:38 +02:00
David Marchand
dc2c712f72 net: add endianness annotations to ethernet headers
Spotted by sparse in OVS build:

/home/runner/work/ovs/ovs/dpdk-dir/build/include/rte_flow.h:789:27:
error: incorrect type in initializer (different base types)
/home/runner/work/ovs/ovs/dpdk-dir/build/include/rte_flow.h:789:27:
expected unsigned short [usertype] ether_type
/home/runner/work/ovs/ovs/dpdk-dir/build/include/rte_flow.h:789:27:
got restricted ovs_be16 [usertype]
/home/runner/work/ovs/ovs/dpdk-dir/build/include/rte_flow.h:829:25:
error: incorrect type in initializer (different base types)
/home/runner/work/ovs/ovs/dpdk-dir/build/include/rte_flow.h:829:25:
expected unsigned short [usertype] vlan_tci
/home/runner/work/ovs/ovs/dpdk-dir/build/include/rte_flow.h:829:25:
got restricted ovs_be16 [usertype]
/home/runner/work/ovs/ovs/dpdk-dir/build/include/rte_flow.h:830:26:
error: incorrect type in initializer (different base types)
/home/runner/work/ovs/ovs/dpdk-dir/build/include/rte_flow.h:830:26:
expected unsigned short [usertype] eth_proto
/home/runner/work/ovs/ovs/dpdk-dir/build/include/rte_flow.h:830:26:
got restricted ovs_be16 [usertype]

This was not caught before as no code in headers was using those fields.
This changed with commit 6f2168b69a ("ethdev: reuse ethernet header
definition in flow item") and commit a56a262e34 ("ethdev: reuse VLAN
header definition in flow item").

Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
2021-05-11 15:22:26 +02:00
David Marchand
eeded2044a log: register with standardized names
Let's try to enforce the convention where most drivers use a pmd. logtype
with their class reflected in it, and libraries use a lib. logtype.

Introduce two new macros:
- RTE_LOG_REGISTER_DEFAULT can be used when a single logtype is
  used in a component. It is associated to the default name provided
  by the build system,
- RTE_LOG_REGISTER_SUFFIX can be used when multiple logtypes are used,
  and then the passed name is appended to the default name,

RTE_LOG_REGISTER is left untouched for existing external users
and for components that do not comply with the convention.

There is a new Meson variable log_prefix to adapt the default name
for baseband (pmd.bb.), bus (no pmd.) and mempool (no pmd.) classes.

Note: achieved with below commands + reverted change on net/bonding +
edits on crypto/virtio, compress/mlx5, regex/mlx5

$ git grep -l RTE_LOG_REGISTER drivers/ |
  while read file; do
    pattern=${file##drivers/};
    class=${pattern%%/*};
    pattern=${pattern#$class/};
    drv=${pattern%%/*};
    case "$class" in
      baseband) pattern=pmd.bb.$drv;;
      bus) pattern=bus.$drv;;
      mempool) pattern=mempool.$drv;;
      *) pattern=pmd.$class.$drv;;
    esac
    sed -i -e 's/RTE_LOG_REGISTER(\(.*\), '$pattern',/RTE_LOG_REGISTER_DEFAULT(\1,/' $file;
    sed -i -e 's/RTE_LOG_REGISTER(\(.*\), '$pattern'\.\(.*\),/RTE_LOG_REGISTER_SUFFIX(\1, \2,/' $file;
  done

$ git grep -l RTE_LOG_REGISTER lib/ |
  while read file; do
    pattern=${file##lib/};
    pattern=lib.${pattern%%/*};
    sed -i -e 's/RTE_LOG_REGISTER(\(.*\), '$pattern',/RTE_LOG_REGISTER_DEFAULT(\1,/' $file;
    sed -i -e 's/RTE_LOG_REGISTER(\(.*\), '$pattern'\.\(.*\),/RTE_LOG_REGISTER_SUFFIX(\1, \2,/' $file;
  done

Signed-off-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
2021-05-11 15:17:55 +02:00
Vladimir Medvedkin
be81f77d80 hash: fix tuple adjustment
rte_thash_adjust_tuple() uses random to generate a new subtuple if
fn() callback reports about collision. In some cases random changes
the subtuple in a way that after complementary bits are applied the
original tuple is obtained. This patch replaces random with subtuple
increment.

Fixes: 28ebff11c2 ("hash: add predictable RSS")
Cc: vladimir.medvedkin@intel.com

Reported-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>
Acked-by: Yipeng Wang <yipeng1.wang@intel.com>
Tested-by: Stanislaw Kardach <kda@semihalf.com>
Reviewed-by: Stanislaw Kardach <kda@semihalf.com>
2021-05-10 15:31:42 +02:00
David Marchand
b81bf1efe3 eal: fix leak in shared lib mode detection
This is reported by our internal covscan:

1. dpdk-20.11/lib/librte_eal/common/eal_common_options.c:508: alloc_fn:
Storage is returned from allocation function "dlopen".
6. dpdk-20.11/lib/librte_eal/common/eal_common_options.c:508:
leaked_storage: Failing to save or free storage allocated by
"dlopen("librte_eal.so.21.0", 5)" leaks it.

 #   506|   	 * shared library is not already loaded i.e. it's
 #   statically linked.)
 #   507|   	 */
 #   508|-> 	if (dlopen("librte_eal.so."ABI_VERSION, RTLD_LAZY |
 #   RTLD_NOLOAD) != NULL &&
 #   509|   			*default_solib_dir != '\0' &&
 #   510|   			stat(default_solib_dir, &sb) == 0 &&

This leak is not an issue per se, but on the other hand, this is easy
to fix and I prefer not having to waive this warning later.

Fixes: 06c7871dde ("eal: restrict default plugin path to shared lib mode")
Cc: stable@dpdk.org

Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
2021-05-10 15:31:42 +02:00
David Marchand
627c5b41bb build: fix default drivers list without Python
If no enable_drivers option is passed, the default is to build
the drivers list by calling list-dir-globs.py.

But if no Python interpreter is installed, no error is reported
and all drivers end up being disabled.

Example on a minimal FreeBSD VM:

  dpdk@freebsd:~/dpdk $ meson setup build
  ...
  drivers:
	  common/cpt:	not in enabled drivers build config
	  common/dpaax:	not in enabled drivers build config
	  common/iavf:	not in enabled drivers build config
	  common/mvep:	not in enabled drivers build config
	  common/octeontx:	not in enabled drivers build config
	  common/octeontx2:	not in enabled drivers build config
	  bus/dpaa:	not in enabled drivers build config
	  bus/fslmc:	not in enabled drivers build config
  ...

  dpdk@freebsd:~/dpdk $ cd drivers/
  dpdk@freebsd:~/dpdk/drivers $ ~/dpdk/buildtools/list-dir-globs.py */*
  env: python3: No such file or directory

Rely on meson internal interpreter.
Check return code when calling this script.

Fixes: ab9407c3ad ("build: allow using wildcards to disable drivers")
Fixes: 2e33309ebe ("config: enable/disable drivers in Arm builds")
Cc: stable@dpdk.org

Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
2021-05-07 15:41:45 +02:00
Chengwen Feng
97ca1e786b eal: fix service core list parsing
This patch adds checking for service core index validity when parsing
service corelist.

Fixes: 7dbd7a6413 ("service: add -S corelist option")
Cc: stable@dpdk.org

Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
Signed-off-by: Min Hu (Connor) <humin29@huawei.com>
Acked-by: Harry van Haaren <harry.van.haaren@intel.com>
2021-05-05 23:19:23 +02:00
Chengwen Feng
76b49dcbd2 ipc: check malloc sync reply result
This patch adds checking for mp reply result in handle_sync().

Fixes: 07dcbfe010 ("malloc: support multiprocess memory hotplug")
Cc: stable@dpdk.org

Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
Signed-off-by: Min Hu (Connor) <humin29@huawei.com>
2021-05-05 23:16:07 +02:00
David Marchand
2223b6cee9 lib: restore developer mode checks
Most of the checks on developer_mode have been accidentally dropped.
Restore them.

Fixes: 7d611e35b0 ("lib: simplify main build file")

Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
2021-05-05 22:05:21 +02:00
David Marchand
ca7036b4af vhost: fix offload flags in Rx path
The vhost library currently configures Tx offloading (PKT_TX_*) on any
packet received from a guest virtio device which asks for some offloading.

This is problematic, as Tx offloading is something that the application
must ask for: the application needs to configure devices
to support every used offloads (ip, tcp checksumming, tso..), and the
various l2/l3/l4 lengths must be set following any processing that
happened in the application itself.

On the other hand, the received packets are not marked wrt current
packet l3/l4 checksumming info.

Copy virtio rx processing to fix those offload flags with some
differences:
- accept VIRTIO_NET_HDR_GSO_ECN and VIRTIO_NET_HDR_GSO_UDP,
- ignore anything but the VIRTIO_NET_HDR_F_NEEDS_CSUM flag (to comply with
  the virtio spec),

Some applications might rely on the current behavior, so it is left
untouched by default.
A new RTE_VHOST_USER_NET_COMPLIANT_OL_FLAGS flag is added to enable the
new behavior.

The vhost example has been updated for the new behavior: TSO is applied to
any packet marked LRO.

Fixes: 859b480d5a ("vhost: add guest offload setting")
Cc: stable@dpdk.org

Signed-off-by: David Marchand <david.marchand@redhat.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2021-05-04 10:22:17 +02:00
Cheng Jiang
eb36520444 vhost: add batch datapath for async packed ring
Add batch datapath for async vhost packed ring to improve the
performance of small packet processing.

Signed-off-by: Cheng Jiang <cheng1.jiang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2021-05-04 10:21:59 +02:00
Cheng Jiang
873e8dad6f vhost: support packed ring in async datapath
For now async vhost data path only supports split ring. This patch
enables packed ring in async vhost data path to make async vhost
compatible with virtio 1.1 spec.

Signed-off-by: Cheng Jiang <cheng1.jiang@intel.com>
Reviewed-by: Jiayu Hu <jiayu.hu@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2021-05-04 10:21:52 +02:00
Cheng Jiang
3d6cb86b0d vhost: refactor async split ring functions
This patch moves some code of async vhost split ring into
inline functions to improve the readability. Also, it
changes the pointer index style of iterator to make the
code more concise.

Signed-off-by: Cheng Jiang <cheng1.jiang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Jiayu Hu <jiayu.hu@intel.com>
2021-05-04 10:21:46 +02:00
Ciara Power
048960272e telemetry: fix race on callbacks list
The list_commands() function accessed the callbacks list,
but did not take the lock. This may have caused inconsistencies if
callbacks were being registered at the same time.
This is now fixed to lock before iterating the list,
and unlock afterwards.

Fixes: f38748736e ("telemetry: add default callback commands")
Cc: stable@dpdk.org

Reported-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ciara Power <ciara.power@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
2021-05-05 18:21:26 +02:00
Jerin Jacob
dbab511874 telemetry: hide internal define
Remove TELEMETRY_MAX_CALLBACKS symbol from the public
rte_telemetry.h header file.

Signed-off-by: Jerin Jacob <jerinj@marvell.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Ciara Power <ciara.power@intel.com>
2021-05-05 18:21:26 +02:00
Hemant Agrawal
a956adb281 ethdev: add missing buses in device iterator
This patch fixes issue with OVS 2.15 not working on
DPAA/FSLMC based platform due to missing support for
these busses in dev_iterate.
This patch adds dpaa_bus and fslmc to dev iterator
for bus arguments.

Fixes: 214ed1acd1 ("ethdev: add iterator to match devargs input")
Cc: stable@dpdk.org

Signed-off-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
2021-05-04 18:33:09 +02:00
Gregory Etelson
1d0b9c7d94 ethdev: fix integrity flow item
Add integrity item definition to the rte_flow_desc_item array.
The new entry allows to build RTE flow item from a data
stored in rte_flow_item_integrity type.

Fixes: b10a421a1f ("ethdev: add packet integrity check flow rules")

Signed-off-by: Gregory Etelson <getelson@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Acked-by: Ori Kam <orika@nvidia.com>
2021-05-04 17:37:22 +02:00
Balazs Nemeth
a287ac2891 vhost: allocate and free packets in bulk in Tx packed
Move allocation out further and perform all allocation in bulk. The same
goes for freeing packets. In the process, also introduce
virtio_dev_pktmbuf_prep and make virtio_dev_pktmbuf_alloc use that.

Signed-off-by: Balazs Nemeth <bnemeth@redhat.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2021-04-28 04:53:30 +02:00
Balazs Nemeth
56fa279124 vhost: remove remaining packets count
The remained variable stores the same information as the difference
between count and pkt_idx. Remove the remained variable to simplify.

Signed-off-by: Balazs Nemeth <bnemeth@redhat.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2021-04-28 04:49:09 +02:00
Balazs Nemeth
d18db8049c vhost: read last used index once
Instead of calculating the address of a packed descriptor based on the
vq->desc_packed and vq->last_used_idx every time, store that base
address in desc_base. On arm, this saves 176 bytes in code size of
function in which vhost_flush_enqueue_batch_packed gets inlined.

Signed-off-by: Balazs Nemeth <bnemeth@redhat.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2021-04-28 04:21:37 +02:00
Jiayu Hu
c94c9f3152 vhost: fix redundant vring status change notification
When VHOST_USER_F_PROTOCOL_FEATURES is not negotiated,
there is no need for vhost_user_set_vring_kick() to
notify the application of vring enabled, as
vhost_user_msg_handler() also notifies the application.

This patch is to remove unnecessary vring_state_changed() call.

Fixes: d0fcc38f5f ("vhost: improve device readiness notifications")
Cc: stable@dpdk.org

Signed-off-by: Jiayu Hu <jiayu.hu@intel.com>
Tested-by: Yinan Wang <yinan.wang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2021-04-28 03:52:44 +02:00
Jiayu Hu
0090535f7a vhost: remove unnecessary free
This patch removes unnecessary rte_free() for async_pkts_info
and async_descs_split.

Signed-off-by: Jiayu Hu <jiayu.hu@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Tested-by: Yinan Wang <yinan.wang@intel.com>
2021-04-28 03:24:04 +02:00
Jiayu Hu
678a91efa2 vhost: fix queue initialization
This patch allocates vhost queue by rte_zmalloc() to avoid
undefined values.

Fixes: a277c71598 ("vhost: refactor code structure")
Cc: stable@dpdk.org

Signed-off-by: Jiayu Hu <jiayu.hu@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Tested-by: Yinan Wang <yinan.wang@intel.com>
2021-04-28 03:24:04 +02:00
Anatoly Burakov
7f15f0fbed power: save original ACPI governor always
Currently, when we set the acpi governor to "userspace", we check if
it is already set to this value, and if it is, we skip setting it.

However, we never save this value anywhere, so that next time we come
back and request the governor to be set to its original value, the
original value is empty.

Fix it by saving the original pstate governor first. While we're at it,
replace `strlcpy` with `rte_strscpy`.

Fixes: 445c6528b5 ("power: common interface for guest and host")
Cc: stable@dpdk.org

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Reshma Pattan <reshma.pattan@intel.com>
2021-05-05 12:29:12 +02:00
Hongbo Zheng
cdcee2ec9b bpf: fix JSLT validation
In function 'eval_jcc', judgment 'op == EBPF_JLT' occurs
twice, as a result, the corresponding second statement
cannot be accessed.

This patch fix this problem.

Fixes: 8021917293 ("bpf: add extra validation for input BPF program")
Cc: stable@dpdk.org

Signed-off-by: Hongbo Zheng <zhenghongbo3@huawei.com>
Signed-off-by: Min Hu (Connor) <humin29@huawei.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2021-05-05 12:22:30 +02:00
Konstantin Ananyev
8e2dd74f0a acl: fix build with GCC 11
gcc 11 with '-O2' complains about some variables being used without
being initialized:

In function ‘start_flow_avx512x8’,
    inlined from ‘search_trie_avx512x8.constprop’ at acl_run_avx512_common.h:317:
lib/librte_acl/acl_run_avx512_common.h:210:13: warning:
    ‘pdata’ is used uninitialized [-Wuninitialized]
In function ‘search_trie_avx512x8.constprop’:
lib/librte_acl/acl_run_avx512_common.h:314:32: note: ‘pdata’ declared here
...

Indeed, these variables are not explicitly initialized,
but this is done intentionally.
We rely on constant mask value that we pass to start_flow*() functions
as a parameter to mask out uninitialized values.
Note that '-O3' doesn't produce this warning.
Anyway, to support clean build with gcc-11 this patch adds
explicit initialization for these variables.
I checked the output binary: with '-O3' both clang and gcc 10/11
generate no extra code for it.
Also performance test didn't reveal any regressions.

Bugzilla ID: 673
Fixes: b64c2295f7 ("acl: add 256-bit AVX512 classify method")
Fixes: 45da22e42e ("acl: add 512-bit AVX512 classify method")
Cc: stable@dpdk.org

Reported-by: Ali Alnubani <alialnu@nvidia.com>
Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2021-05-05 12:10:15 +02:00
Chengwen Feng
f6681ab76b eventdev: fix memory leakage on thread creation failure
This patch fixes the issue that epoll_events memory is not released
after the intr thread created fail.

Fixes: 3810ae4357 ("eventdev: add interrupt driven queues to Rx adapter")
Cc: stable@dpdk.org

Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
Signed-off-by: Min Hu (Connor) <humin29@huawei.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
2021-05-03 11:46:26 +02:00
Chengwen Feng
0bac9fc791 eventdev: remove redundant thread name setting
The thread name already set by rte_ctrl_thread_create() API, so remove
the call of rte_thread_setname() API.

Fixes: 3810ae4357 ("eventdev: add interrupt driven queues to Rx adapter")
Cc: stable@dpdk.org

Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
Signed-off-by: Min Hu (Connor) <humin29@huawei.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
2021-05-03 11:46:26 +02:00
Olivier Matz
bd6113858f mbuf: clarify usage of packet pool initializers
Clarify that the mempool private initializer and object initializer used
for packet pools require that the mempool private size is large enough.

Also add an assert (only enabled when -DRTE_ENABLE_ASSERT is passed) to
check this constraint.

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Aaron Conole <aconole@redhat.com>
2021-05-04 22:41:32 +02:00
Chengwen Feng
d4902ed31c mbuf: check shared memory before dumping dynamic space
Because mbuf dyn shared memory was allocated runtime, so it's
necessary to check validity when dump mbuf dyn info.

Also this patch adds an error logging when init shared memory fail.

Fixes: 4958ca3a44 ("mbuf: support dynamic fields and flags")
Cc: stable@dpdk.org

Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
Signed-off-by: Min Hu (Connor) <humin29@huawei.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
2021-05-04 19:49:44 +02:00
Tal Shnaiderman
2231388f21 eal/windows: fix MinGW build
the strncasecmp macro defined in rte_os_shim.h is already
defined in MinGW-w64, as a result the compiler prints out
the warning below on function redefinition whenever compiling
a file including the header in debug mode.

lib/eal/windows/include/rte_os_shim.h:21:
warning: "strncasecmp" redefined

Fixed by defining the macro only to the clang compiler.

Fixes: 45d62067c2 ("eal: make OS shims internal")

Signed-off-by: Tal Shnaiderman <talshn@nvidia.com>
Acked-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
2021-05-04 19:32:28 +02:00
Juraj Linkeš
20c7744f8d eal/arm64: fix platform register bit
REG_PLATFORM only uses bit 0 to indicate whether the value retrieved
from hardware matches PLATFORM_STR.

Fixes: 97523f822b ("eal/arm: add CPU flags for ARMv8")
Cc: stable@dpdk.org

Signed-off-by: Juraj Linkeš <juraj.linkes@pantheon.tech>
Reviewed-by: Jerin Jacob <jerinj@marvell.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
2021-05-04 18:55:09 +02:00
Bruce Richardson
7d5cfaa750 build: fix formatting of Meson lists
Running "./devtools/check-meson.py --fix" on the DPDK repo fixes a
number of issues with whitespace and formatting of files:

* indentation of lists
* missing trailing commas on final list element
* multiple list entries per line when list is not all single-line

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
2021-05-04 15:01:47 +02:00
Joyce Kong
cee151b41b mempool: distinguish cache and pool debug counters
If cache is enabled, objects will be retrieved/put from/to cache,
subsequently from/to the common pool. Now the debug stats calculate
the objects retrieved/put from/to cache and pool together, it is
better to distinguish them.

Signed-off-by: Joyce Kong <joyce.kong@arm.com>
Signed-off-by: Dharmik Thakkar <dharmik.thakkar@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
2021-05-04 09:44:55 +02:00
Dharmik Thakkar
5648704065 mempool: make stats macro generic
Make __MEMPOOL_STAT_ADD macro more generic and delete
__MEMPOOL_CONTIG_BLOCKS_STAT_ADD macro.

Suggested-by: Olivier Matz <olivier.matz@6wind.com>
Signed-off-by: Dharmik Thakkar <dharmik.thakkar@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
2021-05-04 09:34:28 +02:00
Stanislaw Kardach
1abb185d6c stack: allow lock-free only on relevant architectures
Since commit 7911ba0473 ("stack: enable lock-free implementation for
aarch64"), lock-free stack is supported on arm64 but this description was
missing from the doxygen for the flag.

Currently it is impossible to detect programmatically whether lock-free
implementation of rte_stack is supported. One could check whether the
header guard for lock-free stubs is defined (_RTE_STACK_LF_STUBS_H_) but
that's an unstable implementation detail. Because of that currently all
lock-free ring creations silently succeed (as long as the stack header
is 16B long) which later leads to push and pop operations being NOPs.
The observable effect is that stack_lf_autotest fails on platforms not
supporting the lock-free. Instead it should just skip the lock-free test
altogether.

This commit adds a new errno value (ENOTSUP) that may be returned by
rte_stack_create() to indicate that a given combination of flags is not
supported on a current platform.
This is detected by checking a compile-time flag in the include logic in
rte_stack_lf.h which may be used by applications to check the lock-free
support at compile time.

Use the added RTE_STACK_LF_SUPPORTED flag to disable the lock-free stack
tests at the compile time.
Perf test doesn't fail because rte_ring_create() succeeds, however
marking this test as skipped gives a better indication of what actually
was tested.

Fixes: 7911ba0473 ("stack: enable lock-free implementation for aarch64")

Signed-off-by: Stanislaw Kardach <kda@semihalf.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
2021-05-03 18:46:15 +02:00
David Marchand
7ddb625c3d mbuf: mark offload flag as deprecated
PKT_RX_EIP_CKSUM_BAD has been declared deprecated but there was no
warning to applications still using it.
Fix this by marking as deprecated with the newly introduced
RTE_DEPRECATED.

Fixes: e8a419d6de ("mbuf: rename outer IP checksum macro")

Signed-off-by: David Marchand <david.marchand@redhat.com>
Reviewed-by: Flavio Leitner <fbl@sysclose.org>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Lance Richardson <lance.richardson@broadcom.com>
Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2021-05-03 16:12:49 +02:00
Dmitry Kozlyuk
1ee899977d eal: add timespec_get shim
C11 timespec_get() is not provided on some platforms:

* MinGW-w64 does not currently implement it [1].
* FreeBSD 11 with Clang 10.0.0 does not provide it.

Add internal shims to Windows and FreeBSD EALs.
For Windows, it can be removed after [1] is fixed.

[1]: https://sourceforge.net/p/mingw-w64/mailman/message/37224689/

Signed-off-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
Acked-by: Jie Zhou <jizh@linux.microsoft.com>
Acked-by: Nick Connolly <nick.connolly@mayadata.io>
2021-04-21 23:32:13 +02:00
Min Hu (Connor)
53ef1b3477 ethdev: add sanity checks in control APIs
This patch adds more sanity checks in control path APIs.

Fixes: 214ed1acd1 ("ethdev: add iterator to match devargs input")
Fixes: 3d98f921fb ("ethdev: unify prefix for static functions and variables")
Fixes: 0366137722 ("ethdev: check for invalid device name")
Fixes: d948f596fe ("ethdev: fix port data mismatched in multiple process model")
Fixes: 5b7ba31148 ("ethdev: add port ownership")
Fixes: f8244c6399 ("ethdev: increase port id range")
Cc: stable@dpdk.org

Signed-off-by: Min Hu (Connor) <humin29@huawei.com>
Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Reviewed-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
2021-04-21 18:36:56 +02:00
Li Zhang
5f0d54f372 ethdev: add pre-defined meter policy API
Currently, the flow meter policy does not support multiple actions
per color; also the allowed action types per color are very limited.
In addition, the policy cannot be pre-defined.

Due to the growing in flow actions offload abilities there is a potential
for the user to use variety of actions per color differently.
This new meter policy API comes to allow this potential in the most ethdev
common way using rte_flow action definition.
A list of rte_flow actions will be provided by the user per color
in order to create a meter policy.
In addition, the API forces to pre-define the policy before
the meters creation in order to allow sharing of single policy
with multiple meters efficiently.

meter_policy_id is added into struct rte_mtr_params.
So that it can get the policy during the meters creation.

Allow coloring the packet using a new rte_flow_action_color
as could be done by the old policy API.

Add two common policy template as macros in the head file.

The next API function were added:
- rte_mtr_meter_policy_add
- rte_mtr_meter_policy_delete
- rte_mtr_meter_policy_update
- rte_mtr_meter_policy_validate
The next struct was changed:
- rte_mtr_params
- rte_mtr_capabilities
The next API was deleted:
- rte_mtr_policer_actions_update

To support this API the following app were changed:
app/test-flow-perf: clean meter policer
app/testpmd: clean meter policer

To support this API the following drivers were changed:
net/softnic: support meter policy API
1. Cleans meter rte_mtr_policer_action.
2. Supports policy API to get color action as policer action did.
   The color action will be mapped into rte_table_action_policer.

net/mlx5: clean meter creation management
Cleans and breaks part of the current meter management
in order to allow better design with policy API.

Signed-off-by: Li Zhang <lizh@nvidia.com>
Signed-off-by: Haifei Luo <haifeil@nvidia.com>
Signed-off-by: Jiawei Wang <jiaweiw@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Acked-by: Ray Kinsella <mdr@ashroe.eu>
Acked-by: Ori Kam <orika@nvidia.com>
Acked-by: Jasvinder Singh <jasvinder.singh@intel.com>
Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
2021-04-21 12:22:17 +02:00
Bing Zhao
9847fd125d ethdev: introduce conntrack flow action and item
This commit introduces the conntrack action and item.

Usually the HW offloading is stateless. For some stateful offloading
like a TCP connection, HW module will help provide the ability of a
full offloading w/o SW participation after the connection was
established.

The basic usage is that in the first flow rule the application should
add the conntrack action and jump to the next flow table. In the
following flow rule(s) of the next table, the application should use
the conntrack item to match on the result.

A TCP connection has two directions traffic. To set a conntrack
action context correctly, the information of packets from both
directions are required.

The conntrack action should be created on one ethdev port and supply
the peer ethdev port as a parameter to the action. After context
created, it could only be used between these two ethdev ports
(dual-port mode) or a single port. The application should modify the
action via the API "rte_action_handle_update" only when before using
it to create a flow rule with conntrack for the opposite direction.
This will help the driver to recognize the direction of the flow to
be created, especially in the single-port mode, in which case the
traffic from both directions will go through the same ethdev port
if the application works as an "forwarding engine" but not an end
point. There is no need to call the update interface if the
subsequent flow rules have nothing to be changed.

Query will be supported via "rte_action_handle_query" interface,
about the current packets information and connection status. The
fields query capabilities depends on the HW.

For the packets received during the conntrack setup, it is suggested
to re-inject the packets in order to make sure the conntrack module
works correctly without missing any packet. Only the valid packets
should pass the conntrack, packets with invalid TCP information,
like out of window, or with invalid header, like malformed, should
not pass.

Naming and definition:
https://elixir.bootlin.com/linux/latest/source/include/uapi/linux/
        netfilter/nf_conntrack_tcp.h
https://elixir.bootlin.com/linux/latest/source/net/netfilter/
        nf_conntrack_proto_tcp.c

Other reference:
https://www.usenix.org/legacy/events/sec01/invitedtalks/rooij.pdf

Signed-off-by: Bing Zhao <bingz@nvidia.com>
Acked-by: Ori Kam <orika@nvidia.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
2021-04-20 01:24:57 +02:00
Ori Kam
b10a421a1f ethdev: add packet integrity check flow rules
Currently, DPDK application can offload the checksum check,
and report it in the mbuf.

However, as more and more applications are offloading some or all
logic and action to the HW, there is a need to check the packet
integrity so the right decision can be taken.

The application logic can be positive meaning if the packet is
valid jump / do  actions, or negative if packet is not valid
jump to SW / do actions (like drop) and add default flow
(match all in low priority) that will direct the miss packet
to the miss path.

Since currently rte_flow works in positive way the assumption is
that the positive way will be the common way in this case also.

When thinking what is the best API to implement such feature,
we need to consider the following (in no specific order):
1. API breakage.
2. Simplicity.
3. Performance.
4. HW capabilities.
5. rte_flow limitation.
6. Flexibility.

First option: Add integrity flags to each of the items.
For example add checksum_ok to IPv4 item.

Pros:
1. No new rte_flow item.
2. Simple in the way that on each item the app can see
what checks are available.

Cons:
1. API breakage.
2. Increase number of flows, since app can't add global rule and must
   have dedicated flow for each of the flow combinations, for example
   matching on ICMP traffic or UDP/TCP  traffic with IPv4 / IPv6 will
   result in 5 flows.

Second option: dedicated item

Pros:
1. No API breakage, and there will be no for some time due to having
   extra space. (by using bits)
2. Just one flow to support the ICMP or UDP/TCP traffic with IPv4 /
   IPv6.
3. Simplicity application can just look at one place to see all possible
   checks.
4. Allow future support for more tests.

Cons:
1. New item, that holds number of fields from different items.

For starter the following bits are suggested:
1. packet_ok - means that all HW checks depending on packet layer have
   passed. This may mean that in some HW such flow should be split to
   number of flows or fail.
2. l2_ok - all check for layer 2 have passed.
3. l3_ok - all check for layer 3 have passed. If packet doesn't have
   L3 layer this check should fail.
4. l4_ok - all check for layer 4 have passed. If packet doesn't
   have L4 layer this check should fail.
5. l2_crc_ok - the layer 2 CRC is O.K.
6. ipv4_csum_ok - IPv4 checksum is O.K. It is possible that the
   IPv4 checksum will be O.K. but the l3_ok will be 0. It is not
   possible that checksum will be 0 and the l3_ok will be 1.
7. l4_csum_ok - layer 4 checksum is O.K.
8. l3_len_OK - check that the reported layer 3 length is smaller than the
   frame length.

Example of usage:
1. Check packets from all possible layers for integrity.
   flow create integrity spec packet_ok = 1 mask packet_ok = 1 .....

2. Check only packet with layer 4 (UDP / TCP)
   flow create integrity spec l3_ok = 1, l4_ok = 1 mask l3_ok = 1
   l4_ok = 1

Signed-off-by: Ori Kam <orika@nvidia.com>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
2021-04-19 19:05:17 +02:00
Bing Zhao
4b61b8774b ethdev: introduce indirect flow action
Right now, rte_flow_shared_action_* APIs are used for some shared
actions, like RSS, count. The shared action should be created before
using it inside a flow. These shared actions sometimes are not
really shared but just some indirect actions decoupled from a flow.

The new functions rte_flow_action_handle_* are added to replace
the current shared functions rte_flow_shared_action_*.

There are two types of flow actions:
1. the direct (normal) actions that could be created and stored
   within a flow rule. Such action is tied to its flow rule and
   cannot be reused.
2. the indirect action, in the past, named shared_action. It is
   created from a direct actioni, like count or rss, and then used
   in the flow rules with an object handle. The PMD will take care
   of the retrieve from indirect action to the direct action
   when it is referenced.

The indirect action is accessed (update / query) w/o any flow rule,
just via the action object handle. For example, when querying or
resetting a counter, it could be done out of any flow using this
counter, but only the handle of the counter action object is
required.
The indirect action object could be shared by different flows or
used by a single flow, depending on the direct action type and
the real-life requirements.
The handle of an indirect action object is opaque and defined in
each driver and possibly different per direct action type.

The old name "shared" is improper in a sense and should be replaced.

Since the APIs are changed from "rte_flow_shared_action*" to the new
"rte_flow_action_handle*", the testpmd application code and command
line interfaces also need to be updated to do the adaption.
The testpmd application user guide is also updated. All the "shared
action" related parts are replaced with "indirect action" to have a
correct explanation.

The parameter of "update" interface is also changed. A general
pointer will replace the rte_flow_action struct pointer due to the
facts:
1. Some action may not support fields updating. In the example of a
   counter, the only "update" supported should be the reset. So
   passing a rte_flow_action struct pointer is meaningless and
   there is even no such corresponding action struct. What's more,
   if more than one operations should be supported, for some other
   action, such pointer parameter may not meet the need.
2. Some action may need conditional or partial update, the current
   parameter will not provide the ability to indicate which part(s)
   to update.
   For different types of indirect action objects, the pointer could
   either be the same of rte_flow_action* struct - in order not to
   break the current driver implementation, or some wrapper
   structures with bits as masks to indicate which part to be
   updated, depending on real needs of the corresponding direct
   action. For different direct actions, the structures of indirect
   action objects updating will be different.

All the underlayer PMD callbacks will be moved to these new APIs.

The RTE_FLOW_ACTION_TYPE_SHARED is kept for now in order not to
break the ABI. All the implementations are changed by using
RTE_FLOW_ACTION_TYPE_INDIRECT.

Since the APIs are changed from "rte_flow_shared_action*" to the new
"rte_flow_action_handle*" and the "update" interface's 3rd input
parameter is changed to generic pointer, the mlx5 PMD that uses these
APIs needs to do the adaption to the new APIs as well.

Signed-off-by: Bing Zhao <bingz@nvidia.com>
Acked-by: Andrey Vesnovaty <andreyv@nvidia.com>
Acked-by: Ori Kam <orika@nvidia.com>
Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
2021-04-19 18:25:42 +02:00
Lijun Ou
9ad9ff476c ethdev: add queue state in queried queue information
Currently, upper-layer application could get queue state only
through pointers such as dev->data->tx_queue_state[queue_id],
this is not the recommended way to access it. So this patch
add get queue state when call rte_eth_rx_queue_info_get and
rte_eth_tx_queue_info_get API.

Note: After add queue_state field, the 'struct rte_eth_rxq_info' size
remains 128B, and the 'struct rte_eth_txq_info' size remains 64B, so
it could be ABI compatible.

Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
2021-04-19 18:25:35 +02:00
Thomas Monjalon
027c931be8 telemetry: fix build on FreeBSD < 12.2
The function pthread_setname_np() was originally not available on
FreeBSD. It has been added in FreeBSD 12.2:
https://svnweb.freebsd.org/base?view=revision&revision=362264

The EAL implementation of rte_thread_setname() is duplicated
in the telemetry library, which does not depend on EAL,
so the compilation is safe in all systems.

Fixes: 5da7736f8c ("telemetry: set socket listener thread name")

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
2021-04-21 20:07:59 +02:00
Savinay Dharmappa
3a91d2d138 sched: fix traffic class oversubscription parameter
This patch fixes the traffic class oversubscription watermark
value by initialising it with computed value of maximum watermark.

Fixes: ac6fcb841b ("sched: update subport rate dynamically")
Cc: stable@dpdk.org

Signed-off-by: Savinay Dharmappa <savinay.dharmappa@intel.com>
Acked-by: Jasvinder Singh <jasvinder.singh@intel.com>
2021-04-21 16:57:18 +02:00
Pu Xu
1edf7a796d ip_frag: fix fragmenting IPv4 packet with header option
When fragmenting IPv4 packet, the data offset should be calculated through
the IHL field in IP header rather than using sizeof(struct rte_ipv4_hdr).

Fixes: 4c38e5532a ("ip_frag: refactor IPv4 fragmentation into a proper library")
Cc: stable@dpdk.org

Signed-off-by: Pu Xu <583493798@qq.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2021-04-21 16:50:46 +02:00
Chengwen Feng
c53a5f3efb telemetry: check thread creations
Add result check and message print out for thread creation after
failure.

Fixes: b80fe1805e ("telemetry: introduce backward compatibility")
Fixes: 6dd571fd07 ("telemetry: introduce new functionality")
Cc: stable@dpdk.org

Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
Signed-off-by: Min Hu (Connor) <humin29@huawei.com>
Acked-by: Ciara Power <ciara.power@intel.com>
2021-04-21 16:23:50 +02:00
Chengwen Feng
5da7736f8c telemetry: set socket listener thread name
This patch supports set init threads name which is helpful for
debugging.

Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
Signed-off-by: Min Hu (Connor) <humin29@huawei.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
2021-04-21 15:57:47 +02:00
Bruce Richardson
0bf5832222 lib: allow disabling optional libraries
Add support for the disable_libs option, to allow disabling the build of
particular libraries. As part of this, maintain a list of what libraries
can safely be disabled, without breaking the build - for now this list is
solely those libraries which are not built on FreeBSD, kni, power and
vhost. This list can be expanded by future patches.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
2021-04-21 14:17:29 +02:00
Bruce Richardson
99a2dd955f lib: remove librte_ prefix from directory names
There is no reason for the DPDK libraries to all have 'librte_' prefix on
the directory names. This prefix makes the directory names longer and also
makes it awkward to add features referring to individual libraries in the
build - should the lib names be specified with or without the prefix.
Therefore, we can just remove the library prefix and use the library's
unique name as the directory name, i.e. 'eal' rather than 'librte_eal'

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
2021-04-21 14:04:09 +02:00
Bruce Richardson
6fc406593a lib: clean up build files
Switch from using tabs to 4 spaces for meson.build indentation. Perform
other formatting cleanups such as ensure that long lists of files are one
per line, and terminating with a final comma before the closing brace to
make addition/removals easier. In some cases, reorder lists of items
where they were not in alphabetical order.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
2021-04-21 12:37:55 +02:00
Bruce Richardson
9cc02b1794 lib: tidy up build list
With the lib/meson.build file changed from C-style indentation to
python-style indentation, we need to correct the indentation of the lists
of libraries, since these libs were not modified in the previous patches.
For ease of management of the list and working with patches for adding
to the list, put each library on it's own line.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
2021-04-21 12:37:55 +02:00
Bruce Richardson
7d611e35b0 lib: simplify main build file
Two simplifications can be made to the build file which reduce indentation
levels and make it easier to read:

1. When meson build support was first added, the compat library existed in
DPDK as a single header file. Since that header has been merged into EAL,
we no longer need to support header-only libraries, so can shorten the
code.

2. From meson 0.49 onwards we have the "continue" keyword available to
break out of one loop iteration and begin the next. This allows us to
remove blocks in the build configuration file which were conditional on the
"build" variable being true. Instead we can use "continue" to abort
processing at the point where the "build" value becomes false.

Since this patch changes the indentation level of large parts of the
meson.build file, we use the opportunity to adjust the whitespace used to
the meson-standard 4-spec indentation level.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
2021-04-21 12:37:55 +02:00
Ali Alnubani
fe4b8c7bcd pipeline: fix build with GCC 4.8.5
Compilation on CentOS 7 with gcc version 4.8.5 fails with
the following errors:
error: 'src_struct_id' may be used uninitialized in this
function [-Werror=maybe-uninitialized]
error: 'dst_struct_id' may be used uninitialized in this
function [-Werror=maybe-uninitialized]

This patch fixes the build errors by initializing both variables.

Bugzilla ID: 683
Fixes: 783768136f ("pipeline: auto-detect endianness of action arguments")

Signed-off-by: Ali Alnubani <alialnu@nvidia.com>
2021-04-21 12:36:17 +02:00
Elad Nachman
6b1f8e4f9b kni: support async user request
Adding async userspace requests which don't wait for the userspace
response and always return success. This is preparation to address a
regression in KNI.

Signed-off-by: Elad Nachman <eladv6@gmail.com>
Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
2021-04-21 01:05:15 +02:00
Dmitry Kozlyuk
ff4cf5265c mem: fix cleanup after incomplete initialization
In case of EAL initialization failure rte_eal_memory_detach() may be
called before mapping memory configuration, which in this case points
to the static structure. Attempt to unmap it yields error:

    EAL: Could not unmap shared memory config: Invalid argument

Skip unmapping memory configuration if it's not yet shared.

Fixes: dfbc61a2f9 ("mem: detach memsegs on cleanup")

Reported-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
2021-04-20 23:33:03 +02:00
Vladimir Medvedkin
28ebff11c2 hash: add predictable RSS
This patch adds predictable RSS API.
It is based on the idea of searching partial Toeplitz hash collisions.

Signed-off-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Yipeng Wang <yipeng1.wang@intel.com>
Acked-by: John McNamara <john.mcnamara@intel.com>
2021-04-20 23:13:23 +02:00
Cristian Dumitrescu
783768136f pipeline: auto-detect endianness of action arguments
Each table entry is made up of match fields and action data, with the
latter made up of the action ID and the action arguments. The approach
of having the user specify explicitly the endianness of the action
arguments is difficult to be picked up by P4 compilers, as the P4
compiler is generally unaware about this aspect.

This commit introduces the auto-detection of the endianness of the
action arguments by examining the endianness of the their destination:
network byte order (NBO) when they get copied to headers and host byte
order (HBO) when they get copied to packet meta-data or mailboxes.

The endianness specification of each action argument as part of the
rule specification, e.g. H(...) and N(...) is removed from the rule
file and auto-detected based on their destination. The DMA instruction
scope is made internal, so mov instructions need to be used. The
pattern of transferring complete headers from table entry action args
to headers is detected, and the associated set of mov instructions
plus header validate is internally detected and replaced with the
internal-only DMA instruction to preserve performance.

Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
2021-04-20 21:55:43 +02:00
Cristian Dumitrescu
0f5df4ea5d pipeline: modularize SWX instruction optimizer
Decouple between the different instruction optimizer. Allow each
optimization to run as a separate iteration on the entire instruction
stream.

Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
2021-04-20 21:55:34 +02:00
Cristian Dumitrescu
48ad58964c pipeline: fix endianness conversions
The SWX pipeline instructions work with operands of different types:
header fields (h.header.field), packet meta-data (m.field), extern
object mailbox field (e.obj.field), extern function (f.field), action
data read from table entries (t.field), or immediate values; hence the
HMEFTI acronym. The H operands are stored in network byte order (NBO),
while the MEFT operands are stored in host byte order (HBO), hence the
need to operate endianness conversions.

Some of the endianness conversion macros were not working correctly
for some cases such as operands of different sizes, and they are fixed
now. Affected instructions: mov, and, or, xor, jmpeq, jmpneq.

Fixes: 7210349d5b ("pipeline: add SWX move instruction")
Fixes: 650195cf96 ("pipeline: introduce SWX and instruction")
Fixes: 8f796198dc ("pipeline: introduce SWX or instruction")
Fixes: b4e607f9fd ("pipeline: introduce SWX XOR instruction")
Fixes: b3947e25be ("pipeline: introduce SWX jump and return instructions")
Cc: stable@dpdk.org

Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
2021-04-20 02:34:16 +02:00
Cristian Dumitrescu
e8e22eb0dd pipeline: adjust error code for internal function
Adjusting the error code for the internal function instruction_config
to match the rest of the code which is returning a negative value on
error. Cosmetic change.

Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
2021-04-20 02:33:12 +02:00
Cristian Dumitrescu
6b840b7c53 pipeline: validate header on SWX emit
Enhance the behavior of the emit instruction to ignore invalid
headers, as mandated by the P4 language specification.

Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
2021-04-20 02:31:21 +02:00
Cristian Dumitrescu
742b0a57f5 pipeline: add table statistics to SWX
Add support for table statistics for the SWX pipeline. For each table,
we maintain a counter for lookup hit packets, one for lookup miss
packets and one packet counter for each table action.

Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
Signed-off-by: Yogesh Jangra <yogesh.jangra@intel.com>
2021-04-20 02:27:56 +02:00
Cristian Dumitrescu
fe16d678e7 pipeline: add drop instruction to SWX
Enabled the TX instruction to accept an immediate value for the output
port argument. The drop instruction is simply an alias to the TX
instruction for the last output port of the pipeline.

Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
2021-04-20 02:24:36 +02:00
Cristian Dumitrescu
ea5ab65f57 pipeline: relax table match field requirements
The match fields for a given table have to be part of the same header
or the metadata structure. This commit removes the requirement that
the list of match fields must observe the order of fields within their
structure. For example, the h.ipv4.dst_addr field can now be listed
before the h.ipv4.src_addr field in a table match field list, even
though within the IPv4 header the dst_addr field is present after the
src_addr field.

Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
2021-04-20 02:24:36 +02:00
Churchill Khangar
1af0e07b27 table: relax requirements for entry action data
Currently, the table entry action data is required to be NULL when the
action data size is zero. We now require that action data is ignored
when the action data size is zero. This is to allow for a table entry
instance to be allocated once with max action data size for the table
and reused repeatedly for actions of different sizes, including zero.

Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
Signed-off-by: Churchill Khangar <churchill.khangar@intel.com>
2021-04-19 20:21:18 +02:00
Cristian Dumitrescu
97005a6665 table: fix out of bounds write
Fix out of bounds write. The allocated string size was incorrect.

Coverity issue: 369670
Fixes: 66440b7b22 ("table: add wildcard match table type")

Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
2021-04-19 19:49:09 +02:00
Cristian Dumitrescu
ae650ff9ba port: fix allocation check in ring SWX
Fix logically dead code in ring port.

Coverity issue: 369664
Fixes: 77a413017c ("port: add ring SWX port")

Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
2021-04-19 19:33:46 +02:00
Xueming Li
e4b5b2ba9c devargs: fix list entry update
When inserting devargs that is already in list,
existing one was reset and replaced completely by new one,
the entry info was lost during copy.

This patch backups entry info before copy.

Fixes: 64051bb1f1 ("devargs: unify scratch buffer storage")

Reported-by: Jim Harris <james.r.harris@intel.com>
Signed-off-by: Xueming Li <xuemingl@nvidia.com>
2021-04-19 18:13:02 +02:00
Yunjian Wang
22677b0eef vfio: fix duplicated user mem map
Currently, new user mem maps are checked if they are adjacent to
an existing mem map and if so, the mem map entries are merged.

It didn't check for duplicate mem maps, so if the API is called
with the same mem map multiple times, they will occupy multiple
mem map entries. This will reduce the amount of entries available
for unique mem maps.

So check for duplicate mem maps and merge them into one mem map
entry if any found.

Fixes: 0cbce3a167 ("vfio: skip DMA map failure if already mapped")
Cc: stable@dpdk.org

Suggested-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Yunjian Wang <wangyunjian@huawei.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
2021-04-19 11:57:46 +02:00
Tyler Retzlaff
1cd3ce0953 power: replace unsigned -1 with unsigned maximum
Use UINT64_MAX instead of -1ULL.

Some compilers generate a warning when applying a '-' to
an unsigned literal so avoid this by initializing with
unsigned preprocessor definition.

Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
Acked-by: Morten Brørup <mb@smartsharesystems.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
2021-04-19 11:36:12 +02:00
Tyler Retzlaff
5535bbadcb eal: replace unsigned -1 with unsigned maximums
Use UINT64_MAX and UINT32_MAX instead of -1 or ~0 literal variations
of different explicit widths when creating masks and sentinel values.

Some compilers generate a warning when applying a '-' to an unsigned
literal so avoid this by initializing with unsigned preprocessor
definitions where appropriate.

Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
Acked-by: Morten Brørup <mb@smartsharesystems.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
2021-04-19 11:29:45 +02:00
Tyler Retzlaff
e8eb80e8bf eal: check vsnprintf failure in devargs parsing
Check for failure, while here just increment len once after checking for
failure instead of duplicating len + 1 math in two different argument
lists.

Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
2021-04-19 11:18:07 +02:00
Shijith Thotton
d69123d266 eventdev: fix case to initiate crypto adapter service
Initiate software crypto adapter service, only if hardware capabilities
are not reported. In OP_FORWARD mode, software service is not required
to enqueue events if OP_FORWARD capability is supported by the PMD.

Fixes: 7901eac340 ("eventdev: add crypto adapter implementation")
Cc: stable@dpdk.org

Signed-off-by: Shijith Thotton <sthotton@marvell.com>
Acked-by: Abhinandan Gujjar <abhinandan.gujjar@intel.com>
2021-04-17 19:22:41 +02:00
Akhil Goyal
f96a8ebb27 eventdev: introduce crypto adapter enqueue API
In case an event from a previous stage is required to be forwarded
to a crypto adapter and PMD supports internal event port in crypto
adapter, exposed via capability
RTE_EVENT_CRYPTO_ADAPTER_CAP_INTERNAL_PORT_OP_FWD, we do not have
a way to check in the API rte_event_enqueue_burst(), whether it is
for crypto adapter or for eth tx adapter.

Hence we need a new API similar to rte_event_eth_tx_adapter_enqueue(),
which can send to a crypto adapter.

Note that RTE_EVENT_TYPE_* cannot be used to make that decision,
as it is meant for event source and not event destination.
And event port designated for crypto adapter is designed to be used
for OP_NEW mode.

Hence, in order to support an event PMD which has an internal event port
in crypto adapter (RTE_EVENT_CRYPTO_ADAPTER_OP_FORWARD mode), exposed
via capability RTE_EVENT_CRYPTO_ADAPTER_CAP_INTERNAL_PORT_OP_FWD,
application should use rte_event_crypto_adapter_enqueue() API to enqueue
events.

When internal port is not available(RTE_EVENT_CRYPTO_ADAPTER_OP_NEW mode),
application can use API rte_event_enqueue_burst() as it was doing earlier,
i.e. retrieve event port used by crypto adapter and bind its event queues
to that port and enqueue events using the API rte_event_enqueue_burst().

Signed-off-by: Akhil Goyal <gakhil@marvell.com>
Acked-by: Abhinandan Gujjar <abhinandan.gujjar@intel.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
2021-04-17 18:49:52 +02:00
Raslan Darawsheh
7d96f5717a ethdev: update flow item GTP QFI definition
'qfi' field is 8 bits which represent single bit for
PPP (paging Policy Presence) single bit for RQI
(Reflective QoS Indicator) and 6 bits for QFI
(QoS Flow Identifier)
This is based on RFC 38415-g30
https://www.3gpp.org/ftp/Specs/archive/38_series/38.415/38415-g30.zip

Updated the doxygen comment and the mask for 'qfi'
to properly identify the full 8 bits of the field.

note: changing the default mask would cause different
patterns generated by testpmd.

Fixes: 346553db5b ("ethdev: add GTP extension header to flow API")
Cc: stable@dpdk.org

Signed-off-by: Raslan Darawsheh <rasland@nvidia.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
2021-04-14 14:17:41 +02:00
Haifei Luo
50c383793b ethdev: dump single flow rule
Previous implementations support dump all the flows. Add new arg
rte_flow in rte_flow_dev_dump to dump one flow.

Signed-off-by: Haifei Luo <haifeil@nvidia.com>
Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Acked-by: Ori Kam <orika@nvidia.com>
2021-04-14 13:19:55 +02:00
Li Zhang
74c8ec894d ethdev: add packet mode in meter profile structure
Currently meter algorithms only supports rate is bytes per second (BPS).
Add packet_mode flag in meter profile parameters data structure.
So that it can meter traffic by packet per second.

When packet_mode is 0, the profile rates and bucket sizes are
specified in bytes per second and bytes
when packet_mode is not 0, the profile rates and bucket sizes are
specified in packets and packets per second.

The below structure will be extended:
rte_mtr_meter_profile
rte_mtr_capabilities

Signed-off-by: Li Zhang <lizh@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
2021-04-13 18:40:58 +02:00
Gregory Etelson
f8e7132572 ethdev: fix VXLAN mask initialization
In GCC compiler, __builtin_constant_p(exp) is a function.
The function returns the integer 1 if the argument is known to be
a compile-time constant.
Therefore, __builtin_constant_p(0xffffff << 8) returned 1.
As the result, rte_flow_item_vxlan_mask was initiated to
{{
  {flags = 0x0, rsvd0 = {0x0, 0x0, 0x0},
   vni = {0x0, 0x0, 0x0}, rsvd1 = 0x1},
  hdr = {vx_flags = 0x0, vx_vni = 0x1000000}}}
}}
GCC fails initialization
rte_flow_item_vxlan_mask.hdr.vni = (0xffffff << 8)
with "initializer element is not a constant expression" error.
Use immediate 0xffffff00 value instead.

Fixes: 43af98e687 ("ethdev: reuse VXLAN header definition in flow item")

Signed-off-by: Gregory Etelson <getelson@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
Reviewed-by: Ivan Malov <ivan.malov@oktetlabs.ru>
2021-04-09 11:27:40 +02:00
Matan Azrad
07b0b75370 cryptodev: formalize key wrap method in API
The Key Wrap approach is used by applications in order to protect keys
located in untrusted storage or transmitted over untrusted
communications networks. The constructions are typically built from
standard primitives such as block ciphers and cryptographic hash
functions.

The Key Wrap method and its parameters are a secret between the keys
provider and the device, means that the device is preconfigured for
this method using very secured way.

The key wrap method may change the key length and layout.

Add a description for the cipher transformation key to allow wrapped key
to be forwarded by the same API.

Add a new feature flag RTE_CRYPTODEV_FF_CIPHER_WRAPPED_KEY to be enabled
by PMDs support wrapped key in cipher trasformation.

Signed-off-by: Matan Azrad <matan@nvidia.com>
Acked-by: Akhil Goyal <gakhil@marvell.com>
2021-04-16 12:43:33 +02:00
Fan Zhang
c21574edc5 cryptodev: add dequeue count parameter in raw API
This patch changes the experimental raw data path dequeue burst API.
Originally the API enforces the user to provide callback function
to get maximum dequeue count. This change gives the user one more
option to pass directly the expected dequeue count.

Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com>
Acked-by: Akhil Goyal <gakhil@marvell.com>
2021-04-16 12:43:33 +02:00
Matan Azrad
d014dddb2d cryptodev: support multiple cipher data-units
In cryptography, a block cipher is a deterministic algorithm operating
on fixed-length groups of bits, called blocks.

A block cipher consists of two paired algorithms, one for encryption
and the other for decryption. Both algorithms accept two inputs:
an input block of size n bits and a key of size k bits; and both yield
an n-bit output block. The decryption algorithm is defined to be the
inverse function of the encryption.

For AES standard the block size is 16 bytes.
For AES in XTS mode, the data to be encrypted\decrypted does not have to
be multiple of 16B size, the unit of data is called data-unit.
The data-unit size can be any size in range [16B, 2^24B], so, in this
case, a data stream is divided into N amount of equal data-units and
must be encrypted\decrypted in the same data-unit resolution.

For ABI compatibility reason, the size is limited to 64K (16-bit field).
The new field dataunit_len is inserted in a struct padding hole,
which is only 2 bytes long in 32-bit build.
It could be moved and extended later during an ABI-breakage window.

The current cryptodev API doesn't allow the user to select a specific
data-unit length supported by the devices.
In addition, there is no definition how the IV is detected per data-unit
when single operation includes more than one data-unit.

That causes applications to use single operation per data-unit even though
all the data is continuous in memory what reduces datapath performance.

Add a new feature flag to support multiple data-unit sizes, called
RTE_CRYPTODEV_FF_CIPHER_MULTIPLE_DATA_UNITS.
Add a new field in cipher capability, called dataunit_set,
where the devices can report the range of the supported data-unit sizes.
Add a new cipher transformation field, called dataunit_len, where the user
can select the data-unit length for all the operations.

All the new fields do not change the size of their structures,
by filling some struct padding holes.
They are added as exceptions in the ABI check file libabigail.abignore.

Using a bitmap to report the supported data-unit sizes capability allows
the devices to report a range simply as same as the user to read it
simply. also, thus sizes are usually common and probably will be shared
among different devices.

Signed-off-by: Matan Azrad <matan@nvidia.com>
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Akhil Goyal <gakhil@marvell.com>
2021-04-16 12:43:33 +02:00
Nicolas Chautru
48fc315f02 bbdev: add explicit enum for code block mode
Using explicit enum instead of ambiguous integer value

Signed-off-by: Nicolas Chautru <nicolas.chautru@intel.com>
Reviewed-by: Tom Rix <trix@redhat.com>
2021-04-16 12:43:33 +02:00
Anatoly Burakov
2414ce9b7b power: fix closing frequency file
Currently, we open the system base frequency file, but never close it,
which results in a memory leak.

Coverity issue: 369693
Fixes: 8a5febaac4 ("power: fix P-state base frequency handling")

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Reshma Pattan <reshma.pattan@intel.com>
2021-04-15 23:53:39 +02:00
Anatoly Burakov
64f22b91c6 power: remove redundant close of frequency file
Previous fix has addressed the incorrect handling of `base_frequency`
file, but has added a use-after-free error due to the fact that all
further code paths will lead to an `fclose()` call at the end, so the
additional `fclose()` call right after processing the file was
unnecessary.

Coverity issue: 369901
Fixes: 8a5febaac4 ("power: fix P-state base frequency handling")

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Reviewed-by: Liang Ma <liangma@liangbit.com>
Acked-by: David Hunt <david.hunt@intel.com>
2021-04-15 23:53:33 +02:00
Tyler Retzlaff
a6f88f7eb9 eal: add C++ include guard for reciprocal header
Add missing extern "C" linkage for rte_reciprocal.h consistent with
other eal headers.

Fixes: ffe3ec811e ("sched: introduce reciprocal divide")
Cc: stable@dpdk.org

Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
Acked-by: David Marchand <david.marchand@redhat.com>
2021-04-15 16:44:18 +02:00
Dmitry Kozlyuk
89813a522e net: provide IP-related API on any OS
Users of <rte_ip.h> relied on it to provide IP-related defines,
like IPPROTO_* constants, but still had to include POSIX headers
for inet_pton() and other standard IP-related facilities.

Extend <rte_ip.h> so that it is a single header to gain access
to IP-related facilities on any OS. Use it to replace POSIX includes
in components enabled on Windows. Move missing constants from Windows
networking shim to OS shim header and include it where needed.

Remove Windows networking shim that is no longer needed.

Signed-off-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Ranjit Menon <ranjit.menon@intel.com>
2021-04-15 01:56:43 +02:00
Dmitry Kozlyuk
6c068dbd9f net: work around s_addr macro on Windows
Windows Sockets headers contain `#define s_addr S_un.S_addr`, which
conflicts with definition of `s_addr` field of `struct rte_ether_hdr`.
Prieviously `s_addr` was undefined in <rte_ether.h>, which had been
breaking access to `s_addr` field of `struct in_addr`, so some DPDK
and Windows headers could not be included in one file.

Renaming of `struct rte_ether_hdr` is planned:
https://mails.dpdk.org/archives/dev/2021-March/201444.html

Temporarily disable `s_addr` macro around `struct rte_ether_hdr`
definition to avoid conflict. Place source MAC address in both `s_addr`
and `S_un.S_addr` fields, so that access works either directly or
through the macro as defined in Windows headers.

Signed-off-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
Acked-by: Ranjit Menon <ranjit.menon@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
2021-04-15 01:56:40 +02:00
Dmitry Kozlyuk
45d62067c2 eal: make OS shims internal
DPDK code often relies on functions and macros that are not standard C,
but are found on all platforms, even if by slightly different names.
Windows <rte_os.h> provided macros or inline definitions for such symbols.
However, when placed in public header, these symbols were unnecessarily
exposed, breaking consumer POSIX compatibility code.

Move most of the shims to <rte_os_shim.h>, a header to be used instead
of <rte_os.h> by internal code. Include it in libraries and PMDs that
previously imported shims from <rte_os.h>. Directly replace shims that
were only used inside EAL:
* index -> strchr, rindex -> strrchr
* sleep -> rte_delay_us_sleep
* strerror_r -> strerror_s

Signed-off-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Ranjit Menon <ranjit.menon@intel.com>
2021-04-15 01:56:20 +02:00
Dmitry Kozlyuk
9ec521006d eal/windows: hide asprintf shim
Make asprintf(3) implementation for Windows private to EAL, so that it's
hidden from external consumers. It is not exposed to internal consumers
either, because they don't need asprintf() and also because callers from
other modules would have no reliable way to free allocated memory.

Signed-off-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
Acked-by: Khoa To <khot@microsoft.com>
Acked-by: Nick Connolly <nick.connolly@mayadata.io>
Acked-by: Ranjit Menon <ranjit.menon@intel.com>
2021-04-15 01:56:06 +02:00
Xueming Li
3ab385063c kvargs: add get by key
Adds a new function to get value of a specific key from kvargs list.

Signed-off-by: Xueming Li <xuemingl@nvidia.com>
Reviewed-by: Gaetan Rivet <grive@u256.net>
2021-04-14 22:25:22 +02:00
Xueming Li
e132ee8690 devargs: fix memory leak on parsing failure
This patch fixes memory leak in parsing error handling.

Fixes: 338327d731 ("devargs: add function to parse device layers")
Cc: stable@dpdk.org

Signed-off-by: Xueming Li <xuemingl@nvidia.com>
Reviewed-by: Gaetan Rivet <grive@u256.net>
2021-04-14 22:25:15 +02:00
Xueming Li
64051bb1f1 devargs: unify scratch buffer storage
In current design, legacy parser rte_devargs_parse() saved scratch
buffer to devargs.args while new parser rte_devargs_layers_parse() saved
to devargs.data. Code using devargs had to know the difference and
cleaned up memory accordingly - error prone.

This patch unifies scratch buffer to data field, introduces
rte_devargs_reset() function to wrap the memory clean up logic.

Signed-off-by: Xueming Li <xuemingl@nvidia.com>
Acked-by: Ray Kinsella <mdr@ashroe.eu>
Reviewed-by: Gaetan Rivet <grive@u256.net>
2021-04-14 22:25:08 +02:00
Stephen Hemminger
9667d97c25 pflock: add phase-fair reader writer locks
This is a new type of reader-writer lock that provides better fairness
guarantees which better suited for typical DPDK applications.
A pflock has two ticket pools, one for readers and one
for writers.

Phase-fair reader writer locks ensure that neither reader nor writer will
be starved.
Neither reader or writer are preferred, they execute in alternating
phases.
All operations of the same type (reader or writer) that acquire the lock
are handled in FIFO order.
Write operations are exclusive, and multiple read operations can be run
together (until a write arrives).

A similar implementation is in Concurrency Kit package in FreeBSD.
For more information see:
   "Reader-Writer Synchronization for Shared-Memory Multiprocessor
    Real-Time Systems",
    http://www.cs.unc.edu/~anderson/papers/ecrts09b.pdf

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2021-04-14 21:59:47 +02:00