The 'dmadev' is a generic type of DMA device.
This patch introduce the 'dmadev' device allocation functions.
The infrastructure is prepared to welcome drivers in drivers/dma/
Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Morten Brørup <mb@smartsharesystems.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
Reviewed-by: Kevin Laatz <kevin.laatz@intel.com>
Reviewed-by: Conor Walsh <conor.walsh@intel.com>
As stated in the API, dynamic field and flags should be created with no
additional flag (simply in the API for future changes).
Fix the dynamic flag register helper which was not enforcing it and add
unit tests.
Fixes: 4958ca3a44 ("mbuf: support dynamic fields and flags")
Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Acked-by: Ray Kinsella <mdr@ashroe.eu>
If we do not enforce valid flags are passed by an application, this
application might face issues in the future when we add more flags.
Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Acked-by: Ray Kinsella <mdr@ashroe.eu>
If we do not enforce valid flags are passed by an application, this
application might face issues in the future when we add more flags.
Signed-off-by: David Marchand <david.marchand@redhat.com>
Reviewed-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Acked-by: Ray Kinsella <mdr@ashroe.eu>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
When running using in-memory mode, multiple processes can use the same
runtime dir, leading to conflicts with the telemetry sockets in that
directory. We can resolve this by appending a suffix to each socket
beyond the first, with the suffix being an increasing counter value.
Each process uses the first unused socket counter value.
Fixes: 6dd571fd07 ("telemetry: introduce new functionality")
Reported-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Ciara Power <ciara.power@intel.com>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Tested-by: Conor Walsh <conor.walsh@intel.com>
Telemetry interface should be exposed for primary processes only, since
secondary processes will conflict on socket creation, and since all
data in secondary process is generally available to primary. For
example, all device stats for ethdevs, cryptodevs, etc. will all be
common across processes.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Ciara Power <ciara.power@intel.com>
Tested-by: Conor Walsh <conor.walsh@intel.com>
Current implementation of rte_ipv4_fragment_packet() doesn’t take
into account offset and flag values of the given packet, but blindly
assumes they are always zero (original packet is not fragmented).
According to RFC791, fragment and flag values for new fragment
should take into account values provided in the original IPv4 packet.
Fixes: 4c38e5532a ("ip_frag: refactor IPv4 fragmentation into a proper library")
Cc: stable@dpdk.org
Signed-off-by: Huichao Cai <chcchc88@163.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Removed offload flag PKT_RX_EIP_CKSUM_BAD. PKT_RX_OUTER_IP_CKSUM_BAD
should be used as a replacement.
Signed-off-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Use correct define as a name array size.
The change breaks ABI and therefore cannot be backported to
stable branches.
Fixes: 38c9817ee1 ("mempool: adjust name size in related data types")
Signed-off-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Reviewed-by: David Marchand <david.marchand@redhat.com>
Earlier, JSON message length was limited to 1024 which would not
allow data more than this size. Removed this limitation by creating
output buffer based on requested data length.
Fixes: 52af6ccb2b ("telemetry: add utility functions for creating JSON")
Cc: stable@dpdk.org
Signed-off-by: Gowrishankar Muthukrishnan <gmuthukrishn@marvell.com>
Acked-by: Ciara Power <ciara.power@intel.com>
The in-memory option is not supported on FreeBSD so print a warning and
ignore the flag when it is specified for BSD apps. The lack of support
is due to the different way in which memory is managed on FreeBSD using
the contigmem driver rather than via a hugetlbfs filesystem.
Fixes: 14de8734c4 ("eal: add --in-memory option")
Cc: stable@dpdk.org
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
This function is public since commit 8f0e4d6a78 ("net: export IPv6
header extensions skip function") (2018), and is used by vmxnet3 driver.
Promote it as stable.
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: David Marchand <david.marchand@redhat.com>
Acked-by: Ray Kinsella <mdr@ashroe.eu>
Sort the definitions for extended features (leaf 0) to enhance
readability.
Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Caught while checking CPUID related stuff in OVS.
According to [1], for Structured Extended Feature Flags Enumeration Leaf
(EAX = 0x07H, ECX = 0):
- BMI1 is associated to EBX, bit 3 (was incorrectly 2),
- SMEP is associated to EBX, bit 7 (was incorrectly 6),
- BMI2 is associated to EBX, bit 8 (was incorrectly 7),
- ERMS is associated to EBX, bit 9 (was incorrectly 8),
1: https://software.intel.com/sites/default/files/managed/c5/15/architecture-instruction-set-extensions-programming-reference.pdf
Fixes: af75078fec ("first public release")
Cc: stable@dpdk.org
Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
get_hugepage_dir() was implemented in such a way that a --huge-dir
option had to exactly match the mountpoint, but there's no reason for
this restriction: DPDK might not be the only user of hugepages, and
shouldn't assume it owns an entire mountpoint. For example, if I have
/dev/hugepages/myapp, and /dev/hugepages/dpdk, I should be able to
specify:
--huge-dir=/dev/hugepages/dpdk/
and have DPDK only use that sub-directory.
Fix the implementation to allow a sub-directory within a suitable
hugetlbfs mountpoint to be specified, preferring the closest match.
Signed-off-by: John Levon <john.levon@nutanix.com>
Acked-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
Add inner packet IPv4 hdr and L4 checksum enable options
in conf. These will be used in case of protocol offload.
Per SA, application could specify whether the
checksum(compute/verify) can be offloaded to security device.
Signed-off-by: Archana Muniganti <marchana@marvell.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Akhil Goyal <gakhil@marvell.com>
Proposed change to event crypto metadata is not done as per deprecation
note. Instead, comments are updated in spec to improve readability.
Signed-off-by: Shijith Thotton <sthotton@marvell.com>
Acked-by: Abhinandan Gujjar <abhinandan.gujjar@intel.com>
Queue setup may genuinely fail when adding incremental queues
for a given priority level. In that case application would
attempt to configure a queue at a different priority level.
Not an actual error.
Signed-off-by: Nicolas Chautru <nicolas.chautru@intel.com>
Reviewed-by: Tom Rix <trix@redhat.com>
Adding option to drop CRC24B to align with existing
feature for 5G
Signed-off-by: Nicolas Chautru <nicolas.chautru@intel.com>
Reviewed-by: Tom Rix <trix@redhat.com>
Adding a missing operation when CRC16
is being used for TB CRC check.
Signed-off-by: Nicolas Chautru <nicolas.chautru@intel.com>
Reviewed-by: Tom Rix <trix@redhat.com>
Add option to indicate whether UDP encapsulation ports
verification need to be done as part of inbound
IPsec processing.
Signed-off-by: Tejasree Kondoj <ktejasree@marvell.com>
Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Acked-by: Akhil Goyal <gakhil@marvell.com>
The header was not intended to be a public one.
DPDK users should use `rte_mem_virt2iova()` to translate addresses.
Other virt2phys users should use the header from the driver instead.
Fixes: 2a5d547a4a ("eal/windows: implement basic memory management")
Cc: stable@dpdk.org
Signed-off-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
On Windows, -l/--lcores EAL option was unable to process CPU sets
containing CPUs other than 0 and 1, because CPU_COUNT() macro
only checked these CPUs in the set. Fix CPU_COUNT() by enumerating
all possible CPU indices.
Fixes: e8428a9d89 ("eal/windows: add some basic functions and macros")
Cc: stable@dpdk.org
Signed-off-by: Narcisa Vasile <navasile@microsoft.com>
Acked-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
Acked-by: Pallavi Kadam <pallavi.kadam@intel.com>
A more fine-grain flow API action RTE_FLOW_ACTION_TYPE_SAMPLE should
be used instead of it.
Signed-off-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Jerin Jacob <jerinj@marvell.com>
Acked-by: Haiyue Wang <haiyue.wang@intel.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
Currently, most ethdev callback API use queue ID as parameter, but Rx
and Tx queue release callback use queue object which is used by Rx and
Tx burst data plane callback.
To align with other eth device queue configuration callbacks:
- queue release callbacks are changed to use queue ID
- all drivers are adapted
Signed-off-by: Xueming Li <xuemingl@nvidia.com>
Reviewed-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Acked-by: Somnath Kotur <somnath.kotur@broadcom.com>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
Some drivers don't need Rx and Tx queue release callback, make them
optional. Clean up empty queue release callbacks for some drivers.
Signed-off-by: Xueming Li <xuemingl@nvidia.com>
Reviewed-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Adjust parameters order to eth_xstats_get_by_id_t prototype.
Make ids the second parameter similar to eth_xstats_get_by_id_t.
Signed-off-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
Update xstats by IDs callbacks documentation in accordance with
ethdev usage of these callbacks. Document valid combinations of
input arguments to make driver implementation simpler.
Signed-off-by: Ivan Ilchenko <ivan.ilchenko@oktetlabs.ru>
Signed-off-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Reviewed-by: Andy Moreton <amoreton@xilinx.com>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
Relax requirements on get xstats names by IDs. After the patch
corresponding the driver operation is called with non-NULL ids
and xstats_names parameters only.
Signed-off-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
Document valid combinations of input arguments in accordance with
current implementation in ethdev.
Fixes: 79c913a42f ("ethdev: retrieve xstats by ID")
Cc: stable@dpdk.org
Signed-off-by: Ivan Ilchenko <ivan.ilchenko@oktetlabs.ru>
Signed-off-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Reviewed-by: Andy Moreton <amoreton@xilinx.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
Definition of `rte_ether_addr` structure used a workaround allowing DPDK
and Windows SDK headers to be used in the same file, because Windows SDK
defines `s_addr` as a macro. Rename `s_addr` to `src_addr` and `d_addr`
to `dst_addr` to avoid the conflict and remove the workaround.
Deprecation notice:
https://mails.dpdk.org/archives/dev/2021-July/215270.html
Signed-off-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
Build the security library on Windows.
Remove unneeded export of inline functions from version file.
Signed-off-by: Tal Shnaiderman <talshn@nvidia.com>
Acked-by: Akhil Goyal <gakhil@marvell.com>
Acked-by: William Tu <u9012063@gmail.com>
Build the cryptography device library on Windows OS
by removing unneeded include and exports of inline functions
blocking the compilation.
Signed-off-by: Tal Shnaiderman <talshn@nvidia.com>
Acked-by: Akhil Goyal <gakhil@marvell.com>
Acked-by: William Tu <u9012063@gmail.com>
Remove the netinet includes and replaces them
with rte_ip.h to support the in_addr/in6_addr structs
on all operating systems.
Signed-off-by: Tal Shnaiderman <talshn@nvidia.com>
Acked-by: Akhil Goyal <gakhil@marvell.com>
Acked-by: William Tu <u9012063@gmail.com>
Fixed with ./devtools/update-abi.sh $(cat ABI_VERSION)
Fixes: e73a7ab224 ("net/softnic: promote manage API")
Fixes: 8f532a34c4 ("fib: promote API to stable")
Fixes: 4aeb92396b ("rib: promote API to stable")
Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Ray Kinsella <mdr@ashroe.eu>
In some function definitions, adjust return type and function name on
a separate line to be consistent with DPDK coding style.
Signed-off-by: Xuan Ding <xuan.ding@intel.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
Caught by code review, this copy is unnecessary.
Signed-off-by: David Marchand <david.marchand@redhat.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
Old IOVA cache entries are left when there is a change on virtio driver
in VM. In case that all these old entries have iova addresses lesser
than new iova entries, vhost code will need to iterate all the cache to
find the new ones. In case of just a new iova entry needed for the new
translations, this condition will last forever.
This has been observed in virtio-net to testpmd's vfio-pci driver
transition, reducing the performance from more than 10Mpps to less than
0.07Mpps if the hugepage address was higher than the networking
buffers. Since all new buffers are contained in this new gigantic page,
vhost needs to scan IOTLB_CACHE_SIZE - 1 for each translation at worst.
Fixes: 69c90e98f4 ("vhost: enable IOMMU support")
Cc: stable@dpdk.org
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Reported-by: Pei Zhang <pezhang@redhat.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
This function should be made stable now.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
This one has been in for required time period.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
These functions to register dynamic fields were added in 19.11
and should be promoted to stable.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
These two functions were added in 19.11 as experimental.
Time to promote the to stable status.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Those accessors have been introduced more than two years ago
(rte_mbuf_to_priv in v18.08, rte_mbuf_*_addr* in v19.02).
Time to mark them stable.
rte_mbuf_to_baddr() could be removed, but since we lack a deprecation
notice, keep it as a simple wrapper.
Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
These methods were introduced in 20.05.
There has been no changes in their public API since then.
They seem mature enough to remove the experimental tag.
Signed-off-by: Sean Morrissey <sean.morrissey@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Only a single DPDK process on the system can be using the /dev/contigmem
mappings at a time, but this was never explicitly enforced, e.g. when
using --in-memory flag on two processes. To prevent possible conflict
issues, we lock the dev node when it's in use, preventing other DPDK
processes from starting up and causing problems for us.
Fixes: 764bf26873 ("add FreeBSD support")
Cc: stable@dpdk.org
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>
The fib and fib6 API's have been in since 19.11 and
should be marked as stable.
Signed-off-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>
Acked-by: Conor Walsh <conor.walsh@intel.com>
The rib and rib6 API's have been in since 19.11 and
should be marked as stable.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>
This function has been in since 19.11.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
rte_net_make_rarp_packet was introduced in version v18.02, there was no
change in this public API since then, and it's still being used by vhost
lib and virtio driver, so promote it as stable ABI.
Signed-off-by: Xiao Wang <xiao.w.wang@intel.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Chenbo Xia <chenbo.xia@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
This one might be quite mature to be attested as stable.
Signed-off-by: Ivan Malov <ivan.malov@oktetlabs.ru>
Reviewed-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Acked-by: Ray Kinsella <mdr@ashroe.eu>
The telemetry APIs have been present and unchanged for >1 year now,
so remove experimental tag from them.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Ciara Power <ciara.power@intel.com>
Acked-by: Ray Kinsella <mdr@ashroe.eu>
rte_efd_create() function was using uint8_t for a socket bitmask,
for one of its parameters.
This limits the maximum of NUMA sockets to be 8.
Changing to uint64_t increases it to 64, which should be
more future-proof.
Coverity issue: 366390
Fixes: 56b6ef874f ("efd: new Elastic Flow Distributor library")
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Acked-by: John McNamara <john.mcnamara@intel.com>
Acked-by: Yipeng Wang <yipeng1.wang@intel.com>
Tested-by: David Christensen <drc@linux.vnet.ibm.com>
rte_stats_bitrate_free() has been in DPDK since 20.11.
Its signature is very basic as it just frees an opaque
data struct allocated in rte_stats_bitrate_create()
and returns void.
It's unlikely that such a basic signature would need to change
so might as well promote it to stable for the next major ABI.
Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
rte_stats_bitrate_calc() API states it returns 'Negative value on error'.
However, the implementation will return the error code from
rte_eth_stats_get() which may be non-zero on error.
Change the implementation of rte_stats_bitrate_calc() to match
the API description by always returning a negative value on error.
Fixes: 2ad7ba9a65 ("bitrate: add bitrate statistics library")
Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
rte_stats_bitrate_reg() API states it returns 'Zero on success'.
However, the implementation directly returns the return of
rte_metrics_reg_names() which may be zero or positive on success,
with a positive value also indicating the index.
The user of rte_stats_bitrate_reg() should not care about the
index as it is stored in the opaque rte_stats_bitrates struct.
Change the implementation of rte_stats_bitrate_reg() to match
the API description by always returning zero on success.
Fixes: 2ad7ba9a65 ("bitrate: add bitrate statistics library")
Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
There are a number telemetry threads which are created and
there is nothing that does pthread_join() to wait for them.
Mark these threads as detached, so that the pthread library
can cleanup state when the thread exits.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Ciara Power <ciara.power@intel.com>
Change "enqueue" to "dequeue" because the __rte_ring_move_cons_head()
function is updating the consumer head for dequeue.
Fixes: 0dfc98c507 ("ring: separate out head index manipulation")
Cc: stable@dpdk.org
Signed-off-by: Cian Ferriter <cian.ferriter@intel.com>
Acked-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Currently there are some public headers that include 'sys/queue.h', which
is not POSIX, but usually provided by the Linux/BSD system library.
(Not in POSIX.1, POSIX.1-2001, or POSIX.1-2008. Present on the BSDs.)
The file is missing on Windows. During the Windows build, DPDK uses a
bundled copy, so building a DPDK library works fine. But when OVS or other
applications use DPDK as a library, because some DPDK public headers
include 'sys/queue.h', on Windows, it triggers an error due to no such
file.
One solution is to install the 'lib/eal/windows/include/sys/queue.h' into
Windows environment, such as [1]. However, this means DPDK exports the
functionalities of 'sys/queue.h' into the environment, which might cause
symbols, macros, headers clashing with other applications.
The patch fixes it by removing the "#include <sys/queue.h>" from
DPDK public headers, so programs including DPDK headers don't depend
on the system to provide 'sys/queue.h'. When these public headers use
macros such as TAILQ_xxx, we replace it by the ones with RTE_ prefix.
For Windows, we copy the definitions from <sys/queue.h> to rte_os.h
in Windows EAL. Note that these RTE_ macros are compatible with
<sys/queue.h>, both at the level of API (to use with <sys/queue.h>
macros in C files) and ABI (to avoid breaking it).
Additionally, the TAILQ_FOREACH_SAFE is not part of <sys/queue.h>,
the patch replaces it with RTE_TAILQ_FOREACH_SAFE.
[1] http://mails.dpdk.org/archives/dev/2021-August/216304.html
Suggested-by: Nick Connolly <nick.connolly@mayadata.io>
Suggested-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
Signed-off-by: William Tu <u9012063@gmail.com>
Acked-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
Acked-by: Narcisa Vasile <navasile@linux.microsoft.com>
Public headers including POSIX-specific <sched.h> were unusable
on Windows. These includes were superfluous, remove them.
Signed-off-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
Acked-by: William Tu <u9012063@gmail.com>
* Version and randomness API were not added to .def file by mistake,
which is why they were later excluded from the export list.
* Device API stubs were added to EAL but not exported.
Fixes: edd66d57d5 ("eal/windows: add random function")
Fixes: 3d2fcb0e0a ("eal/windows: add device event stubs")
Fixes: 5b637a8481 ("eal: fix querying DPDK version at runtime")
Signed-off-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
Acked-by: William Tu <u9012063@gmail.com>
Acked-by: Ranjit Menon <ranjit.menon@intel.com>
The majority of common EAL sources that are built for all platforms were
listed separately for Windows and for other OS. It seems that developers
adding modules to EAL perceived this as if Windows supported
only a limited subset of modules and only added new ones into another.
Factor the truly common modules into a shared list,
then extend it with modules supported by different platforms.
When the two lists were created, UUID API implementation was removed
from Windows build (apparently by mistake), then excluded from the
export list for no reason other than not being built. Restore it.
Fixes: df3ff6be2b ("eal: simplify meson build of common directory")
Signed-off-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
Acked-by: William Tu <u9012063@gmail.com>
Acked-by: Ranjit Menon <ranjit.menon@intel.com>
When OVS inits, it calls rte_version to get the DPDK's version.
The patch fixes the error below by exposing rte_version symbol.
libopenvswitch.a(dpdk.c.obj) : error LNK2019: unresolved external symbol
rte_version referenced in function dpdk_init
Fixes: 5b637a8481 ("eal: fix querying DPDK version at runtime")
Cc: stable@dpdk.org
Signed-off-by: William Tu <u9012063@gmail.com>
Acked-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
IAVF PMD needs to generate a random MAC address if it is not configured
by host.
'random' is now supported on Windows.
Signed-off-by: Pallavi Kadam <pallavi.kadam@intel.com>
Reviewed-by: Ranjit Menon <ranjit.menon@intel.com>
Acked-by: Shivanshu Shukla <shivanshu.shukla@intel.com>
A '*' is missing at 2 places, add them.
Fixes: e1a00536c8 ("kvargs: add a new library to parse key/value arguments")
Fixes: 3ab385063c ("kvargs: add get by key")
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Reviewed-by: Xueming Li <xuemingl@nvidia.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
Acked-by: Ray Kinsella <mdr@ashroe.eu>
A quite common scenario with kvargs is to lookup for a <key>=<value> in
a kvlist. For instance, check if name=foo is present in
name=toto,name=foo,name=bar. This is currently done in drivers/bus with
rte_kvargs_process() + the rte_kvargs_strcmp() handler.
This approach is not straightforward, and can be replaced by this new
function.
rte_kvargs_strcmp() is then removed.
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Reviewed-by: Xueming Li <xuemingl@nvidia.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
Acked-by: Ray Kinsella <mdr@ashroe.eu>
The function rte_kvargs_get() is used by eal and pci bus driver since
its introduction in commit 3ab385063c ("kvargs: add get by key") and
commit d2a66ad794 ("bus: add device arguments name parsing"), in
dpdk 21.05.
Let's promote it as stable.
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Reviewed-by: Xueming Li <xuemingl@nvidia.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
Acked-by: Ray Kinsella <mdr@ashroe.eu>
This function is used by EAL to parse key/value strings separated with
specified delimiters.
It was introduced in 2018 by commit 5d6af85ab0 ("kvargs: introduce a
more flexible parsing function"), and can be promoted as stable.
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Reviewed-by: Xueming Li <xuemingl@nvidia.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
Acked-by: Ray Kinsella <mdr@ashroe.eu>
This updates the gtp_psc flow item to use the net header
definition of the gtp_psc to be based on RFC 38415-g30
Signed-off-by: Raslan Darawsheh <rasland@nvidia.com>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
A lot of flags are parts of a group but are documented alone.
The Doxygen syntax @{ and @} for grouping is used
to make flags appear together and have a common description.
Some Rx/Tx offload flags and RSS definitions are not grouped
because they need to be all properly documented first.
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
This patch defines new RSS offload types for IPv4 and
L4(TCP/UDP/SCTP) checksum, which are required when users want
to distribute packets based on the IPv4 or L4 checksum field.
For example "flow create 0 ingress pattern eth / ipv4 / end
actions rss types ipv4-chksum end queues end / end", this flow
causes all matching packets to be distributed to queues on
basis of IPv4 checksum.
Signed-off-by: Alvin Zhang <alvinx.zhang@intel.com>
Reviewed-by: Qi Zhang <qi.z.zhang@intel.com>
Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Acked-by: Aman Deep Singh <aman.deep.singh@intel.com>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
As per ABI policy, move the formerly experimental API's to the stable
section.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Ray Kinsella <mdr@ashroe.eu>
As per ABI policy, move the formerly experimental API's to the stable
section.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Ray Kinsella <mdr@ashroe.eu>
As per ABI policy, move the formerly experimental API's to the stable
section.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Ray Kinsella <mdr@ashroe.eu>
As per ABI policy, move the formerly experimental API's to the stable
section.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Ray Kinsella <mdr@ashroe.eu>
As per ABI policy, move the formerly experimental API's to the stable
section.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Ray Kinsella <mdr@ashroe.eu>
As per ABI policy, move the formerly experimental API's to the stable
section.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Ray Kinsella <mdr@ashroe.eu>
As per ABI policy, move the formerly experimental API's to the stable
section.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Ray Kinsella <mdr@ashroe.eu>
Add option to indicate whether outer header verification
need to be done as part of inbound IPsec processing.
With inline IPsec processing, SA lookup would be happening
in the Rx path of rte_ethdev. When rte_flow is configured to
support more than one SA, SPI would be used to lookup SA.
In such cases, additional verification would be required to
ensure duplicate SPIs are not getting processed in the inline path.
For lookaside cases, the same option can be used by application
to offload tunnel verification to the PMD.
These verifications would help in averting possible DoS attacks.
Signed-off-by: Tejasree Kondoj <ktejasree@marvell.com>
Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Acked-by: Akhil Goyal <gakhil@marvell.com>
Add SA lifetime configuration to register soft and hard expiry limits.
Expiry can be in units of number of packets or bytes. Crypto op
status is also updated to include new field, aux_flags, which can be
used to indicate cases such as soft expiry in case of lookaside
protocol operations.
In case of soft expiry, the packets are successfully IPsec processed but
the soft expiry would indicate that SA needs to be reconfigured. For
inline protocol capable ethdev, this would result in an eth event while
for lookaside protocol capable cryptodev, this can be communicated via
`rte_crypto_op.aux_flags` field.
In case of hard expiry, the packets will not be IPsec processed and
would result in error.
Signed-off-by: Anoob Joseph <anoobj@marvell.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Akhil Goyal <gakhil@marvell.com>
Enabled user to provide IV to be used per security
operation. This would be used with lookaside protocol
offload for comparing against known vectors.
By default, PMD would internally generate random IV.
Signed-off-by: Anoob Joseph <anoobj@marvell.com>
Acked-by: Akhil Goyal <gakhil@marvell.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Currently rte_security_set_pkt_metadata() and rte_security_get_userdata()
methods to set pkt metadata on Inline outbound and get userdata
after Inline inbound processing is always driver specific callbacks.
For drivers that do not have much to do in the callbacks but just
to update metadata in rte_security dynamic field and get userdata
from rte_security dynamic field, having to just to PMD specific
callback is costly per packet operation. This patch provides
a mechanism to do the same in inline function and avoid function
pointer jump if a driver supports the same.
Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Akhil Goyal <gakhil@marvell.com>
Not all net PMD's/HW can parse packet and identify L2 header and
L3 header locations on Tx. This is inline with other Tx offloads
requirements such as L3 checksum, L4 checksum offload, etc,
where mbuf.l2_len, mbuf.l3_len etc, needs to be set for HW to be
able to generate checksum. Since Inline IPsec is also such a Tx
offload, some PMD's at least need mbuf.l2_len to be valid to
find L3 header and perform Outbound IPSec processing.
Hence, this patch updates documentation to enforce setting
mbuf.l2_len while setting PKT_TX_SEC_OFFLOAD in mbuf.ol_flags
for Inline IPsec Crypto / Protocol offload processing to
work on Tx.
Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Akhil Goyal <gakhil@marvell.com>
Add a macro to swap two variables
and updat common autotest for the same.
Signed-off-by: Shijith Thotton <sthotton@marvell.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
The previous commit 18effad9cf ("stack: reload head when pop fails")
only changed C11 implementation, not generic implementation.
List head must be loaded right before continue (when failed to find the
new head). Without this, one thread might keep trying and failing to pop
items without ever loading the new correct head.
Fixes: 3340202f59 ("stack: add lock-free implementation")
Cc: stable@dpdk.org
Signed-off-by: Julien Meunier <julien.meunier@nokia.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
This patch adds new function that compute the greatest common
divisor of 64 bits, also changes the original 32 bits function
to call this new 64-bit version.
Cc: stable@dpdk.org
Signed-off-by: Xueming Li <xuemingl@nvidia.com>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
The arguments of actions that are learned are now specified as part of
the learn instruction as opposed to being statically specified as part
of the learner table configuration.
Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
Commit the pipeline changes when the compilation process is
successful: change the table lookup instructions to execute the action
function for each action, replace the regular pipeline instructions
with the custom instructions.
Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
Build the generated C file into a shared object library.
Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Generate a C function for each custom instruction, which essentially
consolidate multiple regular instructions into a single function call.
The pipeline program is split into groups of instructions, and a
custom instruction is generated for each group that has more than one
instruction. Special care is taken the instructions that can do thread
yield (RX, extern) and for those that can change the instruction
pointer (TX, near/far jump).
Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
Generate a C function for each action. For most instructions, the
associated inline function is called directly. Special care is taken
for TX, jump and return instructions.
Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
Export the array of translated instructions to a C file. There is one
such array per action and one for the pipeline.
Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
Lay the foundation to generate C code for the pipeline: C functions
for actions and custom instructions are generated, built as shared
object library and loaded into the pipeline.
Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
For better performance, the option to create custom instructions when
the program is translated and add them on-the-fly to the pipeline is
now provided. Multiple regular instructions can now be consolidated
into a single C function optimized by the C compiler directly.
Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
For better performance, the option to run a single function per action
is now provided, which requires a single function call per action that
can be better optimized by the C compiler, as opposed to one function
call per instruction. Special table lookup instructions are added to
to support this feature.
Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
Save the instruction meta-data for later use instead of freeing it up
once the instruction translation is completed.
Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
Start to consolidate the data structures and inline functions required
by the pipeline instructions into an internal header file.
Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
A learner table is typically used for learning or connection tracking,
where it allows for the implementation of the "add on miss" scenario:
whenever the lookup key is not found in the table (lookup miss), the
data plane can decide to add this key to the table with a given action
with no control plane intervention. Likewise, the table keys expire
based on a configurable timeout and are automatically deleted from the
table with no control plane intervention.
Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
Added look-ahead instruction to read a header from the input packet
without advancing the extraction pointer. This is typically used in
correlation with the special extract instruction to extract variable
size headers from the input packet: the first few header fields are
read without advancing the extraction pointer, just enough to detect
the actual length of the header (e.g. IPv4 IHL field); then the full
header is extracted.
Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
Added a mechanism to extract variable size headers through a special
flavor of the extract instruction. The length of the last struct field
which has variable size is passed as argument to the instruction.
Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
Added support for variable size headers. The last field of a struct
type can now have a variable size between 0 and N bytes. Useful to
accommodate IPv4 packets with options, etc.
Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
The emit instruction that is responsible for pushing headers into the
output packet is now reading the header length from internal run-time
structures as opposed to constant value from the instruction opcode.
Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
Remove experimental flag from rte_metrics_deinit().
This API was introduced in 19.11 release.
Signed-off-by: Harman Kalra <hkalra@marvell.com>
Acked-by: Ray Kinsella <mdr@ashroe.eu>
When building DPDK on Windows in debug mode the following
warning appear:
warning: token pasting of ',' and __VA_ARGS__ is a GNU extension
[-Wgnu-zero-variadic-macro-arguments] #define open(path, flags, ...)
_open(path, flags, ##__VA_ARGS__)
Modify the 'open' macro to avoid it.
Fixes: 45d62067c2 ("eal: make OS shims internal")
Cc: stable@dpdk.org
Signed-off-by: Tal Shnaiderman <talshn@nvidia.com>
Acked-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
Add support for dicts of dicts to telemetry library.
Increase the max string size to 128.
Signed-off-by: Declan Doherty <declan.doherty@intel.com>
Signed-off-by: Radu Nicolau <radu.nicolau@intel.com>
Acked-by: Ciara Power <ciara.power@intel.com>
Some logs about cores and nodes were using hypotetic plural (s) form.
A fixed plural form with value at the end is preferred.
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
The private headers are compiled internally with a C compiler.
Thus extern "C" declaration is useless in such files.
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
As reported by symbol bot, APIs listed in this patch have been
experimental for more than two years. This patch promotes these
18 APIs to stable.
Signed-off-by: Chenbo Xia <chenbo.xia@intel.com>
Acked-by: Ray Kinsella <mdr@ashroe.eu>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Add log print of socket path in vhost_user_add_connection.
It's useful when adding a mass of socket connections,
because the information of every connection is clearer.
Fixes: 8f972312b8 ("vhost: support vhost-user")
Cc: stable@dpdk.org
Signed-off-by: Gaoxiang Liu <liugaoxiang@huawei.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
The rte_vhost_driver_unregister() and vhost_user_read_cb()
can be called at the same time by 2 threads.
when memory of vsocket is freed in rte_vhost_driver_unregister(),
the invalid memory of vsocket is accessed in vhost_user_read_cb().
It's a bug of both mode for vhost as server or client.
E.g., vhostuser port is created as server.
Thread1 calls rte_vhost_driver_unregister().
Before the listen fd is deleted from poll waiting fds,
"vhost-events" thread then calls vhost_user_server_new_connection(),
then a new conn fd is added in fdset when trying to reconnect.
"vhost-events" thread then calls vhost_user_read_cb() and
accesses invalid memory of socket while thread1 frees the memory of
vsocket.
E.g., vhostuser port is created as client.
Thread1 calls rte_vhost_driver_unregister().
Before vsocket of reconn is deleted from reconn list,
"vhost_reconn" thread then calls vhost_user_add_connection()
then a new conn fd is added in fdset when trying to reconnect.
"vhost-events" thread then calls vhost_user_read_cb() and
accesses invalid memory of socket while thread1 frees the memory of
vsocket.
The fix is to move the "fdset_try_del" in front of free memory of conn,
then avoid the race condition.
The core trace is:
Program terminated with signal 11, Segmentation fault.
Fixes: 52d874dc67 ("vhost: fix crash on closing in client mode")
Cc: stable@dpdk.org
Signed-off-by: Gaoxiang Liu <liugaoxiang@huawei.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
Copy threshold has been introduced in async vhost data
path to select the appropriate copy engine to do copies
for higher efficiency.
However, it may cause packets ordering issues and also
introduces performance unpredictability.
Therefore, this patch removes copy threshold support in
async vhost data path.
Signed-off-by: Jiayu Hu <jiayu.hu@intel.com>
Signed-off-by: Cheng Jiang <cheng1.jiang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Preparation of the headers for the hardware offload
misses the outer IPv4 checksum offload.
It results in bad checksum computed by hardware NIC.
This patch fixes the issue by setting the outer IPv4
checksum field to 0.
Fixes: 4fb7e803eb ("ethdev: add Tx preparation")
Cc: stable@dpdk.org
Signed-off-by: Mohsin Kazmi <mohsin.kazmi14@gmail.com>
Acked-by: Qi Zhang <qi.z.zhang@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
The DPDK Symbol Bot reports:
Please note the symbols listed below have expired. In line with the
DPDK ABI policy, they should be scheduled for removal, in the next
DPDK release.
Symbol
rte_eth_rx_burst_mode_get
rte_eth_tx_burst_mode_get
Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Ray Kinsella <mdr@ashroe.eu>
Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Fix a typo that mb_pool was misspelt as mp_pool.
Fixes: 4ff702b5df ("ethdev: introduce Rx buffer split")
Cc: stable@dpdk.org
Signed-off-by: Joyce Kong <joyce.kong@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
Remove experimental tag from rte_eth_dev_set_ptypes().
Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Acked-by: Ray Kinsella <mdr@ashroe.eu>
Remove the experimental tag for rte_eth_dev_rx_intr_ctl_q_get_fd API
that was introduced in 18.11 and have been around for 11 releases.
Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Ray Kinsella <mdr@ashroe.eu>
Acked-by: David Marchand <david.marchand@redhat.com>
This patch fixes a memleak which was reported in Bugzilla within the
eal_save_args function. This was caused by the function mistakenly
adding -- to the eal args instead of breaking beforehand.
Bugzilla ID: 722
Fixes: 293c53d8b2 ("eal: add telemetry callbacks")
Reported-by: Zhihong Peng <zhihongx.peng@intel.com>
Signed-off-by: Conor Walsh <conor.walsh@intel.com>
Signed-off-by: Conor Fogarty <conor.fogarty@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
New API for these were added in 20.11 and the old API was retained
but marked deprecated. Since 21.11 is the next LTS, it is time
to remove the deprecated ones.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Acked-by: David Marchand <david.marchand@redhat.com>
Windows EAL depends on some system libraries. They were linked using
add_project_link_arguments('-l<LIB>'), which prevented meson from adding
them to Libs.private of pkg-config file. As a result, applications using
pkg-config to find DPDK hit link errors, for example:
librte_eal.a(eal_windows_eal_debug.c.obj) : error LNK2019: unresolved
external symbol __imp_SymInitialize referenced in function
rte_dump_stack
Reference required libraries in EAL using ext_deps meson variable.
bus/pci and net/pcap depend on lib/eal and will pull them automatically.
Drop advapi32 dependency, as MinGW locates VirtualAlloc2() dynamically.
Fixes: 2a5d547a4a ("eal/windows: implement basic memory management")
Fixes: c91717eb75 ("eal/windows: support exit and panic")
Cc: stable@dpdk.org
Reported-by: William Tu <u9012063@gmail.com>
Signed-off-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
Tested-by: William Tu <u9012063@gmail.com>
Added macros to simplify print of MAC address.
The six bytes of a MAC address are extracted in
a macro here, to improve code readablity.
Signed-off-by: Aman Deep Singh <aman.deep.singh@intel.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
Added macro to print six bytes of MAC address.
The MAC addresses will be printed in upper case
hexadecimal format.
In case there is a specific check for lower case
MAC address, the user may need to make a change in
such test case after this patch.
Signed-off-by: Aman Deep Singh <aman.deep.singh@intel.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
This patch add support to handle PDCP short MAC-I domain
along with standard control and data domains as it has to
be treaty as special case with PDCP protocol offload support.
ShortMAC-I is the 16 least significant bits of calculated MAC-I. Usually
when a RRC message is exchanged between UE and eNodeB it is integrity &
ciphered protected.
MAC-I = f(key, varShortMAC-I, count, bearer, direction).
Here varShortMAC-I is prepared by using (current cellId, pci of source cell
and C-RNTI of old cell). Other parameters like count, bearer and
direction set to all 1.
crypto-perf app is updated to take short MAC as input mode.
Signed-off-by: Gagandeep Singh <g.singh@nxp.com>
Signed-off-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Acked-by: Akhil Goyal <gakhil@marvell.com>
The rte_cryptodev_pmd.* files are for drivers only and should be
private to DPDK, and not installed for app use.
Signed-off-by: Akhil Goyal <gakhil@marvell.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Acked-by: Fan Zhang <roy.fan.zhang@intel.com>
Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>
The API rte_cryptodev_pmd_is_valid_dev, can be used
by the application as well as PMD to check whether
the device is valid or not. Hence, _pmd is removed
from the API.
The applications and drivers which use this API are
also updated.
Signed-off-by: Akhil Goyal <gakhil@marvell.com>
Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Suppress gcc warning "warning: writing 16 bytes into a region of
size 0" for users of the POWER rte_memcpy() function. Existing
rte_memcpy() code takes different code paths based on the actual
size of the move so the warning is already addressed. See also
commit b5b3ea803e ("eal/x86: ignore gcc 10 stringop-overflow warnings")
Cc: stable@dpdk.org
Signed-off-by: David Christensen <drc@linux.vnet.ibm.com>
When parsing a devargs, try to parse using the global device syntax
first. Fallback on legacy syntax on error.
Example of new global device syntax:
-a bus=pci,addr=82:00.0/class=eth/driver=mlx5,dv_flow_en=1
Signed-off-by: Xueming Li <xuemingl@nvidia.com>
Reviewed-by: Gaetan Rivet <grive@u256.net>
For device probe and iterator, devargs name was key information,
parsed by rte_devargs_parse. In legacy parser, devargs name was
extracted after bus name:
bus:name,kv_arguments,,,
Example:
pci:83:00.0,arguments,...
vdev:pcap0,...
To be compatible with legacy parser, this patch introduces new
bus driver API devargs_parse to parse devargs and update devargs name.
If devargs_parse not implemented by bus driver, the new syntax parser
rte_devargs_layers_parse default will resolve devargs name from bus's
"name" argument.
Different bus driver might choose different keys from arguments with
unified format. The PCI bus implementation fills the devargs name with
the "addr" argument, example:
-a bus=pci,addr=83:00.0/class=eth/driver=mlx5,...
name: 0000:03:00.0
-a bus=vdev,name=pcap0/class=eth/driver=pcap,...
name:pcap0
Signed-off-by: Xueming Li <xuemingl@nvidia.com>
Reviewed-by: Gaetan Rivet <grive@u256.net>
Start a new release cycle with empty release notes.
The ABI version becomes 22.0.
The map files are updated to the new ABI major number (22).
The ABI exceptions are dropped and CI ABI checks are disabled because
compatibility is not preserved.
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: David Marchand <david.marchand@redhat.com>
This patch fixes the memcpy function call which was incorrect and led
to memory corruption for tables with more that just a few actions.
Fixes: 742b0a57f5 ("pipeline: add table statistics to SWX")
Cc: stable@dpdk.org
Signed-off-by: Churchill Khangar <churchill.khangar@intel.com>
Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
The PMD destroy function was calling the release function, which frees
cryptodev->data, and then tries to free cryptodev->data->dev_private,
which causes the heap use after free issue.
A temporary pointer is set before the free of cryptodev->data,
which can then be used afterwards to free dev_private.
The free cannot be moved to before the release function is called,
as dev_private is used in the PMD close function while being released.
Fixes: 9e6edea418 ("cryptodev: add APIs to assist PMD initialisation")
Cc: stable@dpdk.org
Reported-by: Zhihong Peng <zhihongx.peng@intel.com>
Signed-off-by: Ciara Power <ciara.power@intel.com>
Acked-by: Akhil Goyal <gakhil@marvell.com>
eal_mem_virt2phys_init() opens a handle for use by rte_mem_virt2phy().
Close this handle on EAL cleanup.
Fixes: 2a5d547a4a ("eal/windows: implement basic memory management")
Cc: stable@dpdk.org
Signed-off-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
Acked-by: Ranjit Menon <ranjit.menon@intel.com>
The event port config set by application in
rte_event_eth_tx_adapter_create API is modified in
default configuration callback function. This patch removes
this hardcode to use application provided event port
config value.
Fixes: a3bbf2e097 ("eventdev: add eth Tx adapter implementation")
Cc: stable@dpdk.org
Signed-off-by: Naga Harish K S V <s.v.naga.harish.k@intel.com>
When the vhost-user frontend like Virtio-user tries to
reconnect to the restarted Vhost backend, the Vhost backend
segfaults when multiqueue is enabled.
This is caused by VHOST_USER_GET_VRING_BASE being called for
a virtqueue that has not been created before, causing a NULL
pointer dereferencing.
This patch adds the VHOST_USER_GET_VRING_BASE requests to
the list of requests that trigger queue pair allocations.
Fixes: 160cbc815b ("vhost: remove a hack on queue allocation")
Cc: stable@dpdk.org
Reported-by: Yinan Wang <yinan.wang@intel.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Tested-by: Yinan Wang <yinan.wang@intel.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
Currently, rte_sched_free_memory() is called multiple times by the
exception handling code in rte_sched_subport_config() and
rte_sched_pipe_config().
This patch optimizes them into a unified outlet to free memory.
Fixes: ac6fcb841b ("sched: update subport rate dynamically")
Fixes: 34a90f8665 ("sched: modify pipe functions for config flexibility")
Fixes: ce7c4fd7c2 ("sched: add pipe config to subport level")
Cc: stable@dpdk.org
Signed-off-by: Huisong Li <lihuisong@huawei.com>
Signed-off-by: Min Hu (Connor) <humin29@huawei.com>
Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
This patch fixes return value judgment when allocate memory to store the
subport profile, and releases memory of 'rte_sched_port' if code fails to
apply for this memory.
Fixes: 0ea4c6afca ("sched: add subport profile table")
Cc: stable@dpdk.org
Signed-off-by: Huisong Li <lihuisong@huawei.com>
Signed-off-by: Min Hu (Connor) <humin29@huawei.com>
Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
The freqs array size is RTE_MAX_LCORE_FREQS. Before filling the
array with num_freqs elements, restrict the total num to
RTE_MAX_LCORE_FREQS. This fix aims to fix the coverity scan issue
like:
Overrunning array "pi->freqs" of 256 bytes by passing it to a
function which accesses it at byte offset 464.
Coverity issue: 371913
Fixes: ef1cc88f18 ("power: support cppc_cpufreq driver")
Cc: stable@dpdk.org
Signed-off-by: Richael Zhuang <richael.zhuang@arm.com>
Acked-by: David Hunt <david.hunt@intel.com>
The first argument to rte_bsf32_safe was incorrectly declared as
a 64 bit value. The code only works on 32 bit values and the underlying
function rte_bsf32 only accepts 32 bit values. This was a mistake
introduced when the safe version was added and probably cause
by copy/paste from the 64 bit version.
The bug passed silently under the radar until some other code was
built with -Wall and -Wextra in C++ and C++ complains about the
missing cast.
Yes, this is a API signature change, but the original code was wrong.
It is an inline so not an ABI change.
Fixes: 4e261f5519 ("eal: add 64-bit bsf and 32-bit safe bsf functions")
Cc: stable@dpdk.org
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
When the guest memory is hotplugged, the vhost application which
enables DMA acceleration must stop DMA transfers before the vhost
re-maps the guest memory.
This patch is to notify the vhost application of stopping DMA
transfers.
Signed-off-by: Jiayu Hu <jiayu.hu@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Applications need to stop DMA transfers and finish all the inflight
packets when in VM memory hot-plug case and async vhost is used. This
patch is to provide an unsafe API to clear inflight packets which
are submitted to DMA engine in vhost async data path. Update the
program guide and release notes for virtqueue inflight packets clear
API in vhost lib.
Signed-off-by: Cheng Jiang <cheng1.jiang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
The async vhost callback ops should return negative value when there
are something wrong in the callback, so the return type should be
changed into int32_t. The issue in vhost example is also fixed.
Fixes: cd6760da10 ("vhost: introduce async enqueue for split ring")
Fixes: 819a716858 ("vhost: fix async callback return type")
Fixes: 6b3c81db8b ("vhost: simplify async copy completion")
Fixes: abec60e711 ("examples/vhost: support vhost async data path")
Fixes: 6e9a9d2a02 ("examples/vhost: fix ioat dependency")
Fixes: 873e8dad6f ("vhost: support packed ring in async datapath")
Cc: stable@dpdk.org
Signed-off-by: Cheng Jiang <cheng1.jiang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
EAL functions rte_eal_alarm_set() and rte_eal_alarm_cancel()
did not for invalid parameters in Windows implementation,
which is caught by the unit test alarm_autotest.
Enforce parameter check to fail fast for invalid parameters.
Fixes: f4cbdbc7fb ("eal/windows: implement alarm API")
Cc: stable@dpdk.org
Signed-off-by: Jie Zhou <jizh@linux.microsoft.com>
Acked-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
Currently in scale mode, multi-queue initialization will attempt to
initialize and de-initialize the per-lcore power library structures
multiple times. Fix it to only do this whenever we either enabling
first queue or disabling last queue.
Fixes: 5dff9a72b0 ("power: support callbacks for multiple Rx queues")
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: David Hunt <david.hunt@intel.com>
This patch adds thread unsafe version for async register and
unregister functions.
Signed-off-by: Jiayu Hu <jiayu.hu@intel.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
This patch reworks the async configuration structure to improve code
readability. In addition, add preserved padding fields on the structure
for future usage.
Signed-off-by: Jiayu Hu <jiayu.hu@intel.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
The vhost notifies the application of device readiness via
vhost_user_notify_queue_state(), but calling this function
is not protected by the lock. This patch is to make this
function call lock protected.
Fixes: d0fcc38f5f ("vhost: improve device readiness notifications")
Cc: stable@dpdk.org
Signed-off-by: Jiayu Hu <jiayu.hu@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Unlike split ring, packed ring does not mandate the ring size
to be a power of 2. So we have to use a modulo operation when
wrapping ring index.
Fixes: 873e8dad6f ("vhost: support packed ring in async datapath")
Cc: stable@dpdk.org
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Cheng Jiang <cheng1.jiang@intel.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
This patch allows to check the amount of in-flight packets
for the vhost queue using async acceleration.
Signed-off-by: Jiayu Hu <jiayu.hu@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
We assume that in the sync path, if there is no buffer wrap in the
avail descriptors fetched in a batch, there is no buffer wrap in the
used descriptors which need to be written back in this batch, but
this assumption is wrong in the async path since there are inflight
descriptors which are processed by the DMA device.
This patch refactors the batch copy code and adds used ring buffer
wrap check as a batch copy condition to fix this issue.
Fixes: 873e8dad6f ("vhost: support packed ring in async datapath")
Cc: stable@dpdk.org
Signed-off-by: Cheng Jiang <cheng1.jiang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
We introduced some new indexes in packed ring of async vhost. They
will eventually overflow and lead to errors if the ring size is not
a power of 2. This patch is to check and keep these indexes within a
reasonable range.
Fixes: 873e8dad6f ("vhost: support packed ring in async datapath")
Cc: stable@dpdk.org
Signed-off-by: Cheng Jiang <cheng1.jiang@intel.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
When parsing the virtio net header and packet header for dequeue offload,
we need to perform sanity check on the packet header to ensure:
- No out-of-boundary memory access.
- The packet header and virtio_net header are valid and aligned.
Fixes: d0cf91303d ("vhost: add Tx offload capabilities")
Cc: stable@dpdk.org
Signed-off-by: Xiao Wang <xiao.w.wang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Due to a typo, the selector_free() function incorrectly takes an early
return when the selectors array is non-NULL, as opposed to the other
way around.
Coverity issue: 371912
Fixes: cdaa937d3e ("pipeline: support selector table")
Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
Currently, the error paths can lead to attempts at dereferencing NULL
pointers. Add the check to avoid attempts at dereferencing NULL
pointers.
Coverity issue: 371895
Coverity issue: 371889
Fixes: 06cffd468f ("power: refactor ACPI and intel_pstate support")
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
When the distributor sample app is built as a 32-bit app,
the data buffer passed to find_match_vec can be unaligned,
causing a segmentation fault due to writing a 128-bit value
using _mm_store_si128(). 128-bit align the data being
passed in so this does not happen.
Fixes: 775003ad2f ("distributor: add new burst-capable library")
Cc: stable@dpdk.org
Signed-off-by: David Hunt <david.hunt@intel.com>
In its current state, the API can overflow the user-passed buffer if a new
representor range appears between function calls.
In order to solve this problem, augment the representor info structure with
the numbers of allocated and initialized ranges. This way the users of this
structure can be sure they will not overrun the buffer.
Fixes: 85e1588ca7 ("ethdev: add API to get representor info")
Cc: stable@dpdk.org
Signed-off-by: Viacheslav Galaktionov <viacheslav.galaktionov@oktetlabs.ru>
Signed-off-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Reviewed-by: Xueming Li <xuemingl@nvidia.com>
This is a normal case that the primary process already
owned one device while the secondary process try to
attach it, so suppress the error log here to exclude
this case.
Signed-off-by: Changpeng Liu <changpeng.liu@intel.com>
Add support for the Longest Prefix Match (LPM) lookup to the SWX
pipeline.
Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
Signed-off-by: Churchill Khangar <churchill.khangar@intel.com>
A selector table is made up of groups of weighted members, with a
given member potentially part of several groups. The select operation
returns a member ID by first selecting a group based on an input group
ID and then selecting a member within that group based on hashing one
or several input header/meta-data fields. It is very useful for
implementing an ECMP/WCMP-enabled FIB or a load balancer. It is part
of the action selector described by the P4 Portable Switch
Architecture (PSA) specification.
Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
The rte_swx_pipeline_table_entry_read() function is used to read from
a character string a table entry that is to be added to the table,
deleted from the table or set as the default entry of the table.
Addition needs both the match and the part of the entry, deletion
ignores the action part, while the default set ignores the match part,
hence the need to make both the match and the action part optional.
The logic for skipping the match or the action part was broken, hence
the current fix.
Fixes: b32c0a2c5e ("pipeline: add SWX table update high level API")
Cc: stable@dpdk.org
Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
Signed-off-by: Venkata Suresh Kumar P <venkata.suresh.kumar.p@intel.com>
Signed-off-by: Churchill Khangar <churchill.khangar@intel.com>
Due to a typo, only 3 out of 4 keys in the bucket of the exact match
table were considered, which can result in valid keys being
incorrectly dropped from the table.
Fixes: d0a0096661 ("table: add exact match SWX table")
Cc: stable@dpdk.org
Signed-off-by: Thierry Herbelot <thierry.herbelot@6wind.com>
Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
If the target machine has SVE feature (e.g. "-march=armv8.2-a+sve'),
and the compiler is gcc-8.3, it will produce this error:
In file included from lib/eal/common/eal_common_options.c:38:
lib/eal/arm/include/rte_vect.h:13:10: fatal error:
arm_sve.h: No such file or directory
#include <arm_sve.h>
^~~~~~~~~~~
The root cause is that gcc-8.3 supports SVE (the macro
__ARM_FEATURE_SVE was 1), but it doesn't support SVE ACLE [1].
The solution:
a) Detect compiler whether support SVE ACLE, if support then define
RTE_HAS_SVE_ACLE macro.
b) Use the RTE_HAS_SVE_ACLE macro to include SVE header file.
[1] ACLE: Arm C Language Extensions, the SVE ACLE header file is
<arm_sve.h>, user should include it when writing ACLE SVE code.
Fixes: 67b68824a8 ("lpm/arm: support SVE")
Cc: stable@dpdk.org
Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
Acked-by: Ruifeng Wang <ruifeng.wang@arm.com>
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Instead of polling for tail to be updated, use WFE instruction.
Signed-off-by: Gavin Hu <gavin.hu@arm.com>
Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Steve Capper <steve.capper@arm.com>
Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
In acquiring a spinlock, cores repeatedly poll the lock variable.
This is replaced by rte_wait_until_equal API.
Running micro benchmarking and testpmd and l3fwd traffic tests
on ThunderX2, Ampere eMAG80 and Arm N1SDP, everything went well
and no notable performance gain nor degradation was measured.
Signed-off-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Phil Yang <phil.yang@arm.com>
Reviewed-by: Steve Capper <steve.capper@arm.com>
Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Tested-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
Use the new multi-monitor intrinsic to allow monitoring multiple ethdev
Rx queues while entering the energy efficient power state. The multi
version will be used unconditionally if supported, and the UMWAIT one
will only be used when multi-monitor is not supported by the hardware.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Tested-by: David Hunt <david.hunt@intel.com>
Currently, there is a hard limitation on the PMD power management
support that only allows it to support a single queue per lcore. This is
not ideal as most DPDK use cases will poll multiple queues per core.
The PMD power management mechanism relies on ethdev Rx callbacks, so it
is very difficult to implement such support because callbacks are
effectively stateless and have no visibility into what the other ethdev
devices are doing. This places limitations on what we can do within the
framework of Rx callbacks, but the basics of this implementation are as
follows:
- Replace per-queue structures with per-lcore ones, so that any device
polled from the same lcore can share data
- Any queue that is going to be polled from a specific lcore has to be
added to the list of queues to poll, so that the callback is aware of
other queues being polled by the same lcore
- Both the empty poll counter and the actual power saving mechanism is
shared between all queues polled on a particular lcore, and is only
activated when all queues in the list were polled and were determined
to have no traffic.
- The limitation on UMWAIT-based polling is not removed because UMWAIT
is incapable of monitoring more than one address.
Also, while we're at it, update and improve the docs.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Tested-by: David Hunt <david.hunt@intel.com>
Currently, we expect that only one callback can be active at any given
moment, for a particular queue configuration, which is relatively easy
to implement in a thread-safe way. However, we're about to add support
for multiple queues per lcore, which will greatly increase the
possibility of various race conditions.
We could have used something like an RCU for this use case, but absent
of a pressing need for thread safety we'll go the easy way and just
mandate that the API's are to be called when all affected ports are
stopped, and document this limitation. This greatly simplifies the
`rte_power_monitor`-related code.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Tested-by: David Hunt <david.hunt@intel.com>
Use RTM and WAITPKG instructions to perform a wait-for-writes similar to
what UMWAIT does, but without the limitation of having to listen for
just one event. This works because the optimized power state used by the
TPAUSE instruction will cause a wake up on RTM transaction abort, so if
we add the addresses we're interested in to the read-set, any write to
those addresses will wake us up.
Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: David Hunt <david.hunt@intel.com>
Previously, the semantics of power monitor were such that we were
checking current value against the expected value, and if they matched,
then the sleep was aborted. This is somewhat inflexible, because it only
allowed us to check for a specific value in a specific way.
This commit replaces the comparison with a user callback mechanism, so
that any PMD (or other code) using `rte_power_monitor()` can define
their own comparison semantics and decision making on how to detect the
need to abort the entering of power optimized state.
Existing implementations are adjusted to follow the new semantics.
Suggested-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Tested-by: David Hunt <david.hunt@intel.com>
Acked-by: Timothy McDaniel <timothy.mcdaniel@intel.com>
There are two execution states on armv8 architecture, aarch64 and
aarch32. Add PLATFORM_STR for the latter and update RTE_ARCH_* flags
according to e9b9739264.
Signed-off-by: Juraj Linkeš <juraj.linkes@pantheon.tech>
'rte_kni_update_link()' updates virtual KNI interface link using kernel
sysfs interface.
If the requested link status is same as interface link status, do not
update the link status but return with success.
Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Currently in DPDK only acpi_cpufreq and pstate_cpufreq drivers are
supported, which are both not available on arm64 platforms. Add
support for cppc_cpufreq driver which works on most arm64 platforms.
Signed-off-by: Richael Zhuang <richael.zhuang@arm.com>
Acked-by: David Hunt <david.hunt@intel.com>
Currently, ACPI and PSTATE modes have lots of code duplication,
confusing logic, and a bunch of other issues that can, and have, led to
various bugs and resource leaks.
This commit factors out the common parts of sysfs reading/writing for
ACPI and PSTATE drivers.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Signed-off-by: David Hunt <david.hunt@intel.com>
Currently, ACPI code uses rte_power_info as the struct name, which
gives the appearance that this is an externally visible API. Fix to
use internal namespace.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: David Hunt <david.hunt@intel.com>
Currently, if dev_configure is not called or fails to be called, users
can still call dev_start successfully. So it is necessary to have a flag
which indicates whether the device is configured, to control whether
dev_start can be called and eliminate dependency on user invocation order.
The flag stored in "struct rte_eth_dev_data" is more reasonable than
"enum rte_eth_dev_state". "enum rte_eth_dev_state" is private to the
primary and secondary processes, and can be independently controlled.
However, the secondary process does not make resource allocations and
does not call dev_configure(). These are done by the primary process
and can be obtained or used by the secondary process. So this patch
adds a "dev_configured" flag in "rte_eth_dev_data", like "dev_started".
Signed-off-by: Huisong Li <lihuisong@huawei.com>
Reviewed-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
libabigail raised a warning on this change.
This change is fine wrt ABI as far as we understand, but we can't
express an exception rule (see libabigail bug #28060) to waive the
changes only in this part of the rte_eth_dev_data struct.
The solution for now is to globally waive any change on the
rte_eth_dev_data structure.
Signed-off-by: David Marchand <david.marchand@redhat.com>
When calling rte_eal_cleanup, the mp channel cleanup routine only sets
mp_fd to -1 leaving the rte_mp_handle control thread running.
This control thread can spew warnings on reading on an invalid fd.
This is especially noticed with ASAN enabled.
To handle this situation, set mp_fd to -1 to signal the control thread
it should exit, but since this thread might be sleeping on the socket,
cancel the thread too.
Fixes: 85d6815fa6 ("eal: close multi-process socket during cleanup")
Cc: stable@dpdk.org
Reported-by: Owen Hilyard <ohilyard@iol.unh.edu>
Signed-off-by: David Marchand <david.marchand@redhat.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
The struct rte_flow_action was missing from DPDK API documentation.
Fixes: 3850cf0c8c ("ethdev: add tunnel encap/decap actions")
Cc: stable@dpdk.org
Signed-off-by: Jan Viktorin <viktorin@cesnet.cz>
Acked-by: Ori Kam <orika@nvidia.com>
Acked-by: Aman Deep Singh <aman.deep.singh@intel.com>
Add clock_gettime() on Windows in rte_os_shim.h.
Signed-off-by: Jie Zhou <jizh@linux.microsoft.com>
Acked-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
Add device event stubs in eal_dev.c for Windows
Signed-off-by: Jie Zhou <jizh@linux.microsoft.com>
Acked-by: Tal Shnaiderman <talshn@nvidia.com>
Acked-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
Add required macros by testpmd on Windows in rte_os_shim.h
Signed-off-by: Jie Zhou <jizh@linux.microsoft.com>
Acked-by: Tal Shnaiderman <talshn@nvidia.com>
Acked-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
Enable building libraries that testpmd depends on for Windows
Signed-off-by: Jie Zhou <jizh@linux.microsoft.com>
Acked-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
Since commit d5df2ae042 ("net: fix unneeded replacement of TCP
checksum 0"), the functions rte_ipv4_udptcp_cksum() and
rte_ipv6_udptcp_cksum() can return either 0x0000 or 0xffff when used to
verify a packet containing a valid checksum.
Since these functions should be used to calculate the checksum to set in
a packet, introduce 2 new helpers for checksum verification. They return
0 if the checksum is valid in the packet.
Use this new helper in net/tap driver.
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Morten Brørup <mb@smartsharesystems.com>
Reviewed-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Introduce an internal firmware loading helper to remove code duplication
in our drivers and handle xz compressed firmware by calling libarchive.
This helper tries to look for .xz suffixes so that drivers are not aware
the firmware has been compressed.
libarchive is set as an optional dependency: without libarchive, a
runtime warning is emitted so that users know there is a compressed
firmware.
Windows implementation is left as an empty stub.
Signed-off-by: David Marchand <david.marchand@redhat.com>
Reviewed-by: Igor Russkikh <irusskikh@marvell.com>
Acked-by: Aaron Conole <aconole@redhat.com>
Tested-by: Haiyue Wang <haiyue.wang@intel.com>
If the library fails to create the needed socket, add an additional
check to report if the error is due to a missing DPDK runtime dir.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
Acked-by: Morten Brørup <mb@smartsharesystems.com>
Acked-by: Ciara Power <ciara.power@intel.com>
When multi-process is not wanted and DPDK is run with the "no-shconf"
flag, the telemetry library still needs a runtime directory to place the
unix socket for telemetry connections. Therefore, rather than not
creating the directory when this flag is set, we can change the code to
attempt the creation anyway, but not error out if it fails. If it
succeeds, then telemetry will be available, but if it fails, the rest of
DPDK will run without telemetry. This ensures that the "in-memory" flag
will allow DPDK to run even if the whole filesystem is read-only, for
example.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Reviewed-by: Morten Brørup <mb@smartsharesystems.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
Inflight metadata are allocated using glibc's calloc.
This patch converts them to rte_zmalloc_socket to take
care of the NUMA affinity.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
This patch saves the NUMA node the virtqueue is allocated
on at init time, in order to allocate all other data on the
same node.
While most of the data are allocated before numa_realloc()
is called and so the data will be reallocated properly, some
data like the log cache are most likely allocated after.
For the virtio device metadata, we decide to allocate them
on the same node as the VQ 0.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
This patch improves the numa_realloc() function by making use
of rte_realloc_socket(), which takes care of the memory copy
and freeing of the old data.
Suggested-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
Since the Vhost-user device initialization has been reworked,
enabling the application to start using the device as soon as
the first queue pair is ready, NUMA reallocation no more
happened on queue pairs other than the first one since
numa_realloc() was returning early if the device was running.
This patch fixes this issue by reallocating the device metadata
only if the device is running. For the virtqueues, a vring state
change notification is sent to notify the application of its
disablement. Since the callback is supposed to be blocking, it
is safe to reallocate it afterwards.
Fixes: d0fcc38f5f ("vhost: improve device readiness notifications")
Cc: stable@dpdk.org
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
When the guest allocates virtqueues on a different NUMA node
than the one the Vhost metadata are allocated, both the Vhost
device struct and the virtqueues struct are reallocated.
However, reallocating the log cache on the new NUMA node was
not done. This patch fixes this by reallocating it if it has
been allocated already, which means a live-migration is
on-going.
Fixes: 1818a63147 ("vhost: move dirty logging cache out of virtqueue")
Cc: stable@dpdk.org
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
When the guest allocates virtqueues on a different NUMA node
than the one the Vhost metadata are allocated, both the Vhost
device struct and the virtqueues struct are reallocated.
However, reallocating the guest pages table was missing, which
likely causes at least one cross-NUMA accesses for every burst
of packets.
This patch reallocates this table on the same NUMA node as the
other metadata.
Fixes: e246896178 ("vhost: get guest/host physical address mappings")
Cc: stable@dpdk.org
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
When the guest allocates virtqueues on a different NUMA node
than the one the Vhost metadata are allocated, both the Vhost
device struct and the virtqueues struct are reallocated.
However, reallocating the Vhost memory table was missing, which
likely causes at least one cross-NUMA accesses for every burst
of packets.
This patch reallocates this table on the same NUMA node as the
other metadata.
Fixes: 552e8fd3d2 ("vhost: simplify memory regions handling")
Cc: stable@dpdk.org
Reported-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
Add common devargs key definition for "bus", "class" and "driver".
Signed-off-by: Xueming Li <xuemingl@nvidia.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
The string copy api rte_strscpy() did not set rte_errno during failures,
instead it just returned negative error number.
Set rte_errrno if the destination buffer is too small.
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Signed-off-by: Xueming Li <xuemingl@nvidia.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
Data types Elf32_auxv_t and Elf64_auxv_t are used by OS Linux
auxiliary vector read, and not used by arch specific cpu flag
API implementations. Hence remove them from Arm file.
Reported-by: James Grant <j.grant@qub.ac.uk>
Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
ASAN found a stack buffer overflow in lib/rib/rte_rib6.c:get_dir.
The fix for the stack buffer overflow was to make sure depth
was always < 128, since when depth = 128 it caused the index
into the ip address to be 16, which read off the end of the array.
While trying to solve the buffer overflow, I noticed that a few
changes could be made to remove the for loop entirely.
Fixes: f7e861e21c ("rib: support IPv6")
Cc: stable@dpdk.org
Signed-off-by: Owen Hilyard <ohilyard@iol.unh.edu>
Acked-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>
Rules in a classify table were not freed if the table
had a delete function.
Fixes: be41ac2a33 ("flow_classify: introduce flow classify library")
Cc: stable@dpdk.org
Signed-off-by: Owen Hilyard <ohilyard@iol.unh.edu>
Acked-by: Bernard Iremonger <bernard.iremonger@intel.com>
In kni_allocate_mbufs(), we alloc mbuf for alloc_q as this code.
allocq_free = (kni->alloc_q->read - kni->alloc_q->write - 1) \
& (MAX_MBUF_BURST_NUM - 1);
The value of allocq_free maybe zero, for example :
The ring size is 1024. After init, write = read = 0. Then we fill
kni->alloc_q to full. At this time, write = 1023, read = 0.
Then the kernel send 32 packets to userspace. At this time, write
= 1023, read = 32. And then the userspace receive this 32 packets.
Then fill the kni->alloc_q, (32 - 1023 - 1) & 31 = 0, fill nothing.
...
Then the kernel send 32 packets to userspace. At this time, write
= 1023, read = 992. And then the userspace receive this 32 packets.
Then fill the kni->alloc_q, (992 - 1023 - 1) & 31 = 0, fill nothing.
Then the kernel send 32 packets to userspace. The kni->alloc_q only
has 31 mbufs and will drop one packet.
Absolutely, this is a special scene. Normally, it will fill some
mbufs everytime, but may not enough for the kernel to use.
In this patch, we always keep the kni->alloc_q to full for the kernel
to use.
Fixes: 49da4e82cf ("kni: allocate no more mbuf than empty slots in queue")
Cc: stable@dpdk.org
Signed-off-by: Cheng Liu <liucheng11@huawei.com>
Signed-off-by: Yunjian Wang <wangyunjian@huawei.com>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Same idea as commit a287ac2891 ("vhost: allocate and free packets
in bulk in Tx packed"), allocate and free packets in bulk.
Also remove the unused function virtio_dev_pktmbuf_alloc.
Signed-off-by: Balazs Nemeth <bnemeth@redhat.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Use vc_req only after it was checked not to be NULL.
Fixes: 2d962bb736 ("vhost/crypto: fix possible TOCTOU attack")
Cc: stable@dpdk.org
Signed-off-by: Thierry Herbelot <thierry.herbelot@6wind.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Interrupt manager in Windows EAL allocates on IOCP and starts
a control thread that runs indefinitely. At DPDK cleanup
this thread was not stopped and IOCP handle was not closed.
Gracefully stop interrupt-handling in rte_eal_cleanup().
The thread already closes IOCP handle before exiting.
Fixes: 5c016fc020 ("eal/windows: add interrupt thread skeleton")
Cc: stable@dpdk.org
Signed-off-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
Acked-by: Ranjit Menon <ranjit.menon@intel.com>
Acked-by: Jie Zhou <jizh@microsoft.com>
Tested-by: Jie Zhou <jizh@microsoft.com>
Each time a work was scheduled in the interrupt thread,
usually an alarm, a handle was opened but not closed.
Opening a handle is a system call, which harms alarm precision.
Instead of opening and closing a handle each time, open it
when interrupt thread starts and close it when the thread finishes.
Fixes: 5c016fc020 ("eal/windows: add interrupt thread skeleton")
Cc: stable@dpdk.org
Signed-off-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
Tested-by: Pallavi Kadam <pallavi.kadam@intel.com>
Interrupt thread ID retained its value after interrupt thread finish.
Other interrupt routines could then operate on the wrong thread.
Clear interrupt thread ID before thread termination.
Fixes: 5c016fc020 ("eal/windows: add interrupt thread skeleton")
Cc: stable@dpdk.org
Signed-off-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
Acked-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
This became visible by backporting the following for the 19.11 stable tree:
c13ca4e8 "vfio: fix DMA mapping granularity for IOVA as VA"
The usage of type bool in the vfio code would require "#include
<stdbool.h>", but rte_vfio.h has no direct paths to stdbool.h.
It happens that in eal_vfio_mp_sync.c it comes after "#include
<rte_log.h>".
And rte_log.h since 20.05 includes stdbool since this change:
241e67bfe "log: add API to check if a logtype can log in a given level"
and thereby mitigates the issue.
It should be safe to include stdbool.h from rte_vfio.h itself
to be present exactly when needed for the struct it defines using that
type.
Fixes: c13ca4e81c ("vfio: fix DMA mapping granularity for IOVA as VA")
Cc: stable@dpdk.org
Signed-off-by: Christian Ehrhardt <christian.ehrhardt@canonical.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
--buildtype=debug with gcc 6.3 produces the following error:
../lib/librte_acl/acl_run_avx512_common.h: In function
‘resolve_match_idx_avx512x16’:
../lib/librte_acl/acl_run_avx512x16.h:33:18: error:
the last argument must be an 8-bit immediate
^
../lib/librte_acl/acl_run_avx512_common.h:373:9: note:
in expansion of macro ‘_M_I_’
return _M_I_(slli_epi32)(mi, match_log);
^~~~~
Seems like gcc-6.3 complains about the following construct:
static const uint32_t match_log = 5;
...
_mm512_slli_epi32(mi, match_log);
It can't substitute constant variable 'match_log' with its actual value.
The fix replaces constant variable with its immediate value.
Bugzilla ID: 717
Fixes: b64c2295f7 ("acl: add 256-bit AVX512 classify method")
Fixes: 45da22e42e ("acl: add 512-bit AVX512 classify method")
Cc: stable@dpdk.org
Reported-by: Liang Ma <liangma@liangbit.com>
Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
__rte_alloc_size is mapped to compiler alloc_size attribute.
Quoting gcc documentation:
"""
alloc_size
The alloc_size attribute is used to tell the compiler that the
function return value points to memory, where the size is given by
one or two of the functions parameters. GCC uses this information
to improve the correctness of __builtin_object_size.
The function parameter(s) denoting the allocated size are specified
by one or two integer arguments supplied to the attribute.
The allocated size is either the value of the single function
argument specified or the product of the two function arguments
specified. Argument numbering starts at one.
"""
In rte_realloc_socket case, only 'size' matters.
Note: this has been spotted by Maxime trying to use rte_realloc_socket
and compiling with gcc 11.
Fixes: 17b347dab7 ("malloc: add alloc_size attribute to functions")
Cc: stable@dpdk.org
Signed-off-by: David Marchand <david.marchand@redhat.com>
Tested-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Bitmap initialization function is allowed to memset()
caller-provided buffer with number of bytes exceeded
this buffer size. This happens due to wrong comparison
sign between buffer size and number of bytes required
to initialize bitmap.
Fixes: 602c9ca33a ("sched: bitmap is now dynamically allocated")
Cc: stable@dpdk.org
Reported-by: Andy Moreton <amoreton@xilinx.com>
Signed-off-by: Ivan Ilchenko <ivan.ilchenko@oktetlabs.ru>
Reviewed-by: Andy Moreton <amoreton@xilinx.com>
Signed-off-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
Add the API to set 'Bus Master Enable' bit to be enabled or disabled in
the PCI command register.
Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Acked-by: Ray Kinsella <mdr@ashroe.eu>
This code is not performance sensitive and can be switched to dynamic
allocations.
Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Ciara Power <ciara.power@intel.com>
In function 'stats_mem_init', pointer 'stats' should
be confirmed not null before memset it.
Fixes: af1ae8b6a3 ("graph: implement stats")
Cc: stable@dpdk.org
Signed-off-by: Hongbo Zheng <zhenghongbo3@huawei.com>
Signed-off-by: Min Hu (Connor) <humin29@huawei.com>
Reviewed-by: Jerin Jacob <jerinj@marvell.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
The Doxygen comments are placed before the related lines,
but the markers were /**< instead of /**
The struct rte_flow_item_integrity did not appear in Doxygen output
because there was no general comment for the struct.
Fixes: b10a421a1f ("ethdev: add packet integrity check flow rules")
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
IOTLB messages will be sent when some queues are not enabled. If we
initialize IOTLB in vhost_user_set_vring_num, it could happen that IOTLB
update comes when IOTLB pool of disabled queues are not initialized.
Fixes: 968bbc7e2e ("vhost: avoid IOTLB mempool allocation while IOMMU disabled")
Signed-off-by: David Marchand <david.marchand@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
The optimization introduced by
commit d18db8049c ("vhost: read last used index once")
didn't account for the fact that vhost_flush_enqueue_shadow_packed
increments the last_used_idx.
For this reason, store last_used_idx after the potential call to
vhost_flush_enqueue_shadow_packed.
Bugzilla ID: 699
Fixes: d18db8049c ("vhost: read last used index once")
Signed-off-by: Balazs Nemeth <bnemeth@redhat.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Tested-by: Wei Ling <weix.ling@intel.com>
Change the variable type in store_dma_desc_info_packed() to fix
suspicious implicit sign extension.
Coverity issue: 370608, 370610, 370612
Fixes: 873e8dad6f ("vhost: support packed ring in async datapath")
Signed-off-by: Cheng Jiang <cheng1.jiang@intel.com>
A terminated pthread should be joined or detached so that its associated
resources are released.
The "ice-reset-<vf_id>" threads are used to service some reset task in
the background, but they are never joined by the thread that created
them.
The easiest solution is to detach new threads.
The Windows EAL did not provide a pthread_detach wrapper but there is no
resource to release for Windows threads, so add an empty wrapper.
Fixes: 3b3757bda3 ("net/ice: get VF hardware index in DCF")
Cc: stable@dpdk.org
Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Haiyue Wang <haiyue.wang@intel.com>
Acked-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
In function power_guest_channel_read_msg, 'lcore_id' is used before
validity check, which may cause buffer 'global_fds' accessed by index
'lcore_id' overflow.
This patch moves the validity check of 'lcore_id' before the 'lcore_id'
being used for the first time.
Fixes: 9dc843eb27 ("power: extend guest channel API for reading")
Cc: stable@dpdk.org
Signed-off-by: Hongbo Zheng <zhenghongbo3@huawei.com>
Signed-off-by: Min Hu (Connor) <humin29@huawei.com>
Reviewed-by: Reshma Pattan <reshma.pattan@intel.com>
Acked-by: David Hunt <david.hunt@intel.com>
Currently, the mp uses gettimeofday() API to get the time, and used as
timeout parameter.
But the time which gets from gettimeofday() API isn't monotonically
increasing. The process may fail if the system time is changed.
This fixes it by using clock_gettime() API with monotonic attribution.
Fixes: 783b6e5497 ("eal: add synchronous multi-process communication")
Fixes: f05e26051c ("eal: add IPC asynchronous request")
Cc: stable@dpdk.org
Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
Signed-off-by: Min Hu (Connor) <humin29@huawei.com>
Acked-by: Morten Brørup <mb@smartsharesystems.com>
For 32-bit targets, size_t is normally a 32-bit type and
does not have sufficient range to represent 64-bit offsets
that are needed when mapping PCI addresses.
Use uint64_t instead.
Found when attempting to run 32-bit Linux dpdk-testpmd
using VFIO driver:
EAL: pci_map_resource(): cannot map resource(63, 0xc0010000, \
0x200000, 0x20000000000): Invalid argument ((nil))
Fixes: c4b89ecb64 ("eal: introduce memory management wrappers")
Cc: stable@dpdk.org
Signed-off-by: Lance Richardson <lance.richardson@broadcom.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
Spotted by sparse in OVS build:
../../lib/netdev-dpdk.c: note: in included file (through
/home/runner/work/ovs/ovs/dpdk-dir/build/include/rte_ip.h,
/home/runner/work/ovs/ovs/dpdk-dir/build/include/rte_flow.h, ...):
../../include/sparse/arpa/inet.h:22:2: error: "Must include
<netinet/in.h> before <arpa/inet.h> for FreeBSD support"
This is a check enforced by OVS itself.
See [1] for some context.
1: https://github.com/openvswitch/ovs/commit/b2befd5bb2db
Fixes: 89813a522e ("net: provide IP-related API on any OS")
Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Spotted by sparse in OVS build:
/home/runner/work/ovs/ovs/dpdk-dir/build/include/rte_flow.h:789:27:
error: incorrect type in initializer (different base types)
/home/runner/work/ovs/ovs/dpdk-dir/build/include/rte_flow.h:789:27:
expected unsigned short [usertype] ether_type
/home/runner/work/ovs/ovs/dpdk-dir/build/include/rte_flow.h:789:27:
got restricted ovs_be16 [usertype]
/home/runner/work/ovs/ovs/dpdk-dir/build/include/rte_flow.h:829:25:
error: incorrect type in initializer (different base types)
/home/runner/work/ovs/ovs/dpdk-dir/build/include/rte_flow.h:829:25:
expected unsigned short [usertype] vlan_tci
/home/runner/work/ovs/ovs/dpdk-dir/build/include/rte_flow.h:829:25:
got restricted ovs_be16 [usertype]
/home/runner/work/ovs/ovs/dpdk-dir/build/include/rte_flow.h:830:26:
error: incorrect type in initializer (different base types)
/home/runner/work/ovs/ovs/dpdk-dir/build/include/rte_flow.h:830:26:
expected unsigned short [usertype] eth_proto
/home/runner/work/ovs/ovs/dpdk-dir/build/include/rte_flow.h:830:26:
got restricted ovs_be16 [usertype]
This was not caught before as no code in headers was using those fields.
This changed with commit 6f2168b69a ("ethdev: reuse ethernet header
definition in flow item") and commit a56a262e34 ("ethdev: reuse VLAN
header definition in flow item").
Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Let's try to enforce the convention where most drivers use a pmd. logtype
with their class reflected in it, and libraries use a lib. logtype.
Introduce two new macros:
- RTE_LOG_REGISTER_DEFAULT can be used when a single logtype is
used in a component. It is associated to the default name provided
by the build system,
- RTE_LOG_REGISTER_SUFFIX can be used when multiple logtypes are used,
and then the passed name is appended to the default name,
RTE_LOG_REGISTER is left untouched for existing external users
and for components that do not comply with the convention.
There is a new Meson variable log_prefix to adapt the default name
for baseband (pmd.bb.), bus (no pmd.) and mempool (no pmd.) classes.
Note: achieved with below commands + reverted change on net/bonding +
edits on crypto/virtio, compress/mlx5, regex/mlx5
$ git grep -l RTE_LOG_REGISTER drivers/ |
while read file; do
pattern=${file##drivers/};
class=${pattern%%/*};
pattern=${pattern#$class/};
drv=${pattern%%/*};
case "$class" in
baseband) pattern=pmd.bb.$drv;;
bus) pattern=bus.$drv;;
mempool) pattern=mempool.$drv;;
*) pattern=pmd.$class.$drv;;
esac
sed -i -e 's/RTE_LOG_REGISTER(\(.*\), '$pattern',/RTE_LOG_REGISTER_DEFAULT(\1,/' $file;
sed -i -e 's/RTE_LOG_REGISTER(\(.*\), '$pattern'\.\(.*\),/RTE_LOG_REGISTER_SUFFIX(\1, \2,/' $file;
done
$ git grep -l RTE_LOG_REGISTER lib/ |
while read file; do
pattern=${file##lib/};
pattern=lib.${pattern%%/*};
sed -i -e 's/RTE_LOG_REGISTER(\(.*\), '$pattern',/RTE_LOG_REGISTER_DEFAULT(\1,/' $file;
sed -i -e 's/RTE_LOG_REGISTER(\(.*\), '$pattern'\.\(.*\),/RTE_LOG_REGISTER_SUFFIX(\1, \2,/' $file;
done
Signed-off-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
rte_thash_adjust_tuple() uses random to generate a new subtuple if
fn() callback reports about collision. In some cases random changes
the subtuple in a way that after complementary bits are applied the
original tuple is obtained. This patch replaces random with subtuple
increment.
Fixes: 28ebff11c2 ("hash: add predictable RSS")
Cc: vladimir.medvedkin@intel.com
Reported-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>
Acked-by: Yipeng Wang <yipeng1.wang@intel.com>
Tested-by: Stanislaw Kardach <kda@semihalf.com>
Reviewed-by: Stanislaw Kardach <kda@semihalf.com>
This is reported by our internal covscan:
1. dpdk-20.11/lib/librte_eal/common/eal_common_options.c:508: alloc_fn:
Storage is returned from allocation function "dlopen".
6. dpdk-20.11/lib/librte_eal/common/eal_common_options.c:508:
leaked_storage: Failing to save or free storage allocated by
"dlopen("librte_eal.so.21.0", 5)" leaks it.
# 506| * shared library is not already loaded i.e. it's
# statically linked.)
# 507| */
# 508|-> if (dlopen("librte_eal.so."ABI_VERSION, RTLD_LAZY |
# RTLD_NOLOAD) != NULL &&
# 509| *default_solib_dir != '\0' &&
# 510| stat(default_solib_dir, &sb) == 0 &&
This leak is not an issue per se, but on the other hand, this is easy
to fix and I prefer not having to waive this warning later.
Fixes: 06c7871dde ("eal: restrict default plugin path to shared lib mode")
Cc: stable@dpdk.org
Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
If no enable_drivers option is passed, the default is to build
the drivers list by calling list-dir-globs.py.
But if no Python interpreter is installed, no error is reported
and all drivers end up being disabled.
Example on a minimal FreeBSD VM:
dpdk@freebsd:~/dpdk $ meson setup build
...
drivers:
common/cpt: not in enabled drivers build config
common/dpaax: not in enabled drivers build config
common/iavf: not in enabled drivers build config
common/mvep: not in enabled drivers build config
common/octeontx: not in enabled drivers build config
common/octeontx2: not in enabled drivers build config
bus/dpaa: not in enabled drivers build config
bus/fslmc: not in enabled drivers build config
...
dpdk@freebsd:~/dpdk $ cd drivers/
dpdk@freebsd:~/dpdk/drivers $ ~/dpdk/buildtools/list-dir-globs.py */*
env: python3: No such file or directory
Rely on meson internal interpreter.
Check return code when calling this script.
Fixes: ab9407c3ad ("build: allow using wildcards to disable drivers")
Fixes: 2e33309ebe ("config: enable/disable drivers in Arm builds")
Cc: stable@dpdk.org
Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
This patch adds checking for service core index validity when parsing
service corelist.
Fixes: 7dbd7a6413 ("service: add -S corelist option")
Cc: stable@dpdk.org
Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
Signed-off-by: Min Hu (Connor) <humin29@huawei.com>
Acked-by: Harry van Haaren <harry.van.haaren@intel.com>
This patch adds checking for mp reply result in handle_sync().
Fixes: 07dcbfe010 ("malloc: support multiprocess memory hotplug")
Cc: stable@dpdk.org
Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
Signed-off-by: Min Hu (Connor) <humin29@huawei.com>
Most of the checks on developer_mode have been accidentally dropped.
Restore them.
Fixes: 7d611e35b0 ("lib: simplify main build file")
Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
The vhost library currently configures Tx offloading (PKT_TX_*) on any
packet received from a guest virtio device which asks for some offloading.
This is problematic, as Tx offloading is something that the application
must ask for: the application needs to configure devices
to support every used offloads (ip, tcp checksumming, tso..), and the
various l2/l3/l4 lengths must be set following any processing that
happened in the application itself.
On the other hand, the received packets are not marked wrt current
packet l3/l4 checksumming info.
Copy virtio rx processing to fix those offload flags with some
differences:
- accept VIRTIO_NET_HDR_GSO_ECN and VIRTIO_NET_HDR_GSO_UDP,
- ignore anything but the VIRTIO_NET_HDR_F_NEEDS_CSUM flag (to comply with
the virtio spec),
Some applications might rely on the current behavior, so it is left
untouched by default.
A new RTE_VHOST_USER_NET_COMPLIANT_OL_FLAGS flag is added to enable the
new behavior.
The vhost example has been updated for the new behavior: TSO is applied to
any packet marked LRO.
Fixes: 859b480d5a ("vhost: add guest offload setting")
Cc: stable@dpdk.org
Signed-off-by: David Marchand <david.marchand@redhat.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Add batch datapath for async vhost packed ring to improve the
performance of small packet processing.
Signed-off-by: Cheng Jiang <cheng1.jiang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
For now async vhost data path only supports split ring. This patch
enables packed ring in async vhost data path to make async vhost
compatible with virtio 1.1 spec.
Signed-off-by: Cheng Jiang <cheng1.jiang@intel.com>
Reviewed-by: Jiayu Hu <jiayu.hu@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
This patch moves some code of async vhost split ring into
inline functions to improve the readability. Also, it
changes the pointer index style of iterator to make the
code more concise.
Signed-off-by: Cheng Jiang <cheng1.jiang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Jiayu Hu <jiayu.hu@intel.com>
The list_commands() function accessed the callbacks list,
but did not take the lock. This may have caused inconsistencies if
callbacks were being registered at the same time.
This is now fixed to lock before iterating the list,
and unlock afterwards.
Fixes: f38748736e ("telemetry: add default callback commands")
Cc: stable@dpdk.org
Reported-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ciara Power <ciara.power@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Remove TELEMETRY_MAX_CALLBACKS symbol from the public
rte_telemetry.h header file.
Signed-off-by: Jerin Jacob <jerinj@marvell.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Ciara Power <ciara.power@intel.com>
This patch fixes issue with OVS 2.15 not working on
DPAA/FSLMC based platform due to missing support for
these busses in dev_iterate.
This patch adds dpaa_bus and fslmc to dev iterator
for bus arguments.
Fixes: 214ed1acd1 ("ethdev: add iterator to match devargs input")
Cc: stable@dpdk.org
Signed-off-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
Add integrity item definition to the rte_flow_desc_item array.
The new entry allows to build RTE flow item from a data
stored in rte_flow_item_integrity type.
Fixes: b10a421a1f ("ethdev: add packet integrity check flow rules")
Signed-off-by: Gregory Etelson <getelson@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Acked-by: Ori Kam <orika@nvidia.com>
Move allocation out further and perform all allocation in bulk. The same
goes for freeing packets. In the process, also introduce
virtio_dev_pktmbuf_prep and make virtio_dev_pktmbuf_alloc use that.
Signed-off-by: Balazs Nemeth <bnemeth@redhat.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
The remained variable stores the same information as the difference
between count and pkt_idx. Remove the remained variable to simplify.
Signed-off-by: Balazs Nemeth <bnemeth@redhat.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Instead of calculating the address of a packed descriptor based on the
vq->desc_packed and vq->last_used_idx every time, store that base
address in desc_base. On arm, this saves 176 bytes in code size of
function in which vhost_flush_enqueue_batch_packed gets inlined.
Signed-off-by: Balazs Nemeth <bnemeth@redhat.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
When VHOST_USER_F_PROTOCOL_FEATURES is not negotiated,
there is no need for vhost_user_set_vring_kick() to
notify the application of vring enabled, as
vhost_user_msg_handler() also notifies the application.
This patch is to remove unnecessary vring_state_changed() call.
Fixes: d0fcc38f5f ("vhost: improve device readiness notifications")
Cc: stable@dpdk.org
Signed-off-by: Jiayu Hu <jiayu.hu@intel.com>
Tested-by: Yinan Wang <yinan.wang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
This patch removes unnecessary rte_free() for async_pkts_info
and async_descs_split.
Signed-off-by: Jiayu Hu <jiayu.hu@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Tested-by: Yinan Wang <yinan.wang@intel.com>
Currently, when we set the acpi governor to "userspace", we check if
it is already set to this value, and if it is, we skip setting it.
However, we never save this value anywhere, so that next time we come
back and request the governor to be set to its original value, the
original value is empty.
Fix it by saving the original pstate governor first. While we're at it,
replace `strlcpy` with `rte_strscpy`.
Fixes: 445c6528b5 ("power: common interface for guest and host")
Cc: stable@dpdk.org
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Reshma Pattan <reshma.pattan@intel.com>
In function 'eval_jcc', judgment 'op == EBPF_JLT' occurs
twice, as a result, the corresponding second statement
cannot be accessed.
This patch fix this problem.
Fixes: 8021917293 ("bpf: add extra validation for input BPF program")
Cc: stable@dpdk.org
Signed-off-by: Hongbo Zheng <zhenghongbo3@huawei.com>
Signed-off-by: Min Hu (Connor) <humin29@huawei.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
gcc 11 with '-O2' complains about some variables being used without
being initialized:
In function ‘start_flow_avx512x8’,
inlined from ‘search_trie_avx512x8.constprop’ at acl_run_avx512_common.h:317:
lib/librte_acl/acl_run_avx512_common.h:210:13: warning:
‘pdata’ is used uninitialized [-Wuninitialized]
In function ‘search_trie_avx512x8.constprop’:
lib/librte_acl/acl_run_avx512_common.h:314:32: note: ‘pdata’ declared here
...
Indeed, these variables are not explicitly initialized,
but this is done intentionally.
We rely on constant mask value that we pass to start_flow*() functions
as a parameter to mask out uninitialized values.
Note that '-O3' doesn't produce this warning.
Anyway, to support clean build with gcc-11 this patch adds
explicit initialization for these variables.
I checked the output binary: with '-O3' both clang and gcc 10/11
generate no extra code for it.
Also performance test didn't reveal any regressions.
Bugzilla ID: 673
Fixes: b64c2295f7 ("acl: add 256-bit AVX512 classify method")
Fixes: 45da22e42e ("acl: add 512-bit AVX512 classify method")
Cc: stable@dpdk.org
Reported-by: Ali Alnubani <alialnu@nvidia.com>
Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
This patch fixes the issue that epoll_events memory is not released
after the intr thread created fail.
Fixes: 3810ae4357 ("eventdev: add interrupt driven queues to Rx adapter")
Cc: stable@dpdk.org
Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
Signed-off-by: Min Hu (Connor) <humin29@huawei.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
The thread name already set by rte_ctrl_thread_create() API, so remove
the call of rte_thread_setname() API.
Fixes: 3810ae4357 ("eventdev: add interrupt driven queues to Rx adapter")
Cc: stable@dpdk.org
Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
Signed-off-by: Min Hu (Connor) <humin29@huawei.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
Clarify that the mempool private initializer and object initializer used
for packet pools require that the mempool private size is large enough.
Also add an assert (only enabled when -DRTE_ENABLE_ASSERT is passed) to
check this constraint.
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Aaron Conole <aconole@redhat.com>
Because mbuf dyn shared memory was allocated runtime, so it's
necessary to check validity when dump mbuf dyn info.
Also this patch adds an error logging when init shared memory fail.
Fixes: 4958ca3a44 ("mbuf: support dynamic fields and flags")
Cc: stable@dpdk.org
Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
Signed-off-by: Min Hu (Connor) <humin29@huawei.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
the strncasecmp macro defined in rte_os_shim.h is already
defined in MinGW-w64, as a result the compiler prints out
the warning below on function redefinition whenever compiling
a file including the header in debug mode.
lib/eal/windows/include/rte_os_shim.h:21:
warning: "strncasecmp" redefined
Fixed by defining the macro only to the clang compiler.
Fixes: 45d62067c2 ("eal: make OS shims internal")
Signed-off-by: Tal Shnaiderman <talshn@nvidia.com>
Acked-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
REG_PLATFORM only uses bit 0 to indicate whether the value retrieved
from hardware matches PLATFORM_STR.
Fixes: 97523f822b ("eal/arm: add CPU flags for ARMv8")
Cc: stable@dpdk.org
Signed-off-by: Juraj Linkeš <juraj.linkes@pantheon.tech>
Reviewed-by: Jerin Jacob <jerinj@marvell.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Running "./devtools/check-meson.py --fix" on the DPDK repo fixes a
number of issues with whitespace and formatting of files:
* indentation of lists
* missing trailing commas on final list element
* multiple list entries per line when list is not all single-line
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
If cache is enabled, objects will be retrieved/put from/to cache,
subsequently from/to the common pool. Now the debug stats calculate
the objects retrieved/put from/to cache and pool together, it is
better to distinguish them.
Signed-off-by: Joyce Kong <joyce.kong@arm.com>
Signed-off-by: Dharmik Thakkar <dharmik.thakkar@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Since commit 7911ba0473 ("stack: enable lock-free implementation for
aarch64"), lock-free stack is supported on arm64 but this description was
missing from the doxygen for the flag.
Currently it is impossible to detect programmatically whether lock-free
implementation of rte_stack is supported. One could check whether the
header guard for lock-free stubs is defined (_RTE_STACK_LF_STUBS_H_) but
that's an unstable implementation detail. Because of that currently all
lock-free ring creations silently succeed (as long as the stack header
is 16B long) which later leads to push and pop operations being NOPs.
The observable effect is that stack_lf_autotest fails on platforms not
supporting the lock-free. Instead it should just skip the lock-free test
altogether.
This commit adds a new errno value (ENOTSUP) that may be returned by
rte_stack_create() to indicate that a given combination of flags is not
supported on a current platform.
This is detected by checking a compile-time flag in the include logic in
rte_stack_lf.h which may be used by applications to check the lock-free
support at compile time.
Use the added RTE_STACK_LF_SUPPORTED flag to disable the lock-free stack
tests at the compile time.
Perf test doesn't fail because rte_ring_create() succeeds, however
marking this test as skipped gives a better indication of what actually
was tested.
Fixes: 7911ba0473 ("stack: enable lock-free implementation for aarch64")
Signed-off-by: Stanislaw Kardach <kda@semihalf.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
PKT_RX_EIP_CKSUM_BAD has been declared deprecated but there was no
warning to applications still using it.
Fix this by marking as deprecated with the newly introduced
RTE_DEPRECATED.
Fixes: e8a419d6de ("mbuf: rename outer IP checksum macro")
Signed-off-by: David Marchand <david.marchand@redhat.com>
Reviewed-by: Flavio Leitner <fbl@sysclose.org>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Lance Richardson <lance.richardson@broadcom.com>
Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
C11 timespec_get() is not provided on some platforms:
* MinGW-w64 does not currently implement it [1].
* FreeBSD 11 with Clang 10.0.0 does not provide it.
Add internal shims to Windows and FreeBSD EALs.
For Windows, it can be removed after [1] is fixed.
[1]: https://sourceforge.net/p/mingw-w64/mailman/message/37224689/
Signed-off-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
Acked-by: Jie Zhou <jizh@linux.microsoft.com>
Acked-by: Nick Connolly <nick.connolly@mayadata.io>
This patch adds more sanity checks in control path APIs.
Fixes: 214ed1acd1 ("ethdev: add iterator to match devargs input")
Fixes: 3d98f921fb ("ethdev: unify prefix for static functions and variables")
Fixes: 0366137722 ("ethdev: check for invalid device name")
Fixes: d948f596fe ("ethdev: fix port data mismatched in multiple process model")
Fixes: 5b7ba31148 ("ethdev: add port ownership")
Fixes: f8244c6399 ("ethdev: increase port id range")
Cc: stable@dpdk.org
Signed-off-by: Min Hu (Connor) <humin29@huawei.com>
Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Reviewed-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
Currently, the flow meter policy does not support multiple actions
per color; also the allowed action types per color are very limited.
In addition, the policy cannot be pre-defined.
Due to the growing in flow actions offload abilities there is a potential
for the user to use variety of actions per color differently.
This new meter policy API comes to allow this potential in the most ethdev
common way using rte_flow action definition.
A list of rte_flow actions will be provided by the user per color
in order to create a meter policy.
In addition, the API forces to pre-define the policy before
the meters creation in order to allow sharing of single policy
with multiple meters efficiently.
meter_policy_id is added into struct rte_mtr_params.
So that it can get the policy during the meters creation.
Allow coloring the packet using a new rte_flow_action_color
as could be done by the old policy API.
Add two common policy template as macros in the head file.
The next API function were added:
- rte_mtr_meter_policy_add
- rte_mtr_meter_policy_delete
- rte_mtr_meter_policy_update
- rte_mtr_meter_policy_validate
The next struct was changed:
- rte_mtr_params
- rte_mtr_capabilities
The next API was deleted:
- rte_mtr_policer_actions_update
To support this API the following app were changed:
app/test-flow-perf: clean meter policer
app/testpmd: clean meter policer
To support this API the following drivers were changed:
net/softnic: support meter policy API
1. Cleans meter rte_mtr_policer_action.
2. Supports policy API to get color action as policer action did.
The color action will be mapped into rte_table_action_policer.
net/mlx5: clean meter creation management
Cleans and breaks part of the current meter management
in order to allow better design with policy API.
Signed-off-by: Li Zhang <lizh@nvidia.com>
Signed-off-by: Haifei Luo <haifeil@nvidia.com>
Signed-off-by: Jiawei Wang <jiaweiw@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Acked-by: Ray Kinsella <mdr@ashroe.eu>
Acked-by: Ori Kam <orika@nvidia.com>
Acked-by: Jasvinder Singh <jasvinder.singh@intel.com>
Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
This commit introduces the conntrack action and item.
Usually the HW offloading is stateless. For some stateful offloading
like a TCP connection, HW module will help provide the ability of a
full offloading w/o SW participation after the connection was
established.
The basic usage is that in the first flow rule the application should
add the conntrack action and jump to the next flow table. In the
following flow rule(s) of the next table, the application should use
the conntrack item to match on the result.
A TCP connection has two directions traffic. To set a conntrack
action context correctly, the information of packets from both
directions are required.
The conntrack action should be created on one ethdev port and supply
the peer ethdev port as a parameter to the action. After context
created, it could only be used between these two ethdev ports
(dual-port mode) or a single port. The application should modify the
action via the API "rte_action_handle_update" only when before using
it to create a flow rule with conntrack for the opposite direction.
This will help the driver to recognize the direction of the flow to
be created, especially in the single-port mode, in which case the
traffic from both directions will go through the same ethdev port
if the application works as an "forwarding engine" but not an end
point. There is no need to call the update interface if the
subsequent flow rules have nothing to be changed.
Query will be supported via "rte_action_handle_query" interface,
about the current packets information and connection status. The
fields query capabilities depends on the HW.
For the packets received during the conntrack setup, it is suggested
to re-inject the packets in order to make sure the conntrack module
works correctly without missing any packet. Only the valid packets
should pass the conntrack, packets with invalid TCP information,
like out of window, or with invalid header, like malformed, should
not pass.
Naming and definition:
https://elixir.bootlin.com/linux/latest/source/include/uapi/linux/
netfilter/nf_conntrack_tcp.h
https://elixir.bootlin.com/linux/latest/source/net/netfilter/
nf_conntrack_proto_tcp.c
Other reference:
https://www.usenix.org/legacy/events/sec01/invitedtalks/rooij.pdf
Signed-off-by: Bing Zhao <bingz@nvidia.com>
Acked-by: Ori Kam <orika@nvidia.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
Currently, DPDK application can offload the checksum check,
and report it in the mbuf.
However, as more and more applications are offloading some or all
logic and action to the HW, there is a need to check the packet
integrity so the right decision can be taken.
The application logic can be positive meaning if the packet is
valid jump / do actions, or negative if packet is not valid
jump to SW / do actions (like drop) and add default flow
(match all in low priority) that will direct the miss packet
to the miss path.
Since currently rte_flow works in positive way the assumption is
that the positive way will be the common way in this case also.
When thinking what is the best API to implement such feature,
we need to consider the following (in no specific order):
1. API breakage.
2. Simplicity.
3. Performance.
4. HW capabilities.
5. rte_flow limitation.
6. Flexibility.
First option: Add integrity flags to each of the items.
For example add checksum_ok to IPv4 item.
Pros:
1. No new rte_flow item.
2. Simple in the way that on each item the app can see
what checks are available.
Cons:
1. API breakage.
2. Increase number of flows, since app can't add global rule and must
have dedicated flow for each of the flow combinations, for example
matching on ICMP traffic or UDP/TCP traffic with IPv4 / IPv6 will
result in 5 flows.
Second option: dedicated item
Pros:
1. No API breakage, and there will be no for some time due to having
extra space. (by using bits)
2. Just one flow to support the ICMP or UDP/TCP traffic with IPv4 /
IPv6.
3. Simplicity application can just look at one place to see all possible
checks.
4. Allow future support for more tests.
Cons:
1. New item, that holds number of fields from different items.
For starter the following bits are suggested:
1. packet_ok - means that all HW checks depending on packet layer have
passed. This may mean that in some HW such flow should be split to
number of flows or fail.
2. l2_ok - all check for layer 2 have passed.
3. l3_ok - all check for layer 3 have passed. If packet doesn't have
L3 layer this check should fail.
4. l4_ok - all check for layer 4 have passed. If packet doesn't
have L4 layer this check should fail.
5. l2_crc_ok - the layer 2 CRC is O.K.
6. ipv4_csum_ok - IPv4 checksum is O.K. It is possible that the
IPv4 checksum will be O.K. but the l3_ok will be 0. It is not
possible that checksum will be 0 and the l3_ok will be 1.
7. l4_csum_ok - layer 4 checksum is O.K.
8. l3_len_OK - check that the reported layer 3 length is smaller than the
frame length.
Example of usage:
1. Check packets from all possible layers for integrity.
flow create integrity spec packet_ok = 1 mask packet_ok = 1 .....
2. Check only packet with layer 4 (UDP / TCP)
flow create integrity spec l3_ok = 1, l4_ok = 1 mask l3_ok = 1
l4_ok = 1
Signed-off-by: Ori Kam <orika@nvidia.com>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
Right now, rte_flow_shared_action_* APIs are used for some shared
actions, like RSS, count. The shared action should be created before
using it inside a flow. These shared actions sometimes are not
really shared but just some indirect actions decoupled from a flow.
The new functions rte_flow_action_handle_* are added to replace
the current shared functions rte_flow_shared_action_*.
There are two types of flow actions:
1. the direct (normal) actions that could be created and stored
within a flow rule. Such action is tied to its flow rule and
cannot be reused.
2. the indirect action, in the past, named shared_action. It is
created from a direct actioni, like count or rss, and then used
in the flow rules with an object handle. The PMD will take care
of the retrieve from indirect action to the direct action
when it is referenced.
The indirect action is accessed (update / query) w/o any flow rule,
just via the action object handle. For example, when querying or
resetting a counter, it could be done out of any flow using this
counter, but only the handle of the counter action object is
required.
The indirect action object could be shared by different flows or
used by a single flow, depending on the direct action type and
the real-life requirements.
The handle of an indirect action object is opaque and defined in
each driver and possibly different per direct action type.
The old name "shared" is improper in a sense and should be replaced.
Since the APIs are changed from "rte_flow_shared_action*" to the new
"rte_flow_action_handle*", the testpmd application code and command
line interfaces also need to be updated to do the adaption.
The testpmd application user guide is also updated. All the "shared
action" related parts are replaced with "indirect action" to have a
correct explanation.
The parameter of "update" interface is also changed. A general
pointer will replace the rte_flow_action struct pointer due to the
facts:
1. Some action may not support fields updating. In the example of a
counter, the only "update" supported should be the reset. So
passing a rte_flow_action struct pointer is meaningless and
there is even no such corresponding action struct. What's more,
if more than one operations should be supported, for some other
action, such pointer parameter may not meet the need.
2. Some action may need conditional or partial update, the current
parameter will not provide the ability to indicate which part(s)
to update.
For different types of indirect action objects, the pointer could
either be the same of rte_flow_action* struct - in order not to
break the current driver implementation, or some wrapper
structures with bits as masks to indicate which part to be
updated, depending on real needs of the corresponding direct
action. For different direct actions, the structures of indirect
action objects updating will be different.
All the underlayer PMD callbacks will be moved to these new APIs.
The RTE_FLOW_ACTION_TYPE_SHARED is kept for now in order not to
break the ABI. All the implementations are changed by using
RTE_FLOW_ACTION_TYPE_INDIRECT.
Since the APIs are changed from "rte_flow_shared_action*" to the new
"rte_flow_action_handle*" and the "update" interface's 3rd input
parameter is changed to generic pointer, the mlx5 PMD that uses these
APIs needs to do the adaption to the new APIs as well.
Signed-off-by: Bing Zhao <bingz@nvidia.com>
Acked-by: Andrey Vesnovaty <andreyv@nvidia.com>
Acked-by: Ori Kam <orika@nvidia.com>
Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
Currently, upper-layer application could get queue state only
through pointers such as dev->data->tx_queue_state[queue_id],
this is not the recommended way to access it. So this patch
add get queue state when call rte_eth_rx_queue_info_get and
rte_eth_tx_queue_info_get API.
Note: After add queue_state field, the 'struct rte_eth_rxq_info' size
remains 128B, and the 'struct rte_eth_txq_info' size remains 64B, so
it could be ABI compatible.
Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
The function pthread_setname_np() was originally not available on
FreeBSD. It has been added in FreeBSD 12.2:
https://svnweb.freebsd.org/base?view=revision&revision=362264
The EAL implementation of rte_thread_setname() is duplicated
in the telemetry library, which does not depend on EAL,
so the compilation is safe in all systems.
Fixes: 5da7736f8c ("telemetry: set socket listener thread name")
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
This patch fixes the traffic class oversubscription watermark
value by initialising it with computed value of maximum watermark.
Fixes: ac6fcb841b ("sched: update subport rate dynamically")
Cc: stable@dpdk.org
Signed-off-by: Savinay Dharmappa <savinay.dharmappa@intel.com>
Acked-by: Jasvinder Singh <jasvinder.singh@intel.com>
When fragmenting IPv4 packet, the data offset should be calculated through
the IHL field in IP header rather than using sizeof(struct rte_ipv4_hdr).
Fixes: 4c38e5532a ("ip_frag: refactor IPv4 fragmentation into a proper library")
Cc: stable@dpdk.org
Signed-off-by: Pu Xu <583493798@qq.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Add result check and message print out for thread creation after
failure.
Fixes: b80fe1805e ("telemetry: introduce backward compatibility")
Fixes: 6dd571fd07 ("telemetry: introduce new functionality")
Cc: stable@dpdk.org
Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
Signed-off-by: Min Hu (Connor) <humin29@huawei.com>
Acked-by: Ciara Power <ciara.power@intel.com>
This patch supports set init threads name which is helpful for
debugging.
Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
Signed-off-by: Min Hu (Connor) <humin29@huawei.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Add support for the disable_libs option, to allow disabling the build of
particular libraries. As part of this, maintain a list of what libraries
can safely be disabled, without breaking the build - for now this list is
solely those libraries which are not built on FreeBSD, kni, power and
vhost. This list can be expanded by future patches.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
There is no reason for the DPDK libraries to all have 'librte_' prefix on
the directory names. This prefix makes the directory names longer and also
makes it awkward to add features referring to individual libraries in the
build - should the lib names be specified with or without the prefix.
Therefore, we can just remove the library prefix and use the library's
unique name as the directory name, i.e. 'eal' rather than 'librte_eal'
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Switch from using tabs to 4 spaces for meson.build indentation. Perform
other formatting cleanups such as ensure that long lists of files are one
per line, and terminating with a final comma before the closing brace to
make addition/removals easier. In some cases, reorder lists of items
where they were not in alphabetical order.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
With the lib/meson.build file changed from C-style indentation to
python-style indentation, we need to correct the indentation of the lists
of libraries, since these libs were not modified in the previous patches.
For ease of management of the list and working with patches for adding
to the list, put each library on it's own line.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Two simplifications can be made to the build file which reduce indentation
levels and make it easier to read:
1. When meson build support was first added, the compat library existed in
DPDK as a single header file. Since that header has been merged into EAL,
we no longer need to support header-only libraries, so can shorten the
code.
2. From meson 0.49 onwards we have the "continue" keyword available to
break out of one loop iteration and begin the next. This allows us to
remove blocks in the build configuration file which were conditional on the
"build" variable being true. Instead we can use "continue" to abort
processing at the point where the "build" value becomes false.
Since this patch changes the indentation level of large parts of the
meson.build file, we use the opportunity to adjust the whitespace used to
the meson-standard 4-spec indentation level.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Compilation on CentOS 7 with gcc version 4.8.5 fails with
the following errors:
error: 'src_struct_id' may be used uninitialized in this
function [-Werror=maybe-uninitialized]
error: 'dst_struct_id' may be used uninitialized in this
function [-Werror=maybe-uninitialized]
This patch fixes the build errors by initializing both variables.
Bugzilla ID: 683
Fixes: 783768136f ("pipeline: auto-detect endianness of action arguments")
Signed-off-by: Ali Alnubani <alialnu@nvidia.com>
Adding async userspace requests which don't wait for the userspace
response and always return success. This is preparation to address a
regression in KNI.
Signed-off-by: Elad Nachman <eladv6@gmail.com>
Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
In case of EAL initialization failure rte_eal_memory_detach() may be
called before mapping memory configuration, which in this case points
to the static structure. Attempt to unmap it yields error:
EAL: Could not unmap shared memory config: Invalid argument
Skip unmapping memory configuration if it's not yet shared.
Fixes: dfbc61a2f9 ("mem: detach memsegs on cleanup")
Reported-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
This patch adds predictable RSS API.
It is based on the idea of searching partial Toeplitz hash collisions.
Signed-off-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Yipeng Wang <yipeng1.wang@intel.com>
Acked-by: John McNamara <john.mcnamara@intel.com>
Each table entry is made up of match fields and action data, with the
latter made up of the action ID and the action arguments. The approach
of having the user specify explicitly the endianness of the action
arguments is difficult to be picked up by P4 compilers, as the P4
compiler is generally unaware about this aspect.
This commit introduces the auto-detection of the endianness of the
action arguments by examining the endianness of the their destination:
network byte order (NBO) when they get copied to headers and host byte
order (HBO) when they get copied to packet meta-data or mailboxes.
The endianness specification of each action argument as part of the
rule specification, e.g. H(...) and N(...) is removed from the rule
file and auto-detected based on their destination. The DMA instruction
scope is made internal, so mov instructions need to be used. The
pattern of transferring complete headers from table entry action args
to headers is detected, and the associated set of mov instructions
plus header validate is internally detected and replaced with the
internal-only DMA instruction to preserve performance.
Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>