Commit Graph

6476 Commits

Author SHA1 Message Date
Harman Kalra
3596a037ab eal: fix parentheses in alignment macros
Found an issue while using RTE_ALIGN_MUL_NEAR with an
expression, like as passed in estimate_tsc_freq().
RTE_ALIGN_MUL_FLOOR resulted in unexpected value as
parathesis are required to evaluate an expression.

Fixes: 5120203d75 ("eal: add macros to align value to multiple")
Cc: stable@dpdk.org

Signed-off-by: Harman Kalra <hkalra@marvell.com>
2020-07-11 11:41:33 +02:00
Dmitry Kozlyuk
7daf5bdb0f eal/windows: detect insufficient privileges for hugepages
AdjustTokenPrivileges() succeeds even if no requested privileges have
been granted; this behavior is documented. Check last error code in
addition to return value to detect such case.

Make error messages more specific and add troubleshooting hint.

Signed-off-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
Acked-by: Ranjit Menon <ranjit.menon@intel.com>
2020-07-11 00:45:20 +02:00
Hongzhi Guo
982bb68cab net: fix checksum on big endian CPUs
With current code, the checksum of odd-length buffers is wrong on
big endian CPUs: the last byte is not properly summed to the
accumulator.

Fix this by left-shifting the remaining byte by 8. For instance,
if the last byte is 0x42, we should add 0x4200 to the accumulator
on big endian CPUs.

This change is similar to what is suggested in Errata 3133 of
RFC 1071.

Fixes: 6006818cfb26("net: new checksum functions")
Cc: stable@dpdk.org

Signed-off-by: Hongzhi Guo <guohongzhi1@huawei.com>
Reviewed-by: Morten Brørup <mb@smartsharesystems.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
2020-07-11 00:45:20 +02:00
Hongzhi Guo
d5df2ae042 net: fix unneeded replacement of TCP checksum 0
Per RFC768:
If the computed checksum is zero, it is transmitted as all ones.
An all zero transmitted checksum value means that the transmitter
generated no checksum.

RFC793 for TCP has no such special treatment for the checksum of zero.

Fixes: 6006818cfb ("net: new checksum functions")
Cc: stable@dpdk.org

Signed-off-by: Hongzhi Guo <guohongzhi1@huawei.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Morten Brørup <mb@smartsharesystems.com>
2020-07-11 00:45:20 +02:00
Joyce Kong
58902736a4 vhost: restrict pointer aliasing for packed ring
Restrict pointer aliasing to allow the compiler to vectorize loop
more aggressively.

With this patch, a 9.6% improvement is observed in throughput for
the packed virtio-net PVP case, and a 2.8% improvement in throughput
for the packed virtio-user PVP case. All performance data are measured
on ThunderX-2 platform under 0.001% acceptable packet loss with 1 core
on both vhost and virtio side.

Signed-off-by: Joyce Kong <joyce.kong@arm.com>
Reviewed-by: Phil Yang <phil.yang@arm.com>
Acked-by: Adrián Moreno <amorenoz@redhat.com>
2020-07-10 15:43:41 +02:00
Joyce Kong
428e684795 introduce restricted pointer aliasing marker
The 'restrict' keyword is recognized in C99, while type qualifier
'__restrict' compiles ok in C with all language levels. This patch
is to replace the existing 'restrict' with '__rte_restrict' which
is a common wrapper supported by all compilers.

Signed-off-by: Joyce Kong <joyce.kong@arm.com>
Reviewed-by: Phil Yang <phil.yang@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
2020-07-10 15:35:32 +02:00
Ruifeng Wang
8a9f8564e9 lpm: implement RCU rule reclamation
Currently, the tbl8 group is freed even though the readers might be
using the tbl8 group entries. The freed tbl8 group can be reallocated
quickly. This results in incorrect lookup results.

RCU QSBR process is integrated for safe tbl8 group reclaim.
Refer to RCU documentation to understand various aspects of
integrating RCU library into other libraries.

To avoid ABI breakage, a struct __rte_lpm is created for lpm library
internal use. This struct wraps rte_lpm that has been exposed and
also includes members that don't need to be exposed such as RCU related
config.

Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Acked-by: Ray Kinsella <mdr@ashroe.eu>
Acked-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>
2020-07-10 13:41:29 +02:00
Phil Yang
e0a439466b eal/linux: use C11 atomics for interrupt status
The event status is defined as a volatile variable and shared between
threads. Use C11 atomic built-ins with explicit ordering instead of
rte_atomic ops which enforce unnecessary barriers on aarch64.

The event status has been cleaned up by the compare-and-swap operation
when we free the event data, so there is no need to set it to invalid
after that.

Signed-off-by: Phil Yang <phil.yang@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Harman Kalra <hkalra@marvell.com>
2020-07-09 18:53:40 +02:00
Feifei Wang
2d6ed071a8 ring: use custom element for fixed size API
Use rte_ring_xxx_elem_xxx APIs to replace legacy API implementation.
This reduces code duplication and improves code maintenance.

Tests done on Arm, x86 [1] and PPC [2] do not indicate performance
degradation.
[1] https://mails.dpdk.org/archives/dev/2020-July/173780.html
[2] https://mails.dpdk.org/archives/dev/2020-July/173863.html

Signed-off-by: Feifei Wang <feifei.wang2@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Tested-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Tested-by: David Christensen <drc@linux.vnet.ibm.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2020-07-09 17:22:36 +02:00
Feifei Wang
019bffab51 ring: remove experimental flag from custom element API
Remove the experimental tag for rte_ring_xxx_elem APIs that have been
around for 2 releases.

Signed-off-by: Feifei Wang <feifei.wang2@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2020-07-09 17:22:29 +02:00
Feifei Wang
2f86fb666e ring: remove experimental flag from reset API
Remove the experimental tag for rte_ring_reset API that have been around
for 4 releases.

Signed-off-by: Feifei Wang <feifei.wang2@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
2020-07-09 17:21:54 +02:00
Levend Sayar
604d426de3 service: fix C++ linkage
"extern C" define is added to rte_service_component.h file
to be able to use in C++ context

Fixes: 21698354c8 ("service: introduce service cores concept")
Cc: stable@dpdk.org

Signed-off-by: Levend Sayar <levendsayar@gmail.com>
Acked-by: Harry van Haaren <harry.van.haaren@intel.com>
2020-07-09 16:23:57 +02:00
Ferruh Yigit
f6ac2b06ec ethdev: fix log type for some error messages
Some log macros was using 'EAL' logtype, convert them to 'ethdev'.
Also fix missing EOL and fix syntax for some logs.

Fixes: 214ed1acd1 ("ethdev: add iterator to match devargs input")
Fixes: e489007a41 ("ethdev: add generic create/destroy ethdev APIs")
Cc: stable@dpdk.org

Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>
2020-07-07 23:38:28 +02:00
Patrick Fu
cd6760da10 vhost: introduce async enqueue for split ring
This patch implements async enqueue data path for split ring. 2 new
async data path APIs are defined, by which applications can submit
and poll packets to/from async engines. The async engine is either
a physical DMA device or it could also be a software emulated backend.
The async enqueue data path leverages callback functions registered by
applications to work with the async engine.

Signed-off-by: Patrick Fu <patrick.fu@intel.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
2020-07-07 23:38:28 +02:00
Patrick Fu
78639d5456 vhost: introduce async enqueue registration API
Performing large memory copies usually takes up a major part of CPU
cycles and becomes the hot spot in vhost-user enqueue operation. To
offload the large copies from CPU to the DMA devices, asynchronous
APIs are introduced, with which the CPU just submits copy jobs to
the DMA but without waiting for its copy completion. Thus, there is
no CPU intervention during data transfer. We can save precious CPU
cycles and improve the overall throughput for vhost-user based
applications. This patch introduces registration/un-registration
APIs for vhost async data enqueue operation. Together with the
registration APIs implementations, data structures and the prototype
of the async callback functions required for async enqueue data path
are also defined.

Signed-off-by: Patrick Fu <patrick.fu@intel.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
2020-07-07 23:38:28 +02:00
Simei Su
9a859b8c4a ethdev: add PPPoE RSS offload types
This patch defines new RSS offload types for PPPoE. Typically,
session id would be the RSS input set for a PPPoE packet, but
as a hint, each driver may have different default behaviors.

Signed-off-by: Simei Su <simei.su@intel.com>
Reviewed-by: Qi Zhang <qi.z.zhang@intel.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
2020-07-07 23:38:27 +02:00
Lukasz Wojciechowski
048db4b6dc service: fix core mapping reset
The rte_service_lcore_reset_all function stops execution of services
on all lcores and switches them back from ROLE_SERVICE to ROLE_RTE.
However the thread loop for slave lcores (eal_thread_loop) distincts these
roles to set lcore state after processing delegated function.
It sets WAIT state for ROLE_SERVICE, but FINISHED for ROLE_RTE.
So changing the role to RTE before stopping work in slave lcores
causes lcores to end in FINISHED state. That is why the rte_eal_lcore_wait
must be run after rte_service_lcore_reset_all to bring back lcores to
launchable (WAIT) state.
This has been fixed in test app and clarified in API documentation.

Setting the state to WAIT in rte_service_runner_func is premature
as the rte_service_runner_func function is still a part of the lcore
function delegated to slave lcore. The state is overwritten anyway in
slave lcore thread loop. This premature setting state to WAIT might
however cause rte_eal_lcore_wait, that was called by the application,
to return before slave lcore thread set the FINISHED state. That's
why it is removed from librte_eal rte_service_runner_func function.

Bugzilla ID: 464
Fixes: 21698354c8 ("service: introduce service cores concept")
Fixes: f038a81e1c ("service: add unit tests")
Cc: stable@dpdk.org

Reported-by: Sarosh Arif <sarosh.arif@emumba.com>
Signed-off-by: Lukasz Wojciechowski <l.wojciechow@partner.samsung.com>
Acked-by: Harry van Haaren <harry.van.haaren@intel.com>
2020-07-08 18:52:49 +02:00
Phil Yang
030c216411 eventdev: relax SMP barriers with C11 atomics
The impl_opaque field is shared between the timer arm and cancel
operations. Meanwhile, the state flag acts as a guard variable to
make sure the update of impl_opaque is synchronized. The original
code uses rte_smp barriers to achieve that. This patch uses C11
atomics with an explicit one-way memory barrier instead of full
barriers rte_smp_w/rmb() to avoid the unnecessary barrier on aarch64.

Since compilers can generate the same instructions for volatile and
non-volatile variable in C11 __atomics built-ins, so remain the volatile
keyword in front of state enum to avoid the ABI break issue.

Cc: stable@dpdk.org

Signed-off-by: Phil Yang <phil.yang@arm.com>
Reviewed-by: Dharmik Thakkar <dharmik.thakkar@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Acked-by: Erik Gabriel Carrillo <erik.g.carrillo@intel.com>
2020-07-08 18:16:41 +02:00
Phil Yang
e84d9c62c6 eventdev: remove redundant reset on timer cancel
There is no thread will access these impl_opaque data after timer
canceled. When new timer armed, it got refilled. So the cleanup
process is unnecessary.

Cc: stable@dpdk.org

Signed-off-by: Phil Yang <phil.yang@arm.com>
Reviewed-by: Dharmik Thakkar <dharmik.thakkar@arm.com>
2020-07-08 18:16:41 +02:00
Phil Yang
1028d63eb2 eventdev: use C11 atomics for lcore timer armed flag
The in_use flag is a per core variable which is not shared between
lcores in the normal case and the access of this variable should be
ordered on the same core. However, if non-EAL thread pick the highest
lcore to insert timers into, there is the possibility of conflicts
on this flag between threads. Then the atomic compare-and-swap
operation is needed.

Use the C11 atomics instead of the generic rte_atomic operations to
avoid the unnecessary barrier on aarch64.

Cc: stable@dpdk.org

Signed-off-by: Phil Yang <phil.yang@arm.com>
Reviewed-by: Dharmik Thakkar <dharmik.thakkar@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Acked-by: Erik Gabriel Carrillo <erik.g.carrillo@intel.com>
2020-07-08 18:16:41 +02:00
Phil Yang
aceb737d6f eventdev: fix race condition on timer list counter
The n_poll_lcores counter and poll_lcore array are shared between lcores
and the update of these variables are out of the protection of spinlock
on each lcore timer list. The read-modify-write operations of the counter
are not atomic, so it has the potential of race condition between lcores.

Use c11 atomics with RELAXED ordering to prevent confliction.

Fixes: cc7b73ea9e ("eventdev: add new software timer adapter")
Cc: stable@dpdk.org

Signed-off-by: Phil Yang <phil.yang@arm.com>
Reviewed-by: Dharmik Thakkar <dharmik.thakkar@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Acked-by: Erik Gabriel Carrillo <erik.g.carrillo@intel.com>
2020-07-08 18:16:41 +02:00
Fiona Trahe
b69f927d52 cryptodev: add traces in multi-process path
This patch adds traces to some Cryptodev functions that are used
in primary/secondary context.

Signed-off-by: Fiona Trahe <fiona.trahe@intel.com>
Acked-by: Akhil Goyal <akhil.goyal@nxp.com>
2020-07-08 18:16:17 +02:00
Fiona Trahe
21b6a35171 cryptodev: add function to check queue pair status
This patch adds function that can check if queue pair
was already setup. This may be useful when dealing with
multi process approach in cryptodev.

Signed-off-by: Fiona Trahe <fiona.trahe@intel.com>
Acked-by: Akhil Goyal <akhil.goyal@nxp.com>
2020-07-08 18:16:17 +02:00
David Coyle
15bb21aa60 cryptodev: add comments for DOCSIS protocol
Add a note to the rte_crypto_sym_op->auth.data fields to state that
for DOCSIS security protocol, these are used to specify the offset and
length of data over which the CRC is calculated.

Signed-off-by: David Coyle <david.coyle@intel.com>
Signed-off-by: Mairtin o Loingsigh <mairtin.oloingsigh@intel.com>
Acked-by: Akhil Goyal <akhil.goyal@nxp.com>
Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2020-07-08 00:15:35 +02:00
David Coyle
e44b3faf85 security: support DOCSIS protocol
Add support for DOCSIS protocol to rte_security library. This support
currently comprises the combination of Crypto and CRC operations.

Signed-off-by: David Coyle <david.coyle@intel.com>
Signed-off-by: Mairtin o Loingsigh <mairtin.oloingsigh@intel.com>
Acked-by: Akhil Goyal <akhil.goyal@nxp.com>
Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2020-07-08 00:15:35 +02:00
Adam Dybkowski
53bf0ffd51 cryptodev: verify session mempool element size
This patch adds the verification of the element size of the
mempool provided for the session creation. Returns the error
if the element size is too small to hold the session object.

Signed-off-by: Adam Dybkowski <adamx.dybkowski@intel.com>
Acked-by: Fiona Trahe <fiona.trahe@intel.com>
Acked-by: Akhil Goyal <akhil.goyal@nxp.com>
2020-07-08 00:15:35 +02:00
David Marchand
d2fd16c8b0 eal: add multiprocess disable API
The multiprocess feature has been implicitly enabled so far.
Applications might want to explicitly disable like when using the
non-EAL threads registration API.

Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2020-07-08 14:41:06 +02:00
David Marchand
b41befd3af eal: add lcore iterators
Add a helper to iterate all lcores.
The iterator callback is read-only wrt the lcores list.

Implement a dump function on top of this for debugging.

Signed-off-by: David Marchand <david.marchand@redhat.com>
Reviewed-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2020-07-08 14:41:06 +02:00
David Marchand
61bb531295 eal: add lcore init callbacks
DPDK components and applications can have their say when a new lcore is
initialized. For this, they can register a callback for initializing and
releasing their private data.

Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2020-07-08 14:41:06 +02:00
David Marchand
5c307ba2a5 eal: register non-EAL threads as lcores
DPDK allows calling some part of its API from a non-EAL thread but this
has some limitations.
OVS (and other applications) has its own thread management but still
want to avoid such limitations by hacking RTE_PER_LCORE(_lcore_id) and
faking EAL threads potentially unknown of some DPDK component.

Introduce a new API to register non-EAL thread and associate them to a
free lcore with a new NON_EAL role.
This role denotes lcores that do not run DPDK mainloop and as such
prevents use of rte_eal_wait_lcore() and consorts.

Multiprocess is not supported as the need for cohabitation with this new
feature is unclear at the moment.

Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2020-07-08 14:41:05 +02:00
David Marchand
a837d5c598 eal: move lcore role code
For consistency sake, move all lcore role code in the dedicated
compilation unit / header.

Signed-off-by: David Marchand <david.marchand@redhat.com>
Reviewed-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2020-07-08 14:39:26 +02:00
David Marchand
2ab55f78d1 eal: introduce thread uninit helper
This is a preparation step for dynamically unregistering threads.

Since we explicitly allocate a per thread trace buffer in
__rte_thread_init, add an internal helper to free this buffer.

Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2020-07-08 14:39:26 +02:00
David Marchand
266b641ccf eal: introduce thread init helper
Introduce a helper responsible for initialising the per thread context.
We can then have a unified context for EAL and non-EAL threads and
remove copy/paste'd OS-specific helpers.

Per EAL thread CPU affinity setting is separated from the thread init.
It is to accommodate with Windows EAL where CPU affinity is not set at
the moment.
Besides, having affinity set by the master lcore in FreeBSD and Linux
will make it possible to detect errors rather than panic in the child
thread. But the cleanup when such an event happens is left for later.

A side-effect of this patch is that control threads can now use
recursive locks (rte_gettid() was not called before).

Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2020-07-08 14:39:26 +02:00
David Marchand
3ef1a0b4a1 eal: fix multiple definition of per lcore thread id
Because of the inline accessor + static declaration in rte_gettid(),
we end up with multiple symbols for RTE_PER_LCORE(_thread_id).
Each compilation unit will pay a cost when accessing this information
for the first time.

$ nm build/app/dpdk-testpmd | grep per_lcore__thread_id
0000000000000054 d per_lcore__thread_id.5037
0000000000000040 d per_lcore__thread_id.5103
0000000000000048 d per_lcore__thread_id.5259
000000000000004c d per_lcore__thread_id.5259
0000000000000044 d per_lcore__thread_id.5933
0000000000000058 d per_lcore__thread_id.6261
0000000000000050 d per_lcore__thread_id.7378
000000000000005c d per_lcore__thread_id.7496
000000000000000c d per_lcore__thread_id.8016
0000000000000010 d per_lcore__thread_id.8431

Make it global as part of the DPDK_21 stable ABI.

Fixes: ef76436c68 ("eal: get unique thread id")

Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Ray Kinsella <mdr@ashroe.eu>
Reviewed-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2020-07-08 14:39:26 +02:00
David Marchand
7afdfac8e6 eal: relocate per thread symbols to common
We have per lcore thread symbols scattered in OS implementations but
common code relies on them.
Move all of them in common.

RTE_PER_LCORE(_socket_id) and RTE_PER_LCORE(_cpuset) have public
accessors and are not exported through the library map, they can be
made static.

Signed-off-by: David Marchand <david.marchand@redhat.com>
Reviewed-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2020-07-08 14:39:26 +02:00
Honnappa Nagarahalli
264f7f80e1 eal/arm: adjust memory barriers for IO on ARMv8
Change the barrier APIs for IO to reflect that Armv8-a is other-multi-copy
atomicity memory model.

Armv8-a memory model has been strengthened to require
other-multi-copy atomicity. This property requires memory accesses
from an observer to become visible to all other observers
simultaneously [3]. This means

a) A write arriving at an endpoint shared between multiple CPUs is
   visible to all CPUs
b) A write that is visible to all CPUs is also visible to all other
   observers in the shareability domain

This allows for using cheaper DMB instructions in the place of DSB
for devices that are visible to all CPUs (i.e. devices that DPDK
caters to).

Please refer to [1], [2] and [3] for more information.

[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=22ec71615d824f4f11d38d0e55a88d8956b7e45f
[2] https://www.youtube.com/watch?v=i6DayghhA8Q
[3] https://www.cl.cam.ac.uk/~pes20/armv8-mca/

Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
Tested-by: Ruifeng Wang <ruifeng.wang@arm.com>
2020-07-08 13:44:23 +02:00
Igor Romanov
f3c256b621 service: fix lcore iteration
The service core list is populated, but not used. Incorrect
lcore states are examined for a service.

Use the populated list to iterate over service cores.

Fixes: e484ccddbe ("service: avoid false sharing on core state")
Cc: stable@dpdk.org

Signed-off-by: Igor Romanov <igor.romanov@oktetlabs.ru>
Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Harry van Haaren <harry.van.haaren@intel.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
2020-07-07 23:48:44 +02:00
Stephen Hemminger
b2a0b9f044 rib: add C++ include guard
All include files should be safe from C++

Fixes: 5a5793a5ff ("rib: add RIB library")
Fixes: f7e861e21c ("rib: support IPv6")
Cc: stable@dpdk.org

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>
2020-07-07 23:24:38 +02:00
Stephen Hemminger
58607c2e27 rib: check for negative maximum of nodes
Max_nodes in config is signed, but a negative value makes
no sense. Get rid of extra BSD style parens.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>
2020-07-07 23:22:18 +02:00
Stephen Hemminger
7bd83e60bf rib: constify arguments
The getter functions should take a constant pointer
to make it clear that node is not modified.

The rib create functions do not modify their config structure.
Mark the config as constant so that programs can pass
simple constant data.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>
2020-07-07 23:22:05 +02:00
Stephen Hemminger
041a3971c8 cfgfile: fix stack buffer underflow
If cfgfile is give a line with comment character at the start
of the line, it will dereference outside of the buffer.

Detected with address sanitizer:

SUMMARY: AddressSanitizer: stack-buffer-underflow
lib/librte_cfgfile/rte_cfgfile.c:194 in rte_cfgfile_load_with_params
Shadow bytes around the buggy address:
  0x200fff79f6a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x200fff79f6b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x200fff79f6c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x200fff79f6d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x200fff79f6e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
=>0x200fff79f6f0: 00 00 00 00 f1 f1 f1[f1]00 00 00 00 00 00 00 00
  0x200fff79f700: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x200fff79f710: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x200fff79f720: 04 f2 f2 f2 f3 f3 f3 f3 00 00 00 00 00 00 00 00
  0x200fff79f730: 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1 00 f2
  0x200fff79f740: f2 f2 f3 f3 f3 f3 00 00 00 00 00 00 00 00 00 00
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
==2189==ABORTING

Fixes: a6a47ac9c2 ("cfgfile: rework load function")
Cc: stable@dpdk.org

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Reviewed-by: Bruce Richardson <bruce.richardson@intel.com>
2020-07-07 23:22:04 +02:00
Bruce Richardson
a8550b7731 rawdev: export dump function in map file
The rte_rawdev_dump function was missing from the map file,
meaning it was unavailable for use when linking dynamically.

Fixes: c88b3f2558 ("rawdev: introduce raw device library")
Cc: stable@dpdk.org

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>
2020-07-07 19:18:04 +02:00
Bruce Richardson
2689221592 rawdev: fill NUMA socket ID in info
The rawdev info struct has a socket_id field which was not filled in.

We can also omit the checks for the parameter struct being null, since
that is previously checked in the function.

Fixes: c88b3f2558 ("rawdev: introduce raw device library")
Cc: stable@dpdk.org

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>
2020-07-07 19:18:04 +02:00
Bruce Richardson
201a68c678 rawdev: allow getting info for unknown device
To call the rte_rawdev_info_get() function, the user currently has to know
the underlying type of the device in order to pass an appropriate structure
or buffer as the dev_private pointer in the info structure. By allowing a
NULL value for this field, we can skip getting the device-specific info and
just return the generic info - including the device name and driver, which
can be used to determine the device type - to the user.

This ensures that basic info can be get for all rawdevs, without knowing
the type, and even if the info driver API call has not been implemented for
the device.

Cc: stable@dpdk.org

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>
2020-07-07 19:18:04 +02:00
Haiyue Wang
598be72395 vfio: support VF token
The Linux kernel module vfio-pci introduces the VF token to enable
SR-IOV support since 5.7.

The VF token can be set by a vfio-pci based PF driver and must be known
by the vfio-pci based VF driver in order to gain access to the device.

Since the vfio-pci module uses the VF token as internal data to provide
the collaboration between SR-IOV PF and VFs, so DPDK can use the same
VF token for all PF devices by specifying the related EAL option.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
Tested-by: Harman Kalra <hkalra@marvell.com>
2020-07-07 14:06:49 +02:00
Haiyue Wang
edca6d883e eal: fix uuid header dependencies
Add the dependent header files explicitly, so that the user just needs
to include the 'rte_uuid.h' header file directly to avoid compile error:
 (1). rte_uuid.h:97:55: error: unknown type name ‘size_t’
 (2). rte_uuid.h:58:2: error: implicit declaration of function ‘memcpy’

Fixes: 6bc67c497a ("eal: add uuid API")
Cc: stable@dpdk.org

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Acked-by: David Marchand <david.marchand@redhat.com>
2020-07-07 14:06:49 +02:00
Stephen Hemminger
2cc263c3ff vfio: lower the priority of startup messages
The startup of VFIO is too noisy. Logging is expensive on some
systems, and distracting to the user.

It should not be logging at NOTICE level, reduce it to INFO level.
It really should be DEBUG here but that would hide it by default.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
2020-07-07 13:51:28 +02:00
Yunjian Wang
f4823a3982 vfio: remove unused variable
The 'group_status' has never been used and can be removed.

Fixes: 94c0776b1b ("vfio: support hotplug")
Cc: stable@dpdk.org

Signed-off-by: Yunjian Wang <wangyunjian@huawei.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
2020-07-07 13:49:56 +02:00
Stephen Hemminger
67ae5936c4 eal: fix lcore accessors for non-EAL threads
If rte_lcore_index() is asked to give the index of the
current lcore (argument -1) and is called from a non-EAL thread
then it would invalid result. The result would come
lcore_config[-1].core_index which is some other data in the
per-thread area.

The resolution is to return -1 which is what rte_lcore_index()
returns if handed an invalid lcore.

Same issue existed with rte_lcore_to_cpu_id().

Bugzilla ID: 446
Fixes: 26cc3bbe4d ("eal: add lcore accessors")

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: David Marchand <david.marchand@redhat.com>
2020-07-07 13:45:50 +02:00
Honnappa Nagarahalli
e1c9850f55 eal/armv8: force inlining of timer API
Change the inline functions to use __rte_always_inline to be
consistent with rest of the inline functions.

Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
2020-07-07 13:21:26 +02:00
Honnappa Nagarahalli
97c910139b eal/armv8: fix timer frequency calibration with PMU
get_tsc_freq uses 'nanosleep' system call to calculate the CPU
frequency. However, 'nanosleep' results in the process getting
un-scheduled. The kernel saves and restores the PMU state. This
ensures that the PMU cycles are not counted towards a sleeping
process. When RTE_ARM_EAL_RDTSC_USE_PMU is defined, this results
in incorrect CPU frequency calculation. This logic is replaced
with generic counter based loop.

Bugzilla ID: 450
Fixes: f91bcbb2d9 ("eal/armv8: use high-resolution cycle counter")
Cc: stable@dpdk.org

Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Dharmik Thakkar <dharmik.thakkar@arm.com>
Reviewed-by: Phil Yang <phil.yang@arm.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
2020-07-07 13:20:50 +02:00
Jerin Jacob
a7551b6c60 log: remove unneeded logtype declaration
RTE_LOG_REGISTER macro already declares the log type.
Remove the unneeded log type declaration.

Fixes: 9c99878aa1 ("log: introduce logtype register macro")

Reported-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Jerin Jacob <jerinj@marvell.com>
Acked-by: Gage Eads <gage.eads@intel.com>
2020-07-07 13:18:23 +02:00
Hemant Agrawal
c843fca96c rawdev: remove remaining experimental tags
The experimental tags were removed, but the comment
is still having API classification as EXPERIMENTAL

Fixes: 931cc531aa ("rawdev: remove experimental tag")
Cc: stable@dpdk.org

Signed-off-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: David Marchand <david.marchand@redhat.com>
2020-07-07 12:54:22 +02:00
David Marchand
74f4d6424d lib: remind experimental status in headers
The following libraries are experimental, all of their functions can
be changed or removed:

- librte_bbdev
- librte_bpf
- librte_compressdev
- librte_fib
- librte_flow_classify
- librte_graph
- librte_ipsec
- librte_node
- librte_rcu
- librte_rib
- librte_stack
- librte_telemetry

Their status is properly announced in MAINTAINERS.
Remind this status in their headers in a common fashion (aligned to ABI
docs).

Cc: stable@dpdk.org

Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
2020-07-07 12:49:10 +02:00
David Marchand
7762e0139b build: remove special versioning for non stable libraries
Having a special versioning for experimental/internal libraries put a
additional maintenance cost while this status is already announced in
MAINTAINERS and the library headers/documentation.
Following discussions and vote at 05/20 TB meeting [1], use a single
versioning for all libraries in DPDK.

Note: for the ABI check, an exception [2] had been added when tweaking
this special versioning [3].
Prefer explicit libabigail rules (which will be dropped in 20.11).

1: https://mails.dpdk.org/archives/dev/2020-May/168450.html
2: https://git.dpdk.org/dpdk/commit/?id=23d7ad5db41c
3: https://git.dpdk.org/dpdk/commit/?id=ec2b8cd7ed69

Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Ray Kinsella <mdr@ashroe.eu>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
2020-07-07 12:48:25 +02:00
Tal Shnaiderman
f6d7f40576 mbuf: build on Windows
Build the lib for Windows.
Export needed EAL functions used by the lib.

Signed-off-by: Tal Shnaiderman <talshn@mellanox.com>
2020-07-07 01:47:24 +02:00
Tal Shnaiderman
9af385fd89 eal: support endianness detection on Windows
Inclusion of the endian.h header is set only for Linux OS.

Windows endianness will be determined by the predefined
__BYTE_ORDER__ macro.

Signed-off-by: Tal Shnaiderman <talshn@mellanox.com>
2020-07-07 01:42:27 +02:00
Fady Bader
080bda1a0a mempool: build on Windows
Some EAL functions are used by mempool lib but not exported on Windows.
The functions are exported.
Added mempool to supported libraries for Windows compilation.

Signed-off-by: Fady Bader <fady@mellanox.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
2020-07-07 01:28:12 +02:00
Fady Bader
972848da5f mempool: use generic memory syscall wrappers
Using generic memory management calls instead of Unix memory management
calls for mempool.

Signed-off-by: Fady Bader <fady@mellanox.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
Reviewed-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
2020-07-07 01:24:55 +02:00
Fady Bader
2f59f3b085 eal: disable function versioning on Windows
Function versioning implementation is not supported by Windows.
Function versioning is disabled on Windows.

Signed-off-by: Fady Bader <fady@mellanox.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
2020-07-07 01:23:29 +02:00
Alan Dewar
83415d4fd8 sched: fix port time rounding
The QoS scheduler works off port time that is computed from the number
of CPU cycles that have elapsed since the last time the port was
polled.   It divides the number of elapsed cycles to calculate how
many bytes can be sent, however this division can generate rounding
errors, where some fraction of a byte sent may be lost.

Lose enough of these fractional bytes and the QoS scheduler
underperforms.  The problem is worse with low bandwidths.

To compensate for this rounding error this fix doesn't advance the
port's time_cpu_cycles by the number of cycles that have elapsed,
but by multiplying the computed number of bytes that can be sent
(which has been rounded down) by number of cycles per byte.
This will mean that port's time_cpu_cycles will lag behind the CPU
cycles momentarily.  At the next poll, the lag will be taken into
account.

Fixes: de3cfa2c98 ("sched: initial import")
Cc: stable@dpdk.org

Signed-off-by: Alan Dewar <alan.dewar@att.com>
Acked-by: Jasvinder Singh <jasvinder.singh@intel.com>
2020-07-07 00:58:31 +02:00
Ori Kam
f66967d218 regexdev: implement API functions
This commit implements all the RegEx public API.

Signed-off-by: Ori Kam <orika@mellanox.com>
Acked-by: Guy Kaneti <guyk@marvell.com>
2020-07-07 00:24:52 +02:00
Ori Kam
b25246beae regexdev: add core functions
This commit introduce the API that is needed by the RegEx devices in
order to work with the RegEX lib.

During the probe of a RegEx device, the device should configure itself,
and allocate the resources it requires.
On completion of the device init, it should call the
rte_regex_dev_register in order to register itself as a RegEx device.

Signed-off-by: Ori Kam <orika@mellanox.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
Acked-by: Guy Kaneti <guyk@marvell.com>
2020-07-07 00:24:51 +02:00
Ori Kam
9c26771bc5 regexdev: add core structures
This commit introduce the rte_regexdev_core.h file.
This file holds internal structures and API that are used by
the regexdev.

Signed-off-by: Ori Kam <orika@mellanox.com>
Acked-by: Guy Kaneti <guyk@marvell.com>
2020-07-07 00:24:39 +02:00
Jerin Jacob
bab9497ef7 regexdev: introduce API
As RegEx usage become more used by DPDK applications, for example:
* Next Generation Firewalls (NGFW)
* Deep Packet and Flow Inspection (DPI)
* Intrusion Prevention Systems (IPS)
* DDoS Mitigation
* Network Monitoring
* Data Loss Prevention (DLP)
* Smart NICs
* Grammar based content processing
* URL, spam and adware filtering
* Advanced auditing and policing of user/application security policies
* Financial data mining - parsing of streamed financial feeds
* Application recognition.
* Dmemory introspection.
* Natural Language Processing (NLP)
* Sentiment Analysis.
* Big data database acceleration.
* Computational storage.

Number of PMD providers started to work on HW implementation,
along side with SW implementations.

This lib adds the support for those kind of devices.

The RegEx Device API is composed of two parts:
- The application-oriented RegEx API that includes functions to setup
  a RegEx device (configure it, setup its queue pairs and start it),
  update the rule database and so on.

- The driver-oriented RegEx API that exports a function allowing
  a RegEx poll Mode Driver (PMD) to simultaneously register itself as
  a RegEx device driver.

RegEx device components and definitions:

    +-----------------+
    |                 |
    |                 o---------+    rte_regexdev_[en|de]queue_burst()
    |   PCRE based    o------+  |               |
    |  RegEx pattern  |      |  |  +--------+   |
    | matching engine o------+--+--o        |   |    +------+
    |                 |      |  |  | queue  |<==o===>|Core 0|
    |                 o----+ |  |  | pair 0 |        |      |
    |                 |    | |  |  +--------+        +------+
    +-----------------+    | |  |
           ^               | |  |  +--------+
           |               | |  |  |        |        +------+
           |               | +--+--o queue  |<======>|Core 1|
       Rule|Database       |    |  | pair 1 |        |      |
    +------+----------+    |    |  +--------+        +------+
    |     Group 0     |    |    |
    | +-------------+ |    |    |  +--------+        +------+
    | | Rules 0..n  | |    |    |  |        |        |Core 2|
    | +-------------+ |    |    +--o queue  |<======>|      |
    |     Group 1     |    |       | pair 2 |        +------+
    | +-------------+ |    |       +--------+
    | | Rules 0..n  | |    |
    | +-------------+ |    |       +--------+
    |     Group 2     |    |       |        |        +------+
    | +-------------+ |    |       | queue  |<======>|Core n|
    | | Rules 0..n  | |    +-------o pair n |        |      |
    | +-------------+ |            +--------+        +------+
    |     Group n     |
    | +-------------+ |<-------rte_regexdev_rule_db_update()
    | |             | |<-------rte_regexdev_rule_db_compile_activate()
    | | Rules 0..n  | |<-------rte_regexdev_rule_db_import()
    | +-------------+ |------->rte_regexdev_rule_db_export()
    +-----------------+

RegEx: A regular expression is a concise and flexible means for matching
strings of text, such as particular characters, words, or patterns of
characters. A common abbreviation for this is â~@~\RegExâ~@~].

RegEx device: A hardware or software-based implementation of RegEx
device API for PCRE based pattern matching syntax and semantics.

PCRE RegEx syntax and semantics specification:
http://regexkit.sourceforge.net/Documentation/pcre/pcrepattern.html

RegEx queue pair: Each RegEx device should have one or more queue pair to
transmit a burst of pattern matching request and receive a burst of
receive the pattern matching response. The pattern matching
request/response embedded in *rte_regex_ops* structure.

Rule: A pattern matching rule expressed in PCRE RegEx syntax along with
Match ID and Group ID to identify the rule upon the match.

Rule database: The RegEx device accepts regular expressions and converts
them into a compiled rule database that can then be used to scan data.
Compilation allows the device to analyze the given pattern(s) and
pre-determine how to scan for these patterns in an optimized fashion that
would be far too expensive to compute at run-time. A rule database
contains a set of rules that compiled in device specific binary form.

Match ID or Rule ID: A unique identifier provided at the time of rule
creation for the application to identify the rule upon match.

Group ID: Group of rules can be grouped under one group ID to enable
rule isolation and effective pattern matching. A unique group identifier
provided at the time of rule creation for the application to identify
the rule upon match.

Scan: A pattern matching request through *enqueue* API.

It may possible that a given RegEx device may not support all the
features
of PCRE. The application may probe unsupported features through
struct rte_regexdev_info::pcre_unsup_flags

By default, all the functions of the RegEx Device API exported by a PMD
are lock-free functions which assume to not be invoked in parallel on
different logical cores to work on the same target object. For instance,
the dequeue function of a PMD cannot be invoked in parallel on two logical
cores to operates on same RegEx queue pair. Of course, this function
can be invoked in parallel by different logical core on different queue
pair. It is the responsibility of the upper level application to
enforce this rule.

In all functions of the RegEx API, the RegEx device is
designated by an integer >= 0 named the device identifier *dev_id*

At the RegEx driver level, RegEx devices are represented by a generic
data structure of type *rte_regexdev*.
RegEx devices are dynamically registered during the PCI/SoC device
probing phase performed at EAL initialization time.
When a RegEx device is being probed, a *rte_regexdev* structure and
a new device identifier are allocated for that device. Then, the
regexdev_init() function supplied by the RegEx driver matching the
probed device is invoked to properly initialize the device.

The role of the device init function consists of resetting the hardware
or software RegEx driver implementations.

If the device init operation is successful, the correspondence between
the device identifier assigned to the new device and its associated
*rte_regexdev* structure is effectively registered.
Otherwise, both the *rte_regexdev* structure and the device identifier
are freed.

The functions exported by the application RegEx API to setup a device
designated by its device identifier must be invoked in the following
order:
    - rte_regexdev_configure()
    - rte_regexdev_queue_pair_setup()
    - rte_regexdev_start()

Then, the application can invoke, in any order, the functions
exported by the RegEx API to enqueue pattern matching job, dequeue
pattern matching response, get the stats, update the rule database,
get/set device attributes and so on

If the application wants to change the configuration (i.e. call
rte_regexdev_configure() or rte_regexdev_queue_pair_setup()), it must
call rte_regexdev_stop() first to stop the device and then do the
reconfiguration before calling rte_regexdev_start() again. The enqueue and
dequeue functions should not be invoked when the device is stopped.

Finally, an application can close a RegEx device by invoking the
rte_regexdev_close() function.

Each function of the application RegEx API invokes a specific function
of the PMD that controls the target device designated by its device
identifier.

For this purpose, all device-specific functions of a RegEx driver are
supplied through a set of pointers contained in a generic structure of
type *regexdev_ops*.
The address of the *regexdev_ops* structure is stored in the
*rte_regexdev* structure by the device init function of the RegEx driver,
which is invoked during the PCI/SoC device probing phase, as explained
earlier.

In other words, each function of the RegEx API simply retrieves the
*rte_regexdev* structure associated with the device identifier and
performs an indirect invocation of the corresponding driver function
supplied in the *regexdev_ops* structure of the *rte_regexdev*
structure.

For performance reasons, the address of the fast-path functions of the
RegEx driver is not contained in the *regexdev_ops* structure.
Instead, they are directly stored at the beginning of the *rte_regexdev*
structure to avoid an extra indirect memory access during their
invocation.

RTE RegEx device drivers do not use interrupts for enqueue or dequeue
operation. Instead, RegEx drivers export Poll-Mode enqueue and dequeue
functions to applications.

The *enqueue* operation submits a burst of RegEx pattern matching
request to the RegEx device and the *dequeue* operation gets a burst of
pattern matching response for the ones submitted through *enqueue*
operation.

Typical application utilisation of the RegEx device API will follow the
following programming flow.

- rte_regexdev_configure()
- rte_regexdev_queue_pair_setup()
- rte_regexdev_rule_db_update() Needs to invoke if precompiled rule
  database not
  provided in rte_regexdev_config::rule_db for rte_regexdev_configure()
  and/or application needs to update rule database.
- rte_regexdev_rule_db_compile_activate() Needs to invoke if
  rte_regexdev_rule_db_update function was used.
- Create or reuse exiting mempool for *rte_regex_ops* objects.
- rte_regexdev_start()
- rte_regexdev_enqueue_burst()
- rte_regexdev_dequeue_burst()

Signed-off-by: Jerin Jacob <jerinj@marvell.com>
Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
Signed-off-by: Ori Kam <orika@mellanox.com>
2020-07-07 00:24:38 +02:00
David Marchand
0fc601af3a trace: simplify trace point registration
RTE_TRACE_POINT_DEFINE and RTE_TRACE_POINT_REGISTER must come in pairs.
Merge them and let RTE_TRACE_POINT_REGISTER handle the constructor part.

Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
2020-07-05 21:34:21 +02:00
Bruce Richardson
06c7871dde eal: restrict default plugin path to shared lib mode
When using statically linked DPDK binaries, the EAL checks the default PMD
path and tries to load any drivers there, despite the fact that all drivers
are normally linked into the binary.  This behaviour can cause issues if
the PMD path and lib dir is configured to a non-standard location which is
not in the ld.so.conf paths, e.g. a build with prefix set to a home
directory location. In a case such as this, EAL will try and
(unnecessarily) load the .so driver files but that load will fail as their
dependent libraries, such as ethdev, for example, will not be found.

Because of this, it is better if statically linked DPDK apps do not load
drivers from the standard paths automatically. The user can always have
this behaviour by explicitly specifying the path using -d flag, if so
desired.

Not loading the libraries automatically can also prevent potential issues
with a user building and running a statically-linked DPDK binary based off
a private copy of DPDK, while there exists on the same machine a
system-wide installation of DPDK in the default locations. Without this
change, the system-installed drivers will be loaded to the binary alongside
the statically-linked drivers, which is not what the user would have
intended.

To detect whether we are in a statically or dynamically linked binary, we
can have EAL try to get a dlopen handle to its own shared library, by
calling dlopen with the RTLD_NOLOAD flag. This will return NULL if there is
no such shared lib loaded i.e. the code is executing from a static library,
or a handle to the lib if it is loaded.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Tested-by: Sunil Pai G <sunil.pai.g@intel.com>
2020-07-05 21:32:40 +02:00
Bruce Richardson
e84921fb56 eal: cache last directory permissions checked
When loading a directory of drivers, we check the same hierarchy multiple
times. If we just cache the last directory checked, this avoids repeated
checks of the same path, since all drivers in that path have been added to
the list consecutively.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
2020-07-05 21:32:40 +02:00
Bruce Richardson
7b69fd1e95 eal: forbid loading drivers from insecure paths
Any paths on the system which are world-writable are insecure and should
not be used for loading drivers. Therefore, whenever an absolute or
relative driver path is passed to EAL, check for world-writability and
don't load any drivers from that path if it is insecure. Drivers loaded
from system locations i.e. those passed without any path info and found
automatically by the loader, are excluded from these checks as system paths
are assumed to be secure.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
2020-07-05 21:32:40 +02:00
Bruce Richardson
49b536fc30 eal: load only shared libs from driver plugin directories
When we pass a "-d" flag to EAL pointing to a directory, we attempt to load
all files in that directory as driver plugins, irrespective of file type.
This procludes using e.g. the build/drivers directory, as a driver source
since it contains static libs and other files as well as the shared
objects.

By filtering out any files whose filename does not end in ".so", we can
improve usability by allowing other non-driver files to be present in the
driver directory.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
2020-07-05 21:32:40 +02:00
Bruce Richardson
efa75d61d2 eal: remove unnecessary null-termination in plugin path
Since strlcpy always null-terminates, and the buffer is zeroed before copy
anyway, there is no need to explicitly zero the end of the character
array, or to limit the bytes that strlcpy can write.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
2020-07-05 21:32:40 +02:00
Thomas Monjalon
de321d5918 build: remove special handling for node library
The node library had a need of being linked as a whole
to make some constructors effective.
Now that all libraries are linked with --whole-archive,
there is no need to have this library separate.

Fixes: e2db26f766 ("build: always link whole DPDK static libraries")

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Tested-by: Jerin Jacob <jerinj@marvell.com>
2020-07-05 10:52:11 +02:00
Jerin Jacob
9c99878aa1 log: introduce logtype register macro
Introduce the RTE_LOG_REGISTER macro to avoid the code duplication
in the logtype registration process.

It is a wrapper macro for declaring the logtype, registering it and
setting its level in the constructor context.

Signed-off-by: Jerin Jacob <jerinj@marvell.com>
Acked-by: Adam Dybkowski <adamx.dybkowski@intel.com>
Acked-by: Sachin Saxena <sachin.saxena@nxp.com>
Acked-by: Akhil Goyal <akhil.goyal@nxp.com>
2020-07-03 15:52:51 +02:00
Bruce Richardson
e2db26f766 build: always link whole DPDK static libraries
To ensure all constructors are included in static build, we need to pass
the --whole-archive flag when linking, which is used with the
"link_whole" meson option. Since we use link_whole for all libs, we no
longer need to track the lib as part of the static dependency, just the
path to the headers for compiling.

After this patch is applied, all DPDK .a files are inside
--whole-archive/--no-whole-archive flags, but external dependencies and
shared libs being linked against remain outside.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Tested-by: Andrzej Ostruszka <aostruszka@marvell.com>
Acked-by: Luca Boccassi <bluca@debian.org>
Acked-by: Sunil Pai G <sunil.pai.g@intel.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
2020-07-01 19:30:52 +02:00
Morten Brørup
4b7284a71f ring: optimize empty test
Testing if the ring is empty is as simple as comparing the producer and
consumer pointers.

In theory, this optimization reduces the number of potential cache misses
from 3 to 2 by not having to read r->mask in rte_ring_count().

The modification of this function were also discussed in the RFC here:
https://mails.dpdk.org/archives/dev/2020-April/165752.html

Signed-off-by: Morten Brørup <mb@smartsharesystems.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2020-07-01 11:46:09 +02:00
Morten Brørup
36e3cfbbef ring: cleanup coding style
Fix coding style violations that checkpatch will complain about.

Add missing "int" after "unsigned".
Add missing spaces around "+=" and "+".
Remove superfluous type cast of numerical constant.

Suggested-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Signed-off-by: Morten Brørup <mb@smartsharesystems.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2020-07-01 11:46:09 +02:00
Feifei Wang
708d904451 ring: fix tail in peek API for ST mode
The value of *tail should be the prod->tail not prod->head. After
modification, it can record 'tail' so head/tail can be updated
accordingly.

Fixes: 664ff4b172 ("ring: introduce peek style API")

Signed-off-by: Feifei Wang <feifei.wang2@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2020-07-01 10:41:19 +02:00
Feifei Wang
c8d909aeb7 ring: fix bulk enqueue for HTS/RTS ring modes
Remove the unwanted call to "_rte_ring_do_enqueue_elem" to allow for
correct handling of RTS/HTS modes.

Fixes: e6ba4731c0 ("ring: introduce RTS ring mode")

Signed-off-by: Feifei Wang <feifei.wang2@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Acked-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2020-07-01 10:41:19 +02:00
Matan Azrad
da2b788041 vhost: fix features definition location
The vhost library provide an infrastructure in order to help the DPDK
users to manage vhost devices.

One of the infrastructure parts is the features enablement APIs.

Some features bits may be defined only in the internal file vhost.h in
case the kernel version doesn't include them.

Hence, user running on old kernel may not be able to manage thus
features.

Move all the feature bits definitions to the API file rte_vhost.h.

Fixes: db69be54b6 ("vhost: hide internal code")
Fixes: 8d286dbeb8 ("vhost: fix multiple queue not enabled for old kernels")
Fixes: 3d3c6590b5 ("vhost: enable virtio MTU feature")
Fixes: 704098fc47 ("vhost: fix build with old kernels")
Cc: stable@dpdk.org

Signed-off-by: Matan Azrad <matan@mellanox.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2020-06-30 14:52:31 +02:00
Matan Azrad
b213af9aa4 vhost: notify virtq file descriptor update
When virtq call or kick file descriptors are changed in the device
configuration when the queue is ready, the application and the vDPA
driver should be notified to be aligned to the new file descriptors.

Notify the state to be disabled before the file descriptor update and
return it back to be enabled after the update.

Signed-off-by: Matan Azrad <matan@mellanox.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
2020-06-30 14:52:31 +02:00
Matan Azrad
127f9c6f7b vhost: handle memory hotplug with vDPA devices
Some vDPA drivers' basic configurations should be updated when the
guest memory is hotplugged.

Close vDPA device before hotplug operation and recreate it after the
hotplug operation is done.

Signed-off-by: Matan Azrad <matan@mellanox.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
2020-06-30 14:52:30 +02:00
Matan Azrad
d0fcc38f5f vhost: improve device readiness notifications
Some guest drivers may not configure disabled virtio queues.

In this case, the vhost management never notifies the application and
the vDPA device readiness because it waits to the device to be ready.

The current ready state means that all the virtio queues should be
configured regardless the enablement status.

In order to support this case, this patch changes the ready state:
The device is ready when at least 1 queue pair is configured and
enabled.

So, now, the application and vDPA driver are notifies when the first
queue pair is configured and enabled.

Also the queue notifications will be triggered according to the new
ready definition.

Signed-off-by: Matan Azrad <matan@mellanox.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
2020-06-30 14:52:30 +02:00
Matan Azrad
9f2016b2ce vhost: skip access lock when vDPA is configured
No need to take access lock in the vhost-user message handler when
vDPA driver controls all the data-path of the vhost device.

It allows the vDPA set_vring_state operation callback to configure
guest notifications.

Signed-off-by: Matan Azrad <matan@mellanox.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
2020-06-30 14:52:30 +02:00
Matan Azrad
0329868d6a vhost: support host notifier queue configuration
As an arrangement to per queue operations in the vDPA device it is
needed to change the next experimental API:

The API ``rte_vhost_host_notifier_ctrl`` was changed to be per queue
instead of per device.

A `qid` parameter was added to the API arguments list.

Setting the parameter to the value RTE_VHOST_QUEUE_ALL configures the
host notifier to all the device queues as done before this patch.

Signed-off-by: Matan Azrad <matan@mellanox.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2020-06-30 14:52:30 +02:00
Maxime Coquelin
a49f758d11 vhost: split vDPA header file
This patch split the vDPA header file in two, making
rte_vdpa_device structure opaque to the application.

Applications should only include rte_vdpa.h, while drivers
should include both rte_vdpa.h and rte_vdpa_dev.h.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Adrián Moreno <amorenoz@redhat.com>
2020-06-30 14:52:30 +02:00
Maxime Coquelin
e91ac959fa vhost: remove vDPA device count API
This API is no more useful, this patch removes it.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Adrián Moreno <amorenoz@redhat.com>
2020-06-30 14:52:30 +02:00
Maxime Coquelin
8d44fc3a81 vhost: introduce wrappers for some vDPA ops
This patch is preliminary work to make the vDPA device
structure opaque to the user application. Some callbacks
of the vDPA devices are used to query capabilities before
attaching to a Vhost port. This patch introduces wrappers
for these ops.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Adrián Moreno <amorenoz@redhat.com>
2020-06-30 14:52:30 +02:00
Maxime Coquelin
08a4f9bab3 vhost: use linked list for vDPA devices
There is no more notion of device ID outside of vdpa.c.
We can now move from array to linked-list model for keeping
track of the vDPA devices.

There is no point in using array here, as all vDPA API are
used from the control path, so no performance concerns.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Adrián Moreno <amorenoz@redhat.com>
2020-06-30 14:52:30 +02:00
Maxime Coquelin
f6d587754c vhost: remove useless vDPA API
vDPA is no more used outside of the vDPA internals,
so remove rte_vdpa_get_device() API that is now useless.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Adrián Moreno <amorenoz@redhat.com>
2020-06-30 14:52:30 +02:00
Maxime Coquelin
0f700f90ad vhost: replace device ID in applications
This patch replaces the use of vDPA device ID with
vDPA device pointer. The goals is to remove the vDPA
device ID to avoid confusion with the Vhost ID.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Adrián Moreno <amorenoz@redhat.com>
2020-06-30 14:52:30 +02:00
Maxime Coquelin
2263f13941 vhost: replace vDPA device ID in Vhost
This removes the notion of device ID in Vhost library
as a preliminary step to get rid of the vDPA device ID.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Adrián Moreno <amorenoz@redhat.com>
2020-06-30 14:52:30 +02:00
Maxime Coquelin
81a6b7fe06 vhost: replace device ID in vDPA ops
This patch is a preliminary step to get rid of the
vDPA device ID. It makes vDPA callbacks to use the
vDPA device struct as a reference instead of the ID.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Adrián Moreno <amorenoz@redhat.com>
2020-06-30 14:52:30 +02:00
Maxime Coquelin
38f8ab0bbc vhost: make vDPA framework bus agnostic
This patch makes the vDPA framework to no more
support only PCI devices, but any devices by relying
on the generic device name as identifier.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Adrián Moreno <amorenoz@redhat.com>
2020-06-30 14:52:30 +02:00
Maxime Coquelin
383fb5a9c7 vhost: introduce vDPA device class
This patch introduces vDPA device class. It will enable
application to iterate over the vDPA devices.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Adrián Moreno <amorenoz@redhat.com>
2020-06-30 14:52:30 +02:00
Ruifeng Wang
4eb25acdb7 eal/arm: add vcopyq intrinsic for aarch32
vcopyq_laneq_u32 should be implemented for aarch32 which doesn't have
the intrinsic.
This fixes build of examples/l3fwd for armv7.

Fixes: 3c4b4024c2 ("arch/arm: add vcopyq_laneq_u32 for old gcc")
Cc: stable@dpdk.org

Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
2020-06-30 14:52:30 +02:00
Matan Azrad
1cb4415751 vhost: introduce operation to get vDPA queue stats
The vDPA device offloads all the datapath of the vhost
device to the HW device.

In order to expose to the user traffic information this
patch introduces new 3 APIs to get traffic statistics, the
device statistics name and to reset the statistics per
virtio queue.

The statistics are taken directly from the vDPA driver
managing the HW device and can be different for each vendor
driver.

Signed-off-by: Matan Azrad <matan@mellanox.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2020-06-30 14:52:29 +02:00
Maxime Coquelin
d1c074bd76 vhost: enable reply-ack systematically
As announced during v20.05 release cycle, this
patch makes reply-ack protocol feature to be enabled
unconditionally.

This protocol feature makes the communication between the
master and the slave more robust, avoiding for example
possible undefined behaviour with VHOST_USER_SET_MEM_TABLE.

Also, reply-ack support will be required for upcoming
VHOST_USER_SET_STATUS request.

Note that this protocol feature was disabled by default
because Qemu version 2.7.0 to 2.9.0 had a bug causing a
deadlock when reply-ack was negotiated and multiqueue
enabled. These Qemu version are now very old and no more
maintained, so we can reasonably consider we no more
support them.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
2020-06-30 14:52:29 +02:00
Fady Bader
8d05adbd54 ring: enable for Windows
Building ring on Windows.

Signed-off-by: Fady Bader <fady@mellanox.com>
2020-06-30 01:30:22 +02:00
Tasnim Bashar
5652a4ad24 eal/windows: fix thread handle
Casting thread ID to handle is not accurate way to get thread handle.
Need to use OpenThread function to get thread handle from thread ID.

pthread_setaffinity_np and pthread_getaffinity_np functions
for Windows are affected because of it.

Signed-off-by: Tasnim Bashar <tbashar@mellanox.com>
2020-06-30 00:50:26 +02:00
Tal Shnaiderman
b762221ac2 bus/pci: support Windows with bifurcated drivers
Uses SetupAPI.h functions to scan PCI tree.
Uses DEVPKEY_Device_Numa_Node to get the PCI NUMA node.
Uses SPDRP_BUSNUMBER and SPDRP_BUSNUMBER to get the BDF.
scanning currently supports types RTE_KDRV_NONE.

Signed-off-by: Tal Shnaiderman <talshn@mellanox.com>
2020-06-30 00:02:54 +02:00
Tal Shnaiderman
33031608e8 bus/pci: introduce Windows support with stubs
Addition of stub eal and bus/pci functions to compile
bus/pci for Windows.

Signed-off-by: Tal Shnaiderman <talshn@mellanox.com>
2020-06-30 00:02:54 +02:00
Tal Shnaiderman
8517072c87 pci: fix address domain format size
the struct rte_pci_addr defines domain as uint32_t variable however
the PCI_PRI_FMT macro used for logging the struct sets the format
of domain to uint16_t.

The mismatch causes the following warning messages
in Windows clang build:

format specifies type 'unsigned short' but the argument
has type 'uint32_t' (aka 'unsigned int') [-Wformat]

Fixes: af75078fec ("first public release")
Cc: stable@dpdk.org

Signed-off-by: Tal Shnaiderman <talshn@mellanox.com>
2020-06-30 00:02:54 +02:00
Tal Shnaiderman
b137f95366 pci: build on Windows
Added <sys/types.h> in rte_pci header file
to include off_t type since it is missing for Windows.

Define the implementation of the Linux function rte_pci_get_sysfs_path
in pci_common.c for Linux OS only as it is unneeded for other OSs
and to avoid the warning on deprecated call to getenv() on Windows:

"warning: 'getenv' is deprecated: This function or variable may be unsafe.
Consider using _dupenv_s instead."

Signed-off-by: Tal Shnaiderman <talshn@mellanox.com>
2020-06-30 00:02:54 +02:00
Tal Shnaiderman
2fd3567e54 pci: use OS generic memory mapping functions
Changing all of PCIs Unix memory mapping to the
new memory allocation API wrapper.

Change all of PCI mapping function usage in
bus/pci to support the new API.

Signed-off-by: Tal Shnaiderman <talshn@mellanox.com>
2020-06-30 00:02:54 +02:00
Tal Shnaiderman
2ab745bd8f eal: move OS common options usage functions
Move common functions between Unix and Windows to eal_common_options.c.

Those functions are getter functions for rte_application_usage_hook.

Signed-off-by: Tal Shnaiderman <talshn@mellanox.com>
2020-06-30 00:02:53 +02:00
Tal Shnaiderman
57a2efb304 eal: move OS common config objects
Move common functions between Unix and Windows to eal_common_config.c.

Those functions are getter functions for IOVA,
configuration, Multi-process.

Move rte_config, internal_config, early_mem_config and runtime_dir
to be defined in the common file with getter functions.

Refactor the users of the config variables above to use
the getter functions.

Signed-off-by: Tal Shnaiderman <talshn@mellanox.com>
2020-06-30 00:02:53 +02:00
Tal Shnaiderman
309bf90bf9 build: generate version map file for MinGW
The MinGW build for Windows has special cases where exported
function contain additional prefix:

__emutls_v.per_lcore__*

To avoid adding those prefixed functions to the version.map file
the map_to_def.py script was modified to create a map file for MinGW
with the needed changed.

The file name was changed to map_to_win.py and lib/meson.build map output
was unified with drivers/meson.build output

Signed-off-by: Tal Shnaiderman <talshn@mellanox.com>
2020-06-30 00:02:53 +02:00
Xiaolong Ye
6248368a9a mbuf: add dump of free dynamic flags
Add support to dump free_flags as below format:

Free bit in mbuf->ol_flags (0 = occupied, 1 = free):
  0000: 0 0 0 0 0 0 0 0
  0008: 0 0 0 0 0 0 0 0
  0010: 0 0 0 0 0 0 0 1
  0018: 1 1 1 1 1 1 1 1
  0020: 1 1 1 1 1 1 1 1
  0028: 1 0 0 0 0 0 0 0
  0030: 0 0 0 0 0 0 0 0
  0038: 0 0 0 0 0 0 0 0

Signed-off-by: Xiaolong Ye <xiaolong.ye@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
2020-06-25 23:03:18 +02:00
Xiaolong Ye
d27a4cb3ef mbuf: fix dynamic field dump log
For each mbuf byte, free_space[i] == 0 means the space is occupied,
free_space[i] != 0 means space is free.

Fixes: 4958ca3a44 ("mbuf: support dynamic fields and flags")
Cc: stable@dpdk.org

Signed-off-by: Xiaolong Ye <xiaolong.ye@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
2020-06-25 23:03:18 +02:00
Xiaolong Ye
3a7fb882fd mbuf: fix free space update for dynamic field
The value free_space[i] is used to save the size of biggest aligned
element that can fit in the zone, current implementation has one flaw,
for example, if user registers dynfield1 (size = 4, align = 4, req = 124)
first, the free_space would be as below after registration:

  0070: 08 08 08 08 08 08 08 08
  0078: 08 08 08 08 00 00 00 00

Then if user continues to register dynfield2 (size = 4, align = 4),
free_space would become:

  0070: 00 00 00 00 04 04 04 04
  0078: 04 04 04 04 00 00 00 00

Further request dynfield3 (size = 8, align = 8) would fail to register
due to alignment requirement can't be satisfied, though there is enough
space remained in mbuf.

This patch fixes above issue by saving alignment only in aligned zone,
after the fix, above registrations order can be satisfied, free_space
would be like:

After dynfield1 registration:

  0070: 08 08 08 08 08 08 08 08
  0078: 04 04 04 04 00 00 00 00

After dynfield2 registration:

  0070: 08 08 08 08 08 08 08 08
  0078: 00 00 00 00 00 00 00 00

After dynfield3 registration:

  0070: 00 00 00 00 00 00 00 00
  0078: 00 00 00 00 00 00 00 00

This patch also reduces iterations in process_score() by jumping align
steps in each loop.

Fixes: 4958ca3a44 ("mbuf: support dynamic fields and flags")
Cc: stable@dpdk.org

Signed-off-by: Xiaolong Ye <xiaolong.ye@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
2020-06-25 23:03:18 +02:00
Xiaolong Ye
359386e15b mbuf: fix error code in dynamic field/flag registration
Set rte_errno as ENOMEM when allocation failure.

Fixes: 4958ca3a44 ("mbuf: support dynamic fields and flags")
Cc: stable@dpdk.org

Signed-off-by: Xiaolong Ye <xiaolong.ye@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
2020-06-25 23:03:18 +02:00
Xiaolong Ye
f8eb26dda8 mbuf: fix boundary check at dynamic field registration
We should make sure off + size < sizeof(struct rte_mbuf) to avoid
possible out-of-bounds access of free_space array, there is no issue
currently due to the low bits of free_flags (which is adjacent to
free_space) are always set to 0. But we shouldn't rely on it since it's
fragile and layout of struct mbuf_dyn_shm may be changed in the future.
This patch adds boundary check explicitly to avoid potential risk of
out-of-bounds access.

Fixes: 4958ca3a44 ("mbuf: support dynamic fields and flags")
Cc: stable@dpdk.org

Signed-off-by: Xiaolong Ye <xiaolong.ye@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
2020-06-25 23:03:18 +02:00
David Marchand
d58314aa3c eal: remove redundant newline in alert message
rte_eal_init_alert() already appends a newline.

Fixes: 0a529578f1 ("eal: clean up unused files on initialization")
Cc: stable@dpdk.org

Signed-off-by: David Marchand <david.marchand@redhat.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
2020-06-25 01:18:29 +02:00
Archit Pandey
78621061f7 sched: fix 64-bit rate
64-bit support was missing from the functions pipe_profile_check
and rte_sched_subport_config_pipe_profile_table.

Fixes: 68c1f26d42 ("sched: support 64-bit values")
Cc: stable@dpdk.org

Signed-off-by: Archit Pandey <architpandeynitk@gmail.com>
Acked-by: Jasvinder Singh <jasvinder.singh@intel.com>
2020-06-25 00:52:31 +02:00
Hrvoje Habjanic
fbd8981cdd sched: fix subport freeing
In function rte_sched_subport_free, there is code to free all allocated
stuff related to scheduler subport.
First there are some checks, and in the end, rte_bitmap_free is called.

Now, rte_bitmap_free is a dummy function, and it just checks if
provided pointer to bitmap is valid or not. So, actual memory for
subport is not freed.

This patch fixes this by removing call to rte_bitmap_free, and
instead calling rte_free.

Fixes: d9213b829a ("sched: remove pipe params config from port level")
Cc: stable@dpdk.org

Signed-off-by: Hrvoje Habjanic <hrvoje.habjanic@zg.ht.hr>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Jasvinder Singh <jasvinder.singh@intel.com>
2020-06-25 00:43:28 +02:00
Hongzhi Guo
06602c671a net: fix IPv4 checksum
0xffff is invalid for IPv4 checksum (RFC1624)

Fixes: 6006818cfb ("net: new checksum functions")
Cc: stable@dpdk.org

Signed-off-by: Hongzhi Guo <guohongzhi1@huawei.com>
Reviewed-by: Morten Brørup <mb@smartsharesystems.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
2020-06-24 23:42:20 +02:00
Konstantin Ananyev
ce77f03ca3 bpf/x86: support packet data load instructions
Make x86 JIT to generate native code for
(BPF_ABS | <size> | BPF_LD) and (BPF_IND | <size> | BPF_LD)
instructions.

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Morten Brørup <mb@smartsharesystems.com>
2020-06-24 23:42:20 +02:00
Konstantin Ananyev
b901d92836 bpf: support packet data load instructions
To fill the gap with linux kernel eBPF implementation,
add support for two non-generic instructions:
(BPF_ABS | <size> | BPF_LD) and (BPF_IND | <size> | BPF_LD)
which are used to access packet data.
These instructions can only be used when BPF context is a pointer
to 'struct rte_mbuf' (i.e: RTE_BPF_ARG_PTR_MBUF type).

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2020-06-24 23:42:04 +02:00
Konstantin Ananyev
20c19d9df5 bpf: fix add/sub min/max estimations
eval_add()/eval_sub() not always correctly estimate
minimum and maximum possible values of add/sub operations.

Fixes: 8021917293 ("bpf: add extra validation for input BPF program")
Cc: stable@dpdk.org

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2020-06-24 23:34:07 +02:00
Tal Shnaiderman
c91717eb75 eal/windows: support exit and panic
Support the debug functions in eal_common_debug.c for Windows.

Implementation of rte_dump_stack to get a backtrace similarly to Unix
and of rte_eal_cleanup in eal.c.

Signed-off-by: Tal Shnaiderman <talshn@mellanox.com>
Tested-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
2020-06-24 11:02:51 +02:00
Tal Shnaiderman
48be180de6 eal: move OS common debug functions to single file
Move common functions between Unix and Windows to eal_common_debug.c.

Those functions are rte_exit, __rte_panic and rte_dump_registers
which has the same implementation on Unix and Windows.

Signed-off-by: Tal Shnaiderman <talshn@mellanox.com>
2020-06-24 11:02:29 +02:00
Harman Kalra
02b73b1e93 eal/linux: fix epoll fd list rebuild for interrupts
An issue has been observed where epoll file descriptor
list rebuilds every time an interrupt/alarm event is
received.

eal_intr_process_interrupts() should notify pipe fd only
if any source is removed from the source list i.e (rv > 0)

Fixes: 0c7ce182a7 ("eal: add pending interrupt callback unregister")
Cc: stable@dpdk.org

Signed-off-by: Harman Kalra <hkalra@marvell.com>
2020-06-24 10:01:56 +02:00
Fady Bader
f6da77d92d meter: remove inline functions from export list
The code didn't compile when using exported meter functions under Windows.

error LNK2001: unresolved external symbol
rte_meter_srtcm_color_aware_check
error LNK2001: unresolved external symbol
rte_meter_srtcm_color_blind_check
error LNK2001: unresolved external symbol
rte_meter_trtcm_color_aware_check
error LNK2001: unresolved external symbol
rte_meter_trtcm_color_blind_check
error LNK2001: unresolved external symbol
rte_meter_trtcm_rfc4115_color_aware_check
error LNK2001: unresolved external symbol
rte_meter_trtcm_rfc4115_color_blind_check

The cause was that there were some inline functions that were included in
the export list.
To solve this the functions were removed from rte_meter_version.map export
list which are implemented in the header and shouldn't be exported.

Fixes: 655796d2b5 ("meter: support RFC4115 trTCM")
Fixes: 9d41beed24 ("lib: provide initial versioning")
Cc: stable@dpdk.org

Signed-off-by: Fady Bader <fady@mellanox.com>
2020-06-23 19:29:41 +02:00
Fady Bader
540fbc2786 timer: support EAL functions on Windows
Implemented the needed Windows eal timer functions.

Signed-off-by: Fady Bader <fady@mellanox.com>
Reviewed-by: Tal Shnaiderman <talshn@mellanox.com>
Tested-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
Acked-by: Ranjit Menon <ranjit.menon@intel.com>
2020-06-23 19:03:30 +02:00
Fady Bader
03437f2dc8 timer: move from common to Unix directory
EAL common timer doesn't compile under Windows.

Compilation log:
error LNK2019:
unresolved external symbol nanosleep referenced in function
rte_delay_us_sleep
error LNK2019:
unresolved external symbol get_tsc_freq referenced in function set_tsc_freq
error LNK2019:
unresolved external symbol sleep referenced in function set_tsc_freq

The reason was that some functions called POSIX functions.
The solution was to move POSIX dependent functions from common to Unix.

Signed-off-by: Fady Bader <fady@mellanox.com>
Reviewed-by: Tal Shnaiderman <talshn@mellanox.com>
Acked-by: Ranjit Menon <ranjit.menon@intel.com>
2020-06-23 18:33:20 +02:00
Stephen Hemminger
f3687f251e cfgfile: check flags on creation for future proofing
All API's should check that they support the flag values
passed. If an application passes an invalid flag it could
cause problems in later ABI.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2020-06-16 17:46:39 +02:00
Stephen Hemminger
28396b9043 stack: check flags on creation for future proofing
All API's should check that they support the flag values
passed. If an application passes an invalid flag it could
cause problems in later ABI.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Gage Eads <gage.eads@intel.com>
2020-06-16 17:46:39 +02:00
Stephen Hemminger
0cc917d809 hash: check flags on creation for future proofing
All API's should check that they support the flag values
passed. If an application passes an invalid flag it could
cause problems in later ABI.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Yipeng Wang <yipeng1.wang@intel.com>
2020-06-16 17:46:39 +02:00
Stephen Hemminger
3e71b3d456 ring: check flag settings for future proofing
All API's should check that they support the flag values passed.
These checks ensure that the extra bits can safely be used
without risk of ABI breakage.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2020-06-16 17:46:39 +02:00
Joyce Kong
7f3aa08639 eal: introduce bit operations API
Bitwise operation APIs are defined and used in a lot of PMDs,
which caused a huge code duplication. To reduce duplication,
this patch consolidates them into a common API family.

Signed-off-by: Joyce Kong <joyce.kong@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Phil Yang <phil.yang@arm.com>
Acked-by: Morten Brørup <mb@smartsharesystems.com>
2020-06-16 14:16:56 +02:00
Dmitry Kozlyuk
2a5d547a4a eal/windows: implement basic memory management
Basic memory management supports core libraries and PMDs operating in
IOVA as PA mode. It uses a kernel-mode driver, virt2phys, to obtain
IOVAs of hugepages allocated from user-mode. Multi-process mode is not
implemented and is forcefully disabled at startup. Assign myself as a
maintainer for Windows file and memory management implementation.

Signed-off-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
2020-06-15 19:30:54 +02:00
Dmitry Kozlyuk
c08bd191b1 eal/windows: initialize hugepage info
Add hugepages discovery ("large pages" in Windows terminology)
and update documentation for required privilege setup. Only 2MB
hugepages are supported and their number is estimated roughly
due to the lack or unstable status of suitable OS APIs.
Assign myself as maintainer for the implementation file.

Signed-off-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
2020-06-15 19:30:32 +02:00
Dmitry Kozlyuk
b8a36b0866 eal/windows: improve CPU and NUMA node detection
1. Map CPU cores to their respective NUMA nodes as reported by system.
2. Support systems with more than 64 cores (multiple processor groups).
3. Fix magic constants, styling issues, and compiler warnings.
4. Add EAL private function to map DPDK socket ID to NUMA node number.

Signed-off-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
2020-06-15 19:29:39 +02:00
Dmitry Kozlyuk
b831cfd23a eal/windows: complete queue.h data structures
Limited version imported previously lacks at least SLIST macros.
Import a complete file from FreeBSD, since its license exception is
already approved by Technical Board.

Signed-off-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
2020-06-15 19:27:28 +02:00
Dmitry Kozlyuk
5690607879 eal/windows: add tracing stubs
EAL common code depends on tracepoint calls, but generic implementation
cannot be enabled on Windows due to missing standard library facilities.
Add stub functions to support tracepoint compilation, so that common
code does not have to conditionally include tracepoints until proper
support is added.

Signed-off-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
2020-06-15 19:27:09 +02:00
Dmitry Kozlyuk
262c4ee791 trace: add size_t field emitter
It is not guaranteed that sizeof(long) == sizeof(size_t). On Windows,
sizeof(long) == 4 and sizeof(size_t) == 8 for 64-bit programs.
Tracepoints using "long" field emitter are therefore invalid there.
Add dedicated field emitter for size_t and use it to store size_t values
in all existing tracepoints.

Signed-off-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
2020-06-15 19:27:00 +02:00
Dmitry Kozlyuk
694161b7e0 mem: extract common dynamic memory allocation
Code in Linux EAL that supports dynamic memory allocation (as opposed to
static allocation used by FreeBSD) is not OS-dependent and can be reused
by Windows EAL. Move such code to a file compiled only for the OS that
require it. Keep Anatoly Burakov maintainer of extracted code.

Signed-off-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
2020-06-15 19:26:37 +02:00
Dmitry Kozlyuk
83713ef276 mem: extract common memseg list initialization
All supported OS create memory segment lists (MSL) and reserve VA space
for them in a nearly identical way. Move common code into EAL private
functions to reduce duplication.

Signed-off-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
2020-06-15 19:25:16 +02:00
Dmitry Kozlyuk
c4b89ecb64 eal: introduce memory management wrappers
Introduce OS-independent wrappers for memory management operations used
across DPDK and specifically in common code of EAL:

* rte_mem_map()
* rte_mem_unmap()
* rte_mem_page_size()
* rte_mem_lock()

Windows uses different APIs for memory mapping and reservation, while
Unices reserve memory by mapping it. Introduce EAL private functions to
support memory reservation in common code:

* eal_mem_reserve()
* eal_mem_free()
* eal_mem_set_dump()

Wrappers follow POSIX semantics limited to DPDK tasks, but their
signatures deliberately differ from POSIX ones to be more safe and
expressive. New symbols are internal. Being thin wrappers, they require
no special maintenance.

Signed-off-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
2020-06-15 19:25:05 +02:00
Dmitry Kozlyuk
176bb37ca6 eal: introduce internal wrappers for file operations
Introduce OS-independent wrappers in order to support common EAL code
on Unix and Windows:

* eal_file_open: open or create a file.
* eal_file_lock: lock or unlock an open file.
* eal_file_truncate: enforce a given size for an open file.

Implementation for Linux and FreeBSD is placed in "unix" subdirectory,
which is intended for common code between the two. These thin wrappers
require no special maintenance.

Common code supporting multi-process doesn't use the new wrappers,
because it is inherently Unix-specific and would impose excessive
requirements on the wrappers.

Signed-off-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
2020-06-15 19:24:37 +02:00
Dmitry Kozlyuk
67a661ed85 eal: replace page sizes enum with a set of constants
Clang on Windows follows MS ABI where enum values are limited to 2^31-1.
Enum rte_page_sizes has members valued above this limit, which get
wrapped to zero, resulting in compilation error (duplicate values in
enum). Using MS ABI is mandatory for Windows EAL to call Win32 APIs.

Remove rte_page_sizes and replace its values with #define's.
This enumeration is not used in public API, so there's no ABI breakage.
Announce API changes for 20.08 in documentation.

Suggested-by: Jerin Jacob <jerinjacobk@gmail.com>
Signed-off-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
2020-06-15 19:23:34 +02:00
David Marchand
b6f0621201 eal/windows: fix symbol export
rte_eal_get_configuration() has been made private in 19.11, remove
leftover in Windows export list.

Fixes: f58cef079b ("eal: make the global configuration private")

Signed-off-by: David Marchand <david.marchand@redhat.com>
2020-06-15 11:58:26 +02:00
Pallavi Kadam
d87f964ce6 eal/windows: fix warnings
Fixed bunch of warnings when compiling using clang on Windows
such as the use of an unsafe string function (strerror),
[-Wunused-variable], [-Wunused-function] in eal_common_options.c
[-Wunused-const-variable] in getopt.c and [-Wunused-parameter]
in eal_common_thread.c.
Also fixed warnings generated using Mingw:
[-Werror=old-style-definition], [-Werror=cast-function-type] and
[-Werror=attributes]

Signed-off-by: Ranjit Menon <ranjit.menon@intel.com>
Signed-off-by: Pallavi Kadam <pallavi.kadam@intel.com>
Tested-by: Narcisa Vasile <navasile@linux.microsoft.com>
Acked-by: Narcisa Vasile <navasile@linux.microsoft.com>
2020-06-15 11:35:58 +02:00
Tasnim Bashar
482bcf8404 eal/windows: support thread ID query
Add rte_sys_gettid function to use rte_gettid() on Windows.
rte_gettid() is required for recursive spin lock and recursive ticket lock.

Signed-off-by: Tasnim Bashar <tbashar@mellanox.com>
Acked-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
2020-06-11 16:40:29 +02:00
Tal Shnaiderman
4887a7e234 mbuf: align layout in Windows
Using uint32_t type bit-fields in Windows will pads the
'L2/L3/L4 and tunnel information' union with additional bits.

This padding causes rte_mbuf size misalignment and the total size
increases to 3 cache-lines.

Changed packet_type bit-fields types from uint32_t to uint8_t
to allow unified 2 cache-line structure size.

Added the __extension__ attribute over the modified struct to avoid
the warning:

type of bit-field ... is a GCC extension [-pedantic]

Signed-off-by: Tal Shnaiderman <talshn@mellanox.com>
Tested-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
Acked-by: Ranjit Menon <ranjit.menon@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
2020-06-11 16:26:33 +02:00
Alexander Kozyrev
d6eb247371 mbuf: fix external buffer pool boundaries
Memzones are created in testpmd in order to test external data
buffers functionality. Each memzone is 2Mb in size and divided among
the pool of external memory buffers.

Memzone may not always be fully utilized because mbufs size can vary
and some space can be left unused at the tail of a memzone. This is
not handled properly and mbuf can get the address of this leftover
space since this address is still valid (part of memzone), but there
is not enough space to fit the whole packet data. As a result packet
data may overflow and cause the memory corruption.

Take mbuf size into account when distributing memory addresses from
a memzone to external mbufs. Skip the remaining tail in case there
is not enough room for a packet and move to a next memzone instead.

Fixes: 6c8e50c2e5 ("mbuf: create pool with external memory buffers")
Cc: stable@dpdk.org

Signed-off-by: Alexander Kozyrev <akozyrev@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
2020-06-11 09:51:46 +02:00
Xiaolong Ye
c67a423c53 mbuf: remove unused next member in dynamic flag/field
TAILQ_ENTRY next is not needed in struct mbuf_dynfield_elt and
mbuf_dynflag_elt, since they are actually chained by rte_tailq_entry's
next field when calling TAILQ_INSERT_TAIL(mbuf_dynfield/dynflag_list, te,
next).

Fixes: 4958ca3a44 ("mbuf: support dynamic fields and flags")
Cc: stable@dpdk.org

Signed-off-by: Xiaolong Ye <xiaolong.ye@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
2020-06-11 09:32:43 +02:00
Thomas Monjalon
d1342ea419 mbuf: document guideline for new fields and flags
Since dynamic fields and flags were added in 19.11,
the idea was to use them for new features, not only PMD-specific.

The guideline is made more explicit in doxygen, in the mbuf guide,
and in the contribution design guidelines.

For more information about the original design, see the presentation
https://www.dpdk.org/wp-content/uploads/sites/35/2019/10/DynamicMbuf.pdf

This decision was discussed in the Technical Board:
http://mails.dpdk.org/archives/dev/2020-June/169667.html

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
2020-06-11 09:29:15 +02:00
Ciara Power
61d6c7a98b telemetry: fix init log printing
Initially, printf was used to indicate and error/warning resulting from
telemetry initialisation. This is now fixed to use EAL logs for
notices, and the unnecessary printf for an error is removed.

Fixes: eeb486f3ba ("eal: add telemetry as dependency")
Fixes: dd6275a424 ("telemetry: fix error log output")

Signed-off-by: Ciara Power <ciara.power@intel.com>
Reviewed-by: Bruce Richardson <bruce.richardson@intel.com>
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
2020-05-24 18:01:31 +02:00
Adam Dybkowski
e475fd853a cryptodev: fix SHA-1 digest enum comment
This patch fixes improper SHA-1 digest size in the enum comment
and also adds the note about HMAC-SHA-1-96.

Fixes: 1bd407fac8 ("cryptodev: extract symmetric operations")
Cc: stable@dpdk.org

Signed-off-by: Adam Dybkowski <adamx.dybkowski@intel.com>
Acked-by: Fiona Trahe <fiona.trahe@intel.com>
Acked-by: Anoob Joseph <anoobj@marvell.com>
Acked-by: Akhil Goyal <akhil.goyal@nxp.com>
2020-05-24 11:52:35 +02:00
Xuan Ding
22fa1bcbcb vhost: fix zero-copy server mode
This patch fixes the situation where vhost-user cannot start as server
with dequeue_zero_copy enabled.

Using flag instead of vsocket->is_server to determine whether vhost-user
is in client mode. Because vsocket->is_server is not ready at this time.

Fixes: 715070ea10 ("vhost: prevent zero-copy with incompatible client mode")
Cc: stable@dpdk.org

Signed-off-by: Xuan Ding <xuan.ding@intel.com>
Acked-by: Xiaolong Ye <xiaolong.ye@intel.com>
Tested-by: Yinan Wang <yinan.wang@intel.com>
2020-05-19 17:12:17 +02:00
Ferruh Yigit
60197bda97 meter: provide experimental alias for matured API
On v20.02 some meter APIs have been matured and symbols moved from
EXPERIMENTAL to DPDK_20.0.1 block.

This can break the applications that were using these mentioned APIs on
v19.11. Although there is no modification on the APIs and the action is
positive and matures the APIs, the affect can be negative to
applications.

This patch provides aliasing by duplicating the existing and versioned
symbols as experimental.

Since symbols moved from DPDK_20.0.1 to DPDK_21 block in the v20.05, the
aliasing done between EXPERIMENTAL and DPDK_21.

With DPDK_21 ABI (DPDK v20.11) all aliasing will be removed and only
stable version of the APIs will remain.

Fixes: 30512af820 ("meter: remove experimental flag from RFC4115 trTCM API")
Cc: stable@dpdk.org

Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
Acked-by: Ray Kinsella <mdr@ashroe.eu>
2020-05-19 16:25:09 +02:00
Jerin Jacob
8b9dae0cc3 doc: use globbing terminology
Glob is the terminology used in fnmatch man page.
Use glob terminology across DPDK for shell pattern.

Signed-off-by: Jerin Jacob <jerinj@marvell.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
2020-05-19 16:05:17 +02:00
Muhammad Bilal
5a448a55b4 fix same typo in multiple places
Removed the typing error in doc/guides/eventdevs/index.rst,
drivers/net/mlx5/mlx5.c and in lib/librte_vhost/rte_vhost.h

Bugzilla ID: 477
Fixes: 0857b94211 ("doc: add event device and software eventdev")
Fixes: 039253166a ("vhost: add device op when notification to guest is sent")
Fixes: ad74bc6195 ("net/mlx5: support multiport IB device during probing")
Cc: stable@dpdk.org

Signed-off-by: Muhammad Bilal <m.bilal@emumba.com>
2020-05-19 15:55:57 +02:00
Ciara Power
a0c21662b4 telemetry: fix buffer overrun if max bytes read
If 1024 bytes were received over the socket, this caused
buffer_recvf[bytes] to overrun the array. The size of the buffer - 1 is
now passed to the read function.

Coverity issue: 358442
Fixes: b80fe1805e ("telemetry: introduce backward compatibility")

Signed-off-by: Ciara Power <ciara.power@intel.com>
Acked-by: Kevin Laatz <kevin.laatz@intel.com>
2020-05-19 15:05:56 +02:00
Ciara Power
07580a734b telemetry: check socket creation failure
The return value from the socket function is now checked, as it can
return a negative value on error.

Coverity issue: 358443
Fixes: b80fe1805e ("telemetry: introduce backward compatibility")

Signed-off-by: Ciara Power <ciara.power@intel.com>
Acked-by: Kevin Laatz <kevin.laatz@intel.com>
2020-05-19 15:05:56 +02:00
Ciara Power
6aa1aa0ebc telemetry: close socket on connection failure
The socket fd is now being closed when the connection fails.

Coverity issue: 358444
Fixes: b80fe1805e ("telemetry: introduce backward compatibility")

Signed-off-by: Ciara Power <ciara.power@intel.com>
Acked-by: Kevin Laatz <kevin.laatz@intel.com>
2020-05-19 15:05:56 +02:00
Ciara Power
bd3c89cb1a telemetry: fix error checking for strchr function
The strchr function return was not being checked which could lead to
NULL deferencing later in the function.

Coverity issue: 358438, 358445
Fixes: b80fe1805e ("telemetry: introduce backward compatibility")

Signed-off-by: Ciara Power <ciara.power@intel.com>
Acked-by: Kevin Laatz <kevin.laatz@intel.com>
2020-05-19 15:05:56 +02:00
Ciara Power
febbebf7f2 telemetry: keep threads separate from data plane
The threads for listening on the telemetry sockets are control threads
and should be separated from those on the data plane. Since telemetry
cannot use the rte_ctrl_thread_create() API, as it does not depend on
EAL, we pass the ctrl thread cpu_set to telemetry init and use it
directly to ensure that telemetry cannot interfere with the data plane
threads.

Signed-off-by: Ciara Power <ciara.power@intel.com>
Acked-by: Kevin Laatz <kevin.laatz@intel.com>
2020-05-19 15:05:56 +02:00
Gaetan Rivet
e90b9c52f8 kvargs: fix strcmp helper documentation
Minor error, "unless" was used instead of "unlike".

Fixes: a3b85476c5 ("kvargs: add generic string matching callback")
Cc: stable@dpdk.org

Signed-off-by: Gaetan Rivet <grive@u256.net>
2020-05-19 15:05:56 +02:00
Hemant Agrawal
1168be0077 metrics: fix library cleanup
metrics_initialized shall be reset in deinit function.
This is currently causing issue in running metrics_autotest
multiple times.

Fixes: 07c1b6925b ("telemetry: invert dependency on metrics library")

Signed-off-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Acked-by: David Marchand <david.marchand@redhat.com>
2020-05-19 14:05:21 +02:00
Gaetan Rivet
8354e681e3 pci: explain how empty strings are rejected in DBDF
Empty strings are forbidden as input to rte_pci_addr_parse().
It is explicitly enforced in BDF parsing as parsing the bus
field will immediately fail. The related check is commented.

It is implicitly enforced in DBDF parsing, as the domain would be
parsed to 0 without error, but the check `end[0] != ':'` afterward
will return -EINVAL.

Enforcing consistency between parsers by reading the code is not helped
by this property being implicit. Add a comment to explain.

Signed-off-by: Gaetan Rivet <grive@u256.net>
Acked-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
2020-05-19 11:18:38 +02:00
Gaetan Rivet
21a61fae51 pci: reject negative values in PCI id
The function strtoul will not return ERANGE if the input is negative, as
one might expect.

   0000:-FFFFFFFFFFFFFFFB:00.0

is not a better way to write 0000:05:00.0.
To simplify checking for '-', forbid using spaces before the field value.

   0000: 00:   2c.0

Should not be accepted.

Fixes: af75078fec ("first public release")
Cc: stable@dpdk.org

Signed-off-by: Gaetan Rivet <grive@u256.net>
Acked-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
2020-05-19 11:18:38 +02:00
Darek Stojaczyk
26cfc20fed pci: accept 32-bit domain numbers
The parsing code was bailing on domains greater than UINT16_MAX,
but domain numbers like that are still valid and present on some systems.
One example is Intel VMD (Volume Management Device), which acts somewhat
as a software-managed PCI switch and its upstream linux driver assigns
all downstream devices a PCI domain of 0x10000.

Parsing a BDF like 10000:01:00.0 was failing before. To fix it, increase
the upper limit of domain number to UINT32_MAX. This matches the size of
struct rte_pci_addr->domain (uint32).

Fixes: af75078fec ("first public release")
Cc: stable@dpdk.org

Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Acked-by: Gaetan Rivet <grive@u256.net>
2020-05-19 10:59:19 +02:00
Sivaprasad Tummala
0fd5608ef9 vhost: handle mbuf allocation failure
vhost buffer allocation is successful for packets that fit
into a linear buffer. If it fails, vhost library is expected
to drop the current packet and skip to the next.

The patch fixes the error scenario by skipping to next packet.
Note: Drop counters are not currently supported.

Fixes: c3ff0ac70a ("vhost: improve performance by supporting large buffer")
Cc: stable@dpdk.org

Signed-off-by: Sivaprasad Tummala <sivaprasad.tummala@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2020-05-18 20:35:57 +02:00
Stephen Hemminger
3a2cd6fd06 eal: fix C++17 compilation
Compiling a C++ application that includes directly or indirectly
rte_common.h will cause a warning:

include/rte_common.h:350:37: warning: ISO C++17 does not allow
  ‘register’ storage class specifier [-Wregister]
 rte_combine32ms1b(register uint32_t x)

C++ is pickier than standard C and flags this antique usage.

The register keyword is an old K&R legacy and should be removed
everywhere in DPDK. For now, fix it where it hurts.

Fixes: 08f683174e ("eal: add functions for previous power of 2 alignment")
Cc: stable@dpdk.org

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
2020-05-18 20:46:24 +02:00
Ferruh Yigit
05a38d7c75 compat: provide experimental alias for matured ABI
On v20.02 some APIs matured and symbols moved from EXPERIMENTAL to
DPDK_20.0.1 block.

This had the affect of breaking the applications that were using these
APIs on v19.11. Although there is no modification of the APIs and the
action is positive and matures the APIs, the affect can be negative to
applications.

When a maintainer is promoting an API to become part of the next major
ABI version by removing the experimental tag. The maintainer may
choose to offer an alias to the experimental tag, to prevent these
breakages in future.

The following changes are made to enabling aliasing:

Updated to the ABI policy and ABI versioning documents.

Created VERSION_SYMBOL_EXPERIMENTAL helper macro.

Updated the 'check-symbols.sh' tool, which was complaining that the
symbol is in EXPERIMENTAL tag in .map file but it is not in the
.experimental section (__rte_experimental tag is missing).
Updated tool in a way it won't complain if the symbol in the
EXPERIMENTAL tag duplicated in some other block in .map file (versioned)

Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Signed-off-by: Ray Kinsella <mdr@ashroe.eu>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
2020-05-18 19:46:25 +02:00
Xuan Ding
e7debf6026 vhost: fix potential fd leak
Vhost will create temporary file when receiving VHOST_USER_GET_INFLIGHT_FD
message. Malicious guest can send endless this message to drain out the
resource of host.

When receiving VHOST_USER_GET_INFLIGHT_FD message repeatedly, closing the
file created during the last handling of this message.

CVE-2020-10726
Fixes: d87f1a1cb7 ("vhost: support inflight info sharing")
Cc: stable@dpdk.org

Signed-off-by: Xuan Ding <xuan.ding@intel.com>
Signed-off-by: Xiaolong Ye <xiaolong.ye@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2020-05-18 15:22:42 +02:00
Xiaolong Ye
549de54c4f vhost: fix potential memory space leak
A malicious container which has direct access to the vhost-user socket
can keep sending VHOST_USER_GET_INFLIGHT_FD messages which may cause
leaking resources until resulting a DOS. Fix it by unmapping the
dev->inflight_info->addr before assigning new mapped addr to it.

CVE-2020-10726
Fixes: d87f1a1cb7 ("vhost: support inflight info sharing")
Cc: stable@dpdk.org

Signed-off-by: Xiaolong Ye <xiaolong.ye@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2020-05-18 15:22:42 +02:00
Marvin Liu
97ecc1c85c vhost: fix translated address not checked
Malicious guest can construct desc with invalid address and zero buffer
length. That will request vhost to check both translated address and
translated data length. This patch will add missed address check.

CVE-2020-10725
Fixes: 75ed516978 ("vhost: add packed ring batch dequeue")
Fixes: ef861692c3 ("vhost: add packed ring batch enqueue")
Cc: stable@dpdk.org

Signed-off-by: Marvin Liu <yong.liu@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2020-05-18 15:22:42 +02:00
Maxime Coquelin
acd4c92fa6 vhost/crypto: validate keys lengths
transform_cipher_param() and transform_chain_param() handle
the payload data for the VHOST_USER_CRYPTO_CREATE_SESS
message. These payloads have to be validated, since it
could come from untrusted sources.

Two buffers and their lengths are defined in this payload,
one the the auth key and one for the cipher key. But above
functions do not validate the key length inputs, which could
lead to read out of bounds, as buffers have static sizes of
64 bytes for the cipher key and 512 bytes for the auth key.

This patch adds necessary checks on the key length field
before being used.

CVE-2020-10724
Fixes: e80a987081 ("vhost/crypto: add session message handler")
Cc: stable@dpdk.org

Reported-by: Ilja Van Sprundel <ivansprundel@ioactive.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Xiaolong Ye <xiaolong.ye@intel.com>
Reviewed-by: Ilja Van Sprundel <ivansprundel@ioactive.com>
2020-05-18 15:22:34 +02:00
Maxime Coquelin
c78d94189d vhost: fix vring index check
vhost_user_check_and_alloc_queue_pair() is used to extract
a vring index from a payload. This function validates the
index and is called early on in when performing message
handling. Most message handlers depend on it correctly
validating the vring index.

Depending on the message type the vring index is in
different parts of the payload. The function contains a
switch/case for each type and copies the index. This is
stored in a uint16. This index is then validated. Depending
on the message, the source index is an unsigned int. If
integer truncation occurs (uint->uint16) the top 16 bits
of the index are never validated.

When they are used later on  (e.g. in
vhost_user_set_vring_num() or vhost_user_set_vring_addr())
it can lead to out of bound indexing. The out of bound
indexed data gets written to, and hence this can cause
memory corruption.

This patch fixes this vulnerability by declaring vring
index as an unsigned int in
vhost_user_check_and_alloc_queue_pair().

CVE-2020-10723
Fixes: 160cbc815b ("vhost: remove a hack on queue allocation")
Cc: stable@dpdk.org

Reported-by: Ilja Van Sprundel <ivansprundel@ioactive.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Xiaolong Ye <xiaolong.ye@intel.com>
Reviewed-by: Ilja Van Sprundel <ivansprundel@ioactive.com>
2020-05-18 15:18:58 +02:00
Maxime Coquelin
3ae4beb079 vhost: check log mmap offset and size overflow
vhost_user_set_log_base() is a message handler that is
called to handle the VHOST_USER_SET_LOG_BASE message.
Its payload contains a 64 bit size and offset. Both are
added up and used as a size when calling mmap().

There is no integer overflow check. If an integer overflow
occurs a smaller memory map would be created than
requested. Since the returned mapping is mapped as writable
and used for logging, a memory corruption could occur.

CVE-2020-10722
Fixes: fbc4d248b1 ("vhost: fix offset while mmaping log base address")
Cc: stable@dpdk.org

Reported-by: Ilja Van Sprundel <ivansprundel@ioactive.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Xiaolong Ye <xiaolong.ye@intel.com>
Reviewed-by: Ilja Van Sprundel <ivansprundel@ioactive.com>
2020-05-18 15:18:58 +02:00
Kevin Traynor
572f2a9089 hash: fix gcc 10 maybe-uninitialized warning
gcc 10.1.1 reports a warning for the ext_bkt_id variable:

../lib/librte_hash/rte_cuckoo_hash.c:
In function ‘__rte_hash_add_key_with_hash’:
../lib/librte_hash/rte_cuckoo_hash.c:1104:29:
warning: ‘ext_bkt_id’ may be used uninitialized in this function
[-Wmaybe-uninitialized]
 1104 |  (h->buckets_ext[ext_bkt_id - 1]).sig_current[0] = short_sig;
      |                  ~~~~~~~~~~~^~~

The return value of rte_ring_sc_dequeue_elem() is already checked,
but also initialize ext_bkt_id to zero (invalid value) and check
that it also overwritten.

Fixes: fbfe568103 ("hash: use 32-bit elements rings to save memory")
Cc: stable@dpdk.org

Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Acked-by: Yipeng Wang <yipeng1.wang@intel.com>
2020-05-18 13:54:36 +02:00
Nithin Dabilpuram
a8b8a86317 node: fix arm64 build with old gcc
Older GCC(~4) complains about uninitialized 'dip'
var though all the lanes of the vec register are set.
Hence this patch explicitly initializes vec register
to fix the issue.

In file included from ip4_lookup.c:34:0:
ip4_lookup_neon.h: n function ‘ip4_lookup_node_process’: \
ip4_lookup_neon.h:25:12: error: ‘dip’ may be used uninitialized in \
	this function [-Werror=maybe-uninitialized]
  int32x4_t dip;
            ^

Fixes: 16df6a2c66 ("node: add IPv4 lookup for arm64")

Reported-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Jerin Jacob <jerinj@marvell.com>
2020-05-13 15:38:50 +02:00
Dekel Peled
6b30428820 doc: refine ethernet and VLAN flow rule items
Specified pattern may be translated in different manner.
For example the pattern "eth / ipv4" can be translated to match
untagged packets only, since the pattern doesn't specify a VLAN item.
It can also be translated to match both tagged and untagged packets,
for the same reason.
This patch updates the rte_flow documentation to clearly specify the
required pattern to use.
For example:
To match tagged ipv4 packets, the pattern "eth / vlan / ipv4 / end"
should be used.
To match untagged ipv4 packets, the pattern "eth / ipv4 / end"
should be used.
To match all IPV4 packets, both tagged and untagged, need to apply
two rules with the patterns above.
To match both tagged and untagged packets of any type, the pattern
"eth / end" should be used.

Signed-off-by: Dekel Peled <dekelp@mellanox.com>
Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Ori Kam <orika@mellanox.com>
2020-05-11 22:27:39 +02:00
Asaf Penso
f6eb393849 ethdev: add 200G link speed
There is no way to report back a link speed of 200Gbps.

Adding 200G link speed.

Signed-off-by: Asaf Penso <asafp@mellanox.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
2020-05-11 22:27:39 +02:00
Arek Kusztal
a0f0de06d4 cryptodev: fix ABI compatibility for ChaCha20-Poly1305
This patch adds versioned function rte_cryptodev_info_get()
to prevent some issues with ABI policy.
Node v21 works in same way as before, returning driver capabilities
directly to the API caller. These capabilities may include new elements
not part of the v20 ABI.
Node v20 function maintains compatibility with v20 ABI releases
by stripping out elements not supported in v20 ABI. Because
rte_cryptodev_info_get is called by other API functions,
rte_cryptodev_sym_capability_get function is versioned the same way.

Fixes: b922dbd38c ("cryptodev: add ChaCha20-Poly1305 AEAD algorithm")

Signed-off-by: Arek Kusztal <arkadiuszx.kusztal@intel.com>
Acked-by: Ray Kinsella <mdr@ashroe.eu>
Acked-by: Fiona Trahe <fiona.trahe@intel.com>
Acked-by: Akhil Goyal <akhil.goyal@nxp.com>
2020-05-11 13:17:43 +02:00
Arek Kusztal
b922dbd38c cryptodev: add ChaCha20-Poly1305 AEAD algorithm
This patch adds Chacha20-Poly1305 AEAD algorithm to Cryptodev.

Signed-off-by: Arek Kusztal <arkadiuszx.kusztal@intel.com>
Acked-by: Fiona Trahe <fiona.trahe@intel.com>
Acked-by: Anoob Joseph <anoobj@marvell.com>
Acked-by: Akhil Goyal <akhil.goyal@nxp.com>
2020-05-11 13:17:43 +02:00
Vladimir Medvedkin
e62893f5ec ipsec: check SAD lookup error
Explicitly check return value in add_specific()
CID 357760 (#2 of 2): Negative array index write (NEGATIVE_RETURNS)
8. negative_returns: Using variable ret as an index to array sad->cnt_arr

Coverity issue: 357760
Fixes: b2ee269267 ("ipsec: add SAD add/delete/lookup implementation")
Cc: stable@dpdk.org

Signed-off-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2020-05-11 13:17:43 +02:00
Akhil Goyal
e11bdd3774 cryptodev: add feature flag for non-byte aligned data
Some wireless algos like SNOW, ZUC may support input
data in bits which are not byte aligned. However, not
all PMDs can support this requirement. Hence added a
new feature flag RTE_CRYPTODEV_FF_NON_BYTE_ALIGNED_DATA
to identify which all PMDs can support non-byte aligned
data.

Signed-off-by: Akhil Goyal <akhil.goyal@nxp.com>
Acked-by: Fiona Trahe <fiona.trahe@intel.com>
Acked-by: Adam Dybkowski <adamx.dybkowski@intel.com>
Acked-by: Anoob Joseph <anoobj@marvell.com>
2020-05-11 13:17:43 +02:00
Phil Yang
1a805dee01 ipsec: optimize SA outbound sequence update
For SA outbound packets, rte_atomic64_add_return is used to generate
SQN atomically. Use C11 atomics with RELAXED ordering for outbound SQN
update instead of rte_atomic ops which enforce unnecessary barriers on
aarch64.

Signed-off-by: Phil Yang <phil.yang@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2020-05-11 13:17:43 +02:00
Nicolas Chautru
cc29fea1ca bbdev: fix doxygen comments
Several doxygen markup were incorrect in header files.

Fixes: 4935e1e9f7 ("bbdev: introduce wireless base band device lib")
Cc: stable@dpdk.org

Signed-off-by: Nicolas Chautru <nicolas.chautru@intel.com>
Acked-by: Akhil Goyal <akhil.goyal@nxp.com>
2020-05-11 13:17:43 +02:00
Ferruh Yigit
867b49d17a ring: fix build for gcc O1 optimization
Can be reproduced with "make EXTRA_CFLAGS='-O1'" command using
gcc (GCC) 9.3.1 20200408 (Red Hat 9.3.1-2)

Two build errors:
1)
In file included from .../build/include/rte_ring_elem.h:1093,
                 from .../lib/librte_rcu/rte_rcu_qsbr.c:21:
../lib/librte_rcu/rte_rcu_qsbr.c: In function ‘rte_rcu_qsbr_dq_reclaim’:
.../build/include/rte_ring_peek.h:282:22:
    error: ‘avail’ may be used uninitialized in this function
           [-Werror=maybe-uninitialized]
  282 |   *available = avail - n;
      |                ~~~~~~^~~
./build/include/rte_ring_peek.h:259:11: note: ‘avail’ was declared here
  259 |  uint32_t avail, head, next;
      |           ^~~~~

2)
In file included from .../build/include/rte_ring_elem.h:1093,
                 from .../build/include/rte_ring.h:405,
                 from .../app/test/test_ring_stress.h:13,
                 from .../app/test/test_ring_stress_impl.h:5,
                 from .../app/test/test_ring_peek_stress.c:5:
.../app/test/test_ring_peek_stress.c: In function ‘_st_ring_enqueue_bulk’:
.../build/include/rte_ring_peek.h:80:22:
    error: ‘free’ may be used uninitialized in this function
           [-Werror=maybe-uninitialized]
   80 |   *free_space = free - n;
      |                 ~~~~~^~~
.../build/include/rte_ring_peek.h:60:11: note: ‘free’ was declared here
   60 |  uint32_t free, head, next;
      |           ^~~~

The cases shouldn't be hit, and it looks like there is already logic
error if it has been hit, but assigning 'avail' & 'free' to '0' to fix
the build error.

Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2020-05-11 19:20:54 +02:00
David Marchand
dd6275a424 telemetry: fix error log output
Caught while running testpmd:
No telemetry legacy support- No legacy callbacks, legacy socket not createdInteractive-mode selected

Add missing \n.

Fixes: 6dd571fd07 ("telemetry: introduce new functionality")

Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
2020-05-11 18:58:14 +02:00
David Marchand
a1d8f30925 telemetry: fix build for armv7
telemetry can not depend on EAL anymore but it still wants to get arch
headers.
We directly point at the right source directories by using the same logic
than EAL. However the special case of armv7 has been missed.

Fix this by defaulting ARCH_DIR to RTE_ARCH.

Caught on OBS:
[  162s]   SYMLINK-FILE include/rte_telemetry.h
[  162s]   CC telemetry.o
[  162s]   CC telemetry_data.o
[  162s]   CC telemetry_legacy.o
[  162s] .../lib/librte_telemetry/telemetry.c:15:10: fatal error:
 rte_spinlock.h: No such file or directory
[  162s]  #include <rte_spinlock.h>
[  162s]           ^~~~~~~~~~~~~~~~
[  162s] compilation terminated.

Fixes: 6dd571fd07 ("telemetry: introduce new functionality")

Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
2020-05-11 17:44:13 +02:00
Bing Zhao
b341a09c1d mem: fix overflow on allocation
The size checking is done in the caller. The size parameter is an
unsigned (64b wide) right now, so the comparison with zero should be
enough in most cases. But it won't help in the following case.
If the allocating request input a huge number by mistake, e.g., some
overflow after the calculation (especially subtraction), the checking
in the caller will succeed since it is not zero. Indeed, there is not
enough space in the system to support such huge memory allocation.
Usually it will return failure in the following code. But if the
input size is just a little smaller than the UINT64_MAX, like -2 in
signed type.
The roundup will cause an overflow and then "reset" the size to 0,
and then only a header (128B now) with zero length will be returned.
The following will be the previous allocation header.
It should be OK in most cases if the application won't access the
memory body. Or else, some critical issue will be caused and not easy
to debug. So this issue should be prevented at the beginning, like
other big size failure, NULL pointer should be returned also.

Fixes: fdf20fa7be ("add prefix to cache line macros")
Cc: stable@dpdk.org

Signed-off-by: Bing Zhao <bingz@mellanox.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
2020-05-11 17:44:13 +02:00
Phil Yang
205032bbfc service: relax barriers with C11 atomics
The runstate, comp_runstate and app_runstate are used as guard variables
in the service core lib. To guarantee the inter-threads visibility of
these guard variables, it uses rte_smp_r/wmb. This patch use c11 atomic
built-ins to relax these barriers.

Signed-off-by: Phil Yang <phil.yang@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Acked-by: Harry van Haaren <harry.van.haaren@intel.com>
2020-05-11 13:21:54 +02:00
Phil Yang
41e8227e20 service: optimize with C11 atomics
The num_mapped_cores is used as a statistics. Use c11 atomics with
RELAXED ordering for num_mapped_cores instead of rte_atomic ops which
enforce unnessary barriers on aarch64.

Replace execute_lock operations to spinlock_try_lock to avoid duplicate
code.

Signed-off-by: Phil Yang <phil.yang@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Acked-by: Harry van Haaren <harry.van.haaren@intel.com>
2020-05-11 13:21:54 +02:00
Phil Yang
6c8d14ffbb service: remove redundant code
The service id validation is duplicated, remove the redundant code
in the calling functions.

Signed-off-by: Phil Yang <phil.yang@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Acked-by: Harry van Haaren <harry.van.haaren@intel.com>
2020-05-11 13:21:54 +02:00
Phil Yang
7a0ad72f6e service: remove rte prefix from static functions
clean up rte prefix from static functions.
remove unused parameter for service_dump_one function.

Signed-off-by: Phil Yang <phil.yang@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Acked-by: Harry van Haaren <harry.van.haaren@intel.com>
2020-05-11 13:21:54 +02:00
Honnappa Nagarahalli
5c76111f06 service: fix identification of service running on other lcore
The logic to identify if the MT unsafe service is running on another
core can return -EBUSY spuriously. In such cases, running the service
becomes costlier than using atomic operations. Assume that the
application passes the right parameters and reduce the number of
instructions for all cases.

Cc: stable@dpdk.org
Fixes: 8d39d3e237 ("service: fix race in service on app lcore function")

Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Phil Yang <phil.yang@arm.com>
Acked-by: Harry van Haaren <harry.van.haaren@intel.com>
2020-05-11 13:17:05 +02:00
Honnappa Nagarahalli
18cae99cb9 service: fix race condition for MT unsafe service
A MT unsafe service might get configured to run on another core
while the service is running currently. This might result in the
MT unsafe service running on multiple cores simultaneously. Use
'execute_lock' always when the service is MT unsafe.

If the service is known to be mapped on a single lcore,
setting the service capability to MT safe will avoid taking
the lock and improve the performance.

Fixes: e9139a32f6 ("service: add function to run on app lcore")
Cc: stable@dpdk.org

Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Phil Yang <phil.yang@arm.com>
Acked-by: Harry van Haaren <harry.van.haaren@intel.com>
2020-05-11 09:33:45 +02:00
Bruce Richardson
293c53d8b2 eal: add telemetry callbacks
EAL now registers commands to provide some basic info from EAL.

Example:
Connecting to /var/run/dpdk/rte/dpdk_telemetry.v2
{"version": "DPDK 20.05.0-rc0", "pid": 72662, "max_output_len": 16384}
--> /
{"/": ["/", "/eal/app_params", "/eal/params", "/ethdev/link_status", \
    "/ethdev/list", "/ethdev/xstats", "/help", "/info", "/rawdev/list", \
    "/rawdev/xstats"]}
--> /eal/app_params
{"/eal/app_params": ["-i"]}
--> /eal/params
{"/eal/params": ["./app/dpdk-testpmd"]}

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Signed-off-by: Ciara Power <ciara.power@intel.com>
Reviewed-by: Keith Wiles <keith.wiles@intel.com>
2020-05-11 00:37:16 +02:00
Ciara Power
e122b0bff9 eal: remove option registration infrastructure
As Telemetry no longer uses rte_option, and was the only user of this
infrastructure, it can now be removed.

Signed-off-by: Ciara Power <ciara.power@intel.com>
Reviewed-by: Keith Wiles <keith.wiles@intel.com>
2020-05-11 00:37:16 +02:00
Ciara Power
eeb486f3ba eal: add telemetry as dependency
This patch moves telemetry further down the build, and adds it as a
dependency for EAL. Telemetry V2 is now configured to build by default,
and the legacy support is built when the telemetry config flag is set.

Telemetry now has EAL flags, shown below:
"--telemetry" = Enables telemetry (this is default if no flags given)
"--no-telemetry" = Disables telemetry

When telemetry is enabled, it will attempt to open the new socket
version, and also the legacy support socket (this will depend on Jansson
external dependency and telemetry config flag, as before).

Signed-off-by: Ciara Power <ciara.power@intel.com>
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Reviewed-by: Keith Wiles <keith.wiles@intel.com>
2020-05-11 00:37:16 +02:00
Ciara Power
63e7cb1bf1 telemetry: remove redundant code
This patch removes the existing telemetry files, which are now redundant
as the new version of telemetry has backward compatibility for their
functionality.

Signed-off-by: Ciara Power <ciara.power@intel.com>
Reviewed-by: Keith Wiles <keith.wiles@intel.com>
2020-05-11 00:37:16 +02:00
Ciara Power
b80fe1805e telemetry: introduce backward compatibility
The new telemetry will now open a socket using the old telemetry path,
to ensure backward compatibility. This is not yet initialised, as it
would clash with the existing telemetry, to be removed in a later patch.
This means that both old and new telemetry socket interfaces are
handled in a common way.

Signed-off-by: Ciara Power <ciara.power@intel.com>
Reviewed-by: Keith Wiles <keith.wiles@intel.com>
2020-05-11 00:37:15 +02:00
Ciara Power
b1ad0e1245 rawdev: add telemetry callbacks
The rawdev library now registers commands with telemetry, and
implements the corresponding callback functions. These allow a list of
rawdev devices and xstats for a rawdev port to be queried.

An example usage, with ioat rawdev driver instances, is shown below:

Connecting to /var/run/dpdk/rte/dpdk_telemetry.v2
{"version": "DPDK 20.05.0-rc0", "pid": 65777, "max_output_len": 16384}
--> /
{"/": ["/", "/ethdev/link_status", "/ethdev/list", "/ethdev/xstats", \
    "/help", "/info", "/rawdev/list", "/rawdev/xstats"]}
--> /rawdev/list
{"/rawdev/list": [0, 1, 2, 3, 4, 5]}
--> /rawdev/xstats,0
{"/rawdev/xstats": {"failed_enqueues": 0, "successful_enqueues": 0, \
    "copies_started": 0, "copies_completed": 0}}

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Signed-off-by: Ciara Power <ciara.power@intel.com>
Reviewed-by: Keith Wiles <keith.wiles@intel.com>
2020-05-11 00:37:09 +02:00
Bruce Richardson
c190daedb9 ethdev: add telemetry callbacks
The ethdev library now registers commands with telemetry, and
implements the callback functions. These commands allow the list of
ethdev ports and the xstats and link status for a port to be queried.

An example using ethdev commands is shown below:

Connecting to /var/run/dpdk/rte/dpdk_telemetry.v2
{"version": "DPDK 20.05.0-rc0", "pid": 64379, "max_output_len": 16384}
--> /
{"/": ["/", "/ethdev/link_status", "/ethdev/list", "/ethdev/xstats", \
    "/help", "/info"]}
--> /ethdev/list
{"/ethdev/list": [0, 1, 2, 3]}
--> /ethdev/link_status,0
{"/ethdev/link_status": {"status": "UP", "speed": 10000, "duplex": \
    "full-duplex"}}
--> /ethdev/xstats,0
{"/ethdev/xstats": {"rx_good_packets": 0, "tx_good_packets": 0, \
    <snip>
    "tx_priority7_xon_to_xoff_packets": 0}}

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Signed-off-by: Ciara Power <ciara.power@intel.com>
Reviewed-by: Keith Wiles <keith.wiles@intel.com>
2020-05-11 00:37:01 +02:00
Ciara Power
f38748736e telemetry: add default callback commands
The default commands are now added to provide the list of commands
available, help text for a specified command, and also information
about DPDK and telemetry.

Signed-off-by: Ciara Power <ciara.power@intel.com>
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Reviewed-by: Keith Wiles <keith.wiles@intel.com>
2020-05-10 23:56:33 +02:00
Bruce Richardson
ed1bfad7d3 telemetry: add functions for returning callback data
The functions added in this patch will help applications build
up data in reply to a telemetry request.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Reviewed-by: Keith Wiles <keith.wiles@intel.com>
2020-05-10 23:54:25 +02:00
Bruce Richardson
6dd571fd07 telemetry: introduce new functionality
This patch introduces a new telemetry connection socket and handling
functionality. Like the existing telemetry implementation (which is
unaffected by this change) it uses a unix socket, but unlike the
existing one it does not have a fixed list of commands - instead
libraries or applications can register telemetry commands and callbacks
to provide a full-extensible solution for all kinds of telemetry across
DPDK.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Signed-off-by: Ciara Power <ciara.power@intel.com>
Reviewed-by: Keith Wiles <keith.wiles@intel.com>
2020-05-10 23:53:57 +02:00
Bruce Richardson
52af6ccb2b telemetry: add utility functions for creating JSON
The functions added in this patch will make it easier for telemetry
to convert data to correct JSON responses to telemetry requests.
Tests are also  added for these json utility functions.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Reviewed-by: Keith Wiles <keith.wiles@intel.com>
2020-05-10 23:52:41 +02:00
Bruce Richardson
07c1b6925b telemetry: invert dependency on metrics library
Rather than having the telemetry library depend on the metrics
lib we invert the dependency so that metrics instead depends
on telemetry lib, and registers the needed functions with it
at init time. This prepares the way for a cleaner telemetry
architecture to be applied in later patches.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Reviewed-by: Keith Wiles <keith.wiles@intel.com>
2020-05-10 23:52:00 +02:00
Ciara Power
bb8f5fc317 metrics: reduce telemetry code
The telemetry code that was moved into the metrics library can be
shortened, while still maintaining the same functionality.

Signed-off-by: Ciara Power <ciara.power@intel.com>
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Reviewed-by: Keith Wiles <keith.wiles@intel.com>
2020-05-10 23:50:32 +02:00
Ciara Power
c5b7197f66 telemetry: move some functions to metrics library
This commit moves some of the telemetry library code to a new file in
the metrics library. No modifications are made to the moved code,
except what is needed to allow it to compile and run. The additional
code in metrics is built only when the Jansson library is  present.
Telemetry functions as normal, using the functions from the
metrics_telemetry file. This move will enable code be reused by the new
version of telemetry in a later commit, to support backward
compatibility with the existing telemetry usage.

Signed-off-by: Ciara Power <ciara.power@intel.com>
Reviewed-by: Keith Wiles <keith.wiles@intel.com>
2020-05-10 23:46:18 +02:00
Bruce Richardson
44dfb297af build: add arch-specific header path to global includes
The global include path, which is used by anything built before EAL,
points to the EAL header files so they utility macros etc. can be used
anywhere in DPDK. This path included the OS-specific EAL header files,
but not the architecture-specific ones. This patch moves the selection
of target architecture to the top-level meson.build file so that the
global include can reference that.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Reviewed-by: Keith Wiles <keith.wiles@intel.com>
2020-05-10 23:45:02 +02:00
Kevin Laatz
dec44d4110 eal/x86: add more CPU flags
This patch adds CPU flags which will enable the detection of ISA
features available on more recent x86 based CPUs.

The CPUID leaf information can be found in
Table 1-2. "Information Returned by CPUID Instruction" of this document:
https://software.intel.com/sites/default/files/managed/c5/15/architecture-instruction-set-extensions-programming-reference.pdf

The following CPU flags are added in this patch:
    - AVX-512 doubleword and quadword instructions.
    - AVX-512 integer fused multiply-add instructions.
    - AVX-512 conflict detection instructions.
    - AVX-512 byte and word instructions.
    - AVX-512 vector length instructions.
    - AVX-512 vector bit manipulation instructions.
    - AVX-512 vector bit manipulation 2 instructions.
    - Galois field new instructions.
    - Vector AES instructions.
    - Vector carry-less multiply instructions.
    - AVX-512 vector neural network instructions.
    - AVX-512 for bit algorithm instructions.
    - AVX-512 vector popcount instructions.
    - Cache line demote instructions.
    - Direct store instructions.
    - Direct store 64B instructions.
    - AVX-512 two register intersection instructions.

Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
Acked-by: Harry van Haaren <harry.van.haaren@intel.com>
Acked-by: Ray Kinsella <mdr@ashroe.eu>
2020-05-07 14:51:06 +02:00
Pallavi Kadam
5ebf83784d eal/windows: support logging
Initialize logging on Windows to send log output
to the console.

Signed-off-by: Pallavi Kadam <pallavi.kadam@intel.com>
Reviewed-by: Ranjit Menon <ranjit.menon@intel.com>
Reviewed-by: Tasnim Bashar <tbashar@mellanox.com>
Tested-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
Tested-by: Narcisa Vasile <navasile@linux.microsoft.com>
Acked-by: Narcisa Vasile <navasile@linux.microsoft.com>
2020-05-07 12:18:18 +02:00
Pallavi Kadam
98e792a35c eal/windows: add fnmatch implementation
Fnmatch implementation is required on Windows to support
log level arguments specified with a globbing pattern.
The source file is with BSD-3-Clause license.
https://github.com/lattera/freebsd/blob/master/usr.bin/csup/fnmatch.c

Signed-off-by: Pallavi Kadam <pallavi.kadam@intel.com>
Reviewed-by: Ranjit Menon <ranjit.menon@intel.com>
Reviewed-by: Tasnim Bashar <tbashar@mellanox.com>
Tested-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
Acked-by: Narcisa Vasile <navasile@linux.microsoft.com>
2020-05-07 12:18:17 +02:00
Pavan Nikhilesh
a5f30c925b eventdev: fix probe and remove for secondary process
When probing event device in secondary process skip reinitializing
the device data structure as it is already done in primary process.

When removing event device in secondary process skip closing the
event device as it should be done by primary process.

Fixes: 322d0345c2 ("eventdev: implement PMD registration functions")
Cc: stable@dpdk.org

Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
2020-05-02 12:31:57 +02:00
Joyce Kong
3fc1d87c2a virtio: use one way barrier for split vring avail index
In case VIRTIO_F_ORDER_PLATFORM(36) is not negotiated, then the frontend
and backend are assumed to be implemented in software, that is they can
run on identical CPUs in an SMP configuration.
Thus a weak form of memory barriers like rte_smp_r/wmb, other than
rte_cio_r/wmb, is sufficient for this case(vq->hw->weak_barriers == 1)
and yields better performance.
For the above case, this patch helps yielding even better performance
by replacing the two-way barriers with C11 one-way barriers for avail
index in split ring.

Signed-off-by: Joyce Kong <joyce.kong@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2020-05-05 15:54:26 +02:00
Joyce Kong
ea5207c158 virtio: use one way barrier for split vring used index
In case VIRTIO_F_ORDER_PLATFORM(36) is not negotiated, then the frontend
and backend are assumed to be implemented in software, that is they can
run on identical CPUs in an SMP configuration.
Thus a weak form of memory barriers like rte_smp_r/wmb, other than
rte_cio_r/wmb, is sufficient for this case(vq->hw->weak_barriers == 1)
and yields better performance.
For the above case, this patch helps yielding even better performance
by replacing the two-way barriers with C11 one-way barriers for used
index in split ring.

Signed-off-by: Joyce Kong <joyce.kong@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2020-05-05 15:54:26 +02:00
Marvin Liu
faa9867c4d vhost: use binary search in address conversion
If Tx zero copy enabled, gpa to hpa mapping table is updated one by
one. This will harm performance when guest memory backend using 2M
hugepages. Now utilize binary search to find the entry in mapping
table, meanwhile set the threshold to 256 entries for linear search.

Signed-off-by: Marvin Liu <yong.liu@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2020-05-05 15:54:26 +02:00
Marvin Liu
20fd2f91cf vhost: utilize dynamic memory allocator
Replace dynamic memory allocator with dpdk memory allocator.

Signed-off-by: Marvin Liu <yong.liu@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2020-05-05 15:54:26 +02:00
Xuan Ding
715070ea10 vhost: prevent zero-copy with incompatible client mode
In server mode, virtio-user inits under the assumption that vhost-user
supports a list of features. However, this could be problematic when
in_order feature is negotiated but not supported by vhost-user when
enables dequeue_zero_copy later.

Add handling when vhost-user enables dequeue_zero_copy as client.

Fixes: 64ab701c3d ("vhost: add vhost-user client mode")
Cc: stable@dpdk.org

Signed-off-by: Xuan Ding <xuan.ding@intel.com>
Tested-by: Yinan Wang <yinan.wang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2020-05-05 15:54:26 +02:00
Phil Yang
7ffe400019 vhost: optimize broadcast RARP sync with C11 atomic
The rarp packet broadcast flag is synchronized with rte_atomic_XX APIs
which is a full barrier, DMB, on aarch64. This patch optimized it with
c11 atomic one-way barrier.

Signed-off-by: Phil Yang <phil.yang@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Joyce Kong <joyce.kong@arm.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2020-05-05 15:54:26 +02:00
Roland Qi
41f32b052c vhost: fix peer close check
In process_slave_message_reply(), there is a
possibility that receiving a peer close
message instead of a real message response.

This patch targeting to handle the peer close
scenario and report the correct error message.

Fixes: a277c71598 ("vhost: refactor code structure")
Cc: stable@dpdk.org

Signed-off-by: Roland Qi <roland.qi@ucloud.cn>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2020-05-05 15:54:26 +02:00
David Christensen
67889d1130 eal/ppc: fix build with gcc 9.3
Building DPDK on Ubuntu 20.04 with GCC 9.3.0 results in a "subscript is
outside array bounds" message in rte_memcpy function.  The build error
is caused by an interaction between __builtin_constant_p and
"-Werror=array-bounds" as described in this bugzilla:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90387

Modify the code to disable the array-bounds check for GCC versions 9.0
to 9.3.

Cc: stable@dpdk.org

Signed-off-by: David Christensen <drc@linux.vnet.ibm.com>
2020-05-06 18:12:57 +02:00
Olivier Matz
b2aa2c9723 kvargs: fix invalid token parsing on FreeBSD
The behavior of strtok_r() is not the same between GNU libc and FreeBSD
libc: in the first case, the context is set to "" when the last token is
returned, while in the second case it is set to NULL.

On FreeBSD, the current code crashes because we are dereferencing a NULL
pointer (ctx1). Fix it by first checking if it is NULL. This works with
both GNU and FreeBSD libc.

Fixes: ffcf831454 ("kvargs: fix buffer overflow when parsing list")
Cc: stable@dpdk.org

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Tested-by: Zhimin Huang <zhiminx.huang@intel.com>
2020-05-06 15:22:19 +02:00
Phil Yang
b2f8a22e79 trace: fix build with gcc 10
Prevent from writing beyond the allocated memory.

GCC 10 compiling output:
eal_common_trace_utils.c: In function 'eal_trace_dir_args_save':
eal_common_trace_utils.c:290:24: error: '__builtin___sprintf_chk'   \
	may write a terminating nul past the end of the destination \
	[-Werror=format-overflow=]
  290 |  sprintf(dir_path, "%s/", optarg);
      |                        ^

Fixes: 8af866df8d ("trace: add trace directory configuration parameter")

Signed-off-by: Phil Yang <phil.yang@arm.com>
Reviewed-by: Lijian Zhang <lijian.zhang@arm.com>
Tested-by: Lijian Zhang <lijian.zhang@arm.com>
Acked-by: Sunil Kumar Kori <skori@marvell.com>
Signed-off-by: David Marchand <david.marchand@redhat.com>
2020-05-06 15:07:18 +02:00
David Marchand
3df4282917 trace: remove string duplication
No need to duplicate an untouched string.

Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Sunil Kumar Kori <skori@marvell.com>
2020-05-06 15:07:18 +02:00
David Marchand
970a407648 trace: remove limitation on patterns number
There is nothing performance sensitive in this list, use dynamic
allocations and remove the arbitrary limit on the number of trace
patterns a user can pass.

Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Sunil Kumar Kori <skori@marvell.com>
2020-05-06 15:07:07 +02:00
David Marchand
d73b9f83cd trace: remove unneeded checks in internal API
The trace framework can be configured via 4 EAL options:
- --trace which calls eal_trace_args_save,
- --trace-dir which calls eal_trace_dir_args_save,
- --trace-bufsz which calls eal_trace_bufsz_args_save,
- --trace-mode which calls eal_trace_mode_args_save.

Those 4 internal callbacks are getting passed a non NULL value:
optarg won't be NULL since those options are declared with
required_argument (man getopt_long).

eal_trace_bufsz_args_save() already trusted passed value, align the other
3 internal callbacks.

Coverity issue: 357768
Fixes: 8c8066ea6a ("trace: add trace mode configuration parameter")

Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Sunil Kumar Kori <skori@marvell.com>
2020-05-06 13:50:32 +02:00
David Marchand
b86aebcb6f trace: avoid confusion on optarg
Prefer a local name to optarg which is a global symbol from the C library.

Fixes: 8c8066ea6a ("trace: add trace mode configuration parameter")

Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Sunil Kumar Kori <skori@marvell.com>
2020-05-06 13:50:32 +02:00
David Marchand
ebaee64097 trace: simplify trace point headers
Invert the current trace point headers logic by making
rte_trace_point_register.h include rte_trace_point.h.

There is no more need for a RTE_TRACE_POINT_REGISTER_SELECT special macro
since including rte_trace_point_register.h itself means we want to
register trace points.

The unexplained "provider" notion is removed from the documentation and
rte_trace_point_provider.h is merged into rte_trace_point.h.

Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
2020-05-06 13:50:32 +02:00
David Marchand
b4f2fde1a5 cryptodev: fix trace points registration
Those trace points are defined but not registered.

Fixes: 4cf30e3f3c ("cryptodev: add tracepoints")

Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Sunil Kumar Kori <skori@marvell.com>
2020-05-06 13:50:32 +02:00
Ori Kam
f5bf02df31 eal/ppc: fix bool type after altivec include
The AltiVec header file breaks boolean type. [1] [2]

Currently the workaround was located only in mlx5 device.
Adding the trace module caused this issue to appear again, due to
order of includes, it keeps overriding the local fix.

This patch solves this issue by resetting the bool type, immediately
after it is being changed.

[1] https://mails.dpdk.org/archives/dev/2018-August/110281.html

[2]
In file included from
dpdk/ppc_64-power8-linux-gcc/include/rte_mempool_trace_fp.h:18:0,
                 from
dpdk/ppc_64-power8-linux-gcc/include/rte_mempool.h:54,
                 from
dpdk/drivers/common/mlx5/mlx5_common_mr.c:7:
dpdk/ppc_64-power8-linux-gcc/include/rte_trace_point.h: In
function '__rte_trace_point_fp_is_enabled':
dpdk/ppc_64-power8-linux-gcc/include/rte_trace_point.h:226:2:
error: incompatible types when returning type 'int' but '__vector __bool
int' was expected
  return false;
  ^
In file included from
dpdk/ppc_64-power8-linux-gcc/include/rte_trace_point.h:281:0,
                 from
dpdk/ppc_64-power8-linux-gcc/include/rte_mempool_trace_fp.h:18,
                 from
dpdk/ppc_64-power8-linux-gcc/include/rte_mempool.h:54,
                 from
dpdk/drivers/common/mlx5/mlx5_common_mr.c:7:
dpdk/ppc_64-power8-linux-gcc/include/rte_mempool_trace_fp.h:
In function 'rte_mempool_trace_ops_dequeue_bulk':
dpdk/ppc_64-power8-linux-gcc/include/rte_trace_point_provider.h:104:6:
error: wrong type argument to unary exclamation mark
  if (!__rte_trace_point_fp_is_enabled()) \
      ^
dpdk/ppc_64-power8-linux-gcc/include/rte_trace_point.h:49:2:
note: in expansion of macro '__rte_trace_point_emit_header_fp'
  __rte_trace_point_emit_header_##_mode(&__##_tp); \
  ^
dpdk/ppc_64-power8-linux-gcc/include/rte_trace_point.h:99:2:
note: in expansion of macro '__RTE_TRACE_POINT'
  __RTE_TRACE_POINT(fp, tp, args, __VA_ARGS__)
  ^
dpdk/ppc_64-power8-linux-gcc/include/rte_mempool_trace_fp.h:20:1:
note: in expansion of macro 'RTE_TRACE_POINT_FP'
 RTE_TRACE_POINT_FP(
 ^
dpdk/ppc_64-power8-linux-gcc/include/rte_mempool_trace_fp.h:
In function 'rte_mempool_trace_ops_dequeue_contig_blocks':
dpdk/ppc_64-power8-linux-gcc/include/rte_trace_point_provider.h:104:6:
error: wrong type argument to unary exclamation mark
  if (!__rte_trace_point_fp_is_enabled()) \
      ^
dpdk/ppc_64-power8-linux-gcc/include/rte_trace_point.h:49:2:
note: in expansion of macro '__rte_trace_point_emit_header_fp'
  __rte_trace_point_emit_header_##_mode(&__##_tp); \
  ^
dpdk/ppc_64-power8-linux-gcc/include/rte_trace_point.h:99:2:
note: in expansion of macro '__RTE_TRACE_POINT'
  __RTE_TRACE_POINT(fp, tp, args, __VA_ARGS__)
  ^
dpdk/ppc_64-power8-linux-gcc/include/rte_mempool_trace_fp.h:29:1:
note: in expansion of macro 'RTE_TRACE_POINT_FP'
 RTE_TRACE_POINT_FP(
 ^
dpdk/ppc_64-power8-linux-gcc/include/rte_mempool_trace_fp.h:
In function 'rte_mempool_trace_ops_enqueue_bulk':
dpdk/ppc_64-power8-linux-gcc/include/rte_trace_point_provider.h:104:6:
error: wrong type argument to unary exclamation mark
  if (!__rte_trace_point_fp_is_enabled()) \

Fixes: 725f5dd0bf ("net/mlx5: fix build on PPC64")

Signed-off-by: Ori Kam <orika@mellanox.com>
Signed-off-by: David Christensen <drc@linux.vnet.ibm.com>
Tested-by: David Christensen <drc@linux.vnet.ibm.com>
Tested-by: Raslan Darawsheh <rasland@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-05-06 11:45:13 +02:00
Kevin Traynor
b5b3ea803e eal/x86: ignore gcc 10 stringop-overflow warnings
stringop-overflow warns when it sees a possible overflow
in a string operation.

In the rte_memcpy functions different branches are taken
depending on the size. stringop-overflow is raised for the
branches in the function where it sees the static size of the
src could be overflowed.

However, in reality a correct size argument and in some cases
dynamic allocation would ensure that this does not happen.

For example, in the case below for key, the correct path will be
chosen in rte_memcpy_generic at runtime based on the size argument
but as some paths in the function could lead to a cast to 32 bytes
a warning is raised.

In function ‘_mm256_storeu_si256’,
inlined from ‘rte_memcpy_generic’
at ../lib/librte_eal/common/include/arch/x86/rte_memcpy.h:315:2,
inlined from ‘iavf_configure_rss_key’
at ../lib/librte_eal/common/include/arch/x86/rte_memcpy.h:869:10:

/usr/lib/gcc/x86_64-redhat-linux/10/include/avxintrin.h:928:8:
warning: writing 32 bytes into a region of size 1 [-Wstringop-overflow=]
  928 |   *__P = __A;
      |   ~~~~~^~~~~
In file included
from ../drivers/net/iavf/../../common/iavf/iavf_prototype.h:10,
from ../drivers/net/iavf/iavf.h:9,
from ../drivers/net/iavf/iavf_vchnl.c:22:

../drivers/net/iavf/iavf_vchnl.c:
In function ‘iavf_configure_rss_key’:

../drivers/net/iavf/../../common/iavf/virtchnl.h:508:5:
note: at offset 0 to object ‘key’ with size 1 declared here
  508 |  u8 key[1];         /* RSS hash key, packed bytes */
      |     ^~~

Ignore the stringop-overflow warnings for rte_memcpy.h functions.

Bugzilla ID: 394
Bugzilla ID: 421
Cc: stable@dpdk.org

Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2020-05-06 11:45:10 +02:00
Nithin Dabilpuram
da4b610415 node: add packet drop
Add packet drop node process function for pkt_drop
rte_node. This node simply free's every object received as
an rte_mbuf to its rte_pktmbuf pool.

Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
Signed-off-by: Kiran Kumar K <kirankumark@marvell.com>
2020-05-05 23:42:13 +02:00
Nithin Dabilpuram
f00708c2aa node: add IPv4 rewrite and lookup control
Add ip4_rewrite and ip4_lookup ctrl API. ip4_lookup ctrl
API is used to add route entries for LPM lookup with
result data containing next hop id and next proto.
ip4_rewrite ctrl API is used to add rewrite data for
every next hop.

Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
Signed-off-by: Kiran Kumar K <kirankumark@marvell.com>
2020-05-05 23:41:30 +02:00
Kiran Kumar K
0d352661e0 node: add IPv4 rewrite
Add ip4 rewrite process function for ip4_rewrite
rte_node. On every packet received by this node,
header is overwritten with new data before forwarding
it to next node. Header data to overwrite with is
identified by next hop id passed in mbuf priv data
by previous node.

Signed-off-by: Kiran Kumar K <kirankumark@marvell.com>
Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
2020-05-05 23:41:11 +02:00
Pavan Nikhilesh
11afaa84d0 node: add IPv4 lookup for x86
Add IPv4 lookup process function for ip4_lookup
rte_node. This node performs LPM lookup using x86_64
vector supported RTE_LPM API on every packet received
and forwards it to a next node that is identified by
lookup result.

Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
Signed-off-by: Kiran Kumar K <kirankumark@marvell.com>
2020-05-05 23:41:05 +02:00
Nithin Dabilpuram
16df6a2c66 node: add IPv4 lookup for arm64
Add arm64 specific IPv4 lookup process function
for ip4_lookup node. This node performs LPM lookup
on every packet received and forwards it to a next
node that is identified by lookup result.

Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
Signed-off-by: Kiran Kumar K <kirankumark@marvell.com>
Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
2020-05-05 23:40:36 +02:00
Pavan Nikhilesh
0555f11c2e node: add generic IPv4 lookup
Add IPv4 lookup process function for ip4_lookup node.
This node performs LPM lookup using simple RTE_LPM API on every packet
received and forwards it to a next node that is identified by lookup
result.

Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
2020-05-05 23:40:22 +02:00
Nithin Dabilpuram
947d7f682f node: add ethdev control
Add ctrl api to setup ethdev_rx and ethdev_tx node.
This ctrl api clones 'N' number of ethdev_rx and ethdev_tx
nodes with specific (port, queue) pairs updated in their context.
All the ethdev ports and queues are setup before this api
is called.

Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
Signed-off-by: Kiran Kumar K <kirankumark@marvell.com>
2020-05-05 23:38:35 +02:00
Nithin Dabilpuram
77edede86f node: add ethdev Tx
Add rte_node ethdev_tx process function and register it to
graph infra. This node has a specific (port, tx-queue) as context
and it enqueue's all the packets received to that specific queue pair.
When rte_eth_tx_burst() i.e enqueue to queue pair fails, packets
are forwarded to pkt_drop node to be free'd.

Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
Signed-off-by: Kiran Kumar K <kirankumark@marvell.com>
2020-05-05 23:38:31 +02:00
Nithin Dabilpuram
f0f804236f node: add ethdev Rx
Add source rte_node ethdev_rx process function and register
it. This node is a source node that will be called periodically
and when called, performs rte_eth_rx_burst() on a specific
(port, queue) pair and enqueue them as stream of objects to
next node.

Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
Signed-off-by: Kiran Kumar K <kirankumark@marvell.com>
2020-05-05 23:38:10 +02:00
Nithin Dabilpuram
13fcf8aff7 node: add logging and null node
Add log infra for node specific logging.
Also, add null rte_node that just ignores all the objects
directed to it.

Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
Signed-off-by: Kiran Kumar K <kirankumark@marvell.com>
2020-05-05 23:37:43 +02:00
Jerin Jacob
40d4f51403 graph: implement fastpath routines
Adding implementation for rte_graph_walk() API. This will perform a walk
on the circular buffer and call the process function of each node
and collect the stats if stats collection is enabled.

Signed-off-by: Jerin Jacob <jerinj@marvell.com>
Signed-off-by: Kiran Kumar K <kirankumark@marvell.com>
Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
2020-05-05 23:32:02 +02:00
Jerin Jacob
af1ae8b6a3 graph: implement stats
Adding implementation for graph stats collection API. This API will
create a cluster for a specified node pattern and aggregate the node
runtime stats.

Signed-off-by: Jerin Jacob <jerinj@marvell.com>
Signed-off-by: Kiran Kumar K <kirankumark@marvell.com>
Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
2020-05-05 23:31:53 +02:00
Jerin Jacob
85c0b5285f graph: implement debug routines
Adding implementation for graph specific API to dump the
Graph information to a file. This API will dump detailed internal
info about node objects and graph objects.

Signed-off-by: Jerin Jacob <jerinj@marvell.com>
Signed-off-by: Kiran Kumar K <kirankumark@marvell.com>
Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
2020-05-05 23:31:47 +02:00
Jerin Jacob
618d05635e graph: implement Graphviz export
Adding API implementation support exporting the graph object to file.
This will export the graph to a file in Graphviz format.
It can be viewed in many viewers such as
https://dreampuf.github.io/GraphvizOnline/

Signed-off-by: Jerin Jacob <jerinj@marvell.com>
Signed-off-by: Kiran Kumar K <kirankumark@marvell.com>
Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
2020-05-05 23:31:41 +02:00
Jerin Jacob
c671b62cc5 graph: implement graph operations
Adding support for graph specific API implementation like
Graph lookup to get graph object, retrieving graph ID
From name and graph name from ID.

Signed-off-by: Jerin Jacob <jerinj@marvell.com>
Signed-off-by: Kiran Kumar K <kirankumark@marvell.com>
Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
2020-05-05 23:30:19 +02:00
Jerin Jacob
a91fecc19c graph: implement create and destroy
Adding graph specific API implementations like graph create
and graph destroy. This detect loops in the graph,
check for isolated nodes and operation to verify the validity of
graph.

Signed-off-by: Jerin Jacob <jerinj@marvell.com>
Signed-off-by: Kiran Kumar K <kirankumark@marvell.com>
Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
2020-05-05 23:29:43 +02:00
Jerin Jacob
d6bba625cd graph: populate fastpath memory for graph reel
Adding support to create and populate the memory for graph reel.
This includes reserving the memory in the memzone, populating the nodes,
Allocating memory for node-specific streams to hold objects.

Once it is populated the reel memory contains the following sections.

+---------------------+
|   Graph Header      |
+---------------------+
|   Fence             |
+---------------------+
|   Circular buffer   |
+---------------------+
|   Fence             |
+---------------------+
|   Node Object 0     |
+------------------- -+
|   Node Object 1     |
+------------------- -+
|   Node Object 2     |
+------------------- -+
|   Node Object n     |
+------------------- -+

Signed-off-by: Jerin Jacob <jerinj@marvell.com>
Signed-off-by: Kiran Kumar K <kirankumark@marvell.com>
Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
2020-05-05 23:28:24 +02:00
Jerin Jacob
32bd4751f9 graph: implement internal operation helpers
Adding internal graph API helpers support to check whether a graph has
isolated nodes and any node have a loop to itself and BFS
algorithm implementation etc.

Signed-off-by: Jerin Jacob <jerinj@marvell.com>
Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
2020-05-05 23:28:13 +02:00
Jerin Jacob
9027182689 graph: implement node debug routines
Adding node debug API implementation support to dump
single or all the node objects to the given file.

Signed-off-by: Jerin Jacob <jerinj@marvell.com>
Signed-off-by: Kiran Kumar K <kirankumark@marvell.com>
Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
2020-05-05 23:28:09 +02:00
Jerin Jacob
c59dac2ca1 graph: implement node operations
Adding node-specific API implementation like cloning node, updating
edges for the node, shrinking edges of a node, retrieving edges of a
node.

Signed-off-by: Jerin Jacob <jerinj@marvell.com>
Signed-off-by: Kiran Kumar K <kirankumark@marvell.com>
Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
2020-05-05 23:28:07 +02:00