7456 Commits

Author SHA1 Message Date
David Marchand
41f2f05574 ethdev: warn once when using port not ready
Warning continuously is a pain when developping or if a unit test
is/gets broken.

It could also be a problem if application behaves badly only in some
corner cases and a DoS results of those logs being continuously displayed.

Let's warn once per port and per rx/tx.

Getting such a log is scary, but let's make it more eye catching by
dumping a backtrace with it.

Tested by introducing a bug in testpmd:
 static int
 eth_dev_start_mp(uint16_t port_id)
 {
-       if (is_proc_primary())
+       if (!is_proc_primary())
                return rte_eth_dev_start(port_id);

        return 0;

Then, running a basic null test:
$ ./devtools/test-null.sh
...
Start automatic packet forwarding
io packet forwarding - ports=2 - cores=1 - streams=2 - NUMA support
  enabled, MP allocation mode: native
Logical Core 1 (socket 0) forwards packets on 2 streams:
  RX P=0/Q=0 (socket 0) -> TX P=1/Q=0 (socket 0) peer=02:00:00:00:00:01
  RX P=1/Q=0 (socket 0) -> TX P=0/Q=0 (socket 0) peer=02:00:00:00:00:00

lcore 0 called rx_pkt_burst for not ready port 0
8: [build/app/dpdk-testpmd() [0x59e839]]
7: [/lib64/libc.so.6(__libc_start_main+0xf5) [0x7ff481b69555]]
6: [build/app/dpdk-testpmd(main+0x54b) [0x662d24]]
5: [build/app/dpdk-testpmd(start_packet_forwarding+0x263) [0x65e795]]
4: [build/app/dpdk-testpmd() [0x65e1be]]
3: [build/app/dpdk-testpmd() [0x65a996]]
2: [build/app/dpdk-testpmd() [0xa6cbc7]]
1: [build/app/dpdk-testpmd(rte_dump_stack+0x27) [0xaee796]]
lcore 0 called rx_pkt_burst for not ready port 1
8: [build/app/dpdk-testpmd() [0x59e839]]
7: [/lib64/libc.so.6(__libc_start_main+0xf5) [0x7ff481b69555]]
6: [build/app/dpdk-testpmd(main+0x54b) [0x662d24]]
5: [build/app/dpdk-testpmd(start_packet_forwarding+0x263) [0x65e795]]
4: [build/app/dpdk-testpmd() [0x65e1be]]
3: [build/app/dpdk-testpmd() [0x65a996]]
2: [build/app/dpdk-testpmd() [0xa6cbc7]]
1: [build/app/dpdk-testpmd(rte_dump_stack+0x27) [0xaee796]]
  io packet forwarding packets/burst=32
  nb forwarding cores=1 - nb forwarding ports=2
  port 0: RX queue number: 1 Tx queue number: 1
    Rx offloads=0x0 Tx offloads=0x0

Fixes: c87d435a4d79 ("ethdev: copy fast-path API into separate structure")

Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Tested-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
2021-10-27 19:28:45 +02:00
Olivier Matz
9bffc92850 mem: fix dynamic hugepage mapping in container
Since its introduction in 2018, the SIGBUS handler was never registered,
and all related functions were unused.

A SIGBUS can be received by the application when accessing to hugepages
even if mmap() was successful, This happens especially when running
inside containers when there is not enough hugepages. In this case, we
need to recover. A similar scheme can be found in eal_memory.c.

Fixes: 582bed1e1d1d ("mem: support mapping hugepages at runtime")
Cc: stable@dpdk.org

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: David Marchand <david.marchand@redhat.com>
2021-11-05 15:28:55 +01:00
Ilyes Ben Hamouda
770d41bf33 malloc: fix allocation with unknown socket ID
When using rte_malloc() from a thread which is not bound to a numa
socket (the typical case is a control thread, but it can also happen
on a dataplane thread if its cpu affinity is on cores attached to
several sockets), the used heap is the one from numa socket 0, which
may not have available memory.

Fix this by selecting the first socket which has available memory.

Note: malloc_get_numa_socket() is only used from one .c file, so move
it there, and remove the inline keyword.

Fixes: b94580d6887e ("malloc: avoid unknown socket id")
Cc: stable@dpdk.org

Signed-off-by: Ilyes Ben Hamouda <ilyes.ben_hamouda@6wind.com>
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: David Marchand <david.marchand@redhat.com>
2021-11-05 15:28:49 +01:00
David Hunt
bb0bd346d5 eal: suggest using --lcores option
If the user requests to use an lcore above 128 using -l,
the eal will exit with "EAL: invalid core list syntax" and
very little else useful information.

This patch adds some extra information suggesting to use --lcores
so that physical cores above RTE_MAX_LCORE (default 128) can be
used. This is achieved by using the --lcores option by mapping
the logical cores in the application to physical cores.

For example, if "-l 12-16,130,132" is used, we see the following
additional output on the command line:

EAL: lcore 132 >= RTE_MAX_LCORE (128)
EAL: lcore 133 >= RTE_MAX_LCORE (128)
EAL: To use high physical core ids, please use --lcores to map them
to lcore ids below RTE_MAX_LCORE,
EAL: e.g. --lcores 0@12,1@13,2@14,3@15,4@16,5@132,6@133

The same is added to -c option parsing.

For example, if "-c 0x300000000000000000000000000000000" is
used, we see the following additional output on the command line:

EAL: lcore 128 >= RTE_MAX_LCORE (128)
EAL: lcore 129 >= RTE_MAX_LCORE (128)
EAL: To use high physical core ids, please use --lcores to map them
to lcore ids below RTE_MAX_LCORE,
EAL: e.g. --lcores 0@128,1@129

Signed-off-by: David Hunt <david.hunt@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
2021-11-05 14:39:37 +01:00
David Marchand
f5fa0e110f eal: promote non-EAL lcore API as stable
This API has been around for more than a year (and is in LTS 20.11).
It did not receive negative feedback and will be used in a next OVS
release.
Mark it stable.

Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
2021-11-04 22:57:58 +01:00
Konstantin Ananyev
65d9b7c664 bpf: fix convert API when libpcap missing
rte_bpf_convert() implementation depends on libpcap.
Right now it is defined only when this library is installed and
RTE_PORT_PCAP is defined.
Fix that by providing for such case stub rte_bpf_convert()
implementation that will always return an error.
To draw user attention, if proper implementation is disabled,
warning will be thrown at meson configure stage.
Also move stub for another function (rte_bpf_elf_load) into
the same place (bpf_stub.c).

Fixes: 2eccf6afbea9 ("bpf: add function to convert classic BPF to DPDK BPF")

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2021-11-04 19:56:20 +01:00
Konstantin Ananyev
7b0a120157 bpf: fix doxygen comment
Fix typo in doxygen comments for rte_bpf_convert().

Fixes: 2eccf6afbea9 ("bpf: add function to convert classic BPF to DPDK BPF")

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2021-11-04 19:56:14 +01:00
David Marchand
54abd300d5 pipeline: remove unreachable branch
A previous change blamed it on compiler/ASan, while this is a real
(yet minor) issue.

This return -EINVAL is never reached since we test all combinations of
fidx and fcin booleans.
All branches end up with a return 0, factorize them.

Fixes: 84f5ac9418ea ("pipeline: fix build with ASan")
Fixes: f38913b7fb8e ("pipeline: add meter array to SWX")

Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
2021-11-04 18:11:08 +01:00
Yogesh Jangra
2ce3ccbe44 pipeline: fix dead code
Fix minor dead code issue reported by Coverity.

Coverity issue: 373653
Fixes: e9d870 ("pipeline: add SWX pipeline tables")

Signed-off-by: Yogesh Jangra <yogesh.jangra@intel.com>
Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
2021-11-04 16:43:27 +01:00
Wojciech Liguzinski
44c730b0e3 sched: add PIE based congestion management
Implement PIE based congestion management based on rfc8033.

The Proportional Integral Controller Enhanced (PIE) algorithm works
by proactively dropping packets randomly.
PIE is implemented as more advanced queue management is required to
address the bufferbloat problem and provide desirable quality of
service to users.

Tests for PIE code added to test application.
Added PIE related information to documentation.

Signed-off-by: Wojciech Liguzinski <wojciechx.liguzinski@intel.com>
Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
Acked-by: Jasvinder Singh <jasvinder.singh@intel.com>
2021-11-04 15:41:49 +01:00
David Marchand
5633173341 eal/linux: fix device hotplug
The device event interrupt handler was always freed.

Bugzilla ID: 845
Fixes: c2bd9367e18f ("lib: remove direct access to interrupt handle")

Signed-off-by: David Marchand <david.marchand@redhat.com>
Tested-by: Yan Xia <yanx.xia@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2021-11-04 15:13:41 +01:00
David Marchand
4847122aab eal/linux: fix uevent message parsing
Caught with ASan:
==9727==ERROR: AddressSanitizer: stack-buffer-overflow on address
  0x7f0daa2fc0d0 at pc 0x7f0daeefacb2 bp 0x7f0daa2fadd0 sp 0x7f0daa2fa578
READ of size 1 at 0x7f0daa2fc0d0 thread T1
    #0 0x7f0daeefacb1  (/lib64/libasan.so.5+0xbacb1)
    #1 0x115eba1 in dev_uev_parse ../lib/eal/linux/eal_dev.c:167
    #2 0x115f281 in dev_uev_handler ../lib/eal/linux/eal_dev.c:248
    #3 0x1169b91 in eal_intr_process_interrupts
  ../lib/eal/linux/eal_interrupts.c:1026
    #4 0x116a3a2 in eal_intr_handle_interrupts
  ../lib/eal/linux/eal_interrupts.c:1100
    #5 0x116a7f0 in eal_intr_thread_main
  ../lib/eal/linux/eal_interrupts.c:1172
    #6 0x112640a in ctrl_thread_init
  ../lib/eal/common/eal_common_thread.c:202
    #7 0x7f0dade27159 in start_thread (/lib64/libpthread.so.0+0x8159)
    #8 0x7f0dadb58f72 in clone (/lib64/libc.so.6+0xfcf72)

Address 0x7f0daa2fc0d0 is located in stack of thread T1 at offset 4192
  in frame
    #0 0x115f0c9 in dev_uev_handler ../lib/eal/linux/eal_dev.c:226

  This frame has 2 object(s):
    [32, 48) 'uevent'
    [96, 4192) 'buf' <== Memory access at offset 4192 overflows this
  variable
HINT: this may be a false positive if your program uses some custom
  stack unwind mechanism or swapcontext
      (longjmp and C++ exceptions *are* supported)
Thread T1 created by T0 here:
    #0 0x7f0daee92ea3 in __interceptor_pthread_create
  (/lib64/libasan.so.5+0x52ea3)
    #1 0x1126542 in rte_ctrl_thread_create
  ../lib/eal/common/eal_common_thread.c:228
    #2 0x116a8b5 in rte_eal_intr_init
  ../lib/eal/linux/eal_interrupts.c:1200
    #3 0x1159dd1 in rte_eal_init ../lib/eal/linux/eal.c:1044
    #4 0x7a22f8 in main ../app/test-pmd/testpmd.c:4105
    #5 0x7f0dada7f802 in __libc_start_main (/lib64/libc.so.6+0x23802)

Bugzilla ID: 792
Fixes: 0d0f478d0483 ("eal/linux: add uevent parse and process")
Cc: stable@dpdk.org

Signed-off-by: David Marchand <david.marchand@redhat.com>
Tested-by: Yan Xia <yanx.xia@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2021-11-04 15:13:41 +01:00
Jim Harris
628bac7df1 eal/linux: remove unused variable for socket memory
clang-13 rightfully complains that the total_mem variable in
eal_parse_socket_arg is set but not used, since the final
accumulated total_mem result isn't used anywhere.
So just remove the total_mem variable.

Fixes: 0a703f0f36c1 ("eal/linux: fix parsing zero socket memory and limits")

Signed-off-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
2021-11-04 13:27:18 +01:00
Vladimir Medvedkin
11c5b9b51a fib: add RIB extension size parameter
This patch adds a new parameter to the FIB configuration to specify
the size of the extension for internal RIB structure.

Signed-off-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>
Tested-by: Conor Walsh <conor.walsh@intel.com>
2021-11-04 12:38:03 +01:00
Xueming Li
fc382022c6 eal: fix device iterator when no bus is selected
Devargs used in device iterator initialization wasn't set to zero, random
data like bus string lead to invalid address access.

This patch initializes devargs.

Bugzilla ID: 862
Fixes: c99a2d4c6b7f ("eal: implement device iteration initialization")
Cc: stable@dpdk.org

Signed-off-by: Xueming Li <xuemingl@nvidia.com>
2021-11-04 11:44:49 +01:00
Vladimir Medvedkin
adeca6685f hash: fix use after free in Toeplitz hash
This patch fixes use after free in thash library, reported by ASAN.

Bugzilla ID: 868
Fixes: 28ebff11c2dc ("hash: add predictable RSS")
Cc: stable@dpdk.org

Signed-off-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
2021-11-04 11:43:20 +01:00
Vladimir Medvedkin
d27e2b7e9c hash: enable GFNI Toeplitz hash implementation
This patch enables new GFNI Toeplitz hash in
predictable RSS library.

Signed-off-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2021-11-04 11:19:10 +01:00
Vladimir Medvedkin
31d7c06947 hash: add bulk Toeplitz hash implementation
This patch adds a bulk version for the Toeplitz hash implemented
with Galios Fields New Instructions (GFNI).

Signed-off-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2021-11-04 11:19:10 +01:00
Vladimir Medvedkin
4fd8c4cb0d hash: add new Toeplitz hash implementation
This patch add a new Toeplitz hash implementation using
Galios Fields New Instructions (GFNI).

Signed-off-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2021-11-04 11:19:10 +01:00
Dmitry Kozlyuk
9790fc2149 eal/freebsd: fix IOVA mode selection
FreeBSD EAL selected IOVA mode PA even in --no-huge mode
where PA are not available. Memory zones were created with IOVA
equal to RTE_BAD_IOVA with no indication this field is not usable.

Change IOVA mode detection:
1. Always allow to force --iova-mode=va.
2. In --no-huge mode, disallow forcing --iova-mode=pa, and select VA.
3. Otherwise select IOVA mode according to bus requests, default to PA.
In case contigmem is inaccessible, memory initialization will fail
with a message indicating the cause.

Fixes: c2361bab70c5 ("eal: compute IOVA mode based on PA availability")
Cc: stable@dpdk.org

Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
2021-11-03 18:32:19 +01:00
Feifei Wang
6b70c6b31f distributor: use wait until scheme
Instead of polling for bufptr64 to be updated, use
wait until scheme for this case.

Signed-off-by: Feifei Wang <feifei.wang2@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
2021-11-03 15:50:14 +01:00
Feifei Wang
388bee69a5 bpf: use wait until scheme for Rx/Tx iteration
Instead of polling for cbi->use to be updated, use wait until scheme.

Signed-off-by: Feifei Wang <feifei.wang2@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2021-11-03 15:50:14 +01:00
Feifei Wang
4ed4e554ac mcslock: use wait until scheme for unlock
Instead of polling for mcslock to be updated, use wait until scheme
for this case.

Signed-off-by: Feifei Wang <feifei.wang2@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
2021-11-03 15:50:14 +01:00
Feifei Wang
41902d2468 pflock: use wait until scheme for read lock
Instead of polling for read pflock update, use wait until scheme for
this case.

Signed-off-by: Feifei Wang <feifei.wang2@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
2021-11-03 15:50:14 +01:00
Feifei Wang
875f350924 eal: add a new helper for wait until scheme
Add a new generic helper which is a macro for wait until scheme.

Furthermore, to prevent compilation warning in arm:
----------------------------------------------
'warning: implicit declaration of function ...'
----------------------------------------------
Delete 'undef' constructions for '__LOAD_EXC_xx', '__SEVL' and '__WFE'.
And add ‘__RTE_ARM’ for these macros to fix the namespace.
This is because original macros are undefine at the end of the file.
If the new macro calls them in other files, they will be seen as
'not defined'.

Signed-off-by: Feifei Wang <feifei.wang2@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
2021-11-03 15:50:14 +01:00
Konstantin Ananyev
53caecb844 pdump: fix freeing statistics memzone
rte_pdump_init() always allocates new memzone for pdump_stats.
Though rte_pdump_uninit() never frees it.
So the following combination will always fail:
rte_pdump_init(); rte_pdump_uninit(); rte_pdump_init();
The issue was caught by pdump_autotest UT.
While first test run successful, any consecutive runs
of this test-case will fail.
Fix the issue by calling rte_memzone_free() for statistics memzone.

Fixes: 10f726efe26c ("pdump: support pcapng and filtering")

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Reshma Pattan <reshma.pattan@intel.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
2021-11-03 12:53:03 +01:00
Stephen Hemminger
b2be63b55a pdump: fix packet snapshot length initialization
If packet dump was enabled via pdump_enable_by_deviceid
the packet snapshot length was not being set.

Bugzilla ID: 840
Fixes: 10f726efe26c ("pdump: support pcapng and filtering")

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-11-01 00:36:29 +01:00
Stephen Hemminger
ae1702fffe pcapng: use new ethdev namespace
RTE_ prefix was added by
commit 295968d17407 ("ethdev: add namespace")

Fixes: 8d23ce8f5ee9 ("pcapng: add new library for writing pcapng files")

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-10-31 23:25:02 +01:00
Zhihong Peng
6cc51b1293 mem: instrument allocator for ASan
This patch adds necessary hooks in the memory allocator for ASan.

This feature is currently available in DPDK only on Linux x86_64.
If other OS/architectures want to support it, ASAN_SHADOW_OFFSET must be
defined and RTE_MALLOC_ASAN must be set accordingly in meson.

Signed-off-by: Xueqin Lin <xueqin.lin@intel.com>
Signed-off-by: Zhihong Peng <zhihongx.peng@intel.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
2021-10-29 16:25:03 +02:00
Zhihong Peng
84f5ac9418 pipeline: fix build with ASan
Code changes to avoid the following build error:
"Control reaches end of non-void function".

Signed-off-by: Xueqin Lin <xueqin.lin@intel.com>
Signed-off-by: Zhihong Peng <zhihongx.peng@intel.com>
Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
2021-10-29 15:25:34 +02:00
Anatoly Burakov
ab910a8068 vfio: fix partial unmap
Partial unmap support was introduced in commit c13ca4e81cac
("vfio: fix DMA mapping granularity for IOVA as VA"), and with it
was added a check that dereferenced the IOMMU type to determine whether
partial ummapping is supported for currently configured IOMMU type. In
certain circumstances (such as when VFIO is supported, but no devices
were bound to the VFIO driver), the IOMMU type pointer can be NULL.

However, dereferencing of IOMMU type was guarded by access to the user
maps list - that is, we were always checking the user map list first,
and then, if we found a memory region that encloses the one we're trying
to unmap, we would have performed the IOMMU type check.

This ensured that the IOMMU type check will not cause any NULL pointer
dereferences, because in order for an IOMMU type check to have been
performed, there necessarily must have been at least one memory region
that was previously mapped successfully, and that implies having a
defined IOMMU type.

When commit 56259f7fc010 ("vfio: allow partially unmapping adjacent
memory") was introduced, the IOMMU type check was moved to
before we were traversing the user mem maps list, thereby introducing a
potential NULL dereference, because the IOMMU type access was no longer
guarded by the user mem maps list traversal.

Fix the issue by moving the IOMMU type check to after the user mem maps
traversal, thereby ensuring that by the time the check happens, the
IOMMU type is always valid.

Fixes: 56259f7fc010 ("vfio: allow partially unmapping adjacent memory")

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
Tested-by: Xuan Ding <xuan.ding@intel.com>
2021-10-28 09:51:55 +02:00
Honnappa Nagarahalli
705356f081 eal: simplify control thread creation
Remove the usage of pthread barrier and replace it with
synchronization using atomic variable.
This also removes the use of reference count required to synchronize
freeing the memory.

Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Olivier Matz <olivier.matz@6wind.com>
2021-10-25 21:43:10 +02:00
Harman Kalra
8cb5d08db9 interrupts: extend event list
Dynamically allocating the efds and elist array of intr_handle
structure, based on size provided by user. Eg size can be
MSIX interrupts supported by a PCI device.

Signed-off-by: Harman Kalra <hkalra@marvell.com>
Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
Tested-by: Raslan Darawsheh <rasland@nvidia.com>
2021-10-25 21:20:12 +02:00
Harman Kalra
99e6c7e316 interrupts: rename device specific file descriptor
VFIO/UIO are mutually exclusive, storing file descriptor in a single
field is enough.

Signed-off-by: Harman Kalra <hkalra@marvell.com>
Signed-off-by: David Marchand <david.marchand@redhat.com>
Tested-by: Raslan Darawsheh <rasland@nvidia.com>
2021-10-25 21:20:12 +02:00
Harman Kalra
73d844fd08 interrupts: make interrupt handle structure opaque
Moving interrupt handle structure definition inside a EAL private
header to make its fields totally opaque to the outside world.

Signed-off-by: Harman Kalra <hkalra@marvell.com>
Signed-off-by: David Marchand <david.marchand@redhat.com>
Tested-by: Raslan Darawsheh <rasland@nvidia.com>
2021-10-25 21:20:12 +02:00
Harman Kalra
d61138d4f0 drivers: remove direct access to interrupt handle
Removing direct access to interrupt handle structure fields,
rather use respective get set APIs for the same.
Making changes to all the drivers access the interrupt handle fields.

Signed-off-by: Harman Kalra <hkalra@marvell.com>
Acked-by: Hyong Youb Kim <hyonkim@cisco.com>
Signed-off-by: David Marchand <david.marchand@redhat.com>
Tested-by: Raslan Darawsheh <rasland@nvidia.com>
2021-10-25 21:20:12 +02:00
Harman Kalra
c2bd9367e1 lib: remove direct access to interrupt handle
Removing direct access to interrupt handle structure fields,
rather use respective get set APIs for the same.
Making changes to all the libraries access the interrupt handle fields.

Signed-off-by: Harman Kalra <hkalra@marvell.com>
Signed-off-by: David Marchand <david.marchand@redhat.com>
Tested-by: Raslan Darawsheh <rasland@nvidia.com>
2021-10-25 21:20:12 +02:00
Harman Kalra
90b13ab8d4 alarm: remove direct access to interrupt handle
Removing direct access to interrupt handle structure fields,
rather use respective get set APIs for the same.
Making changes to all the libraries access the interrupt handle fields.

Implementing alarm cleanup routine, where the memory allocated
for interrupt instance can be freed.

Signed-off-by: Harman Kalra <hkalra@marvell.com>
Signed-off-by: David Marchand <david.marchand@redhat.com>
Tested-by: Raslan Darawsheh <rasland@nvidia.com>
2021-10-25 21:20:12 +02:00
Harman Kalra
bbbac4cd6e interrupts: remove direct access to interrupt handle
Making changes to the interrupt framework to use interrupt handle
APIs to get/set any field.

Signed-off-by: Harman Kalra <hkalra@marvell.com>
Signed-off-by: David Marchand <david.marchand@redhat.com>
Tested-by: Raslan Darawsheh <rasland@nvidia.com>
2021-10-25 21:20:12 +02:00
Harman Kalra
b7c9842916 interrupts: add allocator and accessors
Prototype/Implement get set APIs for interrupt handle fields.
User won't be able to access any of the interrupt handle fields
directly while should use these get/set APIs to access/manipulate
them.

Internal interrupt header i.e. rte_eal_interrupt.h is rearranged,
as APIs defined are moved to rte_interrupts.h and epoll specific
definitions are moved to a new header rte_epoll.h.
Later in the series rte_eal_interrupt.h will be removed.

Signed-off-by: Harman Kalra <hkalra@marvell.com>
Acked-by: Ray Kinsella <mdr@ashroe.eu>
Acked-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
Signed-off-by: David Marchand <david.marchand@redhat.com>
Tested-by: Raslan Darawsheh <rasland@nvidia.com>
2021-10-25 21:20:12 +02:00
Dmitry Kozlyuk
0c8fc83a71 eal/windows: fix IOVA mode detection and handling
Windows EAL did not detect IOVA mode and worked incorrectly
if physical addresses could not be obtained
(if virt2phys driver was missing or inaccessible).
In this case, rte_mem_virt2iova() reported RTE_BAD_IOVA for any address.
Inability to obtain IOVA, be it PA or VA, should cause a failure
for the DPDK allocator, but it was hidden by the implementation,
so allocations did not fail when they should.
The mode when DPDK cannot obtain PA but can work is IOVA-as-VA mode.
However, rte_eal_iova_mode() always returned RTE_IOVA_DC
(while it should only ever return RTE_IOVA_PA or RTE_IOVA_VA),
because IOVA mode detection was not implemented.

Implement IOVA mode detection:
1. Always allow to force --iova-mode=va.
2. Allow to force --iova-mode=pa only if virt2phys is available.
3. If no mode is forced and virt2phys is available,
   select the mode according to bus requests, default to PA.
4. If no mode is forced but virt2phys is unavailable, default to VA.
Fix rte_mem_virt2iova() by returning VA when using IOVA-as-VA.
Fix rte_eal_iova_mode() by returning the selected mode.

Fixes: 2a5d547a4a9b ("eal/windows: implement basic memory management")
Cc: stable@dpdk.org

Reported-by: Tal Shnaiderman <talshn@nvidia.com>
Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
Tested-by: Pallavi Kadam <pallavi.kadam@intel.com>
Acked-by: Pallavi Kadam <pallavi.kadam@intel.com>
2021-10-25 20:59:40 +02:00
Harman Kalra
e6732d0d6e mem: add telemetry infos
Registering new telemetry callbacks to list named (memzones)
and unnamed (malloc) memory reserved and return information
based on arguments provided by user.

Example:
Connecting to /var/run/dpdk/rte/dpdk_telemetry.v2
{"version": "DPDK 21.11.0-rc0", "pid": 59754, "max_output_len": 16384}
Connected to application: "dpdk-testpmd"
-->
--> /eal/memzone_list
{"/eal/memzone_list": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]}
-->
-->
--> /eal/memzone_info,0
{"/eal/memzone_info": {"Zone": 0, "Name": "rte_eth_dev_data",    \
"Length": 225408, "Address": "0x13ffc0280", "Socket": 0, "Flags": 0, \
"Hugepage_size": 536870912, "Hugepage_base": "0x120000000",   \
"Hugepage_used": 1}}
-->
-->
--> /eal/memzone_info,6
{"/eal/memzone_info": {"Zone": 6, "Name": "MP_mb_pool_0_0",  \
"Length": 669918336, "Address": "0x15811db80", "Socket": 0,  \
"Flags": 0, "Hugepage_size": 536870912, "Hugepage_base": "0x140000000", \
"Hugepage_used": 2}}
-->
-->
--> /eal/memzone_info,14
{"/eal/memzone_info": null}
-->
-->
--> /eal/heap_list
{"/eal/heap_list": [0]}
-->
-->
--> /eal/heap_info,0
{"/eal/heap_info": {"Head id": 0, "Name": "socket_0",     \
"Heap_size": 1610612736, "Free_size": 927645952,          \
"Alloc_size": 682966784, "Greatest_free_size": 529153152, \
"Alloc_count": 482, "Free_count": 2}}

Signed-off-by: Harman Kalra <hkalra@marvell.com>
Acked-by: Ciara Power <ciara.power@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
2021-10-25 19:39:54 +02:00
Vladimir Medvedkin
97e2ae4c58 rib: fix IPv6 depth mask
Fixes: 03b8372a9a73 ("rib: fix max depth IPv6 lookup")
Cc: stable@dpdk.org

Signed-off-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>
2021-10-25 19:13:12 +02:00
Vladimir Medvedkin
b16ac53657 lpm6: fix buffer overflow
This patch fixes buffer overflow reported by ASAN,
please reference https://bugs.dpdk.org/show_bug.cgi?id=819

The rte_lpm6 keeps routing information for control plane purpose
inside the rte_hash table which uses rte_jhash() as a hash function.
From the rte_jhash() documentation: If input key is not aligned to
four byte boundaries or a multiple of four bytes in length,
the memory region just after may be read (but not used in the
computation).
rte_lpm6 uses 17 bytes keys consisting of IPv6 address (16 bytes) +
depth (1 byte).

This patch increases the size of the depth field up to uint32_t
and sets the alignment to 4 bytes.

Bugzilla ID: 819
Fixes: 86b3b21952a8 ("lpm6: store rules in hash table")
Cc: stable@dpdk.org

Signed-off-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
2021-10-25 19:08:16 +02:00
Vladimir Medvedkin
45523f494c hash: fix Doxygen comment of Toeplitz file
Fixes: 7574c3ef7428 ("hash: add toeplitz algorithm used by RSS")
Cc: stable@dpdk.org

Signed-off-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>
Acked-by: John McNamara <john.mcnamara@intel.com>
2021-10-25 19:06:07 +02:00
Honnappa Nagarahalli
3596537005 eal: fix memory ordering around lcore task accesses
Ensure that the memory operations before the call to
rte_eal_remote_launch are visible to the worker thread.
Use the function pointer to execute in worker thread
as the guard variable.

Ensure that the memory operations in worker thread, that happen
before it returns the status of the assigned function, are
visible to the main thread. Use the variable containing the
lcore's state as the guard variable.

Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com>
Reviewed-by: Feifei Wang <feifei.wang2@arm.com>
2021-10-25 18:20:59 +02:00
Honnappa Nagarahalli
f6c6c686f1 eal: remove FINISHED lcore state
FINISHED state seems to be used to indicate that the worker's update
of the 'state' is not visible to other threads. There seems to be no
requirement to have such a state.

Since the FINISHED state is removed, the API rte_eal_wait_lcore
is updated to always return the status of the last function that
ran in the worker core.

Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com>
Reviewed-by: Feifei Wang <feifei.wang2@arm.com>
2021-10-25 18:20:59 +02:00
Honnappa Nagarahalli
33969e9c61 eal: reset lcore task callback and argument
In the rte_eal_remote_launch function, the lcore function
pointer is checked for NULL. However, the pointer is never
reset to NULL. Reset the lcore function pointer and argument
after the worker has completed executing the lcore function.

Fixes: af75078fece3 ("first public release")
Cc: stable@dpdk.org

Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Feifei Wang <feifei.wang2@arm.com>
2021-10-25 18:20:59 +02:00
Eli Britstein
6de430b707 eal/x86: avoid cast-align warning in memcpy functions
Functions and macros in x86 rte_memcpy.h may cause cast-align warnings,
when using strict cast align flag with supporting gcc:
gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
CFLAGS="-Wcast-align=strict" make V=1 -C examples/l2fwd clean static

For example:
In file included from main.c:24:
/dpdk/build/include/rte_memcpy.h: In function 'rte_mov16':
/dpdk/build/include/rte_memcpy.h:306:25: warning: cast increases
required alignment of target type [-Wcast-align]
  306 |  xmm0 = _mm_loadu_si128((const __m128i *)src);
      |                         ^

As the code assumes correct alignment, add first a (void *) or (const
void *) castings, to avoid the warnings.

Fixes: 9484092baad3 ("eal/x86: optimize memcpy for AVX512 platforms")
Cc: stable@dpdk.org

Signed-off-by: Eli Britstein <elibr@nvidia.com>
2021-10-25 17:28:12 +02:00
Eli Britstein
da0333c879 mbuf: avoid cast-align warning in data offset macro
In rte_pktmbuf_mtod_offset macro, there is a casting from char * to type
't', which may cause cast-align warning when using strict cast align
flag with supporting gcc:
gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
CFLAGS="-Wcast-align=strict" make V=1 -C examples/l2fwd clean static

main.c: In function 'l2fwd_mac_updating':
/dpdk/build/include/rte_mbuf_core.h:719:3: warning: cast increases
required alignment of target type [-Wcast-align]
  719 |  ((t)((char *)(m)->buf_addr + (m)->data_off + (o)))
      |   ^
/dpdk/build/include/rte_mbuf_core.h:733:32: note: in expansion of macro
'rte_pktmbuf_mtod_offset'
  733 | #define rte_pktmbuf_mtod(m, t) rte_pktmbuf_mtod_offset(m, t, 0)
      |                                ^~~~~~~~~~~~~~~~~~~~~~~

As the code assumes correct alignment, add first a (void *) casting, to
avoid the warning.

Fixes: af75078fece3 ("first public release")
Cc: stable@dpdk.org

Signed-off-by: Eli Britstein <elibr@nvidia.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
2021-10-25 17:27:48 +02:00